import pandas as pd
from scipy.sparse import csr_matrix
= pd.read_csv('wikipedia-vectors.csv', index_col=0)
df = csr_matrix(df.transpose())
articles = list(df.columns) titles
To preprocess wikipedia-vectors.csv
into the format in which you used it in the exercises, you have to take its transpose:
The reason for taking this transpose is that without it, there would be 13,000 columns (corresponding to the 13,000 words in the file), which is a lot of columns for a CSV to have.