python - Print Wikipedia Article Title from Gensim WikiCorpus -


i believe question easy, i'm new python , think blinding me bit.

i've downloaded wikipedia dump explained under "preparing corpus" here: https://radimrehurek.com/gensim/wiki.html. ran following lines of code:

import gensim  # these next 2 lines take around 16 hours wikidocs = gensim.corpora.wikicorpus.wikicorpus('enwiki-latest-pages-articles.xml.bz2') gensim.corpora.mmcorpus.serialize('wiki_en_vocab200k', wikidocs) 

these lines of code taken link above. now, in separate script i've done text analysis. result of text analysis number representing index of particular article in wikidocs corpus. problem, don't know how print out text of article. obvious thing try is:

wikidocs[index_of_article] 

but returns error

typeerror: 'wikicorpus' object not support indexing 

i've tried few other things i'm stuck. help.

it's not such easy quesion, reason why didn't work wikicorpus isn't iterator, it's class few functions saving , loading. can see functions buy typing wikicorpus. , pressing tab ipython (this shows options tab-completion:

in [8]: wikidocs = gensim.corpora.wikicorpus.wikicorpus. gensim.corpora.wikicorpus.wikicorpus.get_texts    gensim.corpora.wikicorpus.wikicorpus.load         gensim.corpora.wikicorpus.wikicorpus.save_corpus gensim.corpora.wikicorpus.wikicorpus.getstream    gensim.corpora.wikicorpus.wikicorpus.save 

it looks want get_texts, return iterator rather list though, (iterators don't directly support indexing either) you'll have use

list(wikidocs.get_texts())[i] 

or

from itertools import islice next(islice(wikidocs.get_texts(),i,i+1)) 

Comments

Popular posts from this blog

gridview - Yii2 DataPorivider $totalSum for a column -

java - Suppress Jboss version details from HTTP error response -

Sass watch command compiles .scss files before full sftp upload -