python 3.x - Python34 word2vec.Word2Vec OverFlowError -
i'm studying word2vec, when use word2vec train text data, occur overflowerror numpy.
the message is,
model.vocab[w].sample_int > model.random.randint(2**32)] warning (from warnings module): file "c:\python34\lib\site-packages\gensim\models\word2vec.py", line 636 warnings.warn("c extension not loaded word2vec, training slow. " userwarning: c extension not loaded word2vec, training slow. install c compiler , reinstall gensim fast training. exception in thread thread-1: traceback (most recent call last): file "c:\python34\lib\threading.py", line 920, in _bootstrap_inner self.run() file "c:\python34\lib\threading.py", line 868, in run self._target(*self._args, **self._kwargs) file "c:\python34\lib\site-packages\gensim\models\word2vec.py", line 675, in worker_loop if not worker_one_job(job, init): file "c:\python34\lib\site-packages\gensim\models\word2vec.py", line 666, in worker_one_job job_words = self._do_train_job(items, alpha, inits) file "c:\python34\lib\site-packages\gensim\models\word2vec.py", line 623, in _do_train_job tally += train_sentence_sg(self, sentence, alpha, work) file "c:\python34\lib\site-packages\gensim\models\word2vec.py", line 112, in train_sentence_sg word_vocabs = [model.vocab[w] w in sentence if w in model.vocab , file "c:\python34\lib\site-packages\gensim\models\word2vec.py", line 113, in <listcomp> model.vocab[w].sample_int > model.random.randint(2**32)] file "mtrand.pyx", line 935, in mtrand.randomstate.randint (numpy\random\mtrand\mtrand.c:9520) overflowerror: python int large convert c long can tell me cases?
my machine x64 , os windows 7, python34 32bit. numpy , scipy 32bit.
i well. looks gensim has potential workaround in dev branch.
https://github.com/piskvorky/gensim/commit/726102df66000f2afcea82d95634b055e6521dc8
this doesn't solve core issue of navigating between different hardware , install int sizes, think should alleviate issues particular line.
the necessary change involves switching out
model.vocab[w].sample_int > model.random.randint(2**32)
for
model.vocab[w].sample_int > model.random.rand() * 2**32
this avoids 64 bit / 32 bit int issue created in randint.
update: manually incorporated change gensim install , prevents error.
Comments
Post a Comment