apache-solr apache-tika indexing documents. Slow speed -


i have 4gb ram. running solr on 3gb memory.

i extracting text , meta data using apache-tika server (tika-server.jar).

files taking longer time usual. 20 mb file taking 2 - 3 minutes.

my server hosted on amazon cloud. running ubuntu 14.04.

i have tested on local machine extracts data same file in 1-2 secs.

is there special configuration needed amazon cloud instance. local machine has 4gb ram mac os.

i using tika-python index documents.

i have around 1 million documents in different file formats (pdf,htlm,doc,ppt,xml,txt)

please suggest remedy or alternative solution apache-tika.

thanks


my system has ubuntu without libre office installed in it.

slow indexing happening .doc files only.

is there alternative solution parse full text , meta data ms office files (doc, docx, etc). give better speed.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -