apache-solr apache-tika indexing documents. Slow speed -
i have 4gb ram. running solr on 3gb memory.
i extracting text , meta data using apache-tika server (tika-server.jar).
files taking longer time usual. 20 mb file taking 2 - 3 minutes.
my server hosted on amazon cloud. running ubuntu 14.04.
i have tested on local machine extracts data same file in 1-2 secs.
is there special configuration needed amazon cloud instance. local machine has 4gb ram mac os.
i using tika-python index documents.
i have around 1 million documents in different file formats (pdf,htlm,doc,ppt,xml,txt)
please suggest remedy or alternative solution apache-tika.
thanks
my system has ubuntu without libre office installed in it.
slow indexing happening .doc files only.
is there alternative solution parse full text , meta data ms office files (doc, docx, etc). give better speed.
Comments
Post a Comment