elasticsearch - Elastic Search multilingual field -


i have read through few articles , advices, unfortunately haven't found working solution me.

the problem have field in index can have content in possible language , don't know in language is. need search , sort on it. not localisation, values in different languages.

the first language (excluding few european) have tried on japanese. beginning set field 1 analyzer , tried search japanese words/phrases. took example here. here used this:

'analysis': {     "filter": {     ...         "ja_pos_filter": {             "type": "kuromoji_part_of_speech",             "stoptags": [                 "\\u52a9\\u8a5e-\\u683c\\u52a9\\u8a5e-\\u4e00\\u822c",                 "\\u52a9\\u8a5e-\\u7d42\\u52a9\\u8a5e"]         },     ...     },     "analyzer": {     ...         "ja_analyzer": {             "type": "custom",             "filter": ["kuromoji_baseform", "ja_pos_filter", "icu_normalizer", "icu_folding", "cjk_width"],             "tokenizer": "kuromoji_tokenizer"         },     ...     },     "tokenizer": {         "kuromoji": {             "type": "kuromoji_tokenizer",             "mode": "search"         }     } } 

mapper:

'name': {     'type': 'string',     'index': 'analyzed',     'analyzer': 'ja_analyzer', } 

and here few tries result it:

{     'filter': {         'query': {             'bool': {                 'must': [                     {                         # 'wildcard': {'name': u'*ネバーランド福島*'}                         # 'match': {'name": u'ネバーランド福島'                         # },                         "query_string": {                             "fields": ['name'],                             "query": u'ネバーランド福島',                             "default_operator": 'and'                         }                     },                 ],                 'boost': 1.0             }         }     } } 

none of them works.

if take standard analyser , query in query_string or brake phrase myself (breaking on whitespace, don't have here) , use wildcard *<>* find me nothing again. analyser says ネバーランド , 福島 separate words/parts:

curl -xpost 'http://localhost:9200/test/_analyze?analyzer=ja_analyzer&pretty' -d 'ネバーランド福島' {   "tokens" : [ {     "token" : "ネハラント",     "start_offset" : 0,     "end_offset" : 6,     "type" : "word",     "position" : 1   }, {     "token" : "福島",     "start_offset" : 6,     "end_offset" : 8,     "type" : "word",     "position" : 2   } ] } 

and in case of standard analyser i'll result if i'll ネバーランド i'll want. if use customised analyser , try same or 1 symbol i'm still getting nothing.

the behaviour i'm looking is: breaking query string on words/parts, words/parts should present in resulting name field.

thank in advance


Comments

Popular posts from this blog

How to provide Authorization & Authentication using Asp.net, C#? -

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

How to use Authorization & Authentication in Asp.net, C#? -