elasticsearch - Elastic Search multilingual field -
i have read through few articles , advices, unfortunately haven't found working solution me.
the problem have field in index can have content in possible language , don't know in language is. need search , sort on it. not localisation, values in different languages.
the first language (excluding few european) have tried on japanese. beginning set field 1 analyzer , tried search japanese words/phrases. took example here. here used this:
'analysis': { "filter": { ... "ja_pos_filter": { "type": "kuromoji_part_of_speech", "stoptags": [ "\\u52a9\\u8a5e-\\u683c\\u52a9\\u8a5e-\\u4e00\\u822c", "\\u52a9\\u8a5e-\\u7d42\\u52a9\\u8a5e"] }, ... }, "analyzer": { ... "ja_analyzer": { "type": "custom", "filter": ["kuromoji_baseform", "ja_pos_filter", "icu_normalizer", "icu_folding", "cjk_width"], "tokenizer": "kuromoji_tokenizer" }, ... }, "tokenizer": { "kuromoji": { "type": "kuromoji_tokenizer", "mode": "search" } } } mapper:
'name': { 'type': 'string', 'index': 'analyzed', 'analyzer': 'ja_analyzer', } and here few tries result it:
{ 'filter': { 'query': { 'bool': { 'must': [ { # 'wildcard': {'name': u'*ネバーランド福島*'} # 'match': {'name": u'ネバーランド福島' # }, "query_string": { "fields": ['name'], "query": u'ネバーランド福島', "default_operator": 'and' } }, ], 'boost': 1.0 } } } } none of them works.
if take standard analyser , query in query_string or brake phrase myself (breaking on whitespace, don't have here) , use wildcard *<>* find me nothing again. analyser says ネバーランド , 福島 separate words/parts:
curl -xpost 'http://localhost:9200/test/_analyze?analyzer=ja_analyzer&pretty' -d 'ネバーランド福島' { "tokens" : [ { "token" : "ネハラント", "start_offset" : 0, "end_offset" : 6, "type" : "word", "position" : 1 }, { "token" : "福島", "start_offset" : 6, "end_offset" : 8, "type" : "word", "position" : 2 } ] } and in case of standard analyser i'll result if i'll ネバーランド i'll want. if use customised analyser , try same or 1 symbol i'm still getting nothing.
the behaviour i'm looking is: breaking query string on words/parts, words/parts should present in resulting name field.
thank in advance
Comments
Post a Comment