elasticsearch


Fields not getting sorted in alphabetical order in elasticsearch


I have a few documents with the a name field in it. I am using analyzed version of the name field for search and not_analyzed for sorting purposes. The sorting happens in one level, that is the names are sorted alphabetically at first. But within the list of an alphabet, the names are getting sorted lexicographically rather than alphabetically.
Here is the mapping I have used:
{
"mappings": {
"seing": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Can anyone provide a solution for the same?
Digging down into Elasticsearch documents, I stumbled upon this:
Sorting and Collations
Case-Insensitive Sorting
Imagine that we have three user documents whose name fields contain
Boffey, BROWN, and bailey, respectively. First we will apply the
technique described in String Sorting and Multifields of using a
not_analyzed field for sorting:
PUT /my_index
{
"mappings": {
"user": {
"properties": {
"name": { //1
"type": "string",
"fields": {
"raw": { //2
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
The analyzed name field is used for search.
The not_analyzed name.raw field is used for sorting.
The preceding search request would return the documents in this order:
BROWN, Boffey, bailey. This is known as lexicographical order as
opposed to alphabetical order. Essentially, the bytes used to
represent capital letters have a lower value than the bytes used to
represent lowercase letters, and so the names are sorted with the
lowest bytes first.
That may make sense to a computer, but doesn’t make much sense to
human beings who would reasonably expect these names to be sorted
alphabetically, regardless of case. To achieve this, we need to index
each name in a way that the byte ordering corresponds to the sort
order that we want.
In other words, we need an analyzer that will emit a single lowercase
token:
Following this logic, instead of storing raw document, you need to lowercase it using custom keyword analyzer:
PUT /my_index
{
"settings" : {
"analysis" : {
"analyzer" : {
"case_insensitive_sort" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"]
}
}
}
},
"mappings" : {
"seing" : {
"properties" : {
"name" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"analyzer" : "case_insensitive_sort"
}
}
}
}
}
}
}
Now ordering by name.raw should sort in alphabetical order, rather than lexicographical.
Quick test done on my local machine using Marvel:
Index structure:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_sort": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"user": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"keyword": {
"type": "string",
"analyzer": "case_insensitive_sort"
}
}
}
}
}
}
}
Test data:
PUT /my_index/user/1
{
"name": "Tim"
}
PUT /my_index/user/2
{
"name": "TOM"
}
Query using raw field:
POST /my_index/user/_search
{
"sort": "name.raw"
}
Result:
{
"_index" : "my_index",
"_type" : "user",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "TOM"
},
"sort" : [
"TOM"
]
},
{
"_index" : "my_index",
"_type" : "user",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "Tim"
},
"sort" : [
"Tim"
]
}
Query using lowercased string:
POST /my_index/user/_search
{
"sort": "name.keyword"
}
Result:
{
"_index" : "my_index",
"_type" : "user",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "Tim"
},
"sort" : [
"tim"
]
},
{
"_index" : "my_index",
"_type" : "user",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "TOM"
},
"sort" : [
"tom"
]
}
I'm suspecting that second result is correct in your case.

Related Links

Double wildcard in query causes weird highlighting for plain/fast vectors elasticsearch highlighters
Get Elasticsearch response in DataTable in C#.Net
Elasticsearch NoNodeAvailableException issue
Why Elasticsearch snapshot stats number_of_files differs from actual indices count?
Specifying data type and analyzer while creating index
elasticsearch regroups aggs by regex
get unique values from elasticsearch
Elasticsearch dies on primary node at random times. What should I look for in troubleshooting?
elasticsearch problems with queries
Normalize elasticsearch query/results
ElasticSearch max shard size
Elasticsearch: Group results of different types in response structure
Storing data in Elasticsearch - OLTP
Elasticsearch + Shield, security_exception happens when Document Level Security is enabled
ElasticSearch Parent/Child Restriction
How to configure kibana for multiple nodes in the same cluster?

Categories

HOME
magento2
whmcs
magento-2.0
consul
nsis
bittorrent
ui-automation
cdi
kurento
migrate
fireloop
getopenfilename
chromecast
rpmbuild
office-ui-fabric
string-formatting
salt-cloud
raml
bsd
pdf.js
react-dnd
beego
uri
history.js
math.js
amazon-cloudtrail
sparkle
jsonresult
mapnik
advanced-filter
cloveretl
restful-architecture
db2-luw
keyboard-layout
subclassing
django-smart-selects
aurelia-http-client
stat
hdiv
opal-framework
coverage.py
contactless-smartcard
type-inference
reverse-dns
ipp-protocol
nothing
photoswipe
hibernate-ogm
sigsegv
subforms
android-webservice
patternlab.io
glimpse
dblink
ltrace
front-camera
cognos-tm1
gawk
controllers
jsf-2.3
nvda
dojo-build
powerpoint-2013
codecharge
voronoi
calcite
netdatacontractserializer
blockquote
supercomputers
mobilefirst-server
wicked-gem
innerhtml
pass-by-value
fasterxml
dc
image-rotation
adwords-apiv201402
jini
meteor-collections
oxygene
django-sites
illuminate-container
django-facebook
google-code-prettify
node.js-stream
handshake
workflow-manager-1.x
popup-blocker
mkannotation
mediacontroller
jmenu
mongrel2
principles
raw-data
.net-services
mysqli-multi-query
index.dat
comment-conventions

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile