elasticsearch


How is the memory allocation done for ElasticSearch types in an index?


I was reading the elasticsearch documentation and found an interesting line written, in Index vs. Type under the heading "What is type" the second point says:
Fields that exist in one type will also consume resources for documents of types where this field does not exist.
I am not able to understand what it actually means. Does it mean ti say if I create two types:
Type 1: [a:string, b:text, c:keyword] Type 2: [c: keyword, d:string]
Then even if I am storing a document of type 2, the ElasticSearch will take space for all 5 fields? I don't think it should be the case, but looks to be the same, the way it is written in the documentation.
Elasticsearch is built on top of Lucene, which does not have the concept of a "type". With Lucene, you just have an index and you fill it with documents. A type is an abstraction that only exists at the Elasticsearch layer.
So when you write a document for type 1 then it is like writing this document in Lucene:
{
"a": "Foo",
"b": "Bar",
"c": "Foobar",
"d": null
}
or for writing to type 2 then:
{
"a": null,
"b": null,
"c": "Foobar",
"d": "Foobazz"
}
Despite the fact that when you are writing a document for one type, you are leaving the fields blank for the other type, these empty fields can still consume resources in Lucene. For example, both norms and doc_values are still computed on empty fields (assuming they are enabled, which they are by default depending on the field type).
Also worth reading: https://www.elastic.co/blog/great-mapping-refactoring

Related Links

Elasticsearch Automatic Synonyms
org.elasticsearch.common.netty.channel.ChannelException: Failed to create a selector
Tokenising / filtering text with markup
elasticsearch: boost query based on values of a variable
Elasticsearch function_score query
Elasticsearch 2.0.0 cluster zen discovery in docker
How can I use Kafka to retain logs in logstash for longer period?
Optimal way to set up ELK stack on three servers
Cluster Level Logging with Elasticsearch and Kibana does not work in kubernetes
ElasticSearch Couchbase Replication Issue
How to query for inner_hits against grandparents in multi-generational setup
Request timedout during delete/create on elasticsearch while snapshot is being taken
how to use two parallel Aggregation for elasticsearch nest
How do I enable remote access/request in Elasticsearch 2.0?
elasticsearch faking index per user - how are routing values inferred when updating?
elasticsearch: Proper config in 3 node cluster for each node to have full copy of index?

Categories

HOME
entity-framework
protocol-buffers
keyboard
adfs3.0
writefile
swi-prolog
stacktrace.js
hdfs
histogram
lvm
elk-stack
visualforce
openfoam
haxe
javafx-8
u-boot
angular2-template
colors
erd
fbloginview
clish
sslhandshakeexception
sonata
distributed-computing
max-msp-jitter
texas-instruments
nest-api
spam
tortoisemerge
webviewclient
miniprofiler
locks
snap-framework
markov-chains
busboy
landscape
nsuserdefaults
djcelery
ng-repeat
google-now
jacoco-maven-plugin
minikube
web-inspector
liteide
webalizer
aws-kinesis-firehose
nothing
password-hash
iptv
embedded-v8
skip-lists
exc-bad-access
nrf51
d3v4
petsc
multi-touch
streamwriter
django-testing
gcloud-node
xcode8-beta4
adp
textblock
accessibilityservice
taco
canopy
domino-designer-eclipse
htmltidy
password-recovery
zend-mail
angular-ui-select
lumia-imaging-sdk
box2dweb
network-printers
refit
discovery
iirf
unicoins
lynx
gitx
android-dialog
pageheap
cakeyframeanimation
pcspim
glx
mvcrecaptcha
infopath-2007
dynamic-c
drawimage
net-use
libxslt
multiple-users
pinax
self-tracking-entities
eventaggregator
service-factory

Resources

Encrypt Message