elasticsearch


Logstash import leads to different docs count


I'm importing data from MySQL to elasticsearch using logstash, but after every logtash run I get another docs count in elastic.
Import 1: 1,877.865 docs
Import 2: 1,877.891 docs
Import 3: 1,879.259 docs
Import 4: 1,876.162 docs
This takes place in my dev environment where data in MySQL is static and does never change. I think the problem is related to a nested field which results from a one-to-many import from MySQL using this filter:
filter {
aggregate {
task_id => "%{id}"
code => "
map['title'] = event.get('title')
map['id'] = event.get('id')
map['locations'] ||= []
map['locations'] << {
'location_name' => event.get('location_name'),
'location' => {
'lat' => event.get('lat'),
'lon' => event.get('lon')
}
}
"
push_previous_map_as_event => true
}
}
Within my tests I saw a record with a nested location field that should loke like this:
"locations": [
{
"location_name": "Barcelona",
"location": {
"lon": 2.173404,
"lat": 41.385064
}
},
{
"location_name": "Salzbergen",
"location": {
"lon": 7.343625,
"lat": 52.321644
}
}
]
But after test 2 or 3 the first location suddenly was completely missing. After starting the logtash import once again the record was fine and had both locations again. So I guess there is any small issue wich randomly skipps some of the nested fiels, but I have no idea what could be the reason for this.
Edit: The problem even occurs if I reduce the sql query to a much smaller testset (Limit 3):
My query:
SELECT sa_data.id AS id, jobtitle AS title, sa_locations.location AS location_name, IFNULL(geo_coordinates.lat, 0) AS lat, IFNULL(geo_coordinates.lon,0) AS lon
FROM sa_data LEFT JOIN sa_locations ON sa_data.id = sa_locations.id
LEFT join geo_coordinates ON sa_locations.location = geo_coordinates.location_name
ORDER BY id
LIMIT 3
The query always leads to these static results:
1332 Service Manager w/m - Public Frankfurt am Main 50.110922 8.682127
1333 Service Manager w/m - Public Ratingen 51.296415 6.840184
1334 Qualitäts- und Lean-Manager (Dingelstädt oder Kass... Kassel 51.312711 9.479746
However the result of my logstash import is sometimes 5 docs and sometimes 6 docs. If it's the 5 docs version, the incomplete records looks like this:
{
"_index": "jk_1494862027",
"_type": "jobposting",
"_id": "1334",
"_score": 1,
"_source": {
"location_name": "Kassel",
"#timestamp": "2017-05-15T15:27:14.047Z",
"#version": "1",
"lon": 9.479746,
"id": 1334,
"title": "Qualitäts- und Lean-Manager (Dingelstädt oder Kassel/Lohfelden)",
"lat": 51.312711
}
}
If it's the 6 docs version the record looks like this:
{
"_index": "jk_1494862063",
"_type": "jobposting",
"_id": "1334",
"_score": 1,
"_source": {
"title": "Qualitäts- und Lean-Manager (Dingelstädt oder Kassel/Lohfelden)",
"tags": [
"_aggregatefinalflush"
],
"#timestamp": "2017-05-15T15:27:49.169Z",
"#version": "1",
"locations": [
{
"location_name": "Kassel",
"location": {
"lon": 9.479746,
"lat": 51.312711
}
}
],
"id": 1334
}
}
It seems completely randon to me if an import will result in 5 or 6 docs. What is going wrong here?

Related Links

Sampler Aggregation with score == 0
Is there a way to join 2 Indexes in Elastic Search?
Grafana Templating
Logstash 2.3.4 getting stuck while attempting to install template in elasticsearch
Template file not working in ElasticSearch
Elasticsearch aggregation by 7 fields
How to use DeleteByQuery plugin with embedded ES 2.3.3
How to solve [ plugin:elasticsearch Authentication Exception ] in elasticsearch using shield
Not seeing any Fields for a Y-Axis aggregation in Kibana
Elasticsearch Mapping Custom Propery with Script
How to manage multiple user indexes in spring data elasticsearch
Error: The signal HUP is in use by the JVM and will not work correctly on this platform
MoreLikeThis query with stopwrods and words with numbers
Unbelievably slow indexing in ElasticSearch
how to find out the general response speed of elastic search
How do you load Json in Elastic Search cluster?

Categories

HOME
focus
tcsh
kivy
filesize
whmcs
typeerror
jersey-2.0
lvm
vue-resource
swift2
iptables
ipv6
video-streaming
scheduler
kurento
responsive-design
closures
gravity
raml
sqlplus
environment
jpa-2.1
pubxml
morris.js
unrar
xdebug
tortoisehg
miniprofiler
dotcms
opnet
rust-cargo
bootstrap-sass
char-pointer
numerics
djcelery
uivisualeffectview
widestring
ejml
hypothesis-test
sendinput
permission-denied
windows-media-player
nusoap
hibernate-ogm
embedded-v8
email-forwarding
front-camera
streamwriter
s-function
tablespace
data-management
livecycle
tsqlt
spring-mongo
between
urbit
jemdoc
swiffy
vimperator
wepay
enunciate
cryptojs
fanotify
nastran
cfwheels
gitlab-omnibus
supercomputers
sql-processor
cakephp-2.2
object-code
screwturn
thruway
pageviews
jchartfx
windows-taskbar
meteorite
armv6
sunstudio
sly-scroller
stretch
pcspim
kobold2d
algol68
multidrop-bus
background-repeat
asio
directshow.net
regression-testing
bass
principles
fgetc
asdoc
jquery-effects
tacit-programming
autobench
community-server

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile