elasticsearch


Logstash import leads to different docs count


I'm importing data from MySQL to elasticsearch using logstash, but after every logtash run I get another docs count in elastic.
Import 1: 1,877.865 docs
Import 2: 1,877.891 docs
Import 3: 1,879.259 docs
Import 4: 1,876.162 docs
This takes place in my dev environment where data in MySQL is static and does never change. I think the problem is related to a nested field which results from a one-to-many import from MySQL using this filter:
filter {
aggregate {
task_id => "%{id}"
code => "
map['title'] = event.get('title')
map['id'] = event.get('id')
map['locations'] ||= []
map['locations'] << {
'location_name' => event.get('location_name'),
'location' => {
'lat' => event.get('lat'),
'lon' => event.get('lon')
}
}
"
push_previous_map_as_event => true
}
}
Within my tests I saw a record with a nested location field that should loke like this:
"locations": [
{
"location_name": "Barcelona",
"location": {
"lon": 2.173404,
"lat": 41.385064
}
},
{
"location_name": "Salzbergen",
"location": {
"lon": 7.343625,
"lat": 52.321644
}
}
]
But after test 2 or 3 the first location suddenly was completely missing. After starting the logtash import once again the record was fine and had both locations again. So I guess there is any small issue wich randomly skipps some of the nested fiels, but I have no idea what could be the reason for this.
Edit: The problem even occurs if I reduce the sql query to a much smaller testset (Limit 3):
My query:
SELECT sa_data.id AS id, jobtitle AS title, sa_locations.location AS location_name, IFNULL(geo_coordinates.lat, 0) AS lat, IFNULL(geo_coordinates.lon,0) AS lon
FROM sa_data LEFT JOIN sa_locations ON sa_data.id = sa_locations.id
LEFT join geo_coordinates ON sa_locations.location = geo_coordinates.location_name
ORDER BY id
LIMIT 3
The query always leads to these static results:
1332 Service Manager w/m - Public Frankfurt am Main 50.110922 8.682127
1333 Service Manager w/m - Public Ratingen 51.296415 6.840184
1334 Qualitäts- und Lean-Manager (Dingelstädt oder Kass... Kassel 51.312711 9.479746
However the result of my logstash import is sometimes 5 docs and sometimes 6 docs. If it's the 5 docs version, the incomplete records looks like this:
{
"_index": "jk_1494862027",
"_type": "jobposting",
"_id": "1334",
"_score": 1,
"_source": {
"location_name": "Kassel",
"#timestamp": "2017-05-15T15:27:14.047Z",
"#version": "1",
"lon": 9.479746,
"id": 1334,
"title": "Qualitäts- und Lean-Manager (Dingelstädt oder Kassel/Lohfelden)",
"lat": 51.312711
}
}
If it's the 6 docs version the record looks like this:
{
"_index": "jk_1494862063",
"_type": "jobposting",
"_id": "1334",
"_score": 1,
"_source": {
"title": "Qualitäts- und Lean-Manager (Dingelstädt oder Kassel/Lohfelden)",
"tags": [
"_aggregatefinalflush"
],
"#timestamp": "2017-05-15T15:27:49.169Z",
"#version": "1",
"locations": [
{
"location_name": "Kassel",
"location": {
"lon": 9.479746,
"lat": 51.312711
}
}
],
"id": 1334
}
}
It seems completely randon to me if an import will result in 5 or 6 docs. What is going wrong here?

Related Links

Data modelling with elastic search
match or term query on a long property for exact match?
Updating filtered documents in elasticsearch
Testing ElasticSearch custom analyzers
timestamp issue in elasticsearch
Elasticsearch NEST client singleton usage
Elasticsearch: suggest users based on likes
Set every property type to not_analyzed for custom object
How to tell ElasticSearch to create nested fields
Elasticsearch minimum master nodes
How to create an Elasticsearch index without word-splitting?
Run a simple sql group by query in kibana 4
Clearing ttl from an object in elasticsearch
how can i search for values that have “-” dash in them with elastic search
Elasticsearch array of query strings
ElasticSearch Update Multi-field Mapping

Categories

HOME
rust
firebase
powerbi
apache-nifi
mpdf
seedstack
livecode
bokeh
bots
concurrency
code-formatting
theano
biztalk
magento-2.0
header
greasemonkey
azure-data-lake
scheduler
port
spring-security-oauth2
yeoman-generator-angular
alljoyn
fbloginview
orc
android-externalstorage
outlook-restapi
stimulsoft
squid
robolectric
infrared
jitsi
kaggle
restier
router
password-generator
acrobat
bitbake
dnsmasq
openbr
credit-card
sql-execution-plan
xmlunit
image-optimization
spring-rabbitmq
xerces-c
nsuserdefaults
strapi
grails-spring-security
haar-classifier
destroy
spongycastle
head.js
palindrome
portability
swiperefreshlayout
http4s
adobe-reader
wildfly-9
java-2d
xcode6.4
pyaudio
fragmentstatepageradapter
gulp-uglify
html.actionlink
vdm-sl
algebraixlib
quercus
azure-xplat-cli
nanomsg
sql-processor
comctl32
xml-signature
visual-studio-6
object-code
oracle-adf-mobile
maven-central
forceclose
redpitaya
usb-flash-drive
coalesce
bcdedit
delphi-xe4
nsfont
exponent
qt-mobility
canonicalization
simile
gaelyk
cleartype
spring-modules
sqlsitemapprovider
prism-2
chronic
service-factory

Resources

Encrypt Message