elasticsearch


NLP - Text tagging tools


I have a set of keywords/phrases i would like to find in user's input, e.g. cities. My set contains about 100 000 items.
The most important issue is that i want to recognize it even if user will make a type. For example:
The best city i've ever visited is Barceloma, however New Yuork is also pretty cool, because of amazing views. expected result should be Barcelona and New York
I have a basic algorithm, unfortunately it lasts very long. It's no possible to do it concurrently by splitting text by whitespaces, because i would like to recognize multi-words names, e.g. New York, San Francisco.
Is there any tool which may help me? I thought nltk is one of the libraries, but i'm not able to use it correctly to do it. Is it possible to use elasticsearch or neural network to achieve it?
Thanks for help
I think using neural-networks might not be so effective except if you have access to a really fast computer. It will be hard to train 100000 words to one neural network. So heres how I would do it:
For every word in your text:
For every search_word in your keywords:
var diff = calculateDifference(word, search_word);
if(diff < treshold): log('Found a word!')
So basically, instead of checking if a word matches a word your searching for (==), your going to calculate the difference between the word and the word your seraching for. An example of this distance might be:
word = Text
search_word = Test
// Every letter that is different gets added to difference, divided by word length
difference = 1 /4 = 0.25
word = Key
search_word = Bees
// K = B, y = e, whitespace = s
difference = (1 + 1 + 1) / 4 = 0.75
However, doing it this way might cause unwanted matches; maybe the user did not mistype Text, and actually wanted to type Test. If you want to do this context related, you need to switch to neural networks (LSTM's), but that is pretty advanced.
As you brought up multi-word searches, here is how you could do it:
For every word in your text:
For every search_word in your keywords:
var diff = calculateDifference(word, search_word);
if(diff < treshold): log('Found a word!')
For every search_phrase in your keyphrases:
var level = search_phrase.level;
var diff = calculateDifference(word, search_phrase[level]);
if(diff < treshold):
search_phrase.level++;
if(level == search_phrase.wordcount):
log('Found a phrase!')
search_phrase.level = 1;
else:
search_phrase.level = 1;
So basically, you keep an array of phrases as well (óne word phrases are also possible, so you kan keep them in one array). Each phrase has a word count (e.g. New York, wordcount = 2.
New York, level = 1 > New
New York, level = 2 > York
So when you haven't find the first word of your phrase yet, that phrase's level = 1. You will keep looking for the level = 1 word, aka the first word. If you find the first word, you increase the level by 1. Then you keep looking for the word in the phrase with level = 2. If the next word in the text is not that search word, you reset the level to 1. If it IS that word, you increase the level, however, if level == wordCount you have found the word. So you also reset the level then.
I hope you understand...

Related Links

ElasticSerach cluster performance
Nxlog unable to send eventlog after certain time
Sort elasticsearch search hits by document count
Elastic search date range max, min date
Elastic search river mongodb _meta returning action not found error
Seeing many open Elasticsearch connections even after using singleton pattern
What would be a good approach for sending logs from multiple servers a centralized logging server?
does elasticsearch support queries of queries?
Data modelling with elastic search
match or term query on a long property for exact match?
Updating filtered documents in elasticsearch
Testing ElasticSearch custom analyzers
timestamp issue in elasticsearch
Elasticsearch NEST client singleton usage
Elasticsearch: suggest users based on likes
Set every property type to not_analyzed for custom object

Categories

HOME
hpoo
webview
protocol-buffers
weblogic12c
typelite
barcode-scanner
nsview
odata
consul
tabs
fogbugz
dji-sdk
jint
bittorrent
reportviewer
twitter-oauth
nsstring
grocery-crud
freertos
smartgwt
asp.net-core-1.0
google-sites
coroutine
cpu-usage
deepfreeze
home
cmake-gui
activeadmin
firemonkey-style
math.js
dnsmasq
vcenter
fotorama
spring-rabbitmq
numerics
formsauthenticationticket
uivisualeffectview
ms-dos
aurelia-http-client
iscroll
imgur
fiware-wirecloud
jqgrid-asp.net
nuclio
react-intl
kitura
gcal
tinyioc
uiautomatorviewer
gtk#
mongodb-aggregation
word-2007
predicates
degrees
gawk
mifos
msg
xmlbeans
tsqlt
qtcpsocket
fragmentstatepageradapter
dojo-build
yii2-model
highlighting
okta-api
odesk
outlook.com
dynamics-ax-2012-r2
explain
dbmigrate
text-align
findersync
baseadapter
directorysearcher
lifetime
producer
zend-search-lucene
xulrunner
preorder
application-server
mousemotionlistener
pacman
cmath
ip-geolocation
towers-of-hanoi
gitx
voice-recording
uipangesturerecognizer
mkannotation
ixmldomdocument
fgetc
gui-designer

Resources

Encrypt Message