word2vec


Why cosine_similarity of pretrained fasttex model is high between two sentents are not relative at all?


I am wondering to know why pre-trained 'fasttext model' with wiki(Korean) seems not to work well! :(
model = fasttext.load_model("./fasttext/wiki.ko.bin")
model.cosine_similarity("테스트 테스트 이건 테스트 문장", "지금 아무 관계 없는 글 정말로 정말로")
(in english)
model.cosine_similarity("test test this is test sentence", "now not all relative docs really really ")
0.99....??
Those sentence is not at all relative as meaning. Therefore I think that cosine-similarity must be lower. However It was 0.997383...
Is it impossive to compare lone sentents with fasttext?
So Is it only way to use doc2vec?
Which 'fasttext' code package are you using?
Are you sure its cosine_similarity() is designed to take such raw strings, and automatically tokenize/combine the words of each example to give sentence-level similarities? (Is that capability implied by its documentation or illustrative examples? Or perhaps does it expected pre-tokenized lists of words?)

Related Links

Custom word weights for sentences when calling h2o transform and word2vec, instead of straight AVERAGE of words
Where can I find pre trained word embeddings (English) in word2vec format of 50 dimensions?
Word2Vec Output Vectors
Does word2vec supports multiple languages?
Using ConceptNet5 API to calculate the similarity between texts
Phrase Similarity Score Calculation and Skill-set Extraction from Job Description
How to produce n-gram word-class language model by word2vec?
how can i get cosign diatance between two words in Deeplearning4j - Word2vec
Some problems in the word2vec. The value range of array vocab_hash is larger than vocab's index range?
Paragraph Vector construction and training
CBOW (Continuous Bag of Word) Understandable Code
word2vec specify own training pair for cbow model
Replacing vec.bin with Google News model
word2vec implementation addresing male/female and singular/plural issues
how can I make use of the word2vec pretrained vectors?
word2vec gives vectors of very few words in a text.Why?

Categories

HOME
xpath
oracle
nam
powerbi
keyboard
focus
applescript
nsview
website
turbojpeg
capistrano
sentry
ui-automation
firebase-database
cisco
swarm
solution
match
rangy
multi-step
webstore
complexity-theory
parceler
fido-u2f
qa
amazon-sns
ratio
opentext
webviewclient
ruby-daemons
react-bootstrap-table
macromedia
django-import-export
fatfs
tilemill
mapnik
argv
mongodb-3.4
wampsharp
android-collapsingtoolbar
restful-architecture
dreamfactory
busboy
telephonymanager
xll
mars-simulator
reset
best-buy-api
cloudera-sentry
viewcontroller
apache-toree
icecast
exc-bad-access
cocoa-bindings
ipywidgets
okio
jtag
fragmentstatepageradapter
frame-grab
artisan
web-api
shoes
deque
mutators
phonegap-facebook-plugin
n-tier-architecture
tabris
xhprof
oracle-adf-mobile
full-text-indexing
.htpasswd
thruway
accountmanager
gpu-programming
jomsocial
grails-2.3
tnt4j
node.js-stream
objective-c-runtime
unicoins
spiral
proc-open
onconfigurationchanged
android-dialog
series-40
glx
filelock
data-dump
qt-mobility
botnet
django-pagination
hungarian-notation
mygeneration
zend-test
browser-based
html-components
sqlsitemapprovider
objectinstantiation
gui-designer

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App