word2vec


Why cosine_similarity of pretrained fasttex model is high between two sentents are not relative at all?


I am wondering to know why pre-trained 'fasttext model' with wiki(Korean) seems not to work well! :(
model = fasttext.load_model("./fasttext/wiki.ko.bin")
model.cosine_similarity("테스트 테스트 이건 테스트 문장", "지금 아무 관계 없는 글 정말로 정말로")
(in english)
model.cosine_similarity("test test this is test sentence", "now not all relative docs really really ")
0.99....??
Those sentence is not at all relative as meaning. Therefore I think that cosine-similarity must be lower. However It was 0.997383...
Is it impossive to compare lone sentents with fasttext?
So Is it only way to use doc2vec?
Which 'fasttext' code package are you using?
Are you sure its cosine_similarity() is designed to take such raw strings, and automatically tokenize/combine the words of each example to give sentence-level similarities? (Is that capability implied by its documentation or illustrative examples? Or perhaps does it expected pre-tokenized lists of words?)

Related Links

Phrase Similarity Score Calculation and Skill-set Extraction from Job Description
How to produce n-gram word-class language model by word2vec?
how can i get cosign diatance between two words in Deeplearning4j - Word2vec
Some problems in the word2vec. The value range of array vocab_hash is larger than vocab's index range?
Paragraph Vector construction and training
CBOW (Continuous Bag of Word) Understandable Code
word2vec specify own training pair for cbow model
Replacing vec.bin with Google News model
word2vec implementation addresing male/female and singular/plural issues
how can I make use of the word2vec pretrained vectors?
word2vec gives vectors of very few words in a text.Why?
Is there any tested “Word2Vector” code example in java or python?
Getting different results from deeplearning4j and word2vec
How to get vector for a sentence from the word2vec of tokens in sentence
Why does word2vec use 2 representations for each word?
How to train p(category|title) model with word2vec

Categories

HOME
rust
visual-studio
ionic2
protocol-buffers
apple-push-notifications
yaml
shinyapps
consul
anchor
port
yeoman-generator-angular
phpstorm-2017.1
csrf
vimeo
angular4
xbap
live-streaming
sslhandshakeexception
temperature
boomi
opam
generator
pubxml
tortoisemerge
hdmi
hevc
acrobat
sfml
monitor
d3.js-v4
xenapp
es6-modules
toad
om-next
mongoengine
cmis
char-pointer
macports
swfupload
classnotfoundexception
transparency
permission-denied
doctrine-extensions
hibernate-ogm
sharpdx
skip-lists
android-webservice
webvtt
sequence-diagram
htmlspecialchars
email-forwarding
emacs25
carmen
spring-mongo
openseadragon
android-sdcard
perforce-integrate
apache-pig-grunt
jemdoc
gravatar
as3-api
wif4.5
voronoi
wepay
diawi
decidable
self
asp.net-web-api-helppages
azure-xplat-cli
fasterxml
juttle
password-recovery
treeline
meteor-collections
pageviews
air-native-extension
cidr
hotswap
flipboard
progress-db
lru
insertion
bcdedit
tridion-2011
image-scanner
platform-independent
uipangesturerecognizer
exponent
acpi
background-repeat
will-paginate
winsxs
jdownloader
3-tier
principles
iphone-sdk-4.3
rakudo

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App