Why cosine_similarity of pretrained fasttex model is high between two sentents are not relative at all?
I am wondering to know why pre-trained 'fasttext model' with wiki(Korean) seems not to work well! :( model = fasttext.load_model("./fasttext/wiki.ko.bin") model.cosine_similarity("테스트 테스트 이건 테스트 문장", "지금 아무 관계 없는 글 정말로 정말로") (in english) model.cosine_similarity("test test this is test sentence", "now not all relative docs really really ") 0.99....?? Those sentence is not at all relative as meaning. Therefore I think that cosine-similarity must be lower. However It was 0.997383... Is it impossive to compare lone sentents with fasttext? So Is it only way to use doc2vec?
Which 'fasttext' code package are you using? Are you sure its cosine_similarity() is designed to take such raw strings, and automatically tokenize/combine the words of each example to give sentence-level similarities? (Is that capability implied by its documentation or illustrative examples? Or perhaps does it expected pre-tokenized lists of words?)
Phrase Similarity Score Calculation and Skill-set Extraction from Job Description
How to produce n-gram word-class language model by word2vec?
how can i get cosign diatance between two words in Deeplearning4j - Word2vec
Some problems in the word2vec. The value range of array vocab_hash is larger than vocab's index range?
Paragraph Vector construction and training
CBOW (Continuous Bag of Word) Understandable Code
word2vec specify own training pair for cbow model
Replacing vec.bin with Google News model
word2vec implementation addresing male/female and singular/plural issues
how can I make use of the word2vec pretrained vectors?
word2vec gives vectors of very few words in a text.Why?
Is there any tested “Word2Vector” code example in java or python?
Getting different results from deeplearning4j and word2vec
How to get vector for a sentence from the word2vec of tokens in sentence
Why does word2vec use 2 representations for each word?
How to train p(category|title) model with word2vec