calc_similarity.Rd
This functions compute similarities between entries in a document frequency matrix dfm()
and return a dataframe with distinct id combinations.
It heavily relies on the quanteda package
calc_similarity(data, method, min_sim)
data as a document frequency matrix dfm()
with a set doc_id
character; the method identifying the similarity or distance measure to be used, see ?quanteda::textstat_simil
numeric; a threshold for the similarity values below which similarity values will not be returned; 0.75-0.8 seems reasonable
dataframe containing the two id's and the similarity value