calc_similarity.RdThis functions compute similarities between entries in a document frequency matrix dfm() and return a dataframe with distinct id combinations.
It heavily relies on the quanteda package
calc_similarity(data, method, min_sim)data as a document frequency matrix dfm() with a set doc_id
character; the method identifying the similarity or distance measure to be used, see ?quanteda::textstat_simil
numeric; a threshold for the similarity values below which similarity values will not be returned; 0.75-0.8 seems reasonable
dataframe containing the two id's and the similarity value