寻找最佳余弦相似度在一组向量余弦、向量、相似

2023-09-11 02:13:53 作者:千囚栀愿

予有n个向量,每个m个元素(实数)。我想找到这对那里的余弦相似度最大所有对之间。

I have n vectors, each with m elements (real number). I want to find the pair where there cosine similarity is maximum among all pairs.

这种简单的解决方案需要为O(n 2 M)的时间。

The straightforward solution would require O(n2m) time.

有没有更好的解决办法?

Is there any better solution?

更新

Cosine相似度/距离和三角式激励着我,我可以用弦长代替余弦相似,这  失去precision,但速度提高了很多。 (有许多现有的解决方案解决近邻度量空间,如 ANN )

Cosine similarity / distance and triangle equation Inspires me that I could replace "cosine similarity" with "chord length" which loses precision but increases speed a lot. ( there are many existing solutions solving Nearest Neighbor in metric space, like ANN )

推荐答案

余弦相似 SIM卡(A,B)是有关欧氏距离 | A - B | 通过

|a - b|² = 2(1 - sim(a,b))

有关单位向量 A B

这意味着余弦相似度最大的时候的欧几里德距离最小的L2范数归一化后,与该问题降低到最近对点问题,它可以在O解决(N LG n)的时间。

That means cosine similarity is greatest when Euclidean distance is smallest after normalizing by the L2 norm, and the problem reduces to the closest pair of points problem, which can be solved in O(n lg n) time.