排序列表的区别区别、列表

2023-09-11 04:08:14 作者:忠于你

我有以下问题。

我有一组元素,我可以通过一定的算法排序的。该排序是好的,但价格昂贵。

I have a set of elements that I can sort by a certain algorithm A . The sorting is good, but very expensive.

还有一个算法B中可以近似A的结果是要快得多,但排序不会完全相同。

There is also an algorithm B that can approximate the result of A. It is much faster, but the ordering will not be exactly the same.

以A的输出作为金标准我需要导致对相同的数据采用B的错误的有意义的估计。

Taking the output of A as a 'golden standard' I need to get a meaningful estimate of the error resulting of the use of B on the same data.

任何人都可以请提出任何资源,我可以看看,以解决我的问题? 在此先感谢!

Could anyone please suggest any resource I could look at to solve my problem? Thanks in advance!

编辑:

根据要求:添加一个例子来说明情况: 如果数据的前10个英文字母,

As requested : adding an example to illustrate the case : if the data are the first 10 letters of the alphabet,

A输出:A,B,C,D,E,F,G,H,I,J

A outputs : a,b,c,d,e,f,g,h,i,j

B输出:A,B,D,C,E,G,H,F,J,I

B outputs : a,b,d,c,e,g,h,f,j,i

什么是导致错误的可能措施,这将让我调整算法B的内部参数,以获得结果更接近A的输出?

What are the possible measures of the resulting error, that would allow me to tune the internal parameters of algorithm B to get result closer to the output of A?

推荐答案

我想你想要的是 Spearman秩相关系数 。利用指数[排名]载体两个分类法(完美 A 和近似 B ),你计算的等级相关 RHO 范围从-1(完全不同)到1(完全一样):

Spearman's rho

I think what you want is Spearman's rank correlation coefficient. Using the index [rank] vectors for the two sortings (perfect A and approximate B), you calculate the rank correlation rho ranging from -1 (completely different) to 1 (exactly the same):

其中d(i)是A和B之间的每个字符在行列的差

where d(i) are the difference in ranks for each character between A and B

您可以定义你的错误的措施,因为距离 D:=(1-RHO)/ 2

You can defined your measure of error as a distance D := (1-rho)/2.