我有以下问题。
我有一组元素,我可以通过一定的算法排序的。该排序是好的,但价格昂贵。
I have a set of elements that I can sort by a certain algorithm A . The sorting is good, but very expensive.
还有一个算法B中可以近似A的结果是要快得多,但排序不会完全相同。
There is also an algorithm B that can approximate the result of A. It is much faster, but the ordering will not be exactly the same.
以A的输出作为金标准我需要导致对相同的数据采用B的错误的有意义的估计。
Taking the output of A as a 'golden standard' I need to get a meaningful estimate of the error resulting of the use of B on the same data.
任何人都可以请提出任何资源,我可以看看,以解决我的问题? 在此先感谢!
Could anyone please suggest any resource I could look at to solve my problem? Thanks in advance!
编辑:
根据要求:添加一个例子来说明情况: 如果数据的前10个英文字母,
As requested : adding an example to illustrate the case : if the data are the first 10 letters of the alphabet,
A输出:A,B,C,D,E,F,G,H,I,J
A outputs : a,b,c,d,e,f,g,h,i,j
B输出:A,B,D,C,E,G,H,F,J,I
B outputs : a,b,d,c,e,g,h,f,j,i
什么是导致错误的可能措施,这将让我调整算法B的内部参数,以获得结果更接近A的输出?
What are the possible measures of the resulting error, that would allow me to tune the internal parameters of algorithm B to get result closer to the output of A?
我想你想要的是 Spearman秩相关系数 。利用指数[排名]载体两个分类法(完美 A
和近似 B
),你计算的等级相关 RHO
范围从-1(完全不同)到1(完全一样):
I think what you want is Spearman's rank correlation coefficient. Using the index [rank] vectors for the two sortings (perfect A
and approximate B
), you calculate the rank correlation rho
ranging from -1 (completely different) to 1 (exactly the same):
其中d(i)是A和B之间的每个字符在行列的差
where d(i) are the difference in ranks for each character between A and B
您可以定义你的错误的措施,因为距离 D:=(1-RHO)/ 2
You can defined your measure of error as a distance D := (1-rho)/2
.