字的最大相交的组词算法算法、组词、最大

2023-09-11 23:16:04 作者:旧城旧伤忆旧人

背后的故事

我使用创建一个语音控制应用程序x-webkit的语音这是出奇的好(的功能,而不是我的应用程序),但有时用户(我)喃喃自语一点点。这将是很好接受命令,如果这个词的某些合理的部分匹配一些合理的命令的某些合理的部分。所以我寻找圣杯被称为最伟大的相交字的的算法组字的。难道一些新鲜的聪慧的头脑赶我出绝望的山洞?

示例

 中的旋转[显着,刺青,onclick事件,统计学]
 

应该匹配的纹身的,因为它有最长的有相交的旋转的( tat_o )。的统计学的是第二个最佳(的塔提相交),因为需要的字较长部分被忽略(但这是奖金条件,这将是可接受的,没有它)。

一年级小学生最头疼的生字组词,有了这些语文组词就全搞定了

备注

我用捷克语那里的发音很接近其书面形式 的javascript是preffered的语言,但是任何伪code是可以接受的 相交的最小长度应算法的一个参数

我有什么企图?

嗯,这是pretty的尴尬......

 的(VAR I = 10; I> = 4; --i)//合理子
对(文字VAR字)//为集合中的所有单词
对于(VAR J = 0; J< word.length-I; ++ j)条//搜索任何我子
// aaargh ......三个层次抽象的实在是太多了,我
 

解决方案

这是一种算法,似乎工作。我不知道它的性能有多好,相对于其他已经建立的算法(我怀疑它表现更差),但也许它给你一个想法如何,你可以这样做:

FIDDLE

  VAR MININT = 3;
VAR ARR = [显着,刺青,onclick事件,统计学];
VAR字=旋转;

VAR解析度= [];
如果(word.length> = MININT){
    对于(VAR I = 0; I< arr.length;我++){
        无功补偿=改编[I]
        变种米= 0;
        如果(comp.length> = MININT){
            对于(VAR L = 0; L< comp.length  -  MININT + word.length  -  MININT + 1,L ++){
                VAR subcomp = L> word.length  - 内政部? comp.substring(1  -  word.length + MININT):补偿;
                VAR子字= L< word.length  - 内政部? word.substring(word.length  -  MININT  -  1):字;
                VAR MINL = Math.min(subcomp.length,subword.length);
                变种匹配= 0;
                为(变种K = 0; K< MINL; k ++){
                    如果(subcomp [k]的===子字[K]){
                        比赛++;
                    }
                }
                如果(火柴>米){
                    M =比赛;
                }
            }
        }
        水库[I] = M> = MININT? M:空;
    }
}

执行console.log(RES);
 

会发生什么事是,它通过对其他移动比较两个字符串并计算每个位置匹配的字母。在这里,你看到的相比,子字旋转与著名的

 离子/引人注目 - >一个匹配索引1
化/显着的 - >没有比赛
通货膨胀/引人注目 - >没有比赛
塔季翁/引人注目 - >一个匹配索引2
浮选/引人注目 - >没有比赛
旋转/引人注目 - >在指数1,2,3三场比赛
旋转/ otable  - >没有比赛
旋转/表 - >没有比赛
旋转/能力 - >没有比赛
旋转/ BLE  - >没有比赛
 

正如你看到的,比赛的最大数量为3,这是它会返回。

The story behind

I am creating a voice controlled application using x-webkit-speech which is surprisingly good (the feature, not my app), but sometimes the user (me) mumbles a bit. It would be nice to accept the command if some reasonable part of the word matches some reasonable part of some reasonable command. So I search for the holy grail called Algorithm of the Greatest Intersect of Word in Set of Words. Could some fresh bright mind drive me out of the cave of despair?

Example

"rotation" in ["notable","tattoo","onclick","statistically"]

should match tattoo because it has the longest intersect with rotation (tat_o). statistically is the second best (tati intersect), because longer part of the word needs to be ignored (but this is bonus condition, it would be acceptable without it).

Notes

I use Czech language where the pronunciation is very close to its written form javascript is the preffered language, but any pseudocode is acceptable the minimal length of the intersect should be a parameter of the algorithm

What have I tried?

Well, it is pretty embarassing....

for(var i=10; i>=4; --i) // reasonable substring
for(var word in words) // for all words in the set
for(var j=0; j<word.length-i; ++j) // search for any i substring
// aaargh... three levels of abstraction is too much for me

解决方案

This is an algorithm that seems to work. I have no idea how good it performs compared to other already established algorithms (I suspect it perform worse) but maybe it gives you an idea how you could do it:

FIDDLE

var minInt = 3;
var arr = ["notable","tattoo","onclick","statistically"];
var word = "rotation";

var res = [];
if (word.length >= minInt) {
    for (var i = 0; i < arr.length; i++) {
        var comp = arr[i];
        var m = 0;
        if (comp.length >= minInt) {
            for (var l = 0; l < comp.length - minInt + word.length - minInt + 1; l++) {
                var subcomp = l > word.length - minInt ? comp.substring(l - word.length + minInt) : comp;
                var subword = l < word.length - minInt ? word.substring(word.length - minInt - l) : word;
                var minL = Math.min(subcomp.length, subword.length);
                var matches = 0;
                for (var k = 0; k < minL; k++) {
                    if (subcomp[k] === subword[k]) {
                        matches++;
                    }
                }
                if (matches > m) {
                    m = matches;
                }
            }
        }
        res[i] = m >= minInt ? m : null;
    }
}

console.log(res);

What happens is, that it compares the two strings by "moving" on against the other and calculates the matching letters in each position. Here you see the compared "sub"words for rotation vs. notable:

ion / notable --> one match on index 1
tion / notable --> no match
ation / notable --> no match
tation / notable --> one match on index 2
otation / notable --> no match
rotation / notable --> three matches on index 1,2,3
rotation / otable --> no match
rotation / table --> no match
rotation / able --> no match
rotation / ble  --> no match

As you see, the maximum number of matches is 3 and that is what it would return.