Java字符串的排列组合查询字符串、排列组合、Java

2023-09-11 01:52:23 作者:爱在心口难开ゞ

我正在写一个的的Andr​​oid 的单词应用。我的code包括会发现该字符串的所有组合和7字母串的子带的最小长度3.然后所有可用的组合比较每一个字在字典中查找所有有效字的方法。我使用的是递归方法。这里的code。

  //获取一个字符串的所有排列。
无效permuteString(字符串beginningString,字符串endingString){
    如果(endingString.length()&其中; = 1){
        如果((Arrays.binarySearch(mDictionary,beginningString.toLowerCase()+ endingString.toLowerCase()))> = 0){
            mWordSet.add(beginningString + endingString);
        }
    }
    其他
        的for(int i = 0; I< endingString.length();我++){
            串newString = endingString.substring(0,I)+ endingString.substring第(i + 1);
            permuteString(beginningString + endingString.charAt(i)中,newString);
      }
}
//获取子串的组合。最少3个字母的组合
空字符串(String s)将{
    字符串newString =;
    如果(s.length()→3){
        为(中间体X = 0 X  - 其中; s.length(); X ++){
            newString = removeCharAt(X,S);
            permuteString(,newString);
            子(newString);
        }
    }
}
 

以上code运行良好,但是当我在我的Nexus S的安装了它,我意识到,它运行慢了一点。这需要几秒钟即可完成。约3或4秒,这是不可接受的。 现在,我已经打了我的手机上的一些文字游戏,他们计算出一个字符串瞬间这让我相信,我的算法不是很有效,它可以改善的所有组合。任何人都可以帮忙吗?

 公共类TrieNode {
TrieNode A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y ,Z;
TrieNode []孩子= {A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V, W,X,Y,Z};
私人的ArrayList<字符串>字=新的ArrayList<字符串>();

公共无效addWord(串词){
    words.add(字);
}
公众的ArrayList<字符串> getWords(){
    返回的话;
}
}
 

 公共类特里{

静态字符串myWord;
静态字符串myLetters =afinnrty;
静态的char [] myChars;
静态排序排序;
静态TrieNode MYNODE =新TrieNode();
静态TrieNode currentNode;
静态INT Y = 0;
静态的ArrayList<字符串> availableWords =新的ArrayList<字符串>();

公共静态无效的主要(字串[] args){

    readWords();
    getPermutations();
}
公共静态无效getPermutations(){
    currentNode = MYNODE;
    为(中间体X = 0 X  - 其中; myLetters.length(); X ++){
        如果(currentNode.children [myLetters.charAt(X) - '一']!= NULL){
            //availableWords.addAll(currentNode.getWords());
            currentNode = currentNode.children [myLetters.charAt(X) - '一'];
            的System.out.println(currentNode.getWords()++ myLetters.charAt(X));
        }
    }
    //System.out.println(availableWords);
}
公共静态无效readWords(){
    尝试 {
        的BufferedReader在=新的BufferedReader(新的FileReader(C://scrabbledictionary.txt));
        字符串str;
        而((海峡= in.readLine())!= NULL){
            myWord =海峡;
            myChars = str.toCharArray();
            排序=新的排序(myChars);
            插入(MYNODE,myChars,0);
        }
        附寄();
    }赶上(IOException异常E){
    }
}
公共静态无效的插入(TrieNode节点的char [] myChars,INT X){
    如果(X GT; = myChars.length){
        node.addWord(myWord);
        //System.out.println(node​​.getWords()+""+y);
        ÿ++;
        返回;
    }
    如果(node.children [myChars [X]  - '一'] == NULL){
        插入(node.children [myChars [X]  - '一'] =新TrieNode(),myChars其中x = x + 1);
    }其他{
        插入(node.children [myChars [X]  - '一'],myChars其中x = x + 1);
    }
}
}
 
字符数组应用 对3个字符串求最大字符串和排序

解决方案

在你目前的做法,你正在寻找了每一子的每一个排列。因此,对于ABC,你需要查找ABCACB BACBCA驾驶室CBA。如果你想找到置换的所有排列,您查找的数量几乎是 5亿,而这之前,你甚至看其子。但是我们可以减少这种为的一个的查找,长度不管,由preprocessing字典。

的想法是把每个字在字典中成一些数据结构,其中每个元素包含一个字符集,和含有(只)的那些字符的所有字的列表。因此,例如,你可以建立一个二叉树,这将对包含(排序)字符集ABD和单词列表中的节点 [ 坏,民建联] 。现在,如果我们想找到的所有排列DBA,我们排序给ABD,并期待它在树中检索列表。

正如鲍曼指出,试图非常适合于存储这种数据。该线索的美妙之处在于查找时间只取决于搜索字符串的长度 - 这是独立于字典的大小。既然你要存储相当多的话,和大部分的搜索字符串将是微小的(大部分将是从你的递归的最低水平3个字符的字符串),这种结构是理想的。

在这种情况下,降低您的线索路径将反映的字符集,而不是文字本身。所以,如果你的整个词典是 [坏,轻拍,出租车,电缆] ,您查找的结构将最终看起来是这样的:

还有一点时间/空间权衡实施该方式。在最简单的(和最快)的方法,每个节点包含单词只是列表中,数组节点[26] 。这可以让你找到你后在固定时间内的孩子,只要看一眼儿童[s.charAt(I) - '一'] (其中取值是搜索字符串和是当前深度的线索)。

缺点是,大部分的孩子阵列将大部分是空的。如果空间是一个问题时,可以使用一个更紧凑的重新presentation像一个链表,动态数组,哈希表等,但是,这些都在潜在需要若干存储器的成本和存取在每个节点处的比较,而不是以上简单的数组访问。不过,我会感到惊讶,如果浪费空间超过几兆在你的整个的字典,所以基于阵列的方法可能是你最好的选择。

有了线索,你的整个置换函数替换为一个查找,从使复杂性降低的 O(N!登录D)(其中ð是你的字典的大小, N 您的字符串的大小)为 O(N日志N)(因为你需要的字符排序,查找自身的 0 (N))。

编辑:我一起抛出的(未经测试)实现这种结构: HTTP:/ /pastebin.com/Qfu93E80

I'm writing an Android word app. My code includes a method that would find all combinations of the string and the substrings of a 7 letter string with a minimum of length 3. Then compare all available combination to every word in the dictionary to find all the valid words. I'm using a recursive method. Here's the code.

// Gets all the permutations of a string.
void permuteString(String beginningString, String endingString) {
    if (endingString.length() <= 1){
        if((Arrays.binarySearch(mDictionary, beginningString.toLowerCase() +   endingString.toLowerCase())) >= 0){
            mWordSet.add(beginningString + endingString);
        }
    }
    else
        for (int i = 0; i < endingString.length(); i++) {
            String newString = endingString.substring(0, i) + endingString.substring(i + 1);
            permuteString(beginningString + endingString.charAt(i), newString);
      }
}
// Get the combinations of the sub-strings. Minimum 3 letter combinations
void subStrings(String s){
    String newString = "";
    if(s.length() > 3){
        for(int x = 0; x < s.length(); x++){
            newString = removeCharAt(x, s);
            permuteString("", newString);
            subStrings(newString);
        }
    }
}

The above code runs fine but when I installed it on my Nexus s I realized that it runs a bit too slow. It takes a few seconds to complete. About 3 or 4 seconds which is unacceptable. Now I've played some word games on my phone and they compute all the combinations of a string instantly which makes me believe that my algorithm is not very efficient and it can be improved. Can anyone help?

public class TrieNode {
TrieNode a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z;
TrieNode[] children = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z};
private ArrayList<String> words = new ArrayList<String>();

public void addWord(String word){
    words.add(word);
}
public ArrayList<String> getWords(){
    return words;
}
}

public class Trie {

static String myWord;
static String myLetters = "afinnrty";
static char[] myChars;
static Sort sort;
static TrieNode myNode = new TrieNode();
static TrieNode currentNode;
static int y = 0;
static ArrayList<String> availableWords = new ArrayList<String>();

public static void main(String[] args) {

    readWords();
    getPermutations();
}
public static void getPermutations(){
    currentNode = myNode;
    for(int x = 0; x < myLetters.length(); x++){
        if(currentNode.children[myLetters.charAt(x) - 'a'] != null){
            //availableWords.addAll(currentNode.getWords());
            currentNode = currentNode.children[myLetters.charAt(x) - 'a'];
            System.out.println(currentNode.getWords() + "" + myLetters.charAt(x));
        }
    }
    //System.out.println(availableWords);
}
public static void readWords(){
    try {
        BufferedReader in = new BufferedReader(new FileReader("c://scrabbledictionary.txt"));
        String str;
        while ((str = in.readLine()) != null) {
            myWord = str;
            myChars = str.toCharArray();
            sort = new Sort(myChars);
            insert(myNode, myChars, 0);
        }
        in.close();
    } catch (IOException e) {
    }
}
public static void insert(TrieNode node, char[] myChars, int x){    
    if(x >= myChars.length){
        node.addWord(myWord);
        //System.out.println(node.getWords()+""+y);
        y++;
        return;
    }
    if(node.children[myChars[x]-'a'] == null){
        insert(node.children[myChars[x]-'a'] = new TrieNode(), myChars, x=x+1);
    }else{
        insert(node.children[myChars[x]-'a'], myChars, x=x+1);
    }
}
}

解决方案

In your current approach, you're looking up every permutation of each substring. So for "abc", you need to look up "abc", "acb", "bac", "bca", "cab" and "cba". If you wanted to find all permutations of "permutations", your number of lookups is nearly 500,000,000, and that's before you've even looked at its substrings. But we can reduce this to one lookup, regardless of length, by preprocessing the dictionary.

The idea is to put each word in the dictionary into some data structure where each element contains a set of characters, and a list of all words containing (only) those characters. So for example, you could build a binary tree, which would have a node containing the (sorted) character set "abd" and the word list ["bad", "dab"]. Now, if we want to find all permutations of "dba", we sort it to give "abd" and look it up in the tree to retrieve the list.

As Baumann pointed out, tries are well suited to storing this kind of data. The beauty of the trie is that the lookup time depends only on the length of your search string - it is independent of the size of your dictionary. Since you'll be storing quite a lot of words, and most of your search strings will be tiny (the majority will be the 3-character substrings from the lowest level of your recursion), this structure is ideal.

In this case, the paths down your trie would reflect the character sets rather than the words themselves. So if your entire dictionary was ["bad", "dab", "cab", "cable"], your lookup structure would end up looking like this:

There's a bit of a time/space trade-off in the way you implement this. In the simplest (and fastest) approach, each Node contains just the list of words, and an array Node[26] of children. This allows you to locate the child you're after in constant time, just by looking at children[s.charAt(i)-'a'] (where s is your search string and i is your current depth in the trie).

The downside is that most of your children arrays will be mostly empty. If space is an issue, you can use a more compact representation like a linked list, dynamic array, hash table, etc. However, these come at the cost of potentially requiring several memory accesses and comparisons at each node, instead of the simple array access above. But I'd be surprised if the wasted space was more than a few megabytes over your whole dictionary, so the array-based approach is likely your best bet.

With the trie in place, your whole permutation function is replaced with one lookup, bringing the complexity down from O(N! log D) (where D is the size of your dictionary, N the size of your string) to O(N log N) (since you need to sort the characters; the lookup itself is O(N)).

EDIT: I've thrown together an (untested) implementation of this structure: http://pastebin.com/Qfu93E80