怎样算重复的单词?单词

2023-09-11 06:46:37 作者:用更多谎言丶圆一个谎。

给定一个1GB的(非常大),包含文件中的单词(部分重复),我们需要读多少次的每个词都重复文件和输出。请让我知道,如果我的解决方案是高高性能与否。

(为简单起见让我们假设我们已经抓获了字的的ArrayList<字符串>

我觉得大O(n)是N。我是正确??

 公共静态无效的主要(字串[] args){

            ArrayList的人=新的ArrayList();
            al.add(MATH1);
            al.add(raj1);
            al.add(raj2);
            al.add(数学);
            al.add(RJ2);

            al.add(数学);
            al.add(RJ3);
            al.add(MATH2);
            al.add(RJ1);
            al.add(是);
            地图<字符串,整数>图=新的HashMap<字符串,整数>();

            的for(int i = 0; I< al.size();我++)
            {
                字符串s =(串)al.get(ⅰ);

                    map.put(S,NULL);

            }
            的for(int i = 0; I< al.size();我++)
            {
                字符串s =(串)al.get(ⅰ);
                如果(map.get(S)== NULL)
                    map.put(S,1);
                其他
                {
                    诠释计数=(int)的map.get(S);
                        数=计数+ 1;
                        map.put(S,算);
                }


            }

            的System.out.println();
        }
 

解决方案

从理论上讲,因为HashMap的访问通常是O(1),我猜你的算法是O(n),但在现实中有一些效率低下。理想的情况是,当您阅读它们,你会遍历文件的内容只有一次,处理(即计数)的话,没有必要保存整个文件内容在内存中(你的ArrayList)。您遍历所有的内容,三次 - 一次读取它们,并在你的code以上的两个循环的第二次和第三次。特别是,在code中的第一个循环以上是完全不必要的。最后,您使用的HashMap会比需要的更慢,因为在施工的默认大小是非常小的,并且它必须在内部生长的次数,迫使哈希表的每次重建。最好先关闭它的尺寸适合你希望它来保存什么。你还必须考虑客座率成说。

Given a 1GB(very large) file containing words (some repeated), we need to read the file and output how many times each word is repeated. Please let me know if my solution is high performant or not.

请问在word里面怎么筛选出重复的词汇

(For simplicity lets assume we have already captured the words in an arraylist<string>)

I think the big O(n) is "n". Am I correct??

public static void main(String[] args) {

            ArrayList al = new ArrayList();
            al.add("math1");
            al.add("raj1");
            al.add("raj2");
            al.add("math");
            al.add("rj2");

            al.add("math");
            al.add("rj3");
            al.add("math2");
            al.add("rj1");
            al.add("is");
            Map<String,Integer> map= new HashMap<String,Integer>();

            for (int i=0;i<al.size();i++)
            {
                String s= (String)al.get(i);

                    map.put(s,null);

            }
            for (int i=0;i<al.size();i++)
            {
                String s= (String)al.get(i);
                if(map.get(s)==null)
                    map.put(s,1);
                else
                {
                    int count =(int)map.get(s);
                        count=count+1;
                        map.put(s,count);
                }


            }

            System.out.println("");
        }

解决方案

Theoretically , since HashMap access is generally O(1), I guess your algorithm is O(n), but in reality has several inefficiencies. Ideally you would iterate over the contents of the file just once, processing (i.e. counting) the words while you read them in. There's no need to store the entire file contents in memory (your ArrayList). You loop over the contents three times - once to read them, and the second and third times in the two loops in your code above. In particular, the first loop in your code above is completely unnecessary. Finally, your use of HashMap will be slower than needed because the default size at construction is very small, and it will have to grow internally a number of times, forcing a rebuilding of the hash table each time. Better to start it off a size appropriate for what you expect it to hold. You also have to consider the load factor into that.