最小的文字片段,其中k关键词片段、最小、关键词、文字

2023-09-10 23:39:12 作者:划船不用桨全靠浪

我的朋友问的这个问题采访。 给定一个文本文档(比如一个新闻文章)和一组关键词(为前谷歌搜索词),找到最小的段,其中包含这些关键字(按任何顺序)的文本文档中

My friend was asked this question in a interview. Given a text document ( say an news article ) and a set of keywords (for ex google search terms), find the smallest segment in the text document which contains these keywords (in any order).

我能想到的方法是使用含有关键字和文本关键词的位置的地图。然后,我们用这个地图选择文本片段。不过,我一直在寻找更好的方法。

The method I could think of is to use a map containing the keyword and the position of the keyword in the text. We then use this map to select the text fragment. But I was looking for better methods.

推荐答案

我可能会提出这样的事情:

I would probably have proposed something like that:

1. create a new map of <String, Integer> of size k to associate each keyword 
     with its number of occurences in the text
2. go through the text and each time you encounter a keyword, increment its 
     number of occurences in the map
3. check that each keyword has been encountered at least once, otherwise 
     return a proper statement
4. start from the beginning of the text and each time you encounter a keyword, 
     decrement its number of occurences:
    4.1. if the number reaches 0, stop iterating and set 
           int startIndex = currentIndex
    4.2. otherwise keep iterating
5. start from the end of the text and each time you encounter a keyword, 
     decrement its number of occurences:
    5.1. if the number reaches 0, stop iterating and set 
           int endIndex = currentIndex
    5.2 otherwise keep reverse iterating
6. return [startIndex, endIndex] 

整体复杂度为O(n)。

The overall complexity is O(n).