还有一个Lucene.net问题由一个极端的新手吧。
Yet another Lucene.net question by an extreme newbie to it.
这时候,我发现了一个有趣的问题,使用一个包含范围的查询和使用突出。
This time, I have found an interesting issue with using a query that contains a range and using highlighting.
我是从内存中写入这一点,所以请原谅任何语法错误。
I am writing this from memory, so please forgive any syntax errors.
我有这样一个假设Lucene索引:
I have a hypothetical Lucene index of this:
---------------------------------------------------------
| date | text |
---------------------------------------------------------
| 1317809124 | a crazy block of text |
---------------------------------------------------------
| 1317809284 | programmers are crazy |
---------------------------------------------------------
** date is a unix timestamp
...和它们已被从这个添加到索引
... and they have been added to the index via this:
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
doc.Add(new Lucene.Net.Documents.Field("text", "some block of text", Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.Add(new Lucene.Net.Documents.Field("date", "some unix timestamp", Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED));
这是我如何查询Lucene的:
This is how I am querying Lucene:
Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(_headlinesDirectory), true);
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
Lucene.Net.Search.Query query = parser.Parse(queryPhrase);
Lucene.Net.Search.Hits hits = searcher.Search(query);
// code highlighting
Lucene.Net.Highlight.Formatter formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter("<span style=\"background:yellow;\">","</span>");
Lucene.Net.Highlight.SimpleFragmenter fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(50);
Lucene.Net.Highlight.QueryScorer scorer = new Lucene.Net.Highlight.QueryScorer(query);
Lucene.Net.Highlight.Highlighter highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer);
highlighter.SetTextFragmenter(fragmenter);
for (int i = 0; i < hits.Length(); i++)
{
Lucene.Net.Documents.Document doc = hits.Doc(i);
Lucene.Net.Analysis.TokenStream stream = analyzer.TokenStream("", new StringReader(doc.Get("text")));
string highlightedText = highlighter.GetBestFragments(stream, doc.Get("text"), 1, "...");
Console.WriteLine("--> " + highlightedText);
}
下面是我的查询的例子:
Here is an example of my query:
crazy AND date:[1286273266 TO 32503680000]
在此查询,发现所有的结果疯狂,但不输出任何突出显示文本。
When this is queried, it finds all the results for "crazy" but does not output any highlighted text.
在该日期范围被删除,您只需要查询的术语:
When the date range is removed and you simply query the term:
crazy
...这个时候凸显正常工作。
... this time highlighting works properly.
有什么我做错了在我的实现,我应该寻找一个新的实现,或者这是一个已知问题,可能周围的工作。
Is there something I am doing wrong in my implementation, should I be looking at a new implementation, or is this a known issue with potentially a work around.
感谢您提前stackeroverflow'ers:)
Thank you in advance stackeroverflow'ers :)
- 编辑 -
我已经实现从LB的建议(顺便说一句惊人!)。我仍然不知道为什么这个工程,我认为Lucene是完整的巫术或编程巫术,但它确实和我很高兴:)
I have implemented the suggestions from LB (amazing btw!). I still have no idea why this works as I think Lucene is complete voodoo or programming witchcraft, but it does and I am happy :).
有关完整,这里是修改code:
For completeness, here is the modified code:
Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(_headlinesDirectory), true);
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
// new line here
parser.SetMultiTermRewriteMethod(Lucene.Net.Search.MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Lucene.Net.Search.Query query = parser.Parse(queryPhrase);
// new line here
Lucene.Net.Search.Query query2 = query.Rewrite(searcher.GetIndexReader());
Lucene.Net.Search.Hits hits = searcher.Search(query);
// code highlighting
Lucene.Net.Highlight.Formatter formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter("<span style=\"background:yellow;\">","</span>");
Lucene.Net.Highlight.SimpleFragmenter fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(50);
// changed to use query2
Lucene.Net.Highlight.QueryScorer scorer = new Lucene.Net.Highlight.QueryScorer(query2);
Lucene.Net.Highlight.Highlighter highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer);
highlighter.SetTextFragmenter(fragmenter);
for (int i = 0; i < hits.Length(); i++)
{
Lucene.Net.Documents.Document doc = hits.Doc(i);
Lucene.Net.Analysis.TokenStream stream = analyzer.TokenStream("", new StringReader(doc.Get("text")));
string highlightedText = highlighter.GetBestFragments(stream, doc.Get("text"), 1, "...");
Console.WriteLine("--> " + highlightedText);
}
如果你能,让我知道,如果我已经准确地执行这些建议。
If you could, let me know if I have implemented the suggestions accurately.
首先调用的QueryParser的
First invoke QueryParser's
SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
方法,然后创建一个新的查询,
method, then create a new query as
Query newQuery = query.Rewrite(indexReader);
现在你可以使用newQuery进行搜索。
Now you can use "newQuery" to make your searches.