解析器解析搜索词并提取有价值的信息有价值、搜索词、信息

2023-09-11 23:02:00 作者:我这么帅当然是女孩子

我想了解用户的serarh期限。想到有人正在寻找钉在纽约 - 我想知道,它的一个位置搜索,其中关键字是主食和地点是纽约。同样,如果有人类型猫帽子,解析器不应该标志,也作为一个位置搜索,这里的整个关键字是猫帽子。 是否有任何算法或开源库可用来分析一个搜索词,并了解其比较(如A对B),或者它是一个基于位置的搜索(如在X)?

I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat". Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)?

推荐答案

您所描述的问题称为的信息提取。一系列的算法存在,最简单的幸福正则表达式匹配,最好的结构化的机器学习。第一次尝试正则表达式,并期待在像 NLTK 如果你知道了Python。

The problem you describe is called information extraction. A host of algorithms exist, the simplest being regexp matching, the best structured machine learning. Try regexps first and look at something like NLTK if you know Python.

这是猫帽子分裂钉在纽约是可能的,如果你的程序知道NY是一个位置。您可以通过首都还是因为纽约出现在列表中称为地名

Distinguishing "staples in NY" from "cat in hat" is possible if your program knows that "NY" is a location. You can tell either by the capitals or because "NY" occurs in a list called a gazetteer.

在一般的问题是 AI-完整的,所以我们期待投入大量的艰苦的工作,如果你想很好的效果。

The problem in general is AI-complete, so expect to put in lots of hard work if you want good results.