解析器解析搜索词并提取有价值的信息有价值、搜索词、信息

2023-09-11 23:02:00 作者：我这么帅当然是女孩子

我想了解用户的serarh期限。想到有人正在寻找钉在纽约 - 我想知道，它的一个位置搜索，其中关键字是主食和地点是纽约。同样，如果有人类型猫帽子，解析器不应该标志，也作为一个位置搜索，这里的整个关键字是猫帽子。是否有任何算法或开源库可用来分析一个搜索词，并了解其比较（如A对B），或者它是一个基于位置的搜索（如在X）？

I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat". Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)?

推荐答案

您所描述的问题称为的信息提取。一系列的算法存在，最简单的幸福正则表达式匹配，最好的结构化的机器学习。第一次尝试正则表达式，并期待在像 NLTK 如果你知道了Python。

The problem you describe is called information extraction. A host of algorithms exist, the simplest being regexp matching, the best structured machine learning. Try regexps first and look at something like NLTK if you know Python.

这是猫帽子分裂钉在纽约是可能的，如果你的程序知道NY是一个位置。您可以通过首都还是因为纽约出现在列表中称为地名

Distinguishing "staples in NY" from "cat in hat" is possible if your program knows that "NY" is a location. You can tell either by the capitals or because "NY" occurs in a list called a gazetteer.

在一般的问题是 AI-完整的，所以我们期待投入大量的艰苦的工作，如果你想很好的效果。

The problem in general is AI-complete, so expect to put in lots of hard work if you want good results.

上一篇：在一个六边形地图运动算法算法、地图、六边形

下一篇：子集和问题，其中每个数可以增加或减少子集、个数、问题

相关推荐