方法地理标记或Geolabelling文本内容标记、文本、地理、方法

2023-09-12 21:17:06 作者:凉薄i

有什么好的算法,用于与该城市/地区或原产地标记自动文本?也就是说,如果一个博客是纽约,我怎么能告诉编程。是否有包/论文,声称有任何把握做到这一点?

What are some good algorithms for automatically labeling text with the city / region or origin? That is, if a blog is about New York, how can I tell programatically. Are there packages / papers that claim to do this with any degree of certainty?

我也看了一些基于TFIDF方法,专有名词的十字路口,但到目前为止,没有任何引人注目的成功,而且我AP preciate的想法!

I have looked at some tfidf based approaches, proper noun intersections, but so far, no spectacular successes, and I'd appreciate ideas!

在更普遍的问题是有关分配课文题目,题目给出了一些列表。

The more general question is about assigning texts to topics, given some list of topics.

pferred到全贝叶斯方法简单/幼稚的方法$ P $,但我开放。

Simple / naive approaches preferred to full on Bayesian approaches, but I'm open.

推荐答案

您正在寻找一个名为实体识别系统,或短的净入学率。有几个 好 工具包可以帮助你。 LingPipe特别是有一个非常体面的教程。 CAGEclass 似乎是面向各地NER地理地名,但我还没有使用它。

You're looking for a named entity recognition system, or short NER. There are several good toolkits available to help you out. LingPipe in particular has a very decent tutorial. CAGEclass seems to be oriented around NER on geographical place names, but I haven't used it yet.

这里的的困难一个不错的博客条目的净入学率与地理地名。

Here's a nice blog entry about the difficulties of NER with geographical places names.

如果你打算使用Java,我建议使用LingPipe NER类。 OpenNLP也有一些,但前者有一个更好的文档

If you're going with Java, I'd recommend using the LingPipe NER classes. OpenNLP also has some, but the former has a better documentation.

如果你正在寻找一些理论背景,查韦斯等人。 (2005年)构建了一个有趣的syntem并记录它。

If you're looking for some theoretical background, Chavez et al. (2005) have constructed an interesting syntem and documented it.

 
精彩推荐
图片推荐