如何在谷歌"您是不是要找&QUOT?;算法的工作?您是、要找、算法、不是

2023-09-10 22:23:32 作者:乖就亲你

我一直在开发一个内部网站,投资组合管理工具。有很多的文本数据的公司名称等。我一直pssed与一些搜索引擎很快与你的意思是:XXXX的查询作出反应的能力实在IM $ P $的。

I've been developing an internal website for a portfolio management tool. There is a lot of text data, company names etc. I've been really impressed with some search engines ability to very quickly respond to queries with "Did you mean: xxxx".

我需要能够智能地采取用户查询并不仅原始的搜索结果,但也有回应你是什么意思?反应时,有一个很有可能的替代答案等

I need to be able to intelligently take a user query and respond with not only raw search results but also with a "Did you mean?" response when there is a highly likely alternative answer etc

[我正在开发的 ASP.NET (VB - 不要嫌弃我!) ]

[I'm developing in ASP.NET (VB - don't hold it against me! )]

更新: 好了,我怎么能模仿这种无以百万计的无偿用户?

UPDATE: OK, how can I mimic this without the millions of 'unpaid users'?

生成错别字每一个已知或正确的期限和执行查找? 在其他一些更优雅的方式?

推荐答案

下面的说明直接从源(几乎)

Here's the explanation directly from the source ( almost )

在分22:03

值得看!

基本上根据道格拉斯美林前谷歌技术总监是这样的:

Basically and according to Douglas Merrill former CTO of Google it is like this:

1)你写在谷歌(拼写错误)字

1) You write a ( misspelled ) word in google

2)你没有找到你想要的(不要点击任何结果)

2) You don't find what you wanted ( don't click on any results )

3)你知道你拼错单词让你重写这个词在搜索框中。

3) You realize you misspelled the word so you rewrite the word in the search box.

4)你找到你想要的东西(你在第一个链接点击)

4) You find what you want ( you click in the first links )

此模式乘以数百万次,显示了最常见的拼错,什么是最普通的更正。

This pattern multiplied millions of times, shows what are the most common misspells and what are the most "common" corrections.

这样谷歌就可以几乎在瞬间,提供拼写校正的每一种语言。

This way Google can almost instantaneously, offer spell correction in every language.

另外这意味着如果一夜之间所有人都开始拼夜为nigth谷歌建议这个词来代替。

Also this means if overnight everyone start to spell night as "nigth" google would suggest that word instead.

修改

@ThomasRutter:道格拉斯形容为统计机器学习。

@ThomasRutter: Douglas describe it as "statistical machine learning".

他们知道谁是正确的查询,因为他们知道哪些查询来自哪个用户(使用cookie)

They know who correct the query, because they know which query comes from which user ( using cookies )

如果用户执行查询,并且只有10%的用户点击搜索结果和90%返回并键入另一个查询(具有正确的单词),这一次,90%点击的结果,那么他们知道他们已经找到一种修正。

If the users perform a query, and only 10% of the users click on a result and 90% goes back and type another query ( with the corrected word ) and this time that 90% clicks on a result, then they know they have found a correction.

他们也可以知道,如果那些关系的两个不同的查询,因为他们有他们所表现出的链接信息。

They can also know if those are "related" queries of two different, because they have information of all the links they show.

此外,它们现在包括上下文到拼写检查,因此它们甚至可以建议不同字取决于上下文。

Furthermore, they are now including the context into the spell check, so they can even suggest different word depending on the context.

请参阅谷歌Wave (@44米06S)本演示,说明如何在上下文考虑到自动更正拼写。

See this demo of google wave ( @ 44m 06s ) that shows how the context is taken into account to automatically correct the spelling.

这里的阐述了如何自然语言处理工作。

Here it is explained how that natural language processing works.

和终于在这里是可以做什么添加自动机器翻译一个真棒演示( @ 1H12米47步枪)的组合。

And finally here is an awesome demo of what can be done adding automatic machine translation ( @ 1h 12m 47s ) to the mix.

  我添加的,分和秒锚视频,直接跳到内容,如果他们不工作,尝试重新加载页面或用手滚动至刻度。

I've added anchors of minute and seconds to the videos to skip directly to the content, if they don't work, try reloading the page or scrolling by hand to the mark.