计算点积计算接近

2023-09-11 07:04:01 作者:不期待就不会失望

我已经要求在Calculating字接近在倒排索引。 但是我觉得,这个问题过于笼统,没有细化不够。所以这里去。

I have already asked a similar question at Calculating Word Proximity in an inverted Index. However i felt that the question was too general and not refined enough. So here goes.

我有一个列表,其中包含标记的文档中的位置。对于每一个片段它会为

I have a List which contains the location of tokens in a document. for each token it goes as

public List<int> hitLocation;

让我们说,该文件是

Lets say the the document is

Java programming language has a name similar to java island in Indonesia however
local language in java bears no resemblance to the programming language called java.

和查询是

java island language

所以说,我锁定到Java结果列表,并试图直接计算的Java HisList,岛结果列表和语言结果列表之间的距离。

So Say i lock on to the Java HitList and attempt to directly calculate the distance between the Java HisList, Island HitList and Language Hitlist.

现在的第一个问题是,有4 Java的标记出现在句子中。哪一个做我选择。假设我选择的第一个。

Now the first problem is that there are 4 java tokens occurrences in the sentence. Which one do i select. Assuming i select the first one.

我去上岛标记列表和比较后发现,它毗邻的java的第二次出现。所以,我改变我的选择和锁定的java的第二次出现。

I go onto the island token list and after comparing find it that it adjacent to the second occurrence of java. So i change my selection and lock onto the second occurrence of java.

进行到第三标记语言,我发现它位于距离我们的选择还有一定的距离,但是我觉得这是非常接近第一个Java发生。

Proceeding to the third token language i find that it situated at quite a distance from our selection however i find it that it is quite near the first java occurrence.

所以你看到的困境这里如果现在又恢复到原来的选区,即Java的第一次出现的第二代币孤岛的距离增加,如果我留在我的当前选择的第二次出现的纯粹的距离令牌语言将使相关捣毁。

So you see the dilemma here if now again revert back to the original selection i.e the first occurrence of java the distance to second token "island" increases and if i stay with my current selection the sheer distance of the second occurrence of the token "language" will make relevance busted.

previously有积的建议,但是我在就如何推进进行该选项的损失。

Previously there was the suggestion of dot product however i am at loss on how to proceed forward with that option.

任何其他解决方案也将受到欢迎。

Any other solution would also be welcomed.

据我所知,这个问题很细致。但是我已经搜索漫长而艰难的,并没有发现任何这样的问题,关于这个主题。

I Understand that this question is quite detailed. However i have searched long and hard and haven't found any question like this on this topic.

我觉得如果这个问题的回答将是一个很好的补充,以社区,将让任何人谁正在设计相关的任何关联很高兴。

I feel if this question is answered it will be a great addition to the community and will make anybody who is designing anything related to relevancy quite happy.

感谢你。

推荐答案

确定这样的家伙,我意识到,我回答我的问题,有点晚了。

Ok so guys i realize that i am answering my own question and a bit late.

所以,所有这些人试图计算字接近从倒排索引开始应该看看这个链接的http://www.ardendertat.com/2011/05/31/how-to-implement-a-search-engine-part-2-query-index/

So to all those people trying to calculate word proximity starting from inverted index should take a look at this link http://www.ardendertat.com/2011/05/31/how-to-implement-a-search-engine-part-2-query-index/

相关推荐
 
精彩推荐
图片推荐