查找与评价值的用户最喜欢的项目最喜欢、评价、项目、用户

2023-09-11 07:13:24 作者:你给的伤数不清

让我们假设一个用户投票中的比例为1到5。这些电影的一些电影有风格的信息,以及一部电影可以有多个流派。像这样的:

Let's assume that a user votes for some movies in a scale of 1 to 5. These movies has genre info, and a movie can have more than one genre. Like this:

Movie A Rating 4
Action/Sci-Fi

Movie B Rating 5
Comedy/Action

Movie C Rating 4
Comedy/Drama

我们想了解哪些类型喜欢我们的用户。在这里,我们有我们的结果集:

We want to learn which genre likes our user. Here we have our result set:

Genre Movie_Count Average_Rating

----------
Action 2 5
Comedy 2 4.5
SciFi 1 4
Drama 1 4

显然,我们不能predict什么用这么小的结果集,但是让我们假设,我们已经一个更大的数据集。

Obviously, we cannot predict anything with such a small resultset, but let us assume that we've a larger dataset.

使用这些数据,我们怎么能排序最喜欢这个用户的类型?简单地计算加权平均或更复杂的东西?

Using this data, how can we sort most liked genres of this user? Simply calculating weighted average or something more complex?

推荐答案

我在这里看到的主要问题是:

The main problem I see here is:

用户速率1000喜剧电影的平均得分4

User rates 1000 comedy movies with average score of 4

用户率10动作电影,平均得分为4.1

User rates 10 action movies with average score of 4.1

你如何订购它们?

请参阅 http://www.evanmiller.org /how-not-to-sort-by-average-rating.html 讨论,一个可能的解决方案。

See http://www.evanmiller.org/how-not-to-sort-by-average-rating.html for discussion and one possible solution.

另一个问题是:

如果一部电影既是喜剧和动作,并给出了4.0,有多少是它,因为它是喜剧或动作评级?

If a movie is both comedy and action, and was given a rating of 4.0, how much was it because it is comedy or action ?

您可以解决这个使用期望最大化 http://en.wikipedia.org /维基/期望%E2%80%93maximization_algorithm 。

You can solve this using expectation maximization http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm .

 
精彩推荐