匹配两段社会媒体简介两段、简介、媒体、社会

2023-09-11 05:48:07 作者:可不可以,再爱我一次

您如何检查是否从两个不同的社交媒体网站的两个配置文件是一样的吗? 要完成什么算法存在这一点,从而赋予权重衡量的比赛?

How can you check if two profiles from two different Social Media sites are the same? What algorithms exist to accomplish this and thereby assigning a weight measure for the match?

让我们说,我从LinkedIn个人资料,并从Facebook的另一个配置文件。我知道这两个配置文件的属性。我可以实现什么算法来查找这两个配置文件之间的匹配距离。

Let's say that I have a profile from LinkedIn and another profile from Facebook. I know the properties of these two profiles. What algorithm can I implement to find the matching distance between these two profile.

谢谢 阿布舍克小号

推荐答案

您可以尝试 机器学习 算法,具体分类

You can try machine learning algorithms, specifically classification

为了简单起见,我们假设你想有一个二进制的答案:是或否(这个以后可以改进的)

For simplicity, let's assume you want a binary answer: yes or not (this can be later improved).

您需要做什么:

提取你从两个配置文件的功能,并创建一个 两个联合剖面单个实例。这将是一个实例 需要进行分类 创建一个训练集。训练集是一组实例你知道的分类(从手动通常被贴上)的。 运行分类算法,考虑到训练集 - 将猜的分类为未分类的情况下,您将在以后得到 Extract the features you have from the two profile and create a single instance for two combined profiles. This will be an instance needed to be classified Create a training set. A training set is a set of "instances" which you know the classification for (from manually labeling them usually). Run a classification algorithm, given the training set - that will "guess" the classification for the unclassified instances you will later get.

您可能需要使用一些算法是:

SVM - 这被认为是当今很多优秀的分类算法存在 决策树 - 特别的 C4.5 - (!人类可读)非常直观的分类,使用简单,也 - 很短的时间分类

K近邻 - 直观和简单的使用,但表现的特征严重时数就大了。 SVM - which is considered by many the best classification algorithm exists today. Decision Trees - especially C4.5 - Very intuitive classifier (human readable!) and simple to use, also - very short classification time.

K Nearest Neighbor - intuitive and simple to use, but behaves badly when the number of features is big. 您也可以使用交叉验证来评估你的成绩有多好。 对于Java - 有一个叫做开源项目 Weka的实现这些分类算法,更 You can also use cross validation to evaluate how good your results are. For java - there is an open source project called Weka that implement these classification algorithms and more.