散列和加密技术包含电话号码,一个巨大的数据集电话号码、加密技术、巨大、数据

2023-09-11 23:03:48 作者:你的名字写到顺手念到顺口

问题描述: 我在与高度敏感的数据集,包含人的电话号码信息作为列的一个工作的流程。我需要申请(对他们的加密/散列函数),将其转换为一些连接codeD值,做我的分析。它可以是一个单向散列 - 也就是说,用我们不会被它们转换回到原来的电话号码加密的数据处理之后。从本质上讲,要寻找一个匿名者的需要的电话号码,并将其转换为一些随机值上,我可以做我的处理。建议做一下这个过程的最佳方式。在最好的算法推荐使用是欢迎的。

Description of problem: I'm in the process of working with a highly sensitive data-set that contains the people's phone number information as one of the columns. I need to apply (encryption/hash function on them) to convert them as some encoded values and do my analysis. It can be an one-way hash - i.e, after processing with the encrypted data we wont be converting them back to original phone numbers. Essentially, am looking for an anonymizer that takes phone numbers and converts them to some random value on which I can do my processing. Suggest the best way to do about this process. Recommendations on the best algorithms to use are welcome.

更新:数据集大小 我的数据集是数百GB的大小真的很巨大的。

Update: size of the dataset My dataset is really huge in the size of hundreds of GB.

更新:敏感 通过敏感的,我的意思是电话号码不应该是我们的analysis.So的一部分,基本上我需要一个单向散列函数,但没有冗余 - 每个电话号码应该映射到独特的价值--Two电话号码不应该映射到一个相同的值。

Update: Sensitive By sensitive, I meant that phone number should not be a part of our analysis.So, basically I would need a one-way hashing function but without redundancy - Each phone number should map to unique value --Two phones numbers should not map to a same value.

更新:实施

感谢您的answers.I正在寻找复杂的implementation.I正在经历Python的hashlib库哈希,是否就必然地做同样的一套步骤,你的建议? 这里是链接

Thanks for your answers.I am looking for elaborate implementation.I was going through python's hashlib library for hashing, Does it necessarily do the same set of steps that you suggested ? Here is the link

您能给我一些例子code到pferably实现的过程,$ P $在Python?

Can you give me some example code to achieve the process , preferably in Python ?

推荐答案

生成设置为你的数据的密钥(16或32个字节),并保守秘密。使用您的数据 HMAC-SHA1 用此键,和基地64 EN code这一点,你必须每一个联系号码随机唯一字符串不可逆(无钥匙)。

Generate a key for your data set (16 or 32 bytes) and keep it secret. Use Hmac-sha1 on your data with this key, and base 64 encode that and you have a random unique string per phonenumber that isn't reversable (without the key).

示例(HMAC-SHA1带256位密钥)使用 Keyczar :

Example (Hmac-Sha1 with 256bit key) using Keyczar:

创建随机密钥:

$> python keyczart.py create --location=path_to_key_set --purpose=sign
$> python keyczart.py addkey --location=path_to_key_set --status=primary

匿名化的电话号码:

Anonymize phone number:

from keyczar import keyczar

def anonymize(phone_num):
  signer = keyczar.Signer.Read("path_to_key_set");
  return signer.Sign(phone_num)