如何生成一个唯一的哈希的网址是什么?网址

2023-09-11 03:23:08 作者:挖个坑埋了你

从Twitter鉴于这两个图像。

Given these two images from twitter.

https://m.xsw88.com/allimgs/daicuo/20230911/2604.png.jpg
https://m.xsw88.com/allimgs/daicuo/20230911/2605.png.jpg

我想将它们下载到本地文件系统和放大器;它们存储在一个单一的目录中。 我怎么克服的名称冲突?

I want to download them to local filesystem & store them in a single directory. How shall I overcome name conflicts ?

在上面的例子中,我无法将它们存储为* lowres_profilepic.jpg *。 我的设计理念是当作不透明的字符串的URL,除了最后一段。 什么算法(实现为 F )我可以使用散列prefixes成唯一的字符串。

In the example above, I cannot store them as *lowres_profilepic.jpg*. My design idea is treat the URLs as opaque strings except for the last segment. What algorithms (implemented as f) can I use to hash the prefixes into unique strings.

f( "http://a3.twimg.com/profile_images/130500759/" ) = 6tgjsdjfjdhgf
f( "http://a1.twimg.com/profile_images/58079916/" )  = iuhd87ysdfhdk

这样的话,我可以将文件保存为: -

That way, I can save the files as:-

6tgjsdjfjdhgf_lowres_profilepic.jpg
iuhd87ysdfhdk_lowres_profilepic.jpg

我不想加密算法,因为这需要一个高性能的操作。

I don't want a cryptographic algorithm as it this needs to be a performant operation.

推荐答案

而不管你怎么做(散列,编码,数据库查询)我建议你的不的尝试映射数量庞大的URL的文件在一个大平板目录。

Irrespective of the how you do it (hashing, encoding, database lookup) I recommend that you don't try to map a huge number of URLs to files in a big flat directory.

的原因是,文件查找对于大多数文件系统包括通过在一个目录中的文件名的线性扫描。所以,如果你的文件的所有n是在一个目录,查找将涉及1/2的氮对平均的比较;即 O(N)(注意,ReiserFS的组织的名称在目录作为B树。不过,ReiserFS的似乎是例外而非规则。)

The reason is that file lookup for most file systems involves a linear scan through the filenames in a directory. So if all N of your files are in one directory, a lookup will involve 1/2 N comparisons on average; i.e. O(N) (Note that ReiserFS organizes the names in a directory as a BTree. However, ReiserFS seems to be the exception rather than the rule.)

,这将是更好的URI的映射到目录树。根据树的形状,查找可能不如 O(logN)的。例如,如果你组织了树,以便它有3个级别的目录,在每个目录下最多100个条目,可以容纳1亿个URL。如果设计使用2个字符的文件名映射,每个目录应该轻松地安装到一个单独的磁盘块和路径查询(假设所需目录已缓存)应该采取几微秒。

Instead of one big flat directory, it would be better to map the URIs to a tree of directories. Depending on the shape of the tree, lookup can be as good as O(logN). For example, if you organized the tree so that it had 3 levels of directory with at most 100 entries in each directory, you could accommodate 1 million URLs. If you designed the mapping to use 2 character filenames, each directory should easily fit into a single disk block, and a pathname lookup (assuming that the required directories are already cached) should take a few microseconds.