如何均匀Ş$ P $垫是前四个字节一个GUID在.NET中创建?字节、均匀、NET、垫是前

2023-09-03 04:02:58 作者:无奈我和你 写不出结局つ

这是一个很好的交易的信息在网上和计算器的GUID。事实上,有关的独特不已的问题。这是不是一个问题,约2 ^ 128的独特性

There is a good deal of info on GUIDs on the net and StackOverflow. Indeed endless questions about uniqueness. This is not a question about 2^128 uniqueness.

我的问题是,以确定究竟乱在第一节,secifically中的第一的四个字节的GUID的是.NET。根据研究,这是所谓的最低显著32位时间戳。但是,如何时间戳转换?究竟有多乱是什么?

My question is to determine just how random the first section, secifically the first four bytes of the GUID is in .NET . Based on research, it is supposedly the least significant 32 bits of timestamp. But how is timestamp converted? Just how random is this?

是否有人知道如何在第一部分由.NET构建的,如果是真正的均匀s ^ $ P $垫4个字节?

Does anybody know how the first section is constructed by .NET and if is truly evenly spread in 4 bytes ?

是如何在时间戳用于构建前32位

如何时钟precision有影响吗?

How does clock precision affect it?

是由微软提出的任何企图,以确保前4个字节趋向于随机或不?

Was any attempt made by Microsoft to make sure the the first 4 bytes tends to random or not?

原因:高卷GUID的使用有两个主要的商业案例为好随机的GUID的前4个字节。如果你有一个更小号$ P $垫,每一个新的GUID,那么你可以使用基于第1,2,3或4个字节的表分区的基础上,你有多少分区需要。我每天都看到一个2十亿行的表,用10万美元的插入,与用第一个2字节作为分区键128个分区。下DB2中的密钥的第一部分注不得不使用。报价DB2数据库管理员。这大大提高了吞吐量的数据库。第二个用途是批处理作业平键分配。如果你知道你大概有N行作为一个批处理任务,您可以分配键范围为并行作业。如果没有均匀分割,调度员必须首先从和为每个作业键计算。如果这意味着读100百万和管理在内存中只是派遣工作时,首先x分钟是失去了工作调度。在这个例子中我已经看到它是大约15分钟。因此,有2个绝佳的理由来使用,想均匀s ^ $ P $垫的GUID。

WHY: High volume Guid use has 2 main business cases for Good random guids in the first 4 bytes. If you have an even spread for each new GUID, then you can use table partitioning based on the first 1,2,3 or 4 bytes based on how many partitions you need. I have seen a 2 billion row table with 10 million inserts a day, with 128 partitions using the first 2 bytes as partition key. NOTE under DB2 the the first part of the key had to be used. Quote DB2 DBA. This greatly improved throughput on the DB. The second use is batch job parallel key allocation. If you know you have approximately N rows as a batch task, you can allocate key ranges to parallel jobs. Without a homogenous split, the dispatcher must first calculate the from and to keys for each job. If that means reading 100 millions and managing them in memory just to dispatch work, the first x minutes is lost to job dispatch. In the example I have seen it was around 15 mins. So there a 2 excellent reasons to use and want Evenly spread GUIds.

在SAP银行系统实际上引入了一个自定义GUID程序来解决缺乏随机性的GUID的第一部分。对于那些进入到SAP的银行系​​统,功能BANK_DISTRIBUTED_ID_CREATE。在code中的注释解释为什么他们做到了。那些拥有SAP支持有一张纸条496904解释了为什么他们认为有必要修复的GUID。

The SAP Banking system actually introduced a custom GUID routine to resolve the lack randomness in the first Section of the GUID. For those with access to an SAP banking system, the Function is BANK_DISTRIBUTED_ID_CREATE. the comments in the code explain why they did it. Those with access to SAP support there is a note 496904 explains why they see it necessary to fix guids.

在此之前的自定义程序有明显的歪斜在AIX下的GUID。 C ++内核。 唯一肯定的,但随机的,特别是第一部分,显然不是。

Prior to the custom routine there were clear skews in the GUIDs under AIX. C++ kernel. Unique yes, but random , especially the first section, clearly not.

更新:我决定写一个程序来调查: .NET 4在Windows XP中,戴尔的英特尔Core 2 Duo处理器。

Update: As I decided to write a program to investigate: .net 4 on Windows XP, Dell Intel Core 2 Duo.

我已经包括测试程序结果柜面如果利率。 使用GUID生成

I have included the TEST PROGRAM RESULTS incase if interest. Guid generated using

var G = Guid.NewGuid();

结果看样品亿的GUID确定。(大组仍在运行) 对于我而言,这看起来均匀s ^ $ P $垫,足以承担起确定。

The results look OK on SAMPLE 100,000,000 guids.(larger set still running) For my purposes, that looks evenly spread enough to assume OK.

Byte 0: with Value 6A was least frequent : 389140 times
Byte 0: with Value 58 was most  frequent : 392241 times
Byte 1: with Value 25 was least frequent : 388905 times
Byte 1: with Value B3 was most  frequent : 392552 times
Byte 2: with Value D2 was least frequent : 389114 times
Byte 2: with Value CC was most  frequent : 391984 times
Byte 3: with Value 66 was least frequent : 388744 times
Byte 3: with Value 16 was most  frequent : 392838 times

编辑:背景研究的基础上增加评论

我看到GUID的样本上的AIX系统。我们有超过20十亿了。他们是不是均匀US $ p $垫。还有,在2字节明显歪斜。其结果是一种特殊的程序被引入到产生均匀的GUID。我在想,如果.NET也有类似的偏斜

I have seen samples of GUIDs on a AIX system. We have over 2 billion already. They are NOT evenly spread. There a noticeable skews in the 2 bytes. As a result a special routine was introduced to generate homogenous guids. I was wondering if .net had a similar skew

推荐答案

的GUID的似乎是均匀š$ P $垫。 1十亿的GUID测试好看。如果考虑前4个字节。这意味着它们的分区和范围大致可以推断,而不是从数据库中读取有用的。

The Guids appear to be evenly spread. Tests on 1 billion Guids look good. If considering the first 4 bytes. Which mean they are useful for partitions and ranges can be roughly deduced rather than read from Db.