节省空间的方式来连接code数字作为排序字符串字符串、节省、方式、数字

2023-09-11 06:27:30 作者:我只想要有你在的未来

与整数列表启动任务是每一个整数转换为字符串,从而生成的字符串列表会以数字顺序排序的字典序时。

Starting with a list of integers the task is to convert each integer into a string such that the resulting list of strings will be in numeric order when sorted lexicographically.

这是必要的,这样一个特定的系统,该系统仅能够对字符串进行排序将产生一个输出,按数字顺序的。

This is needed so that a particular system that is only capable of sorting strings will produce an output that is in numeric order.

例如:

由于整数

1, 23, 3

我们可以转换为这样的字符串:

we could convert the to strings like this:

"01", "23", "03"

这样排序,当他们成为:

so that when sorted they become:

"01", "03", "23"

这是正确的。一个错误的结果将是:

which is correct. A wrong result would be:

"1", "23", "3"

由于该列表是按字符串命令,不按数字顺序。

because that list is sorted in "string order", not in numeric order.

我在找东西的更有效不是简单的零填充方案。为了覆盖所有可能的32位整数,我们就需要垫至10位这是低效的。

I'm looking for something more efficient than the simple zero-padding scheme. In order to cover all possible 32 bit integers we'd need to pad to 10 digits which is inefficient.

推荐答案

恩$ C根据大小(OM),它们的顺序和其他字符,使他们的排序根据需要,相对于数字$ C位codeD zjzjz-zaC1B2A3

TL;DR

Encode digits according to their order of magnitude (OM) and other characters so they sort as desired, relative to numbers: jj-a123 would be encoded zjzjz-zaC1B2A3

这会在排序算法,将最终被用于排序,以及如何人想任何标点字符相对于字母和数字排序有所依赖,但如果是ASCII-betical或类似的,你可以EN code的数量每个数字重新present的幅度(OM),其订单的数量,而其它的编码字符,他们会根据您所需的排序顺序进行排序。

This would depend somewhat upon the sorting algorithm that will finally be used to sort and how one would want any given punctuation characters to be sorted in relation to letters and numbers, but if it's "ascii-betical" or similar, you could encode each digit of a number to represent its order of magnitude (OM) in the number, while encoding other characters such that they would sort according to your desired sort order.

为了简单起见,我建议与高值编码每一个非数字字符(如小写以Z 甚至开始〜 如果最终值是ASCII),这样它以后排序EN codeD数字。然后缓存,直到遇到一个非数字时遇到的每一个数字,那么连接code分别缓存数字与价值重估presenting其OM。如果该号码12945中遇到非Numerics的之间,你会输出一个电子来恩code的5,OM则数字是该量级, 1 ,其次是4下一OM( D )及其相关的数字, 2 。继续,直到所有的数字数字已经刷新,然后继续与非Numerics的。

For simplicity, I would suggest beginning with encoding every non-numeric character with a "high" value (e.g. lower case z or even ~ if final value is ASCII), so that it sorts after encoded digits. Then cache each digit encountered until another non-numeric is encountered, then encode each cached digit with a value representing its OM. If the number 12945 was encountered in between non-numerics, you would output an E to encode an OM of 5, then the digit that is that order of magnitude, 1, followed by the next OM of 4 (D) and its associated digit, 2. Continue until all numeric digits have been flushed, then continue with non-numerics.

非Numerics的将被单独处理,并名列相对于数字的OM。如果希望为他们上面的数字(也许是空格字符或某些人视为特殊)进行排序,他们将连接codeD由prepending一个低价值的字符(如空格字符,如果终值将被处理的和排序为ASCII)。当/如果另一个数字是遇到,根据OM一旦所有连续Numerics的被缓存开始高速缓存和连接code。

Non-numerics would be treated individually and ranked relative to the OM of digits. If it is desired for them to sort "above" numbers (perhaps the space character or certain others deemed special) they would be encoded by prepending a low-value character (like the space character, if final value will be treated and sorted as ASCII). When/if another numeric is encountered, begin caching and encode according to OM once all consecutive numerics are cached.

进一步进行过滤某些水平 - 甚至是翻译 - 可以应用。如果一个人希望允许根据罗马数字准确分类,我们可以连接code它们为十进制(或者甚至是十六进制)数字与适当的OM。

Certain levels of additional filtering - or even translation - could be applied. If one wanted to allow accurate sorting based upon Roman numerals, one could encode them as decimal (or even hexadecimal) numbers with an appropriate OM.

治疗小数点(无​​论是时间还是逗号,视)为实际的小数点分隔符,以及其他标点符号截然不同很可能会超出这个编码方案的实际效用,因为字母数字字段很少使用周期或逗号作为小数点分隔符。如果希望用他们的方式,该算法只会检测到小数点分隔符(无论是周期或逗号作为酌情数字之间),而不是连接$ C C的分离器,绝不是普通的文本后的数字部分$。小数部分实际上是在一个正常的基于ASCII码排序正确地排序,因为越来越多的数字重新presents更大precision - 而不是更大的幅度

Treating decimal points (either periods or commas, depending) as actual decimal separators, and distinct from other punctuation would probably be beyond the true utility of this encoding scheme, as alphanumeric fields seldom use a period or comma as a decimal separator. If it is desired to use them that way, the algorithm would simply detect a decimal separator (either period or comma as appropriate, in between digits) and not encode the numeric portion after that separator as anything but normal text. Fractional portions are actually sorted correctly during a normal ASCII based sort, because more digits represents greater precision - not greater magnitude.

non-encoded                 encoded
-----------                 -------
12345                       E1D2C3B4A5
a100                        zaC1B0A0
a20                         zaB2A0
a2000                       zaD2C0B0A0
x100.5                      zxC1B0A0z.A5
x100.23                     zxC1B0A0z.B2A3
1, 23, 3                    A1z,z B2A1z,z A3
1, 2, 3                     A1z,z A2z,z A3
1,2,3                       A1z,A2z,A3

潜在优势

去有点超出了简单的数字排序,一些优势,这种编码方法将与最终有​​效的排序顺序灵活性几个方面 - 你基本上是编码类别为每个字符 - 数字得到根据他们的位置的更大串称作一个数字,而其它字符被简单地说,在其正常的方式(例如ASCII码)来排序位,但后的数字范围内。该号码之前或在其他命令应该排序的任何异常将是在一个或多个附加的类别。的ASCII可以有效地重新连接$ C $光盘在一个非ASCII的方式来排序:

Potential advantages

Going somewhat beyond simple numeric sorting, some advantages to this encoding method would be several aspects of flexibility with final effective sort order - you are essentially encoding a category for each character - digits get a category based upon their position within the greater string of digits known as a number, while other characters are simply told to sort in their normal way (e.g. ASCII), but after numbers. Any exceptions that should sort before numbers or in other orders would be in one or more additional categories. ASCII can effectively be re-encoded to sort in a non-ASCII way:

您可以连接code小写字母之前或与大写字母排序。要使用和大写字母与EN $ C $切换的下限和上限的情况下,你ç小写字母以Z 。对于伪不区分大小写排序,分类既 A A 具有相同编码的字符会排序他们两人 B之前 B ,虽然 A 总归总是排序前 A 如果你想扩展ASCII字符(如使用变音符号)进行排序以及它们的ASCII表兄妹,你恩code A A A A A A Æ以及 A 通过使用 A 作为OM角色,EN code B C C B 电子电子电子电子,和电子 C 等同样的内部类的排序顺序警告仍然适用,有的决策需作出类似资本的Eth字符,并在一定程度上其他类似的刺,和夏普S( D ,和 SS 分别),至于他们是否会根据在外观上还是发音,或者反而更可能是正确的,按字母顺序排序的相似之处。 You could encode lower case letters to sort before or along with upper case letters. To switch the lower and upper cases, you encode lower case letters with a y and upper case letters with a z. For a pseudo-case-insensitive sort, categorizing both A and a with the same encoding character would sort both of them before B and b, though A would nonetheless always sort before a If you want Extended ASCII characters (e.g. with diacritics) to sort along with their ASCII cousins, you encode À, Á, Â, Ã, Ä, Å, and Æ along with A by using an a as the OM character, encode B, C, and Ç with a b, and E, È, É, Ê, and Ë with a c, etc. The same intra-category sort order caveat still applies, and some decisions need to be made on characters like capital Eth, and to a certain extent others like Thorn, and Sharp S (Ð, Þ, and ß respectively) as to whether they will sort based on similarities in appearance or pronunciation, or instead more properly perhaps, alphabetical order.

虽然这使得人物的许多类别进行定义时,一定要记住,对于数字的数量级的每一个订单是它自己的类别 - 你需要知道数据将不包含更大的OM比约数250,这取决于希望多少其他类别来定义(ASCII 0 被保留用于存储字串,并需要有至少一个其它的字符表示不是数字 - 至少字母数字数据 - 使最大可能是254数量级),但是这应该是足够任何情况下,我能想象。我不知道还有什么其他问题,量子计算将带来的,但有可能是一个量子解它,不管它是什么。

Though this allows many 'categories' of characters to be defined, be sure to remember that each order of magnitude for digits is its own category - you need to know that the data will not contain numbers that are greater in OM than approximately 250, depending upon how many other categories you wish to define (ASCII 0 is reserved for storing strings, and there needs to be at least one other character to indicate "not a digit" - at least for alphanumeric data - making the maximum perhaps 254 orders of magnitude), but that should be plenty for any situation I can imagine. I'm not sure what other issues quantum computing will bring about, but there's probably a quantum solution to it, whatever it is.

最后,如果连字符连接codeD作为一个非数字字符,以及所有的非Numerics的都带codeD具有较高的有机质比数字,负数将连接codeD作为更大比任何正数。连字符应该是连接codeD作为低于两位数-OM(也许只有当preceding一个数字),如果负数需要正确按照大小进行排序。

Finally, if the hyphen is encoded as a non-numeric character, and all non-numerics are encoded with a higher OM than digits, negative numbers would be encoded as greater than any positive number. The hyphen should be encoded as a lower-than-digit-OM (perhaps only when preceding a digit) if negative numbers need to be sorted correctly according to magnitude.