在16个字节的COM preSS 21字母数字字符字节、字母、字符、数字

2023-09-11 00:02:33 作者:浅如初夏

我试图把21个字节的数据唯一标识一个行业,并将其存储在一个16字节的字符阵列。我有想出正确的算法,这个麻烦。

I'm trying to take 21 bytes of data which uniquely identifies a trade and store it in a 16 byte char array. I'm having trouble coming up with the right algorithm for this.

贸易ID,这我想COM preSS包含2个字段:

The trade ID which I'm trying to compress consists of 2 fields:

在18个字母数字字符 自由以下组成的ASCII字符 0x20至0x7E的,包容性。 (32-126) 3个字符的数字串000到999 18 alphanumeric characters consisting of the ASCII characters 0x20 to 0x7E, Inclusive. (32-126) A 3-character numeric string "000" to "999"

因此​​,这将包含该数据的C ++类看起来是这样的:

So a C++ class that would encompass this data looks like this:

class ID
{
public:
    char trade_num_[18];
    char broker_[3];
};

此数据需要存储在一个16 - 字符数据结构,它看起来是这样的:

This data needs to be stored in a 16-char data structure, which looks like this:

class Compressed
{
public:
    char sku_[16];    
};

我试图利用这一事实,即自 trade_num _ 的字符只有0-127有每个字符1个未使用位优势。同样地,在二进制999是1111100111,这是唯一的10个位 - 6位短的2字节字。但是,当我工作了多少,我可以挤下来,最小的也能让它是17个字节;一个字节太大。

I tried to take advantage of the fact that since the characters in trade_num_ are only 0-127 there was 1 unused bit in each character. Similarly, 999 in binary is 1111100111, which is only 10 bits -- 6 bits short of a 2-byte word. But when I work out how much I can squeeze this down, the smallest I can make it is 17 bytes; one byte too big.

任何想法?

顺便说一句, trade_num _ 是用词不当。它可以包含字母和其他字符。这就是该规范说。

By the way, trade_num_ is a misnomer. It can contain letters and other characters. That's what the spec says.

编辑:很抱歉的混乱。该 trade_num _ 字段确实是18个字节,而不是16后,我张贴了这个线程我的互联网连接死了,我不能回到这个线程,直到刚才。

Sorry for the confusion. The trade_num_ field is indeed 18 bytes and not 16. After I posted this thread my internet connection died and I could not get back to this thread until just now.

EDIT2:我认为它是安全的,作出关于数据集的假设。为trade_num_字段,我们可以假设,非打印的ASCII字符0-31不会present。也不会ASCII codeS 127或126(〜)。所有的人可能会被present,包括大写和小写字母,数字和标点符号。这剩下的总共94个字符的集 trade_num _ 将包括,ASCII codeS 32到125(含)。

I think it is safe to make an assumption about the dataset. For the trade_num_ field, we can assume that the non-printable ASCII characters 0-31 will not be present. Nor will ASCII codes 127 or 126 (~). All the others might be present, including upper and lower case letters, numbers and punctuations. This leaves a total of 94 characters in the set that trade_num_ will be comprised of, ASCII codes 32 through 125, inclusive.

推荐答案

如果您有范围为0 18个字符 - 127等一批范围为0 - 999和紧凑这一尽可能那么将需要17个字节

If you have 18 characters in the range 0 - 127 and a number in the range 0 - 999 and compact this as much as possible then it will require 17 bytes.

>>> math.log(128**18 * 1000, 256)
16.995723035582763

您可能能够利用以下事实:一些字符是最有可能不使用的优势。尤其它是不太可能有以下值32的任何字符,和127也可能不被使用。如果你能找到一个更未使用的字符,这样你可以先为紧密字符转换成基94,然后打包成字节越好。

You may be able to take advantage of the fact that some characters are most likely not used. In particular it is unlikely that there are any characters below value 32, and 127 is also probably not used. If you can find one more unused character so you can first convert the characters into base 94 and then pack them into the bytes as closely as possible.

>>> math.log(94**18 * 1000, 256)
15.993547951857446

本的只是的安装到16个字节!

This just fits into 16 bytes!

示例code

下面是用Python编写的(但写在一个非常迫切的风格,使得它可以很容易地通过非Python程序员理解的)一些例如code。我假设有输入无波浪号()。如果有,你应该编码字符串之前与另一个字符替换。

Here is some example code written in Python (but written in a very imperative style so that it can easily be understood by non-Python programmers). I'm assuming that there are no tildes (~) in the input. If there are you should substitute them with another character before encoding the string.

def encodeChar(c):
    return ord(c) - 32

def encode(s, n):
    t = 0
    for c in s:
        t = t * 94 + encodeChar(c)
    t = t * 1000 + n

    r = []
    for i in range(16):
        r.append(int(t % 256))
        t /= 256

    return r

print encode('                  ', 0)    # smallest possible value
print encode('abcdefghijklmnopqr', 123)
print encode('}}}}}}}}}}}}}}}}}}', 999)  # largest possible value

输出:

[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]
[ 59, 118, 192, 166, 108,  50, 131, 135, 174,  93,  87, 215, 177,  56, 170, 172]
[255, 255, 159, 243, 182, 100,  36, 102, 214, 109, 171,  77, 211, 183,   0, 247]

此算法使用Python的处理能力非常大的数字。为了将这个code到C ++,你可以使用一个大的整数库。

This algorithm uses Python's ability to handle very large numbers. To convert this code to C++ you could use a big integer library.

您当然会需要一个相当的解码功能,其原理是相同的 - 的操作以相反的顺序执行

You will of course need an equivalent decoding function, the principle is the same - the operations are performed in reverse order.

 
精彩推荐