。C#/。NET - 自定义的二进制文件格式 - 从哪里开始?自定义、格式、二进制文件、从哪里

2023-09-03 13:55:16 作者:花貓

我需要能够存储一些数据中一个自定义的二进制文件格式。我以前从来没有设计我自己的文件格式。它需要一个友好的格式为C#之间的旅行,Java和Ruby / Perl的/ Python的世界。

I need to be able to store some data in a custom binary file format. I've never designed my own file format before. It needs to be a friendly format for traveling between the C#, Java and Ruby/Perl/Python worlds.

要启动与该文件将包括记录。 GUID字段和JSON / YAML / XML数据包领域。我不知道作为分隔符来使用的。逗号,选项卡或换行之类的话显得太脆弱了。什么是Excel的呢?或pre-XML的OpenOffice格式?如果您使用ASCII字符0或1。不知道从哪里开始。在话题的任何文章或书籍?

To start with the file will consist of records. A GUID field and a JSON/YAML/XML packet field. I'm not sure what to use as delimiters. A comma, tab or newline kind of thing seems too fragile. What does Excel do? or the pre-XML OpenOffice formats? Should you use ASCII chars 0 or 1. Not sure where to begin. Any articles or books on the topic?

该文件格式以后可能会扩展到包括头节。

This file format may expand later to include a "header section".

请注意:首先,我会在.NET中工作正常,但我想的格式是易于携带

Note: To start with I'll be working in .NET, but I'd like the format to be easily portable.

更新: 在包的处理可以是缓慢的,但在文件格式中的导航不能。因此,我认为XML是假表。

UPDATE: The processing of the "packets" can be slow, but navigation within the file format cannot. So I think XML is off the table.

推荐答案

我会尝试加入一些一般的提示创建一个可移植的二进制文件格式。

I'll try to add some general hints for creating a portable binary file format.

请注意,要发明一种二进制文件格式的文件手段,如何在其位必须和它们的含义。这不是编码,而且文件。

Note that to invent a binary file format means to document, how the bits in it must go and what they mean. It's not coding, but documentation.

现在的提示:

决定做什么用的字节序。良好的和简单的方式去是一劳永逸的决定吧。在选择将是preferably小尾数普通PC机上使用时(即86),以节省转换(性能)。

Decide what to do with endianess. Good and simple way to go is to decide it once and forever. The choice would be preferably little endian when used on common PC (that is x86) to save conversions (performance).

创建标题。是的,这是好主意,总是有一个头。该文件的第一个字节应该可以告诉你,什么格式你玩弄。

Create header. Yes, it is good idea to always have a header. First bytes of the file should be able to tell you, what format you are messing with.

在开始用魔法能够识别您的格式(ASCII字符串会做的伎俩) 添加版本。文件格式的版本不会伤害增加,它可以让你以后做的向后兼容性。

最后,添加数据。现在,数据的格式将是具体的,它总是会根据您的具体需求。基本上,数据将被存储在一些数据结构的二进制图像。数据结构是什么,你需要拿出。

Finally, add the data. Now, the format of the data will be specific and it will always be based on your exact needs. Basically, the data will be stored in a binary image of some data structure. The data structure is what you need to come up with.

如果你需要通过某种指标对您的数据的随机存取, B-树的路要走,而如果你只需要一个批号来写它们,然后看他们都一个阵列会做的伎俩。

If you need random access to your data by some sort of indices, B-Trees are way to go, while if you just need a lot of numbers to write them all and then read them all an "array" will do the trick.

此外,您可以使用 TLV(类型长度值)概念向前兼容性。

Additionally, you might use a TLV (Type-Length-Value) concept for forward compatibility.