protobuf的流媒体(懒惰序列化)API流媒体、懒惰、序列化、protobuf

2023-09-06 00:35:23 作者:情比纸薄

我们有一个Android应用程序,使用Protocol Buffers的存储应用程序数据。的数据格式(大致)是一个单一的protobuf(容器),其中包含protobufs列表(项目),其为重复字段:

We have an Android app that uses Protocol Buffers to store application data. The data format (roughly) is a single protobuf ("container") that contains a list of protobufs ("items") as a repeated field:

message Container {
    repeated Item item = 1;
}

当我们要保存更改的项目,我们必须重新protobuf的容器中,加入所有的项目,然后序列化,并将其写入文件。

When we want to save a change to an item, we must recreate the protobuf container, add all the items to it, then serialize it and write it to a file.

跟这个有方法的问题是它潜在的三倍节省,因为数据必须首先从模型类复制到protobuf的建设者,然后在protobuf的序列化,所有的写前一个字节数组时所使用的内存出到文件流

The problem with this a approach is it potentially triples the memory used when saving because the data has to first be copied from the model class to the protobuf builder and then to a byte array when the protobuf is serialized, all before writing it out to a file stream.

我们希望是一个方法来创建我们的protobuf的容器,并懒洋洋地把它序列化到一个流,然后只需添加的每个protobuf的项目(从我们的模型数据创建的)到序列化,并将其写入到流的容器,而不是直到我们建立在内存中的整个容器保持所有项目在内存中。

What we would like is a way to create our protobuf container and lazily serialize it to a stream, then simply add each protobuf item (created from our model data) to the container which serializes and writes it to the stream, rather than keeping all the items in memory until we've created the entire container in memory.

有没有办法建立一个protobuf的,并懒洋洋地流序列化?

如果有没有办法正式做到这一点,是否有任何库,可以帮助?有没有人有任何建议或想法如何解决这个在其他方面?替代数据格式或技术(如JSON或包含XML protobufs),这将使这一切成为可能?

If there's not a way to do this officially, are there any libraries that can help? Does anyone have any suggestions or ideas how to solve this in other ways? Alternative data formats or technologies (e.g. JSON or XML containing protobufs) that would make this possible?

推荐答案

有关序列化:

protobuf的是一个追加的格式,不同的物品,是的合并的,反复的物品是附的

protobuf is an appendable format, with individual items being merged, and repeated items being appended

因此​​,写一个序列作为一个懒惰的河流,所有你需要做的就是重复书写相同的结构,只有一个项目在列表中:序列化的200×容器与1项序列是100%相同序列化1个集装箱内200个项目。

Therefore, to write a sequence as a lazy stream, all you need to do is repeatedly write the same structure with only one item in the list: serializing a sequence of 200 x "Container with 1 Item" is 100% identical to serializing 1 x "Container with 200 Items".

所以:只是做了

有关反序列化:

这是技术上的非常容易阅读作为一个流 - 这一切,然而,归结到库所使用。例如,我在protobuf网(一个.net / C#实现)为使本 Serializer.DeserializeItems< T> ,其内容(完全懒/流)的序列键入 T 的基础上,他们在你的问题描述(表格假设的消息,因此 Serializer.DeserializeItems<项目> 将替换流路 Serializer.Deserialize<集装箱> - 最外面的对象还挺并不真正存在的protobuf)

That is technically very easy to read as a stream - it all, however, comes down to which library you are using. For example, I expose this in protobuf-net (a .NET / C# implementation) as Serializer.DeserializeItems<T>, which reads (fully lazy/streaming) a sequence of messages of type T, based on the assumption that they are in the form you describe in the question (so Serializer.DeserializeItems<Item> would be the streaming way that replaces Serializer.Deserialize<Container> - the outermost object kinda doesn't really exist in protobuf)

如果这是不可用的,但你有机会到原始阅读器的API,你需要做的是:

If this isn't available, but you have access to a raw reader API, what you need to do is:

读取一张varint该接头连接 - 这将是的值10(的0x0A),即(1&所述; 3;)| 2的场数(1)和导线型(2)分别 - 所以这也可以表述:从流读取单个字节,并检查值10 在读取一张varint以下项目的长度 现在,: 如果读者API允许您限制要处理的字节的最大数量,使用此长度以指定随后的长度 或包裹的长度限流流API,仅限于长度 或只是手动读取的字节数,并从有效载荷构建一个内存流 read one varint for the header - this will be the value 10 (0x0A), i.e. "(1 << 3) | 2" for the field-number (1) and wire-type (2) respectively - so this could also be phrased: "read a single byte from the stream , and check the value is 10" read one varint for the length of the following item now: if the reader API allows you to restrict the maximum number of bytes to process, use this length to specify the length that follows or wrap the stream API with a length-limiting stream, limited to that length or just manually read that many bytes, and construct an in-memory stream from the payload