IEnumerable的性能比较,提高事件源每个项目?性能、事件、项目、IEnumerable

2023-09-03 03:00:32 作者:暖夏清风

我想读取包含数百万条记录的大二进制文件,我希望得到一些报告的记录。我用 BinaryReader在阅读(我认为在读者的最佳性能),并转换读取的字节数据模型。由于记录的数量,通过模型报告层是另一个问题:我preFER使用的IEnumerable 制定报告时,有LINQ功能和特性

下面是样本数据类:

 公共类迈德特
    公共A1作为UINT64
    公共A2作为UINT64
    公共A3为字节
    公共A4作为UINT16
    公共A5作为UINT64
末级
 

我用这个子创建文件:

 子CreateSampleFile()
    使用的StreamWriter作为新的FileStream(文件名,FileMode.Create,FileAccess.Write,FileShare.Write)
        对于我作为整数= 1至1000
            对于j为整数= 1至1000
                对于k = 1至30
                    昏暗的项目作为新迈德特随着{.A1 = I,.A2 = j的,.A3 = K,.A4 = j的,.A5 = I *Ĵ}
                    昏暗的字节()为字节= BitConverter.GetBytes(item.A1).Concat(BitConverter.GetBytes(item.A2)).Concat({item.A3}).Concat(BitConverter.GetBytes(item.A4)).Concat(BitConverter.GetBytes(item.A5)).ToArray
                    streamWriter.Write(字节,0,bytes.Length)
                下一个
            下一个
        下一个
    结束使用
结束小组
 
小米11评测 这一次它的确更有 高端感 了

和这里是我的读取器类:

 进口System.IO

公共类的FileReader

    公共常量BUFFER_LENGTH只要= 4096 * 256 * 27
    公共常量MY_DATA_LENGTH只要= 27
    私人_buffer(BUFFER_LENGTH  -  1)为字节
    私人_streamWriter作为的FileStream
    公共事件OnByteRead(发件人为的FileReader,字节()为字节,指数只要)

    公用Sub StartReadBinary(文件名作为字符串)
        昏暗currentBufferReadCount只要= 0
        使用FILESTREAM作为新的FileStream(文件名,FileMode.Open,FileAccess.Read,FileShare.Read)
            使用的StreamReader作为BinaryReader在新(FILESTREAM)
                currentBufferReadCount = streamReader.Read(Me._buffer,0,Me._buffer.Length)
                虽然currentBufferReadCount> 0
                    对于我作为整数= 0〜currentBufferReadCount  -  1步MY_DATA_LENGTH
                        的RaiseEvent OnByteRead(我,Me._buffer,I)
                    下一个
                    currentBufferReadCount = streamReader.Read(Me._buffer,0,Me._buffer.Length)
                结束在
            结束使用
        结束使用
    结束小组

    公共迭代器功能GETALL(文件名作为字符串)为IEnumerable(中迈德特)
        昏暗currentBufferReadCount只要= 0
        使用FILESTREAM作为新的FileStream(文件名,FileMode.Open,FileAccess.Read,FileShare.Read)
            使用的StreamReader作为BinaryReader在新(FILESTREAM)
                currentBufferReadCount = streamReader.Read(Me._buffer,0,Me._buffer.Length)
                虽然currentBufferReadCount> 0
                    对于我作为整数= 0〜currentBufferReadCount  -  1步MY_DATA_LENGTH
                        产量的GetInstance(_buffer,I)
                    下一个
                    currentBufferReadCount = streamReader.Read(Me._buffer,0,Me._buffer.Length)
                结束在
            结束使用
        结束使用
    端功能

    公共功能的GetInstance(字节()为字节,指数只要)作为迈德特
        返回新迈德特随着{.A1 = BitConverter.ToUInt64(字节,索引),.A2 = BitConverter.ToUInt64(字节,指数+ 8),.A3 =字节(指数+ 16),.A4 = BitConverter.ToUInt16(字节,指数+ 17),.A5 = BitConverter.ToUInt64(字节,指数+ 19)}
    端功能

末级
 

我在想的的IEnumerable 的表现,所以我试图同时使用 GETALL 方法的IEnumerable ,提高事件是从文件中读取每个记录。下面是测试模块:

 进口System.IO

模块模块1

    私人文件名作为字符串=MYDATA.DAT
    私人readerJustTraverse作为新的FileReader
    私人WithEvents就readerWithoutInstance作为新的FileReader
    私人WithEvents就readerWithInstance作为新的FileReader
    私人readerIEnumerable作为新的FileReader

    副主()

        昏暗参考译文]新秒表

        s.Start()
        readerJustTraverse.StartReadBinary(文件名)
        s.Stop()
        Console.WriteLine(读字节:{0},s.El​​apsedMilliseconds)

        s.Restart()
        readerWithoutInstance.StartReadBinary(文件名)
        s.Stop()
        Console.WriteLine(读字节,提高事件:{0},s.El​​apsedMilliseconds)

        s.Restart()
        readerWithInstance.StartReadBinary(文件名)
        s.Stop()
        Console.WriteLine(读字节,提高活动,获得实例:{0},s.El​​apsedMilliseconds)

        s.Restart()
        每个项目在readerIenumerable.GetAll(文件名)

        下一个
        Console.WriteLine(读字节,获得实例,回报收益率:{0},s.El​​apsedMilliseconds)
        s.Stop()

        到Console.ReadLine()

    结束小组

    私人小组readerWithInstance_OnByteRead(发件人为的FileReader,字节()为字节,指数只要)处理readerWithInstance.OnByteRead
        昏暗的项目作为迈德特= sender.GetInstance(字节,索引)
    结束小组

    私人小组readerWithoutInstance_OnByteRead(发件人为的FileReader,字节()为字节,指数只要)处理readerWithoutInstance.OnByteRead
        '没做什么
    结束小组

前端模块
 

我不知道是经过时间为每个进程的事情,这里是测试结果(华硕的Ultrabook测试 - 的Zenbook的Core i7):

读取的字节:384(无接触读取的字节!)

读取的字节,提高事件:583

读取的字节,提升活动,获得实例:3923

读取的字节,得到例如,回报率:4917

这表明,阅读文件的字节是令人难以置信的速度快,转换字节模式是缓慢的。此外筹款活动,而不是越来越IEnumerable的结果,是快25%。

的IEnumerable的迭代是真的有这种性能损失或我错过了什么?

解决方案

是的,使用迭代器功能进行性能上的损失。

我编译了code和我得到了相同的结果像你一样。我看着产生IL code。从GETALL方法创建的状态机确实包含了很多东西,但大部分的指令是NOP的或者简单的操作。

与/结果不使用迭代器的功能有所不同,就像你说的,25%。这不是太多。当您使用StartReadBinary,根本就它调用OnByteRead方法(通过事件)三十亿次一个大周期。然而,当你创建一个foreach循环的对象,你必须做的每一个对象调用生成的枚举的GetCurrent()方法,且MoveNext(),其中后者是不平凡的(大部分code从GETALL被搬到那里),并使用编译器生成的变数相当数量。

使用收益一般减慢你的程序,因为编译器能够创建复杂的IL code重新present状态机。

I want to read big binary file containing millions of records and I want to get some reports for the records. I use BinaryReader to read (which I think has the best performance in readers) and convert read bytes to data model. Due to the count of records, passing model to the report layer is another issue: I prefer to use IEnumerable to have LINQ functionality and features when developing the reports.

Here is sample data class:

Public Class MyData
    Public A1 As UInt64
    Public A2 As UInt64
    Public A3 As Byte
    Public A4 As UInt16
    Public A5 As UInt64
End Class

I used this sub to create the file:

Sub CreateSampleFile()
    Using streamWriter As New FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.Write)
        For i As Integer = 1 To 1000
            For j As Integer = 1 To 1000
                For k = 1 To 30
                    Dim item As New MyData With {.A1 = i, .A2 = j, .A3 = k, .A4 = j, .A5 = i * j}
                    Dim bytes() As Byte = BitConverter.GetBytes(item.A1).Concat(BitConverter.GetBytes(item.A2)).Concat({item.A3}).Concat(BitConverter.GetBytes(item.A4)).Concat(BitConverter.GetBytes(item.A5)).ToArray
                    streamWriter.Write(bytes, 0, bytes.Length)
                Next
            Next
        Next
    End Using
End Sub

And here is my reader class:

Imports System.IO

Public Class FileReader

    Public Const BUFFER_LENGTH As Long = 4096 * 256 * 27
    Public Const MY_DATA_LENGTH As Long = 27
    Private _buffer(BUFFER_LENGTH - 1) As Byte
    Private _streamWriter As FileStream
    Public Event OnByteRead(sender As FileReader, bytes() As Byte, index As Long)

    Public Sub StartReadBinary(fileName As String)
        Dim currentBufferReadCount As Long = 0
        Using fileStream As New FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read)
            Using streamReader As New BinaryReader(fileStream)
                currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                While currentBufferReadCount > 0
                    For i As Integer = 0 To currentBufferReadCount - 1 Step MY_DATA_LENGTH
                        RaiseEvent OnByteRead(Me, Me._buffer, i)
                    Next
                    currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                End While
            End Using
        End Using
    End Sub

    Public Iterator Function GetAll(fileName As String) As IEnumerable(Of MyData)
        Dim currentBufferReadCount As Long = 0
        Using fileStream As New FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read)
            Using streamReader As New BinaryReader(fileStream)
                currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                While currentBufferReadCount > 0
                    For i As Integer = 0 To currentBufferReadCount - 1 Step MY_DATA_LENGTH
                        Yield GetInstance(_buffer, i)
                    Next
                    currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                End While
            End Using
        End Using
    End Function

    Public Function GetInstance(bytes() As Byte, index As Long) As MyData
        Return New MyData With {.A1 = BitConverter.ToUInt64(bytes, index), .A2 = BitConverter.ToUInt64(bytes, index + 8), .A3 = bytes(index + 16), .A4 = BitConverter.ToUInt16(bytes, index + 17), .A5 = BitConverter.ToUInt64(bytes, index + 19)}
    End Function

End Class

I was thinking about the IEnumerable performance, so I tried to use both GetAll method as IEnumerable and raising event for each record which is read from file. Here is the test module:

Imports System.IO

Module Module1

    Private fileName As String = "MyData.dat"
    Private readerJustTraverse As New FileReader
    Private WithEvents readerWithoutInstance As New FileReader
    Private WithEvents readerWithInstance As New FileReader
    Private readerIEnumerable As New FileReader

    Sub Main()

        Dim s As New Stopwatch

        s.Start()
        readerJustTraverse.StartReadBinary(fileName)
        s.Stop()
        Console.WriteLine("Read bytes: {0}", s.ElapsedMilliseconds)

        s.Restart()
        readerWithoutInstance.StartReadBinary(fileName)
        s.Stop()
        Console.WriteLine("Read bytes, raise event: {0}", s.ElapsedMilliseconds)

        s.Restart()
        readerWithInstance.StartReadBinary(fileName)
        s.Stop()
        Console.WriteLine("Read bytes, raise event, get instance: {0}", s.ElapsedMilliseconds)

        s.Restart()
        For Each item In readerIenumerable.GetAll(fileName)

        Next
        Console.WriteLine("Read bytes, get instance, return yield: {0}", s.ElapsedMilliseconds)
        s.Stop()

        Console.ReadLine()

    End Sub

    Private Sub readerWithInstance_OnByteRead(sender As FileReader, bytes() As Byte, index As Long) Handles readerWithInstance.OnByteRead
        Dim item As MyData = sender.GetInstance(bytes, index)
    End Sub

    Private Sub readerWithoutInstance_OnByteRead(sender As FileReader, bytes() As Byte, index As Long) Handles readerWithoutInstance.OnByteRead
        'do nothing
    End Sub

End Module

The thing which I'm wondering is elapsed time for each process, here is the test result (tested on ASUS Ultrabook - Zenbook Core i7):

Read bytes: 384 (without touching the read bytes!)

Read bytes, raise event: 583

Read bytes, raise event, get instance: 3923

Read bytes, get instance, return yield: 4917

It shows that reading file as byte is incredibly fast, and converting bytes to the model is slow. Also raising event instead of getting IEnumerable result, is 25% faster.

Is iterating in IEnumerable is really has this performance cost or I missed something?

解决方案

Yes, using Iterator Functions carries a performance penalty.

I compiled your code and I got the same results as you did. I looked at the generated IL code. The state machine created from the GetAll method does contain a lot of stuff but most of the instructions are nop's or simple operations.

The results with/without using the iterator functions differ, as you say, by 25%. That's not too much. When you are using StartReadBinary, there is simply one big cycle which calls the OnByteRead method (via the event) three billion times. However, when you create the objects in a foreach cycle, what you must do for each object is call the GetCurrent() method and MoveNext() of the generated enumerator, the latter of which is not trivial (most of the code from GetAll was moved there) and uses quite an amount of compiler-generated variables.

Using "Yield" generally slows down your program because the compiler has to create complicated IL code to represent the state machine.