今天早上,我问here为什么我的Python code是(很多)慢然后我的F#版本,但我不知道F#的版本是否可以进行得更快。任何想法如何,我可以创造的低于code读取唯一索引的排序列表从二进制文件,32位整数的速度更快的版本?请注意,我试过2的方法,一种是基于BinaryReader在基于MemoryMappedFile另一个(的和一些Github上)。
This morning I asked here why my Python code was (a lot) slower then my F# version but I'm wondering whether the F# version can be made faster. Any ideas how I could create a faster version of the below code that reads a sorted list of unique indexes from a binary file with 32-bit integers? Note that I tried 2 approaches, one based on a BinaryReader, the other one based on MemoryMappedFile (and some more on Github).
module SimpleRead
let readValue (reader:BinaryReader) cellIndex =
// set stream to correct location
reader.BaseStream.Position <- cellIndex*4L
match reader.ReadInt32() with
| Int32.MinValue -> None
| v -> Some(v)
let readValues fileName indices =
use reader = new BinaryReader(File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
// Use list or array to force creation of values (otherwise reader gets disposed before the values are read)
let values = List.map (readValue reader) (List.ofSeq indices)
values
module MemoryMappedSimpleRead =
open System.IO.MemoryMappedFiles
let readValue (reader:MemoryMappedViewAccessor) offset cellIndex =
let position = (cellIndex*4L) - offset
match reader.ReadInt32(position) with
| Int32.MinValue -> None
| v -> Some(v)
let readValues fileName indices =
use mmf = MemoryMappedFile.CreateFromFile(fileName, FileMode.Open)
let offset = (Seq.min indices ) * 4L
let last = (Seq.max indices) * 4L
let length = 4L+last-offset
use reader = mmf.CreateViewAccessor(offset, length, MemoryMappedFileAccess.Read)
let values = (List.ofSeq indices) |> List.map (readValue reader offset)
values
有关比较这里是我的最新版本numpy的
For comparison here is my latest numpy version
import numpy as np
def convert(v):
if v <> -2147483648:
return v
else:
return None
def read_values(filename, indices):
values_arr = np.memmap(filename, dtype='int32', mode='r')
return map(convert, values_arr[indices])
更新 在相反的是我以前在这里说,我的蟒蛇还是慢了很多,然后在F#版本,但由于一个错误在我的Python测试中,它似乎并非如此。 这里留下了这个问题,以防有人在BinaryReader在该深入了解或MemoryMappedFile知道一些改进。
Update In contrary to what I said before here, my python is still a lot slower then the F# version but due to an error in my python tests it appeared otherwise. Leaving this question here in case someone with in depth knowledge of the BinaryReader or MemoryMappedFile knows some improvements.
我设法获得SimpleReader 30%的速度通过,而不是reader.BaseStream.Position reader.BaseStream.Seek。我也通过阵列替换名单,但这并没有发生很大的变化。
I managed to get the SimpleReader 30% faster by using reader.BaseStream.Seek instead of reader.BaseStream.Position. I also replaced lists by arrays but this didn't change a lot.
我的简单的读满code现在是:
The full code of my simple reader is now:
open System
open System.IO
let readValue (reader:BinaryReader) cellIndex =
// set stream to correct location
reader.BaseStream.Seek(int64 (cellIndex*4), SeekOrigin.Begin) |> ignore
match reader.ReadInt32() with
| Int32.MinValue -> None
| v -> Some(v)
let readValues indices fileName =
use reader = new BinaryReader(File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
// Use list or array to force creation of values (otherwise reader gets disposed before the values are read)
let values = Array.map (readValue reader) indices
values
满code和版本在其他语言都在 GitHub上
上一篇:C#为什么必须转换操作符必须声明为静态和公众?静态、公众、声明、操作
下一篇:如何socketdatagram转换为Serversock?转换为、socketdatagram、Serversock