什么是阅读和分析网络上的一个大的文本文件的最佳方法是什么?文本文件、方法、网络

2023-09-03 16:06:35 作者:风过了无痕

我有一个要求我从远程计算机分析多个日志文件的一个问题。 有一些并发症: 1)该文件可能在使用中 2)该文件可能会相当大(100MB +) 3)每个条目可以是多行

I have a problem which requires me to parse several log files from a remote machine. There are a few complications: 1) The file may be in use 2) The files can be quite large (100mb+) 3) Each entry may be multi-line

要解决使用中的问题,我需要先复制它。目前,我直接从远程计算机复制到本地计算机,并分析它。这导致问题2.由于文件是相当大的本地复制它可能需要相当长的时间。

To solve the in-use issue, I need to copy it first. I'm currently copying it directly from the remote machine to the local machine, and parsing it there. That leads to issue 2. Since the files are quite large copying it locally can take quite a while.

要提高分析的时候,我想提出解析器多线程,但是这使得处理多行条目需要一点技巧。

To enhance parsing time, I'd like to make the parser multi-threaded, but that makes dealing with multi-lined entries a bit trickier.

这两个主要问题是: 1)如何加快文件传输(比较pression?是本地转移甚至neccessary?我可以读取使用的文件中的一些其他的方式?) 2)我如何处理多行条目分割线路线程之间是什么时候?

The two main issues are: 1) How do i speed up the file transfer (Compression?, Is transferring locally even neccessary?, Can I read an in use file some other way?) 2) How do i deal with multi-line entries when splitting up the lines among threads?

更新:我没有做在服务器上的原因显而易见解析的原因是,我希望有尽可能少的CPU的影响成为可能。我不想影响系统即时测试的性能。

UPDATE: The reason I didnt do the obvious parse on the server reason is that I want to have as little cpu impact as possible. I don't want to affect the performance of the system im testing.

推荐答案

如果你正在读你想通过在网络线路读取它的行顺序文件。你需要一个能流的传输方法。您需要查看您的IO流技术摸不着头脑。

If you are reading a sequential file you want to read it in line by line over the network. You need a transfer method capable of streaming. You'll need to review your IO streaming technology to figure this out.

大IO这样的操作将不会受益多少。

Large IO operations like this won't benefit much by multithreading since you can probably process the items as fast as you can read them over the network.

您的其他伟大的选择是把服务器上的日志分析程序,并下载结果。

Your other great option is to put the log parser on the server, and download the results.