如何拆分一个巨大的文件进言？巨大、文件

2023-09-04 10:52:10 作者：不倾世丶只倾你一人

我

如何读取文本文件很长的字符串，然后对其进行处理（分成的话）？

我试过 StreamReader.ReadLine（）的方法，但我得到一个内存不足例外。很显然，我的线是非常长的。这是我的code读取文件：

 使用（VAR的StreamReader = File.OpenText（_filePath））
    {

        INT LINENUMBER = 1;
        字符串currentString =的String.Empty;
        而（（currentString = streamReader.ReadLine（））！= NULL）
        {

            ProcessString（currentString，行号）;
            Console.WriteLine（行{0}，行号）;
            LINENUMBER ++;
        }
    }

而code的分割线成词：

  VAR wordPattern = @\ w +;
VAR matchCollection = Regex.Matches（文字，wordPattern）;
VAR话=（从matchCollection匹配词
             选择word.Value.ToLowerInvariant（））了ToList（）。

解决方案拆分文档将大文档拆分,从而分配给多个译员

您可以通过炭阅读，建立的话，你走了，使用收益率，使其延迟，以便你不必将整个文件读取一次：

 私有静态的IEnumerable＆LT;字符串＆GT; ReadWords（字符串文件名）
{
    使用（VAR读卡器=新的StreamReader（文件名））
    {
        VAR建设者=新的StringBuilder（）;

        而（！reader.EndOfStream）
        {
            炭C =（char）的reader.Read（）;

            //模仿正则表达式/ W /  - 差不多。
            如果（char.IsLetterOrDigit（三）||ç=='_'）
            {
                builder.Append（C）;
            }
            其他
            {
                如果（builder.Length大于0）
                {
                    收益回报builder.ToString（）;
                    builder.Clear（）;
                }
            }
        }

        收益回报builder.ToString（）;
    }
}

在code读取由字符的文件，它遇到非文字字符时，它会收益率回报建立在此之前字（仅适用于第一个非字母字符）。在code使用了的StringBuilder 打造的字串。

Char.IsLetterOrDigit（） 的行为就如同的正则表达式字字符是W 的字符，但下划线（其中包括）也属于后一类。如果输入包含多个字符，你还希望包括，你就必须改变如果（）。

How can I read a very long string from text file, and then process it (split into words)?

I tried the StreamReader.ReadLine() method, but I get an OutOfMemory exception. Apparently, my lines are extremely long. This is my code for reading file:

using (var streamReader = File.OpenText(_filePath))
    {

        int lineNumber = 1;
        string currentString = String.Empty;
        while ((currentString = streamReader.ReadLine()) != null)
        {

            ProcessString(currentString, lineNumber);
            Console.WriteLine("Line {0}", lineNumber);
            lineNumber++;
        }
    }

And the code which splits line into words:

var wordPattern = @"\w+";
var matchCollection = Regex.Matches(text, wordPattern);
var words = (from Match word in matchCollection
             select word.Value.ToLowerInvariant()).ToList();

解决方案

You could read by char, building up words as you go, using yield to make it deferred so you don't have to read the entire file at once:

private static IEnumerable<string> ReadWords(string filename)
{
    using (var reader = new StreamReader(filename))
    {
        var builder = new StringBuilder();

        while (!reader.EndOfStream)
        {
            char c = (char)reader.Read();

            // Mimics regex /w/ - almost.
            if (char.IsLetterOrDigit(c) || c == '_')
            {
                builder.Append(c);
            }
            else
            {
                if (builder.Length > 0)
                {
                    yield return builder.ToString();
                    builder.Clear();
                }
            }
        }

        yield return builder.ToString();
    }
}

The code reads the file by character, and when it encounters a non-word character it will yield return the word built up until then (only for the first non-letter character). The code uses a StringBuilder to build the word string.

Char.IsLetterOrDigit() behaves just as the regex word character w for characters, but underscores (amongst others) also fall into the latter category. If your input contains more characters you also wish to include, you'll have to alter the if().

上一篇：Android的工作室0.8.14：新创建的目录没有出现在文件夹视图出现在、视图、文件夹、工作室

下一篇：在AsyncTask的使用等待AsyncTask

相关推荐

精彩图集

精彩推荐

图片推荐

泰姬陵爆发的诡异局面揭秘泰姬陵爆发了