可能重复: 处理逗号文件
我写我自己一个CSV解析器它正常工作,直到我打这个纪录:
B002VECGTG,B002VECGTG,HAS_17131_spaceshooter,4,426,0.04%,4832,0.03%,0%,1,0.02%,$ 20.47,1
该逃了出来,在4,426,并在4,426刹车我的解析器。
I wrote myself a CSV parser it works fine until I hit this record:
B002VECGTG,B002VECGTG,HAS_17131_spaceshooter,"4,426",0.04%,"4,832",0.03%,0%,1,0.02%,$20.47 ,1
The escaped , in "4,426" and in "4,426" brake my parser.
这是我用什么来解析文本行:
This is what I am using to parse the line of text:
char[] comma = { ',' };
string[] words = line.Split(comma);
我如何prevent我的程序破裂?
How do I prevent my program from breaking?
您不能只是分裂逗号。为了实现适当的解析器这种情况下,通过字符串自己需要循环,跟踪你是否是引号内与否。如果你是一个引号字符串内,你应该继续,直到找到另一个报价。
You can't just split on comma. To implement a proper parser for that case, you need to loop through the string yourself, keeping track of whether you are inside quotes or not. If you are inside a quoted string, you should keep on until you find another quote.
IEnumerable<string> LineSplitter(string line)
{
int fieldStart = 0;
for(int i = 0; i < line.Length; i++)
{
if(line[i] == ',')
{
yield return line.SubString(fieldStart, i - fieldStart);
fieldStart = i + 1;
}
if(line[i] == '"')
for(i++; line[i] != '"'; i++) {}
}
}