如果HTML文件无结束" / TR"标签或" / TD"标签然后HTML敏捷包不阅读这些信息完美标签、敏捷、结束、完美

2023-09-04 09:25:55 作者:智者不入爱河

我使用的HTML敏捷性包解析HTML内容。我使用的分析来提取表信息。 有用。但是,如果没有结束/ TR标签或/ TD标签,然后它不分析这些信息完美。(其中没有结束tr标签或td标签。)

如同

 < HTML>
  < HEAD>
    < META NAME =发电机的内容=
    HTML精简的Windows(VERS 2006年2月14日),请参阅www.w3.org>
    <冠军>< /标题>
  < /头>
  <身体GT;
    <表CELLSPACING =0的cellpadding =0WIDTH =100%的边界=0>
      < TBODY>
        &其中; TR>
          < TD类=xl27VALIGN =底部合并单元格=9>
            先生/女士,< BR>
            我/我们今天做依照您的订单,并在您的帐户中
            以下交易:
          < / TD>
          < TD类=xl27boTRLALIGN =中合并单元格=5>
            支付了按要求根据有关印花税法印花税
            在月底合并的基础。
          < / TD>
        < / TR>
        &所述; TR高度=30>
          < TD类=xl27boTBLALIGN =中间WIDTH =7%>
            订单号
          < / TD>
          < TD类=xl27boTBLALIGN =中间WIDTH =4%>
            订货时间
          < / TD>

          &所述; TD类=xl27boTBL对齐=中间宽度=10%>
            净利率
          < / TD>
          &所述; TD类=xl27boTBL对齐=中间宽度=10%>
            服务税
          < / TD>
          &所述; TD类=xl27boTBL对齐=中间宽度=10%>
           量
          < / TD>
          < TD类=xl27boTRBL的风格=BORDER-BOTTOM:windowtext 1点固;
          对齐=中间宽度=8%>
          净额卢比
          < / TD>
        < / TR>
        &所述; TR高度=20>
          < TD类=xl27boLNOWRAP宽度=7%>
            25222105
          < / TD>
          < TD类=xl27boLNOWRAP宽度=4%>
            14时○二分39秒
          < / TD>


          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          < TD类=xl27boRLNOWRAP ALIGN =正确的宽度=8%>
            125288.00
          < / TD>

        &所述; TR高度=20>
          < TD类=xl27boLNOWRAP宽度=7%>
            122122141
          < / TD>
          < TD类=xl27boLNOWRAP宽度=4%>
            14时01分56秒
          < / TD>


          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          < TD类=xl27boRLNOWRAP ALIGN =正确的宽度=8%>
            249612.64
          < / TD>

        &所述; TR高度=20>
          < TD类=xl27boLNOWRAP宽度=7%>
            
          < / TD>
          < TD类=xl27boLNOWRAP宽度=4%>
            
          < / TD>
          < TD类=xl27boLNOWRAP宽度=7%>
            
          < / TD>
          < TD类=xl27boLNOWRAP宽度=4%>
            
          < / TD>
          < TD类=xl27boLNOWRAP align =leftWIDTH =15%>
            [服务税]
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          < TD类=xl27boLNOWRAP ALIGN =正确的宽度=7%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          &所述; TD类=xl27boLNOWRAP对齐=右宽度=10%>
            
          < / TD>
          < TD类=xl27boRLNOWRAP ALIGN =正确的宽度=8%>
            61.66
          < / TD>
        < / TR>
      < / TBODY>
    < /表>
  < /身体GT;
< / HTML>
 

在这样的情况我该怎么办?

 <表的cellpadding = 1 CELLSPACING = 0宽度=100%的风格=边界:1px的固体#FFFFFF;''>
< TRAlign ='中间'VALIGN ='底部'类='clsTRFontBold'&GT​​;
< TD NoWrap的类= clsTRFontHdr> ORDER NO< / TD>< TD NoWrap的类= clsTRFontHdr> ORD TIME< / TD>
< TD NoWrap的类= clsTRFontHdr>外贸否< / TD>< TD NoWrap的类= clsTRFontHdr> TRD TIME< / TD>
< TD NoWrap的类= clsTRFontHdr ALIGN = CENTER> SCRIPNAME< / TD>
< TD NoWrap的类= clsTRFontHdr>买入/卖出< / TD>< TD NoWrap的类= clsTRFontHdr>数量< / TD>
< TD NoWrap的类= clsTRFontHdr ALIGN =右GT;年率(RS)和LT; / TD>
< TD NoWrap的类= clsTRFontHdr ALIGN =正确个总(RS)< / TD>
< TD NoWrap的类= clsTRFontHdr ALIGN =右GT; TOT布洛克(RS)< / TD>
< TD NoWrap的类= clsTRFontHdr ALIGN =右GT; SER税(RS)< / TD>
< TD NoWrap的类= clsTRFontHdr ALIGN =右GT; STT(RS)< / TD>
< TD NoWrap的类= clsTRFontHdr ALIGN =右GT;总净(RS)< / TD>
< / TR>

&所述; TR类='clsTRFont'>
< TD NoWrap的> 2009030267182768< / TD>
&所述; TD NoWrap的→10:28:11&其中; / TD>&所述; TD NoWrap的> 66950592&所述; / TD>
&所述; TD NoWrap的→10:28:25℃; / TD>
< TD NoWrap的> SESA GOA LTD< / TD>
< TD NoWrap的>买入< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 366< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 78.2000< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 28621.20< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; -28621.21< / TD>< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030267182768< / TD>
&所述; TD NoWrap的→10:28:11&其中; / TD>&所述; TD NoWrap的> 66950783&所述; / TD>&所述; TD NoWrap的→10:28:27&其中; / TD>
< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 100℃/ TD>
< TD NoWrap的ALIGN ='右'&GT​​; 78.2000< / TD>< TD NoWrap的ALIGN ='右'&GT​​;&7820.00 LT; / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -7820.01< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030267182768&所述; / TD>&所述; TD NoWrap的→10:28:11&其中; / TD>
&所述; TD NoWrap的> 66956828&所述; / TD>&所述; TD NoWrap的→10:29:39&其中; / TD>&所述; TD NoWrap的> SESA GOA LTD。&所述; / TD>
< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 534< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 78.2000< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 41758.80< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -41758.81< / TD>
< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030267510894< / TD>< TD NoWrap的> 11:06:12< / TD>< TD NoWrap的> 67137258< / TD>
< TD NoWrap的> 11:09:24< / TD>< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 162< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.2500< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 12676.50< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 3.1320< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 12673.36< / TD>< / TR>< TD NoWrap的> 2009030267510894< / TD>
&所述; TD NoWrap的> 11:06:12&其中; / TD>&所述; TD NoWrap的> 67137465&所述; / TD>&所述; TD NoWrap的> 11:09:28和; / TD>
< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 200℃; / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 78.2500< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 15650.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 4.1010< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 15645.89< / TD>
< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030267510894< / TD>< TD NoWrap的> 11:06:12< / TD>
< TD NoWrap的> 67137479< / TD>< TD NoWrap的> 11:09:28和; / TD>< TD NoWrap的> SESA GOA LTD< / TD>
< TD NoWrap的>卖< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 4℃; / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 78.2500< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 313.00与LT; / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.0773< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 312.91与LT; / TD>
< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030267510894< / TD>< TD NoWrap的> 11:06:12< / TD>< TD NoWrap的> 67137995< / TD>
< TD NoWrap的> 11:09:32< / TD>< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 16< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.2500< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 1252.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.3093< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 1251.68< / TD>< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030267510894< / TD>
< TD NoWrap的> 11:06:12< / TD>< TD NoWrap的> 67138097< / TD>< TD NoWrap的> 11:09:34  - ; / TD>
< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 100℃/ TD>
< TD NoWrap的ALIGN ='右'&GT​​; 78.2500< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 7825.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 0.01 LT; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>
< TD NoWrap的ALIGN ='右'&GT​​; 1.9333< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 7823.06< / TD>
< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030267510894< / TD>< TD NoWrap的> 11:06:12< / TD>< TD NoWrap的> 67138333< / TD>< TD NoWrap的> 11:09:39< / TD>< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 200℃; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.2500&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 15650.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 3.8666< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 15646.12< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030267510894&所述; / TD>&所述; TD NoWrap的> 11:06:12&其中; / TD>&所述; TD NoWrap的> 67138344&所述; / TD>&所述; TD NoWrap的> 11:09:40℃; / TD>&其中; TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 318< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.2500&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 24883.50< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 6.1479< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 24877.34< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030268222556&所述; / TD>&所述; TD NoWrap的> 13:03:50℃; / TD>&所述; TD NoWrap的> 67511545&所述; / TD>&所述; TD NoWrap的> 13:03:51&其中; / TD>&其中; TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 733< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.0000&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 57174.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -57174.01< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030268222556&所述; / TD>&所述; TD NoWrap的> 13:03:50℃; / TD>&所述; TD NoWrap的> 67511621&所述; / TD>&所述; TD NoWrap的> 13:03:53&其中; / TD>&其中; TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 2< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.0000&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 156.00与LT; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -156.01< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030268222556&所述; / TD>&所述; TD NoWrap的> 13:03:50℃; / TD>&所述; TD NoWrap的> 67511797&所述; / TD>&所述; TD NoWrap的> 13:03:58&其中; / TD>&其中; TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 1< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.0000&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -78.01< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030268222556&所述; / TD>&所述; TD NoWrap的> 13:03:50℃; / TD>&所述; TD NoWrap的> 67512082&所述; / TD>&所述; TD NoWrap的> 13:04:05&其中; / TD>&其中; TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 264< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 78.0000&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 20592.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -20592.01< / TD>
< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030268378000< / TD>< TD NoWrap的> 13:31:04< / TD>< TD NoWrap的> 67609079< / TD>< TD NoWrap的> 13:33:39< / TD>< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 405< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 77.6000&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 31428.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -31428.01< / TD>
< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030268378000< / TD>< TD NoWrap的> 13:31:04< / TD>< TD NoWrap的> 67609374< / TD>< TD NoWrap的> 13:33:46< / TD>< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 45℃; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 77.6000&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 3492.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -3492.01< / TD>
< / TR>
<! -  tr标签缺失 - >
< TD NoWrap的> 2009030268779359< / TD>< TD NoWrap的> 14:32:04< / TD>< TD NoWrap的> 67870192< / TD>< TD NoWrap的> 14:32:41< / TD>< TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>买入< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 900℃; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 77.3000&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 69570.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00< / TD>< TD NoWrap的ALIGN ='右'&GT​​; -69570.01< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030269013760&所述; / TD>&所述; TD NoWrap的→15:03:56&其中; / TD>&所述; TD NoWrap的> 68018179&所述; / TD>&所述; TD NoWrap的→15:03:56&其中; / TD>&其中; TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 146< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 76.2500&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 11132.50< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 2.8226< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 11129.67< / TD>
< / TR>
<! -  tr标签缺失 - >
&所述; TD NoWrap的> 2009030269013760&所述; / TD>&所述; TD NoWrap的→15:03:56&其中; / TD>&所述; TD NoWrap的> 68018180&所述; / TD>&所述; TD NoWrap的→15:03:56&其中; / TD>&其中; TD NoWrap的> SESA GOA LTD< / TD>< TD NoWrap的>卖< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 10< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 76.2500&LT ; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 762.50与LT; / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.01< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.00 < / TD>< TD NoWrap的ALIGN ='右'&GT​​; 0.1933< / TD>< TD NoWrap的ALIGN ='右'&GT​​; 762.30与LT; / TD>
< / TR>
<表的cellpadding = 0 CELLSPACING = 0的边界= 0>< BR>
 

解决方案

既然你测试了我其他的想法,并没有工作,我想你只有两个选择:

修改HTML敏捷性包处理您的案件,或 填写所缺< / TR> 取值自己 最多跑一次丨出口退 免 税企业备案信息报告

下面是可能填补一个正则表达式中缺少< / TR> 给你的:

  HTML = Regex.Replace(HTML, "<tr[^>]*>(?:(?!</?tr>|</tbody>|</table>).)*?(?=<tr[^>]*>|</tbody>|</table>)", $&安培;&所述; / TR&gt;中,RegexOptions.Singleline | RegexOptions.IgnoreCase);
 

(如果有人能提高我的正则表达式,请随意。)

I am using HTML Agility Pack to parse html content. I am using parsing to extract table information. It works. But if there is no ending "/tr" tag or "/td" tag then it does not parse that information perfectly.(in which there is no ending tr tag or td tag.)

Like

    <html>
  <head>
    <meta name="generator" content=
    "HTML Tidy for Windows (vers 14 February 2006), see www.w3.org">
    <title></title>
  </head>
  <body>
    <table cellspacing="0" cellpadding="0" width="100%" border="0">
      <tbody>
        <tr>
          <td class="xl27" valign="bottom" colspan="9">
            Sir / Madam,<br>
            I/We have this day done by your order and on your account the
            following transactions:
          </td>
          <td class="xl27boTRL" align="middle" colspan="5">
            Stamp duty as required under the relevant stamp act to be paid on
            consolidated basis at the end of the month.
          </td>
        </tr>
        <tr height="30">
          <td class="xl27boTBL" align="middle" width="7%">
            Order No
          </td>
          <td class="xl27boTBL" align="middle" width="4%">
            Order Time
          </td>

          <td class="xl27boTBL" align="middle" width="5%">
            Net Rate
          </td>
          <td class="xl27boTBL" align="middle" width="5%">
            Service Tax
          </td>
          <td class="xl27boTBL" align="middle" width="5%">
           Amount
          </td>
          <td class="xl27boTRBL" style="BORDER-BOTTOM: windowtext 1pt solid;"
          align="middle" width="8%">
          Net Amount Rs
          </td>
        </tr>
        <tr height="20">
          <td class="xl27boL" nowrap width="7%">
            25222105
          </td>
          <td class="xl27boL" nowrap width="4%">
            14:02:39
          </td>


          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boRL" nowrap align="right" width="8%">
            125288.00 
          </td>

        <tr height="20">
          <td class="xl27boL" nowrap width="7%">
            122122141
          </td>
          <td class="xl27boL" nowrap width="4%">
            14:01:56
          </td>


          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boRL" nowrap align="right" width="8%">
            249612.64 
          </td>

        <tr height="20">
          <td class="xl27boL" nowrap width="7%">
             
          </td>
          <td class="xl27boL" nowrap width="4%">
             
          </td>
          <td class="xl27boL" nowrap width="7%">
             
          </td>
          <td class="xl27boL" nowrap width="4%">
             
          </td>
          <td class="xl27boL" nowrap align="left" width="15%">
            [SERVICE TAX]
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="7%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boL" nowrap align="right" width="5%">
             
          </td>
          <td class="xl27boRL" nowrap align="right" width="8%">
            61.66
          </td>
        </tr>
      </tbody>
    </table>
  </body>
</html>

So for that what should I do ?

<TABLE  cellpadding=1 cellspacing=0 Width='100%'  style='border:1px solid #FFFFFF;''>
<TRAlign='middle' VAlign='bottom' Class='clsTRFontBold'>
<TD NoWrap class=clsTRFontHdr>ORDER NO</TD><TD NoWrap class=clsTRFontHdr>ORD TIME</TD>
<TD  NoWrap class=clsTRFontHdr>TRADE NO</TD><TD  NoWrap class=clsTRFontHdr>TRD TIME</TD>
<TD  NoWrap class=clsTRFontHdr ALIGN=CENTER>SCRIPNAME</TD>
<TD  NoWrap class=clsTRFontHdr>BUY/SELL</TD><TD  NoWrap class=clsTRFontHdr>QUANTITY</TD>
<TD NoWrap class=clsTRFontHdr align=right>RATE (RS)</TD>
<TD NoWrap class=clsTRFontHdr align=right>TOTAL (RS)</TD>
<TD NoWrap class=clsTRFontHdr align=right>TOT BROK (RS)</TD>
<TD NoWrap class=clsTRFontHdr align=right>SER TAX (RS)</TD>
<TD NoWrap class=clsTRFontHdr align=right>STT (RS)</TD>
<TD NoWrap class=clsTRFontHdr align=right>NET TOTAL (RS)</TD>
</TR>

<TR Class='clsTRFont'>
<TD NoWrap>2009030267182768</TD>
<TD NoWrap>10:28:11</TD><TD NoWrap>66950592</TD>
<TD NoWrap>10:28:25</TD>
<TD NoWrap>SESA GOA LTD</TD>
<TD NoWrap>BUY</TD>
<TD NoWrap ALIGN='RIGHT'>366 </TD>
<TD NoWrap ALIGN='RIGHT'>78.2000</TD>
<TD NoWrap ALIGN='RIGHT'>28621.20</TD>
<TD NoWrap ALIGN='RIGHT'>0.01</TD>
<TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD>
<TD NoWrap ALIGN='RIGHT'>-28621.21</TD></TR>
<!--tr tag missing-->
<TD NoWrap>2009030267182768</TD>
<TD NoWrap>10:28:11</TD><TD NoWrap>66950783</TD><TD NoWrap>10:28:27</TD>
<TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>100 </TD>
<TD NoWrap ALIGN='RIGHT'>78.2000</TD><TD NoWrap ALIGN='RIGHT'>7820.00</TD>
<TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD>
<TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-7820.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030267182768</TD><TD NoWrap>10:28:11</TD>
<TD NoWrap>66956828</TD><TD NoWrap>10:29:39</TD><TD NoWrap>SESA GOA LTD</TD>
<TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>534 </TD>
<TD NoWrap ALIGN='RIGHT'>78.2000</TD><TD NoWrap ALIGN='RIGHT'>41758.80</TD>
<TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD>
<TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-41758.81</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030267510894</TD><TD NoWrap>11:06:12</TD><TD NoWrap>67137258</TD>
<TD NoWrap>11:09:24</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD>
<TD NoWrap ALIGN='RIGHT'>162 </TD><TD NoWrap ALIGN='RIGHT'>78.2500</TD>
<TD NoWrap ALIGN='RIGHT'>12676.50</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD>
<TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>3.1320</TD>
<TD NoWrap ALIGN='RIGHT'>12673.36</TD></TR><TD NoWrap>2009030267510894</TD>
<TD NoWrap>11:06:12</TD><TD NoWrap>67137465</TD><TD NoWrap>11:09:28</TD>
<TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD><TD NoWrap ALIGN='RIGHT'>200 </TD>
<TD NoWrap ALIGN='RIGHT'>78.2500</TD><TD NoWrap ALIGN='RIGHT'>15650.00</TD>
<TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD>
<TD NoWrap ALIGN='RIGHT'>4.1010</TD><TD NoWrap ALIGN='RIGHT'>15645.89</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030267510894</TD><TD NoWrap>11:06:12</TD>
<TD NoWrap>67137479</TD><TD NoWrap>11:09:28</TD><TD NoWrap>SESA GOA LTD</TD>
<TD NoWrap>SELL</TD><TD NoWrap ALIGN='RIGHT'>4 </TD>
<TD NoWrap ALIGN='RIGHT'>78.2500</TD><TD NoWrap ALIGN='RIGHT'>313.00</TD>
<TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD>
<TD NoWrap ALIGN='RIGHT'>0.0773</TD><TD NoWrap ALIGN='RIGHT'>312.91</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030267510894</TD><TD NoWrap>11:06:12</TD><TD NoWrap>67137995</TD>
<TD NoWrap>11:09:32</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD>
<TD NoWrap ALIGN='RIGHT'>16 </TD><TD NoWrap ALIGN='RIGHT'>78.2500</TD>
<TD NoWrap ALIGN='RIGHT'>1252.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD>
<TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.3093</TD>
<TD NoWrap ALIGN='RIGHT'>1251.68</TD></TR>
<!--tr tag missing-->
<TD NoWrap>2009030267510894</TD>
<TD NoWrap>11:06:12</TD><TD NoWrap>67138097</TD><TD NoWrap>11:09:34</TD>
<TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD><TD NoWrap ALIGN='RIGHT'>100 </TD>
<TD NoWrap ALIGN='RIGHT'>78.2500</TD><TD NoWrap ALIGN='RIGHT'>7825.00</TD>
<TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD>
<TD NoWrap ALIGN='RIGHT'>1.9333</TD><TD NoWrap ALIGN='RIGHT'>7823.06</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030267510894</TD><TD NoWrap>11:06:12</TD><TD NoWrap>67138333</TD><TD NoWrap>11:09:39</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD><TD NoWrap ALIGN='RIGHT'>200 </TD><TD NoWrap ALIGN='RIGHT'>78.2500</TD><TD NoWrap ALIGN='RIGHT'>15650.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>3.8666</TD><TD NoWrap ALIGN='RIGHT'>15646.12</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030267510894</TD><TD NoWrap>11:06:12</TD><TD NoWrap>67138344</TD><TD NoWrap>11:09:40</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD><TD NoWrap ALIGN='RIGHT'>318 </TD><TD NoWrap ALIGN='RIGHT'>78.2500</TD><TD NoWrap ALIGN='RIGHT'>24883.50</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>6.1479</TD><TD NoWrap ALIGN='RIGHT'>24877.34</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030268222556</TD><TD NoWrap>13:03:50</TD><TD NoWrap>67511545</TD><TD NoWrap>13:03:51</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>733 </TD><TD NoWrap ALIGN='RIGHT'>78.0000</TD><TD NoWrap ALIGN='RIGHT'>57174.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-57174.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030268222556</TD><TD NoWrap>13:03:50</TD><TD NoWrap>67511621</TD><TD NoWrap>13:03:53</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>2 </TD><TD NoWrap ALIGN='RIGHT'>78.0000</TD><TD NoWrap ALIGN='RIGHT'>156.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-156.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030268222556</TD><TD NoWrap>13:03:50</TD><TD NoWrap>67511797</TD><TD NoWrap>13:03:58</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>1 </TD><TD NoWrap ALIGN='RIGHT'>78.0000</TD><TD NoWrap ALIGN='RIGHT'>78.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-78.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030268222556</TD><TD NoWrap>13:03:50</TD><TD NoWrap>67512082</TD><TD NoWrap>13:04:05</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>264 </TD><TD NoWrap ALIGN='RIGHT'>78.0000</TD><TD NoWrap ALIGN='RIGHT'>20592.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-20592.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030268378000</TD><TD NoWrap>13:31:04</TD><TD NoWrap>67609079</TD><TD NoWrap>13:33:39</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>405 </TD><TD NoWrap ALIGN='RIGHT'>77.6000</TD><TD NoWrap ALIGN='RIGHT'>31428.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-31428.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030268378000</TD><TD NoWrap>13:31:04</TD><TD NoWrap>67609374</TD><TD NoWrap>13:33:46</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>45 </TD><TD NoWrap ALIGN='RIGHT'>77.6000</TD><TD NoWrap ALIGN='RIGHT'>3492.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-3492.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030268779359</TD><TD NoWrap>14:32:04</TD><TD NoWrap>67870192</TD><TD NoWrap>14:32:41</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>BUY</TD><TD NoWrap ALIGN='RIGHT'>900 </TD><TD NoWrap ALIGN='RIGHT'>77.3000</TD><TD NoWrap ALIGN='RIGHT'>69570.00</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>-69570.01</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030269013760</TD><TD NoWrap>15:03:56</TD><TD NoWrap>68018179</TD><TD NoWrap>15:03:56</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD><TD NoWrap ALIGN='RIGHT'>146 </TD><TD NoWrap ALIGN='RIGHT'>76.2500</TD><TD NoWrap ALIGN='RIGHT'>11132.50</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>2.8226</TD><TD NoWrap ALIGN='RIGHT'>11129.67</TD>
</TR>
<!--tr tag missing-->
<TD NoWrap>2009030269013760</TD><TD NoWrap>15:03:56</TD><TD NoWrap>68018180</TD><TD NoWrap>15:03:56</TD><TD NoWrap>SESA GOA LTD</TD><TD NoWrap>SELL</TD><TD NoWrap ALIGN='RIGHT'>10 </TD><TD NoWrap ALIGN='RIGHT'>76.2500</TD><TD NoWrap ALIGN='RIGHT'>762.50</TD><TD NoWrap ALIGN='RIGHT'>0.01</TD><TD NoWrap ALIGN='RIGHT'>0.00</TD><TD NoWrap ALIGN='RIGHT'>0.1933</TD><TD NoWrap ALIGN='RIGHT'>762.30</TD>
</TR>
<TABLE cellpadding=0 cellspacing=0 border=0><br>

解决方案

Since you tested my other idea and it didn't work, I think you have only two options:

Modify HTML Agility Pack to handle your case, or Fill in the missing </tr>s yourself.

Here's a regex that might fill in the missing </tr>s for you:

html = Regex.Replace(html, "<tr[^>]*>(?:(?!</?tr>|</tbody>|</table>).)*?(?=<tr[^>]*>|</tbody>|</table>)", "$&</tr>", RegexOptions.Singleline | RegexOptions.IgnoreCase);

(If someone can improve my regex, please feel free.)