.NET:从检索DTD prevent XmlDocument.LoadXmlDTD、NET、prevent、LoadXml

2023-09-04 23:49:33 作者:左岸青春ぃ右岸年华

我有以下code(C#),时间过久,它抛出异常:

 新的XmlDocument()。
loadXML的(?< XML版本='1.0'>< D​​OCTYPE注意系统'的http:// someserver / DTD'?!><注意>< /注意事项>中);
 

我明白为什么它做到这一点。我的问题是我怎么让它停下来?我不关心DTD验证。我想我可能只是正则表达式替换它,但我正在寻找更好的解决方案。

背景: 实际的XML是从一个网站,我没有收到。当网站正在维护中返回的XML与DOCTYPE指向的DTD,这不是可用的维护过程中。所以,我的服务得到不必要的慢,因为它试图获取DTD,因为我需要解析每个XML。

下面是异常堆栈:

 未处理的异常:System.Net.WebException:远程名称无法解析:someserver
在System.Net.HttpWebRequest.GetResponse()
在System.Xml.XmlDownloadManager.GetNonFileStream(URI URI,ICredentials凭证)
在System.Xml.XmlDownloadManager.GetStream(URI URI,ICredentials凭证)
在System.Xml.XmlUrlResolver.GetEntity(URI绝对URI,字符串的作用,类型ofObjectToReturn)
在System.Xml.XmlTextReaderImpl.OpenStream(URI URI)
在System.Xml.XmlTextReaderImpl.DtdParserProxy_PushExternalSubset(字符串的systenId,字符串publicId)
在System.Xml.XmlTextReaderImpl.DtdParserProxy.System.Xml.IDtdParserAdapter.PushExternalSubset(字符串的systenId,字符串publicId)
在System.Xml.DtdParser.ParseExternalSubset()
在System.Xml.DtdParser.ParseInDocumentDtd(布尔saveInternalSubset)
在System.Xml.DtdParser.Parse(布尔saveInternalSubset)
在System.Xml.XmlTextReaderImpl.DtdParserProxy.Parse(布尔saveInternalSubset)
在System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
在System.Xml.XmlTextReaderImpl.ParseDocumentContent()
在System.Xml.XmlTextReaderImpl.Read()
在System.Xml.XmlLoader.LoadDocSequence(XmlDocument的parentDoc)
在System.Xml.XmlLoader.Load(XmlDocument的文档,XmlReader的读者,布尔preserveWhitespace)
在System.Xml.XmlDocument.Load(XmlReader的读者)
在System.Xml.XmlDocument.LoadXml(XML字符串)
在ConsoleApplication36.Program.Main(字串[] args)在C:\项目\ TEMP \ ConsoleApplication36 \的Program.cs:行11
 
xml语言,dtd约束是什么,xml的属性语法,xml文档的dom树的讲解

解决方案

那么,在.NET 4.0中的XmlTextReader有一个名为DtdProcessing财产。当设置为DtdProcessing.Ignore应该禁用DTD处理。

I have following code (C#), it takes too long and it throws exception:

new XmlDocument().
LoadXml("<?xml version='1.0' ?><!DOCTYPE note SYSTEM 'http://someserver/dtd'><note></note>");

I understand why it does that. My question is how do I make it stop? I don't care about DTD validation. I suppose I could just regex-replace it, but I am looking for more elegant solution.

Background: The actual XML is received from a web site I do not own. When site is undergoing maintenance it returns XML with DOCTYPE that points to the DTD that's not available during maintenance. So my service gets unnecessary slow because it tries to get DTD for each XML I need to parse.

Here is exception stack:

Unhandled Exception: System.Net.WebException: The remote name could not be resolved: 'someserver'
at System.Net.HttpWebRequest.GetResponse()
at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri, ICredentials credentials)
at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials)
at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
at System.Xml.XmlTextReaderImpl.OpenStream(Uri uri)
at System.Xml.XmlTextReaderImpl.DtdParserProxy_PushExternalSubset(String systemId, String publicId)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.System.Xml.IDtdParserAdapter.PushExternalSubset(String systemId, String publicId)
at System.Xml.DtdParser.ParseExternalSubset()
at System.Xml.DtdParser.ParseInDocumentDtd(Boolean saveInternalSubset)
at System.Xml.DtdParser.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ConsoleApplication36.Program.Main(String[] args) in c:\Projects\temp\ConsoleApplication36\Program.cs:line 11

解决方案

Well, in .NET 4.0 XmlTextReader has a property called DtdProcessing. When set to DtdProcessing.Ignore it should disable DTD processing.