我写的Flex(AS3)一个HTML解析器,我需要删除一些不需要的HTML标签。
I'm writing a HTML parser in Flex (AS3) and I need to remove some HTML tags that are not needed.
例如,我想从这个code删除的div:
For example, I want to remove the divs from this code:
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p style="padding-left: 18px; padding-right: 20px; text-align: center;">
<span></span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: bold; text-decoration: none; font-family: Arial;">20% OFF.</span>
<span> </span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: normal; text-decoration: none; font-family: Arial;">Do it NOW!</span>
<span> </span>
</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
和像这样结束:
<div>
<p style="padding-left: 18px; padding-right: 20px; text-align: center;">
<span></span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: bold; text-decoration: none; font-family: Arial;">20% OFF.</span>
<span> </span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: normal; text-decoration: none; font-family: Arial;">Do it NOW!</span>
<span> </span>
</p>
</div>
我的问题是,我怎么能写一个正EX pression删除这些不必要的DIV?有没有更好的办法做到这一点?
My question is, how can I write a regular expression to remove these unwanted DIVs? Is there a better way to do it?
在此先感谢。
假设你的目标HTML实际上是有效的XML,您可以用递归函数来拖延非DIV位。
Assuming that your target HTML is actually valid XML, you can use a recursive function to drag out the non-div bits.
static function grabNonDivContents(xml:XML):XMLList {
var out:XMLList = new XMLList();
var kids:XMLList = xml.children();
for each (var kid:XML in kids) {
if (kid.name() && kid.name() == "div") {
var grandkids:XMLList = grabNonDivContents(kid);
for each (var grandkid:XML in grandkids) {
out += grandKid;
}
} else {
out += kid;
}
}
return out;
}