我可以用什么算法来识别网页上的内容可以用、算法、网页、内容

2023-09-11 03:20:17 作者：後世續前緣

我装了在浏览器中（即它的DOM和元素的定位都可以访问到我）一个网页，我想找到块元素（或这些元素的排序列表），其中可能包含了大多数内容（如在文本中的连续块）。我们的目标是要排除的东西，如菜单，页眉，页脚以及这样

I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such.