Python的文件解析:建立从文本文件树文本文件、文件、Python

2023-09-10 23:06:28 作者:往事归零

我有一个将被用来建立一个树中的缩进的文本文件。每一行再presents一个节点,重新缩进present深度以及节点当前节点的孩子。

I have an indented text file that will be used to build a tree. Each line represents a node, and indents represent depth as well as node the current node is a child of.

例如,一个文件可能看起来像

For example, a file might look like


ROOT
   Node1
      Node2
         Node3
            Node4
   Node5
   Node6

这表明ROOT包含三个孩子:1,5,6,节点1有一个孩子:2,和节点有一个孩子:3,等等

Which indicates that ROOT contains three children: 1, 5, and 6, Node1 has one child: 2, and Node2 has one child: 3, etc.

我想出了一个递归算法,并编写了它,和它的作品,但它是一种丑陋,尤其是(从节点4将节点5时)把上面很粗略的例子

I have come up with a recursive algorithm and have programmed it and it works, but it's kind of ugly and especially treats the example above very crudely (when going from node 4 to node 5)

它使用缩数为基础进行递归,所以如果缩进的数量=当前深度+ 1,我会去更深一层。但是,这意味着,当我读少缩进一条线,我要回来了一个级别的时间,每次检查的深度。

It uses "indent count" as the basis for recursion, so if the number of indents = current depth + 1, I would go one level deeper. But this means when I read a line with less indents, I have to come back up one level at a time, checking the depth each time.

下面是我


def _recurse_tree(node, parent, depth):
    tabs = 0

    while node:
        tabs = node.count("\t")
        if tabs == depth:
            print "%s: %s" %(parent.strip(), node.strip())
        elif tabs == depth + 1:
            node = _recurse_tree(node, prev, depth+1)
            tabs = node.count("\t")

            #check if we have to surface some more
            if tabs == depth:
                print "%s: %s" %(parent.strip(), node.strip())
            else:
                return node
        else:
            return node

        prev = node
        node = inFile.readline().rstrip()

inFile = open("test.txt")
root = inFile.readline().rstrip()
node = inFile.readline().rstrip()
_recurse_tree(node, root, 1)

现在我刚打印出的节点,以验证父节点是正确的每一行,但也许有一个更清洁的方式做到这一点?特别是在ELIF块的情况下,当我从每个递归调用回来。

Right now I am just printing out the nodes to verify that the parent node is correct for each line, but maybe there is a cleaner way to do it? Especially the case in the elif block when I'm coming back from each recursion call.

推荐答案

最大的问题是向前看,我认为造成问题的丑陋。它可以稍微缩短:

The big issue is the "lookahead" that I think caused the ugliness in question. It can be shortened slightly:

def _recurse_tree(parent, depth, source):
    last_line = source.readline().rstrip()
    while last_line:
        tabs = last_line.count('\t')
        if tabs < depth:
            break
        node = last_line.strip()
        if tabs >= depth:
            if parent is not None:
                print "%s: %s" %(parent, node)
            last_line = _recurse_tree(node, tabs+1, source)
    return last_line

inFile = open("test.txt")
_recurse_tree(None, 0, inFile)

既然我们谈到递归,我煞费苦心,以避免任何全局变量( last_line )。这将是更Python,使他们的成员在某些解析器对象。

Since we're talking recursion, I took pains to avoid any global variables (source and last_line). It would be more pythonic to make them members on some parser object.