创建与所有可能的组合的字符串组合、字符串

2023-09-11 05:55:07 作者：莫念初。

我使用的是OCR算法（基于的tesseract），它具有识别某些字符的困难。我已经部分解决，通过建立自己的后处理哈希表，其中包括对字符。例如，由于文字只是数字，我想通了，如果有问：字符的文本中，它应该是 9 代替。

I am using a OCR algorithm (tesseract based) which has difficulties with recognizing certain characters. I have partially solved that by creating my own "post-processing hash-table" which includes pairs of characters. For example, since the text is just numbers, I have figured out that if there is Q character inside the text, it should be 9 instead.

不过，我有 6 和 8 字符，一个更严重的问题，因为他们都被认为是 B 。现在，因为我知道我要找的（当我翻译文本的图像）和字符串是相当短（6〜8位数字），我认为创建的所有可能组合的字符串6 和 8 ，并比较他们每个人到一个我期待的。

However I have a more serious problem with 6 and 8 characters since both of them are recognized as B. Now since I know what I am looking for (when I am translating the image to text) and the strings are fairly short (6~8 digits), I thought to create strings with all possible combinations of 6 and 8 and compare each one of them to the one I am looking for.

因此，举例来说，我得到OCR识别以下字符串：

So for example, I have the following string recognized by the OCR:

L0B7B0B5

因此，每个 B 这里可以是 6 或 8 。

现在我想产生类似下面的列表：

Now I want to generate a list like the below:

因此，它是一种二进制表的3位数字，并在此情况下，有8种选择。但 B 字符的字符串，量能不是3（它可以是任何数字）。

So it's kind of binary table with 3 digits and in this case there are 8 options. But the amount of B characters in string can be other than 3 (it can be any number).

我曾尝试使用Python itertools 有这样的事情模块：

I have tried to use Python itertools module with something like that:

list(itertools.product(*["86"] * 3))

这将提供以下结果：

Which will provide the following result:

[('8', '8', '8'), ('8', '8', '6'), ('8', '6', '8'), ('8', '6', '6'), ('6', '8', '8'), ('6', '8', '6'), ('6', '6', '8'), ('6', '6', '6')]

我以为那我以后可以用它来交换 B 字符。但是，由于某种原因，我不能让 itertools 的工作在我的环境。我认为它有事情做，我使用的的Jython 而非单纯的的Python 的事实。

which I assume I can then later use to swap B characters. However, for some reason I can't make itertools work in my environment. I assume it has something to do the fact I am using Jython and not pure Python.

我会很乐意听到任何其他的想法如何完成这个任务。也许有一个简单的解决方案，我没有想到的？

I will be happy to hear any other ideas as how to complete this task. Maybe there is a simpler solution I didn't think of?

推荐答案

itertools.product 接受重复关键词您可以使用：

itertools.product accepts a repeat keyword that you can use:

In [92]: from itertools import product

In [93]: word = "L0B7B0B5"

In [94]: subs = product("68", repeat=word.count("B"))

In [95]: list(subs)
Out[95]: 
[('6', '6', '6'),
 ('6', '6', '8'),
 ('6', '8', '6'),
 ('6', '8', '8'),
 ('8', '6', '6'),
 ('8', '6', '8'),
 ('8', '8', '6'),
 ('8', '8', '8')]

后来有相当简洁的方法，使该换人是做一个还原操作字符串替换方法：

In [97]: subs = product("68", repeat=word.count("B"))

In [98]: [reduce(lambda s, c: s.replace('B', c, 1), sub, word) for sub in subs]
Out[98]: 
['L0676065',
 'L0676085',
 'L0678065',
 'L0678085',
 'L0876065',
 'L0876085',
 'L0878065',
 'L0878085']

另一种方法，利用一对夫妇更多的功能 itertools ：

In [90]: from itertools import chain, izip_longest

In [91]: subs = product("68", repeat=word.count("B"))

In [92]: [''.join(chain(*izip_longest(word.split('B'), sub, fillvalue=''))) for sub in subs]
Out[92]: 
['L0676065',
 'L0676085',
 'L0678065',
 'L0678085',
 'L0876065',
 'L0876085',
 'L0878065',
 'L0878085']

上一篇：在拉宾，卡普滚动哈希

下一篇：couting后边缘来获得在有向图cylces的数目数目、边缘、couting、cylces

相关推荐

精彩图集

精彩推荐

图片推荐

儿子悄悄为父母购车母亲到店后急得狂飙