算法排除号码算法、号码

2023-09-11 03:03:43 作者:淇實ヤ心佷痛

您将得到一个整数N,它适合于长期(不少于2 ^ 63-1)和其他50个整数。你的任务是找到多少个号码从1到N包含没有一个50个号码作为其子?

You are given a integer N which fits in long(less than 2^63-1) and 50 other integers. Your task is to find how many numbers from 1 to N contain none of the 50 numbers as its substring?

来自这个问题接受采访。

This question is from an interview.

推荐答案

这是只什么larsmans已经写了一个说明。如果你喜欢这个答案,请投他,除了

This is only an explanation of what larsmans already wrote. If you like this answer, please vote him up in addition.

有一个有限自动机,FA,只是一组的状态的,说的规则,如果你是在状态取值和下一个字符你喂养的 C 则转变为状态 T 。两种状态的是特殊的。手段之一,从这里开始,另一个意思是我成功匹配。人物之一是特别,意思是刚刚结束的字符串。所以,你需要一个字符串和一个有限自动机,开始在初始状态,再喂字符转换成机器和不断变化的状态。你不能,如果你给任何国家意想不到的输入相匹配。你成功,如果你曾经达到的状态我成功匹配的匹配。

A finite automaton, FA, is just a set of states, and rules saying that if you are in state S and the next character you are fed is c then you transition to state T. Two of the states are special. One means, "start here" and the other means "I successfully matched". One of the characters is special, and means "the string just ended". So you take a string and a finite automaton, start in the starting state, keep feeding characters into the machine and changing states. You fail to match if you give any state unexpected input. You succeed in matching if you ever reach the state "I successfully matched".

现在有一个著名的算法将一个普通的前pression成有限自动机的一个串当且仅当,经常EX pression匹配匹配。 (如果你读过有关正EX pressions,这是何等的DFA引擎的工作。)为了说明我将使用模式 ^ *(44 | 3)。* $ 这表示字符串,任何数量的字符的起始,后跟44或3,接着任意数量的字符,然后是串的结束。

Now there is a well-known algorithm for converting a regular expression into a finite automaton that matches a string if and only if that regular expression matches. (If you've read about regular expressions, this is how DFA engines work.) To illustrate I'll use the pattern ^.*(44|3).*$ which means "start of the string, any number of characters, followed by either 44 or 3, followed by any number of characters, followed by the end of the string."

首先,让我们标记所有的位置,我们可以在常规的前pression,当我们正在寻找下一个字符: ^ A *(4 B 4 | 3)。 C * $

First let's label all of the positions we can be in in the regular expression when we're looking for the next character: ^A.*(4B4|3)C.*$

我们经常EX pression引擎的状态将是这些职位的子集,以及特殊的状态相匹配。状态转换的结果将是状态集合,我们可以得到,如果我们在那个位置,看到了一个特定的字符。我们的出发位置是在可再生能源,这是{A}的开始。这里是可以达成的状态:

The states of our regular expression engine will be subsets of those positions, and the special state matched. The result of a state transition will be the set of states we could get to if we were at that position, and saw a particular character. Our starting position is at the start of the RE, which is {A}. Here are the states that can be reached:

S1: {A}   # start
S2: {A, B}
S3: {A, C}
S4: {A, B, C}
S5: matched

下面是过渡时期的规则:

Here are the transition rules:

S1:
  3: S3
  4: S2
  end of string: FAIL
  any other char: S1
S2:
  3: S3
  4: S3
  end of string: FAIL
  any other char: S1
S3:
  4: S4
  end of string: S5 (match)
  any other char: S3
S4:
  end of string: S5 (match)
  any other char: S4

现在,如果你采取任何字符串,开始在国家 S1 ,并遵守规则,你会匹配这个模式。这个过程可能是漫长而乏味的,但幸运的是可以实现自动化。我的猜测是,larsmans拥有自动化是供自己使用。 (技术说明,从为在RE位置扩张套位置,你有可能会在既可以做在前面,因为在这里,或在运行时,对于大多数的RE最好是做在前面,因为在这里,但病理例子一小部分会风与一个非常大的数目的状态,它可以更好地做那些在运行时。)

Now if you take any string, start that in state S1, and follow the rules, you'll match that pattern. The process can be long and tedious, but luckily can be automated. My guess is that larsmans has automated it for his own use. (Technical note, the expansion from "positions in the RE" to "sets of positions you might possibly be in" can be done either up front, as here, or at run time. For most REs it is better to do it up front, as here. But a tiny fraction of pathological examples will wind up with a very large number of states, and it can be better to do those at run-time.)

我们可以与任何常规EX pression做到这一点。例如 ^([1-9] | 1 [0-9] | 2 [0-7])$ 可以得到标记: ^ A ([1-9] | 1 B [0-9] | 2 Ç [0-7]) D $ ,我们得到了状态:

We can do this with any regular expression. For instance ^([1-9]|1[0-9]|2[0-7])$ can get labeled: ^A([1-9]|1B[0-9]|2C[0-7])D$ and we get the states:

T1: {A}
T2: {D}
T3: {B, D}
T4: {C, D}

和转换:

T1:
  1: T3
  2: T4
  3-9: T2
  any other char: FAIL
T2:
  end of string: MATCH
  any other char: FAIL
T3:
  0-9: T2
  end of string: MATCH
  any other char: FAIL
T4:
  0-7: T2
  end of string: MATCH
  any other char: FAIL

好了,我们知道什么是正规的前pression是,什么是有限自动机,以及它们与。什么是两个有限自动机的交集?这仅仅是一个有限自动机匹配当两个有限自动机单独匹配,否则不匹配。这是很容易构造,其状态集合仅仅是组对中所述一个的状态,而在其他的状态下。它的转换规则是只适用于每一个成员的过渡统治独立,如果有一个失败,整个的确,如果这两个比赛他们都做。

OK, so we know what a regular expression is, what a finite automaton, and how they relate. What is the intersection of two finite automata? It is just a finite automaton that matches when both finite automata individually match, and otherwise fails to match. It is easy to construct, its set of states is just the set of pairs of a state in the one, and a state in the other. Its transition rule is to just apply the transition rule for each member independently, if either fails the whole does, if both match they both do.

有关上述对,让我们实际执行的交叉点上的数字 13 。我们开始在状态(S1,T1)

For the above pair, let's actually execute the intersection on the number 13. We start in state (S1, T1)

state: (S1, T1)  next char: 1
state: (S1, T3)  next char: 3
state: (S3, T2)  next char: end of string
state: (matched, matched) -> matched

,然后在数 14

state: (S1, T1)  next char: 1
state: (S1, T3)  next char: 4
state: (S2, T2)  next char: end of string
state: (FAIL, matched) -> FAIL

现在我们来到了这整点。由于最终的有限自动机,我们可以用动态规划找出多少字符串有那场比赛吧。下面是计算:

Now we come to the whole point of this. Given that final finite automata, we can use dynamic programming to figure out how many strings there are that match it. Here is that calculation:

0 chars:
  (S1, T1): 1
    -> (S1, T3): 1 # 1
    -> (S1, T4): 1 # 2
    -> (S3, T2): 1 # 3
    -> (S2, T2): 1 # 4
    -> (S1, T2): 5 # 5-9
1 chars:
  (S1: T2): 5      # dead end
  (S1, T3): 1
    -> (S1, T2): 8 # 0-2, 5-9
    -> (S2, T2): 1 # 3
    -> (S3, T2): 1 # 4
  (S1, T4): 1
    -> (S1, T2): 6 # 0-2, 5-7
    -> (S2, T2): 1 # 3
    -> (S3, T2): 1 # 4
  (S2, T2): 1      # dead end
  (S3, T2): 1
    -> match:    1 # end of string
2 chars:
  (S1, T2): 14     # dead end
  (S2, T2): 2      # dead end
  (S3, T2): 2
    -> match     2 # end of string
  match:    1
    -> match     1 # carry through the count
3 chars:
  match:    3

确定,这是一个很大的工作,但我们发现,有3个字符串,同时符合这两方面的规则。我们做了一个方式,是自动化的,可扩展到更大的数字。

OK, that's a lot of work, but we found that there are 3 strings that match both of those rules simultaneously. And we did it in a way that is automatable and scaleable to much larger numbers.

当然,我们的问题最初提出的是有多少匹配的第二个,但不是第一个。嗯,我们知道27的比赛第二个规则,3比赛双方,所以24必须在第二条规则匹配,但不是第一个。

Of course the question we were originally posed was how many matched the second but not the first. Well we know that 27 match the second rule, 3 match both, so 24 must match the second rule but not the first.

正如我以前说过,这只是larsmans解决方案阐明。如果你学到了一些东西,upvote他,投给他的回答。如果该材料听起来很有趣,去购买一本书一样的预设电台语言语用的和学到更多关于有限自动机,解析,编译,等等。这是一个非常不错的技能有,而且太多的程序员不知道。

As I said before, this is just larsmans solution elucidated. If you learned something, upvote him, vote for his answer. If this material sounds interesting, go buy a book like Progamming Language Pragmatics and learn a lot more about finite automata, parsing, compilation, and the like. It is a very good skillset to have, and far too many programmers don't.