其中普通防爆pression算法并JavaScript中使用正则表达式为?算法、普通、正则表达式、pression

2023-09-11 03:44:25 作者:人不风流枉少年

我今天读在两个不同的普通恩pression算法这篇文章。

根据文章旧的Unix工具,如编辑,战略经济对话的grep,egrep的,awk和法,都使用什么所谓的汤普森NFA算法在其经常EX presssions ...

According to the article old Unix tools like ed, sed, grep, egrep, awk, and lex, all use what's called the Thompson NFA algorithm in their regular expresssions...

不过新如Java,Perl和PHP和Python的工具都使用不同的算法,他们经常EX pressions是非常非常慢。

However newer tools like Java, Perl, PHP, and Python all use a different algorithm for their regular expressions that are much, much slower.

本文只字不提,在所有JavaScript的正则表达式algorthim,(是的,我知道有各种JS引擎在那里),但我想知道是否有人知道其中哪些他们使用的算法,如果可能的算法应该被替换为汤普森NFA。

This article makes no mention at all of Javascript's regex algorthim, (and yes I know there are various JS engines out there) but I was wondering if anybody knew which of those algorithms they use, and if maybe those algorithms should be swapped out for Thompson NFA.

推荐答案

JavaScript的ECMA语言描述不强加要求的具体落实定期EX pressions,因此,问题的一部分是不是很好-formed。你真的想知道在特定浏览器的特定实现。

The Javascript ECMA language description doesn't impose a requirement for the particular implementation of regular expressions, so that part of the question isn't well-formed. You're really wondering about the particular implementation in a particular browser.

原因的Perl / Python的等使用较慢的算法,虽然是确定的正则表达式语言不是的真正的定期EX pressions。一个真正的普通EX pression可以pssed作为一个有限状态机EX $ P $,但正则表达式的语言是上下文无关。这就是为什么时尚只是调用谈论经常EX pressions它的正则表达式代替。

The reason Perl/Python etc use a slower algorithm, though, is that the regex language defined isn't really regular expressions. A real regular expression can be expressed as a finite state machine, but the language of regex is context free. That's why the fashion is to just call it "regex" instead of talking about regular expressions.

是的,事实上JavaScript的正则表达式是不是内容免费常规。使用考虑语法`{N,M}',也就是从匹配的 N 的到的 M 的接受regexs。让我们的ð的区别的ð的= | 纳米的|。语法意味着存在一个字符串的 UX ð是W 的是可以接受的,但一个字符串的 UX K>ð是W 的那不是。它遵循经由泵送引理正则语言,这不是一个常规的语

Yes, in fact javascript regex isn't content free regular. Consider the syntax using `{n,m}', that is, matches from n to m accepted regexs. Let d the difference d=|n-m|. The syntax means there exists a string uxdw that is acceptable, but a string uxk>dw that is not. It follows via the pumping lemma for regular languages that this is not a regular language.

(augh。Thinko纠正。)

(augh. Thinko corrected.)