正则表达式匹配的两个或多个连续字符多个、字符、两个、正则表达式

2023-09-03 21:57:11 作者:kimi会发光丫^

使用常规的前pressions我想匹配一言而

Using regular expressions I want to match a word which

以字母开头 有英语alpahbets 数字,时间,连字符(。)( - ),下划线(_) 在不应该有两个或更多的连续周期或连字符或下划线 在可以有多个时段或连字符或下划线

例如,

flin..stones或flin__stones或FLIN - 石头

flin..stones or flin__stones or flin--stones

是不允许的。

fl_i_stones或fli_st.ones或flin.stones或flinstones

fl_i_stones or fli_st.ones or flin.stones or flinstones

是允许的。

到目前为止,我经常EX pression是 ^ [A-ZA-Z] [A-ZA-Z \ D ._-] + $

So far My regular expression is ^[a-zA-Z][a-zA-Z\d._-]+$

所以,我的问题是如何使用正则EX pression做

So My question is how to do it using regular expression

推荐答案

您可以使用的前瞻和逆向引用以解决这个问题。但需要注意的是,现在你需要至少2个字符。起始字母和另一个(由于 + )。你可能想使该 + * 使第二个字符类可以重复0次或更多次:

You can use a lookahead and a backreference to solve this. But note that right now you are requiring at least 2 characters. The starting letter and another one (due to the +). You probably want to make that + and * so that the second character class can be repeated 0 or more times:

^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d._-]*$

如何超前的工作?首先,这是一个负面的预计。如果这个模式里面找到了匹配,超前导致整个模式失败,反之亦然。因此,我们可以有一个模式内的匹配,如果我们的做有两个连续的字符。首先,我们来看看在字符串中的任意位置(。* ),那么我们配单(任意)字符()和捕捉的用括号。因此,一个角色进入捕获组 1 。然后,我们要求所应遵循这本身捕获组(引用它\ 1 )。因此,内部格局将在每一个位置上尝试在字符串中(由于回溯)是否有后跟自己的字符。如果发现这两个连续的字符,图案就会失败。如果它们不能被发现,发动机跳回其中先行开始(所述字符串的开头),并继续具有匹配的实际模式

How does the lookahead work? Firstly, it's a negative lookahead. If the pattern inside finds a match, the lookahead causes the entire pattern to fail and vice-versa. So we can have a pattern inside that matches if we do have two consecutive characters. First, we look for an arbitrary position in the string (.*), then we match single (arbitrary) character (.) and capture it with the parentheses. Hence, that one character goes into capturing group 1. And then we require this capturing group to be followed by itself (referencing it with \1). So the inner pattern will try at every single position in the string (due to backtracking) whether there is a character that is followed by itself. If these two consecutive characters are found, the pattern will fail. If they cannot be found, the engine jumps back to where the lookahead started (the beginning of the string) and continue with matching the actual pattern.

另外,您可以拆分这分成两个独立的检查。一个有效字符和首字母:

Alternatively you can split this up into two separate checks. One for valid characters and the starting letter:

^[a-zA-Z][a-zA-Z\d._-]*$

和一个用于连续字符(在这里您可以反转匹配结果):

And one for the consecutive characters (where you can invert the match result):

(.)\1

这会大大增加你的code的可读性(因为它是比先行少晦涩),也将让你检测实际问题的模式,并返回一个适当的和有用的错误消息。

This would greatly increase the readability of your code (because it's less obscure than that lookahead) and it would also allow you to detect the actual problem in pattern and return an appropriate and helpful error message.