如何分析一个字符串以每年提取范围值?字符串、范围

2023-09-11 07:17:19 作者:如果爱可以重来

我收到更改请求,我不能确定如何最好地接近它。如果客户端搜索的东西,它需要指定一年或一年的范围比我们在我们的数据库更大,我不得不返回对应于最新的年份范围,我们有结果。

目前,我们已经在数据库的所有结果遵循下列模式之一:

  Thing1 Thing2 S1 //这里也有一些结果,没有一年
Thing1 Thing2 2006-07系列6 //有一些结果与X系列
Thing1 Thing2 2006-2007 S12 RP //一些resuls有SN或SN YZ
Thing1 Thing2 2020-21 S6 //一些成果没有一个完整的第二年
Thing1 Thing2二〇二二年至2024年S12
Thing1 Thing2 2024年起//相匹配的结果的最后一年只是有一年及放大器; '起'
 

有更多的结果Thing1 Thing2可以在世界上,马上要到2060年,但我们只保留14年有价值的数据,因为14年后(比如2026或2028),该数据是完全一样的年previous。

最大的一年,我们有,而在实存增加2年每2年最大的一年。因此,在2012年,我们将有Thing1 Thing2 2026年起,最大的实存将是2062。

因此​​,基本上,我需要确定何时为[Thing1(或)Thing2与上年范围],如果第一年的值大于客户端搜索[今年+ 14]我得回到[今年+ 14],但只有当年是偶数年,否则我必须返回[今年+ 13]。

时遇到的麻烦是如何识别在不遵循一个很好定义的模式,比年范围的第一部分的其他以一个4位数字年一个字符串中间一年。

什么是我去了解这一点的最好方法是什么?可能有人建议我怎么能接近一个解决这个问题?谢谢你。

解决方案 技术干货丨拥有这些编程技巧,轻松玩转ABB机器人

这正则表达式模式将很好地工作: \ B(< YEAR1> \ D {4}?)(:-(? < YEAR2>?\ d {2,4}))\ b

说明:

\ b :是一个字边界,以确保我们捕获年完全靠自己,而不是作为另一个词的部分(即没有部分匹配) - 这是用来固定的图案两端 (?< YEAR1> \ D {4}):命名捕获组匹配4个数字 ( - (小于YEAR2> \ D {2,4})?):此相匹配的 - 冲,然后使用名为捕获组第2年匹配2-4重复数字,因为这些年的长度不同。打开和关闭括号组这种模式一起,最后是结尾的使整组可选,其中第二年不存在的情况下。

技术上的 \ D {2,4} 部分接受07, 107 ,2007年。显然,一个3位数的年份不正确。我建议你​​进行额外的错误检查捕获这样的场景。你可以prevent它通过将其更改为 \ D {2} |。\ D {4} 但你会匹配YEAR1而不是YEAR2而失去用户输入

这里的code:

 的String []输入= {Thing1 Thing2 S1,Thing1 Thing2 2006-07系列6,Thing1 Thing2 2006-2007 S12 RP,Thing1 Thing2 2020-21 S6中,Thing1 Thing2 2022年至2024年S12中,Thing1 Thing2 2024年起};
字符串模式= @\ B(?&其中; YEAR1> \ D {4})( - (小于?YEAR2> \ D {2,4}))\ b的;
正则表达式RX =新的正则表达式(模式);

的foreach(在输入VAR输入)
{
    匹配M = rx.Match(输入);
    Console.WriteLine({0}:{1},m.Success,输入);
    如果(m.Success)
    {
        字符串YEAR1 = m.Groups [YEAR1]值。
        字符串YEAR2 = m.Groups [YEAR2]值。
        Console.WriteLine(YEAR1:{0},YEAR2:{1},YEAR1,YEAR2 ==N / A,?:YEAR2);
    }
    Console.WriteLine();
}
 

I received a change request and I'm unsure how to best approach it. If the client searches for something and they specify a year or year range greater than what we have in our database, I have to return the result that corresponds to the latest year range that we have.

Currently the results we have in the db all follow one of the following pattern:

Thing1 Thing2 S1 // There's some results with no year
Thing1 Thing2 2006-07 Series 6 // there's some results with 'Series X'
Thing1 Thing2 2006-2007 S12 RP // some resuls have SN or SN YZ
Thing1 Thing2 2020-21 S6 // some results don't have a full second year
Thing1 Thing2 2022-2024 S12
Thing1 Thing2 2024 Onwards // the result that matches the final year just has the year & 'Onwards'

There are more results for Thing1 Thing2 available in the world, going up to 2060, but we only keep +14 years worth of data, because after 14 years (say 2026 or 2028), the data is exactly the same as the years previous.

The maximum year we have, and the maximum year in existance increases by 2 years every 2 years. So in 2012, we'll have Thing1 Thing2 2026 Onwards, and the maximum in existance will be 2062.

So basically, I need to identify when the client searches for [Thing1 (or) Thing2 with a year range], and if the first year value is greater than [this year + 14] I have to return [this year + 14], but only if the current year is an even year, otherwise I have to return [this year + 13].

The trouble I'm having is how to identify a year in the middle of a string that doesn't follow a well defined pattern, other than the first part of the year range starts with a 4 digit year.

What is the best way for me to go about this? Could somebody suggest how I could approach a solution to this issue? Thanks.

解决方案

This regex pattern would work nicely: \b(?<Year1>\d{4})(?:-(?<Year2>\d{2,4}))?\b

Explanation:

\b: is a word-boundary to ensure we're capturing the years entirely on their own and not as part of another word (i.e., no partial match) - this is used to anchor both ends of the pattern (?<Year1>\d{4}): named capture group to match 4 digits (-(?<Year2>\d{2,4}))?: this matches the - dash and then uses a named capture group for the 2nd year which matches 2-4 repeated digits since those years vary in length. The opening and closing parentheses groups this pattern together, and finally the trailing ? makes the entire group optional for cases where the second year doesn't exist.

Technically the \d{2,4} part accepts 07, 107, 2007. Obviously a 3 digit year is incorrect. I suggest you perform additional error checking to capture such scenarios. You could prevent it by changing it to \d{2}|\d{4} but then you would match Year1 and not Year2 and lose user input.

Here's the code:

string[] inputs = { "Thing1 Thing2 S1", "Thing1 Thing2 2006-07 Series 6", "Thing1 Thing2 2006-2007 S12 RP", "Thing1 Thing2 2020-21 S6", "Thing1 Thing2 2022-2024 S12", "Thing1 Thing2 2024 Onwards" };
string pattern = @"\b(?<Year1>\d{4})(-(?<Year2>\d{2,4}))?\b";
Regex rx = new Regex(pattern);

foreach (var input in inputs)
{
    Match m = rx.Match(input);
    Console.WriteLine("{0}: {1}", m.Success, input);
    if (m.Success)
    {
        string year1 = m.Groups["Year1"].Value;
        string year2 = m.Groups["Year2"].Value;
        Console.WriteLine("Year1: {0}, Year2: {1}", year1, year2 == "" ? "N/A" : year2);
    }
    Console.WriteLine();
}