避免SQL不是通过更换和长度检查长度、不是、SQL

2023-09-03 22:57:20 作者:身虽存〃心已死゛

我有一个情况下,我必须动态地创建我的SQL字符串,我试图用paramaters和sp_executesql的在可能的情况,所以我可以重复使用查询计划。在做大量的在线阅读和个人的经验,我发现NOT INS和INNER / LEFT JOINS是缓慢的表演和昂贵的时基(最左边)表比较大(1.5M行,像50列)。我也看过,使用任何类型的功能应该避免,因为它会减慢查询,所以我不知道这是坏?

I have a situation where I have to dynamically create my SQL strings and I'm trying to use paramaters and sp_executesql where possible so I can reuse query plans. In doing lots of reading online and personal experience I have found "NOT IN"s and "INNER/LEFT JOIN"s to be slow performers and expensive when the base (left-most) table is large (1.5M rows with like 50 columns). I also have read that using any type of function should be avoided as it slows down queries, so I'm wondering which is worse?

我已经使用这个解决办法在过去,虽然我不知道它是做什么,以避免使用NOT IN与项目列表的时候,比如我传递的列表的最好的事情3字符串与例如一个管定界符(只元件之间):

I have used this workaround in the past, although I'm not sure it's the best thing to do, to avoid using a "NOT IN" with a list of items when, for example I'm passing in a list of 3 character strings with, for example a pipe delimiter (only between elements):

LEN(@param1) = LEN(REPLACE(@param1, [col], '')) 

来代替:

[col] NOT IN('ABD', 'RDF', 'TRM', 'HYP', 'UOE') 

...想象串为1至约80的可能值长的列表,并且该方法不适合它自于paraterization任

...imagine the list of strings being 1 to about 80 possible values long, and this method doesn't lend it self to paraterization either.

在这个例子中,我可以用=一个NOT IN,我会用传统的列表技术为我,或者!=如果这是一个速度更快,虽然我对此表示怀疑。这是比使用NOT IN快?

In this example I can use "=" for a NOT IN and I would use a traditional list technique for my IN, or != if that is a faster although I doubt it. Is this faster than using the NOT IN?

作为一个可能的第三选择,如果我知道所有的其他的可能性(在IN无限可能性,这有可能是80-95x长列表),并通过这些来代替;这将在应用程序的业务层进行并采取工作量离开的SQL Server。不是一个很好的possability查询计划重用,但如果它刮胡子一两秒钟掀起了很大的讨厌的查询,凭啥不。

As a possible third alternative, what if I knew all the other possibilities (the IN possabilities, which could potentially be 80-95x longer list) and pass those instead; this would be done in the application's Business Layer as to take the workload off of the SQL Server. Not a very good possability for query plan reuse but if it shaves a sec or two off a big nasty query, why the hell not.

我也善于SQL CLR函数创建。由于上述的字符串操作将一个CLR函数最好?

I'm also adept at SQL CLR function creation. Since the above is string manipulation would a CLR function be best?

思考?

先谢谢您的任何和所有帮助/咨询/等。

Thanks in advance for any and all help/advice/etc.

推荐答案

由于高德纳常常是(MIS)引用,premature的优化是万恶之源。 因此,首先,你确定,如果你写你的code以最清晰简单的方式(来写和读),其执行速度很慢?如果没有,检查它,开始使用任何聪明优化的技巧了。

As Donald Knuth is often (mis)quoted, "premature optimization is the root of all evil". So, first of all, are you sure that if you write your code in the most clear and simple way (to both write and read), it performs slowly? If not, check it, before starting to use any "clever" optimization tricks.

如果在code是缓慢的,检查查询计划thouroughly。大多数时候查询执行的时间比查询编译长得多,所以平时你不用担心的查询计划重用。因此,建立最佳的索引和/或表结构通常给显著更好的结果比调整查询是建立的方式。

If the code is slow, check the query plans thouroughly. Most of the time query execution takes much longer than query compilation, so usually you do not have to worry about query plan reuse. Hence, building optimal indexes and/or table structures usually gives significantly better results than tweaking the ways the query is built.

例如,我严重怀疑你的查询与LEN和REPLACE比NOT IN更好的性能 - 在任何情况下,所有的行都将被扫描并进行匹配检查。对于一个足够长的名单MSSQL优化器会自动创建一个临时表来优化相等比较。 更有甚者,像这样的招数往往会引入错误。比如说,你的榜样将无法正常工作,如果[COL] ='AB'

For instance, I seriously doubt that your query with LEN and REPLACE has better performance than NOT IN - in either case all the rows will be scanned and checked for a match. For a long enough list MSSQL optimizer would automatically create a temp table to optimize equality comparison. Even more, tricks like this tend to introduce bugs: say, your example would work incorrectly if [col] = 'AB'.

在查询通常是更快然后NOT IN,因为对于在查询中的行的唯一部分是足够进行检查。该方法的效率取决于你是否能得到正确的列表中的速度不够快。

IN queries are often faster then NOT IN, because for IN queries only part of the rows is enough to be checked. The efficiency of the method depends on whether you can get a correct list for IN quickly enough.

说起传递一个可变长度的列表,服务器,还有这里是在SO和其他地方的许多讨论。通常情况下,你的选择是:

Speaking of passing a variable-length list to the server, there're many discussions here on SO and elsewhere. Generally, your options are:

表值参数(MSSQL 2008+只), 动态构造SQL(容易出错和/或不安全的), 在临时表(好长的列表,以书面形式和执行时间较短的人可能太多的开销), 在分隔字符串(好为乖巧的值短名单 - 就像一把整数), 在XML参数(有些复杂,但效果很好 - 如果你使用一个良好的XML库,不构成'手'复杂的XML文本)。

下面是一个文章随着这些技术的一个很好的概述和多了一些。

Here is an article with a good overview of these techniques and a few more.