什么是一些资源,我可以用它来学习分析/优化?可以用、它来、资源

2023-09-02 21:51:32 作者:不爱就是不爱*

我刚刚继承运行方式将放缓,将不得不开始优化其C#项目。我想首先要做的就是多学一点有关分析/优化,因为我没有之前做到这一点。所以,问题是我从哪里开始,我能读什么书/博客/ articels?

I just inherited a C# project that runs way to slow and will have to start optimizing it. What I wanted to do first is learn a little more about profiling/optimizing since I didnt have to do it before. So the question is where do I start, what books/blogs/articels can I read?

我不知道.NET分析器像蚂蚁探查等,但我不知道如何有效地使用它们。我还没有真正使用它,就让它在几个示例应用程序的运行,以玩的输出。

I do know OF the .net profilers like ANTS profiler and so on, but I have no idea how to use them efficiently. I have not really used it, just let it run on a few sample apps to play around with the output.

推荐答案

有两个步骤优化code。

There are two steps to optimizing code.

首先,你需要找出什么是缓慢的。这是分析,正如你可能已经猜到,分析器通常用于此。大多数分析器通常是直接使用。您可以通过探查器运行应用程序,并终止时,分析器会告诉你有多少时间花费在每个功能,独有的以及包容性(时间在此花费(不函数从调用花了计时此功能)功能,包括孩子的函数调用)。

First, you need to find out what's slow. That's profiling, and, as you might guess, a profiler is commonly used for this. Most profilers are generally straightforward to use. You run your application through a profiler, and when it terminates, the profiler will show you how much time was spent in each function, exclusive (this function without counting time spent in function called from that) as well as inclusive (time spent in this function, including child function calls).

在换句话说,你会得到一个很大的调用树,而你只需要追捕的大数字。通常情况下,你只有很少的功能消耗的执行时间超过10%。因此,找到这些,你知道的什么的优化。

In other words, you get a big call tree, and you just have to hunt down the big numbers. Usually, you have very few functions consuming more than 10% of the execution time. So locate these and you know what to optimize.

请注意,一个分析器是既无必要,也不必然,在最佳的办法。一个非常简单的,但有效的,方法是只在调试器中运行该程序,并在一些准随机时间,暂停执行并查看调用堆栈。这样做只是一对夫妇的时候,你有,你的执行时间花在了一个非常好的主意。 @Mike Dunlavey谁在这个答案评论其他地方深入描述了这种方法。

Note that a profiler is neither necessary nor, necessarily, the best approach. A remarkably simple, but effective, approach is to just run the program in a debugger, and, at a few quasi-random times, pause execution and look at the call stack. Do this just a couple of times, and you have a very good idea of where your execution time is being spent. @Mike Dunlavey who commented under this answer has described this approach in depth elsewhere.

但现在,你知道的执行时间被消耗,然后是棘手的部分,如何的优化code。

But now that you know where the execution time is being spent, then comes the tricky part, how to optimize the code.

当然,最有效的方法是经常在高级别之一。该问题有以这种方式来解决?它有没有在所有需要解决?难道已经解决提前,结果缓存,因此它可以被立即发送时,应用程序的其它部分需​​要它? 是否有更有效的算法来解决这个问题?

Of course, the most effective approach is often the high-level one. Does the problem have to be solved in this way? Does it have to be solved at all? Could it have been solved in advance and the result cached so it could be delivered instantly when the rest of the app needed it? Are there more efficient algorithms for solving the problem?

如果你可以申请这种高层次的优化,做到这一点,看看是否有更好的性能足够,如果不是,再次剖析。

If you can apply such high-level optimizations, do that, see if that improved performance sufficiently, and if not, profile again.

迟早,你可能不得不潜入更多低级别的优化。这是一个棘手的领土,但。如今的电脑是pretty的复杂,你从他们身上得到的表现并不简单。一个分支或一个函数调用的成本可以变化很大,这取决于上下文。两个数相加一起可以从0到100的时钟周期需要的任何地方取决于这两个值是否已经在CPU的寄存器,什么的其他的正当时执行,和许多其它因素。因此,在这个级别的优化需要(1)一个很好的理解的CPU如何工作的,以及(2)大量的实验和测量。你可以很容易地使你的的变化认为的会更快,但你需要确定,因此前和变化后测量性能。

Sooner or later, you may have to dive into more low-level optimizations. This is tricky territory though. Today's computers are pretty complex, and the performance you get from them is not straightforward. The cost of a branch or a function call can vary widely depending on the context. Adding two numbers together may take anywhere from 0 to 100 clock cycles depending on whether both values were already in the CPU's registers, what else is being executed at the time, and a number of other factors. So optimization at this level requires (1) a good understanding of how the CPU works, and (2) lots of experimentation and measurements. You can easily make a change that you think will be faster, but you need to be sure, so measure the performance before and after the change.

有经验,往往能够帮助指导优化的几个基本原则:

There are a few general rules of thumb that can often help guide optimizations:

I / O是昂贵的。 CPU指令被测量在一纳秒的级分。 RAM存取是在几十到几百纳秒的量级。一个硬盘的访问可能需要几十毫秒* *秒。所以很多时候,I / O将是什么拖慢您的应用程序。 请问您的应用程序中执行一些大型I / O读取(读一大块一个20MB的文件),或无数小的(从一个文件中读取的字节2052到2073,然后读取另一个文件中的几个字节)?较少的大读取可以通过几千倍速度的I / O了。

I/O is expensive. CPU instructions are measured in fractions of a nanosecond. RAM access is in the order of tens to hundreds of nanoseconds. A harddrive access may take tens of *milli*seconds. So often, I/O will be what's slowing down your application. Does your application perform few large I/O reads (read a 20MB file in one big chunk), or countless small ones (read bytes 2,052 to 2073 from one file, then read a couple of bytes from another file)? Fewer large reads can speed your I/O up by a factor of several thousand.

页面错误涉及到硬盘的访问了。在内存中的页面都被推到页面文件,以及分页出来的人必须读回内存。如果发生这种情况很多,这将是缓慢的。你可以提高数据的局部性使较少的页面,将需要在同一时间?你可以简单地购买更多的RAM为主机以避免页面数据了呢? (一般情况下,硬件是便宜的升级计算机是完全有效的优化 - 。但要确保升级将有所作为磁盘读取不会是快了不少,买了更快的计算机,如果一切符合到RAM中。在旧系统中,有一个在买一个8倍的RAM)是没有意义

Pagefaults involve harddrive accesses too. In-memory pages have to be pushed to the pagefile, and paged-out ones have to be read back into memory. If this happens a lot, it's going to be slow. can you improve the locality of your data so fewer pages will be needed at the same time? Can you simply buy more RAM for the host computer to avoid having to page data out? (As a general rule, hardware is cheap. Upgrading the computer is a perfectly valid optimization - but make sure the upgrade will make a difference. Disk reads won't be a lot faster by buying a faster computer. And if everything fits into RAM on your old system, there's no point in buying one with 8 times as much RAM)

您的数据库依赖于硬盘的访问了。所以,你可以用缓存在内存中的数据,而只是偶尔写出来到数据库中脱身? (当然有存在的风险。如果应用程序崩溃,会发生什么?

Your database relies on harddrive accesses too. So can you get away with caching more data in RAM, and only occasionally writing it out to the database? (Of course there's a risk there. What happens if the application crashes?

再有就是大家的喜爱,穿线。现代CPU具有可在任何地方从2到16个CPU核心。您是否使用它们?你会使用它们受益?是否有长期运行可被异步执行的操作?应用程序启动在单独的线程操作,并且然后能够恢复正常运行瞬间,而不是阻止,直到操作完成。

And then there's everyone favorite, threading. A modern CPU has anywhere from 2 to 16 CPU cores available. Are you using them all? Would you benefit from using them? Are there long-running operations that can be executed asynchronously? The application starts the operation in a separate thread, and is then able to resume normal operation instantly, rather than blocking until the operation is complete.

因此​​,基本上,使用分析器来了解你的应用程序。它是如何度过它的执行时间,在那里它被用在何处?是内存消耗问题?哪些I / O模式(既硬盘和网络访问,以及任何其他类型的I / O)? 在CPU刚搅动了所有的时间,或者是空闲等待一些外部事件,如I / O或定时器?

So basically, use the profiler to understand your application. How does it spend its execution time, where is it being spent? Is memory consumption a problem? What are the I/O patterns (both harddrive and network accesses, as well as any other kind of I/O)? Is the CPU just churning away all the time, or is it idle waiting for some external events, such as I/O or timers?

然后了解尽可​​能多地了解它的计算机上运行。了解它有哪些可用资源(CPU高速缓存,多核心),和他们每个人的手段表现。

And then understand as much as possible about the computer it's running on. Understand what resources it has available (CPU cache, multiple cores), and what each of them means for performance.

这是所有pretty的模糊,因为技巧来优化大型数据库服务器将要的非常的从你会怎么做,以优化一些大的数字运算算法不同。

This is all pretty vague, because the tricks to optimizing a big database server are going to be very different from what you'd do to optimize some big number-crunching algorithm.