包含在阵列Python中的算法区(图)阵列、算法、Python

2023-09-11 23:12:03 作者:待我发光闪瞎你

可视化大阵的数字,每个数字再presents上条形图酒吧的高度。

Visualize a large array of numbers where each number represents a height of a bar on a bar graph.

例如: [5,4,3,7,2,3,1,12]

       █
       █
       █
       █
       █
   █   █
   █   █
█  █   █
██ █   █
████ █ █
██████ █
████████

分析:

这是previous数的条形图。我需要找到的是containted在图形中打开(或填充)单元号的区域。

Analyze:

This is the bar graph of the previous numbers. What I need to find is the area containted in the graph in number of open (or unfilled) units.

去这个我做了一个算法来计算阵列中的所有峰值。

To go about this I made an algorithm to calculate all the peaks in the array.

这将返回: [5,7,3,12] ,以及另一个列表,每个条目的指数, [0,3 ,5,7]

This returns : [5, 7, 3, 12] as well as another list with the indices of each entry, [0,3,5,7]

对于我们来说,只有三种找到该地区重要的峰值。该 5 7 12 。 然后,我们可以把它分解这样的。

To us, there are only three important peaks to find the area. The 5, the 7, and the 12. We can then break it down like this.

开区的5和7之间的量(一般情况):

The amount of open area in between the 5 and 7 is (general rule):

(([指数放大] - [指数较小] - [1])* [SmallerValue]) - [值全部在B / W为]

所以第一部分的面积将是(2 * 5) - (4 + 3) 10-7 3 。这是有道理的,因为如果你看一下图中可以看到有一个空的L形部分,你可以在没有它四溢适合3个单位,都说水。 如果你与你得到它的正确的区域,以及第二部分重复这一点。

So the area of the first section would be (2*5) - (4+3) or 10-7 or 3. THis makes sense because if you look at the graph you see there is an empty L shaped section that you could fit 3 units of say, water in without it overflowing. If you repeat this with the second section you get its correct area as well.

在这种情况下,这是非常容易,看看如何可以做。您只需编写的算法中发现的 3 7 12小因此摆脱它,并返回峰改良版本。

In this case it is extremely easy to see how that could be done. You simply write an algo to find that the 3 is smaller than the 7 and the 12 so get rid of it and return a refined version of the peaks.

但它并不总是那么简单。

However it is not always that simple.

我有一个数组:

[5,4,3,7,2,3,1,12日​​,9日,10日,5,3,6,8,5,6,4,7,6,9, 4,11,11,4,1,2,1]

通过一个基本的自定义峰发现算法 O(N)运行它返回的是:

Running it through a basic custom peak finding algorithm O(N) It returns:

[5,7,3,12,10,8,6,7,9,11,11,4,2]

在这个例子中,我们看到了同样的问题,在这个问题的第一部分,然而,在 12 在这个峰值列表,一个人可以很容易地看到,看看下一个最重要的高峰期是两个 11S 4 2 。所以,我需要一种方法,从去:

In this example, we see the same problem in the first part of this question, however, after the 12 in this peak list, a human can easily see that the next most important peak to look at are the two 11s, the 4, and 2. So I need a way to go from:

[5,7,3,12,10,8,6,7,9,11,11,4,2]

要:

[5,7,12,11,11,4,2]

上面的阵列的重要的峰是必要找到的区域,并再次显现开放块仿佛他们会含有水或东西,使得它们溢出之前限定于最低即时峰的列表。

The above array is a list of the 'important' peaks that are necessary to find the area and again visualize the open blocks as if they'd contain water or something so that they are limited to the lowest immediate peak before overflowing.

要更好地观察这个更全,第二个例子我有图形的图片,所有的峰值和数据点的这里。

To better visualize this more full, second example I have a picture of the graph and all of its peaks and data points here.

感谢你。

推荐答案

我觉得这个处理所有条件,但所有的最大的计算会慢下来的大型数据集。我用IPython的笔记本来绘制它。它基本上是@雷米的想法:

I think this handles all conditions but all the maximum calculations will slow it down for large data sets. I used IPython Notebook to graph it. It is basically @Rémi's idea:

对于任何数据点:

径的最大点的左侧和最大点在右边。 一个。如果在两端假定为零。 取最小值的两个最高分。 如果该数据点是低于最低,这是在水下,并返回区别其他为零。 Take the maximum point to the left and the maximum point to the right. a. If at the ends assume zero. Take the minimum of the two maximum points. If the datapoint is below that minimum, it is underwater and return the difference else zero.

有可能通过计算左最大,因为它扫描到右边,并且计算对于每个位置的右边的最大值的时间提前于由右至左的单程被优化

It could be optimized by computing the left maximum as it scans to the right, and computing the right maximums for each position ahead of time in a single pass from right to left.

由于是花了约4.1秒的算法来对我的系统做的10,000个数据点。

The algorithm as is took about 4.1 seconds to do 10,000 data points on my system.

的未填充区域(黄色),将和(C)

The unfilled area (yellow) will be sum(C):

%matplotlib inline
import matplotlib.pyplot as plt
import random

def contribution(L,i):
    max_left = 0 if i==0 else max(L[:i])
    max_right = 0 if i==len(L)-1 else max(L[i+1:])
    lower = min(max_left,max_right)
    return 0 if lower < L[i] else lower - L[i]

N = [random.randint(0,12) for i in range(50)]
C = [contribution(N,i) for i in range(len(N))]

ind = list(range(len(N))) # the x locations for the groups
width = 1                 # the width of the bars: can also be len(x) sequence

p1 = plt.bar(ind, N, width, color='r')
p2 = plt.bar(ind, C, width, color='y',bottom=N)

下面是实现我上面提到的优化速度更快的版本。它计算百万的数据点在1.33秒,但使用较少量的下面作图。我看不出它如何能在一次完成,因为细胞需要知道最大限度其左侧的和的权利,有可能是多个点等于最大的两个方向。

Here's a faster version that implements the optimization I mentioned above. It computes one million datapoints in 1.33 seconds, but uses a smaller amount for graphing below. I don't see how it could be done in one pass, given that a cell needs to know the maximum to its left and right and there could be multiple points equal to the maximum in either direction.

%matplotlib inline
import matplotlib.pyplot as plt
import random

def right_maximums(L):
    '''Given list L, compute [max(L[i+1:] for i in range(len(L)-1)]+[0] more efficiently.

    This gives the maximum cell to the right of the current cell.
    Example: [1,2,3,4,5,4,3,2,1] -> [5,5,5,5,4,3,2,1,0]
    '''
    N = [0]
    for i,v in enumerate(L[:0:-1]):
        N.append(max(N[i],v))
    return N[::-1]

def contribution(N):
    '''In a bar graph of data N, compute how much "water" a data valley, assuming water
    spills off the sides of the bar graph.
    '''
    rmaxs = right_maximums(N) # compute maximums to the right of a data point in advance.
    lmax = 0 # compute maximums to the left as we go.
    C = []
    for i,v in enumerate(N):
         # find the lower of the left and right maximum.
        lower = min(lmax,rmaxs[i])
        # if the data point is higher than the maximums, it won't hold water,
        # else it holds the difference between the lower maximum and its value.
        C.append(0 if lower < v else lower - v)
        lmax = max(lmax,v)
    return C

N = [random.randrange(0,50) for i in range(50)]
C = contribution(N)

ind = list(range(len(N))) # the x locations for the groups
width = 1                 # the width of the bars: can also be len(x) sequence

p1 = plt.bar(ind, N, width, color='r')
p2 = plt.bar(ind, C, width, color='y',bottom=N)