分段最小二乘法,不明白这个动态规划的概念可言不明白、乘法、小二、概念

2023-09-11 05:37:58 作者:ぉ老街んоù巷

我一直在努力,现在实现这个算法在Python几天。我继续回来给它,只是放弃和感到沮丧。我不知道怎么回事。我没有任何人提出的任何地方或去帮助,所以我来到这里。

I've been trying to implement this algorithm in Python for a few days now. I keep coming back to it and just giving up and getting frustrated. I don't know whats going on. I don't have anyone to ask or anywhere to go for help so I've come here.

PDF警告:http://www.cs.uiuc.edu/class/sp08/cs473/Lectures/lec10.pdf

我不认为它是一个清楚的解释,我当然不明白。

I don't think its a clearly explained, I sure don't understand.

我对所发生的事情的理解是这样的:

My understanding of what's happening is this:

我们有一组点(X1,Y1),(X2,Y2)......的,我们希望找到最适合该数据的一些线路。我们可以有多个直线,这些行来从给定的论坛a和b(Y = AX + B)。

We have a set of of points (x1,y1), (x2,y2).. and we want to find some lines that best fit this data. We can have multiple straight lines, and these lines come from the given forums for a and b (y = ax +b).

现在该算法开始于结束(?),并假定一个点p(x_i,y_i)是线段的一部分。然后将音符说,最佳解决方案是'为{p1的最优解。 。 。 PI-1}加(最好)通过{圆周线。 。 。 PN}。这只是意味着给我,说我们去点p(x_i,y_i)和倒退,并发现通过点的其余另一线段。现在最佳的解决方案是这两个线段。

Now the algorithm starts at the end (?) and assumes that a point p(x_i, y_i) is part of the line segment. Then the notes say that the optimal solution is 'is optimal solution for {p1, . . . pi−1} plus (best) line through {pi , . . . pn}'. Which just means to me, that we go to the point p(x_i, y_i) and go backwards and find another line segment through the rest of the points. Now the optimal solution is both these line segments.

然后,它需要一个合乎逻辑的跳跃,我不能跟随,并说假设的最后一点PN是,在p_i开始段的一部分,如果选件(j)表示第j个点和E的成本(十中,k)的 到k线最好贯通点Ĵ错误,那么选件(N)= E(I,N)+ C +选件(I - 1)

Then it takes a logical jump I can't follow, and says "Suppose the last point pn is part of a segment that starts at p_i. If Opt(j) denotes the cost of the first j points and e(j,k) the error of the best line through points j to k then Opt(n) = e(i, n) + C + Opt(i − 1)"

再有就是伪code的算法,我不明白。据我所知,我们要遍历点的列表,找到其中最小化OPT(N)式的点,但我就是不遵守它。它让我觉得自己很蠢。

Then there is the pseudocode for the algorithm, which I don't understand. I understand that we want to iterate through the list of points and find the points which minimize the OPT(n) formula, but I just don't follow it. It's making me feel stupid.

我知道这个问题是一个痛苦的屁股,而这并不容易回答,但我只是在寻找一些指导对理解这个算法。我的PDF道歉,但我没有得到关键信息给读者一个更合适的方法。

I know this question is a pain in the ass and that it's not easy to answer but I'm just looking for some guidance towards understanding this algorithm. I apologize for the PDF but I don't have a neater way of getting the crucial information to the reader.

感谢您的时间和阅读这一点,我AP preciate吧。

Thank you for your time and reading this, I appreciate it.

推荐答案

从1点开始,最好的解决办法,直到一个j点,必须包括在最后一条线段新的最终点j,使问题变得哪里我必须把最后的分割,以尽量减少这最后的线段的费用是多少?

Starting from point 1, the best solution up until a point j, must include the new end-point j in the last line segment, so the problem becomes where do I have to place the last split to minimize the cost of this last line-segment?

幸运的成本计算在你试图解决同样的问题的子问题而言,幸运的你已经被你移动到下一个点j的时间解决这些较小的子问题。因此,在新的点j你正在努力寻找一个最佳点i,点1和J之间,分裂掀起新线段,包括J和最大限度地降低成本:optimal_cost_up_to(我)+ cost_of_split + cost_of_lsq_fit(I + 1 ,j)的。现在混乱的部分是,在任何时候它可能看起来像你只是找到一个单一的分裂,但在现实中,所有的previous分裂是由optimal_cost_up_to(我)决定的,因为你已经计算出最优的成本所有这些问题导致了到j,那么你只需要memoize的答案,这样的算法并不那么每次进步一点时间来重新计算这些成本。

Fortunately the cost is calculated in terms of subproblems of the same problem you are trying to solve, and fortunately you've already solved these smaller subproblems by the time you've moved to the next point j. So at the new point j you are trying to find an optimal point i, between points 1 and j, to split off a new line segment that includes j, and minimizes the cost: optimal_cost_up_to(i) + cost_of_split + cost_of_lsq_fit(i+1,j). Now the confusing part is that at any point it might seem like you are just finding a single split, but in reality all the previous splits are determined by optimal_cost_up_to(i), and since you've already calculated the optimal cost for all these points leading up to j, then you just need to memoize the answers so that the algorithm doesn't have to recalculate these costs each time it advances a point.

在Python中我可能会使用字典来存储结果,但是这个动态规划算法阵列可能会更好......

In python I'd probably use a dictionary to store the results, although for this dynamic programming algorithm an array might be better...

反正...

    def optimalSolution(points,split_cost)
        solutions = {0:{'cost':0,'splits':[]}}
        for j in range(1,len(points)):
            best_split = None
            best_cost = lsqFitCost(points,0,j)
            for i in range(0,j):
                cost = solutions[i]['cost'] + split_cost + lsqFitCost(points,i+1,j)
                if cost < best_cost:
                   best_cost = cost
                   best_split = i
            if best_split != None:
                solution[j] = {'cost':best_cost,'splits':solution[best_split]['splits']+[best_split]}
            else:
                solution[j] = {'cost':best_cost,'splits':[]}
        return solutions

这不是pretty的,我也没有签(有可能是涉及在没有分裂就是最好的成本的情况下的错误),但希望它会帮助你在正确的道路上?需要注意的是lsqFitCost已经做了很多工作,在每次迭代,但是对于最小二乘线性拟合这样就可以使这个成本少了很多维持在计算中使用运行总和......你应该谷歌最小二乘线拟合更多信息。这可能使lsqFitCost不变,所以总的时间是O(N ^ 2)。

it's not pretty, and I haven't checked it (there might be an error involving the case where no split is the best cost), but hopefully it'll help get you on the right path? Note that lsqFitCost has to do a lot of work on each iteration, but for a least squares line fit like this you can make this cost a lot less by maintaining running sums used in the calculation... you should Google least squares line fitting for more info. This could make lsqFitCost constant so the overall time would be O(N^2).