Python的:加权平均算法大 pandas算法、加权平均、Python、pandas

2023-09-11 02:32:07 作者:眉间苦涩

我有一个数据帧,看起来像这样:

I have a dataframe that looks like this:

Out[14]:
    impwealth  indweight
16     180000     34.200
21     384000     37.800
26     342000     39.715
30    1154000     44.375
31     421300     44.375
32    1210000     45.295
33    1062500     45.295
34    1878000     46.653
35     876000     46.653
36     925000     53.476

我要计算使用频率的权重列 impwealth indweight 的加权中值。我的伪code是这样的:

I want to calculate the weighted median of the column impwealth using the frequency weights in indweight. My pseudo code looks like this:

# Sort `impwealth` in ascending order 
df.sort('impwealth', 'inplace'=True)

# Find the 50th percentile weight, P
P = df['indweight'].sum() * (.5)

# Search for the first occurrence of `impweight` that is greater than P 
i = df.loc[df['indweight'] > P, 'indweight'].last_valid_index()

# The value of `impwealth` associated with this index will be the weighted median
w_median = df.ix[i, 'impwealth']

这个方法似乎笨重,而且我不知道这是正确的。我没有找到一个内置的方式做到这一点的大熊猫参考。什么是去寻找加权中值的最佳方式是什么?

This method seems clunky, and I'm not sure it's correct. I didn't find a built in way to do this in pandas reference. What is the best way to go about finding weighted median?

推荐答案

您是否尝试过 wqantiles 包?我从来没有使用过它,但它有一个加权中值函数,似乎给至少一个合理的答案(你可能想仔细检查,它的使用你所期望的方式)。

Have you tried the wqantiles package? I had never used it before, but it has a weighted median function that seems to give at least a reasonable answer (you'll probably want to double check that it's using the approach you expect).

In [12]: import weighted

In [13]: weighted.median(df['impwealth'], df['indweight'])
Out[13]: 914662.0859091772