使用histogram2d python查找均值bin值[重复]
问题内容:
这个问题已经在这里有了答案 :
使用scipy / numpy在python中合并数据 (6个答案)
4年前关闭。
如何在python中使用2D直方图计算垃圾箱的平均值?我在x和y轴上有温度范围,我正在尝试使用各个温度下的垃圾箱来绘制闪电的可能性。我正在读取一个csv文件中的数据,我的代码是这样的:
filename = 'Random_Events_All_Sorted_85GHz.csv'
df = pd.read_csv(filename)
min37 = df.min37
min85 = df.min85
verification = df.five_min_1
#Numbers
x = min85
y = min37
H = verification
#Estimate the 2D histogram
nbins = 4
H, xedges, yedges = np.histogram2d(x,y,bins=nbins)
#Rotate and flip H
H = np.rot90(H)
H = np.flipud(H)
#Mask zeros
Hmasked = np.ma.masked_where(H==0,H)
#Plot 2D histogram using pcolor
fig1 = plt.figure()
plt.pcolormesh(xedges,yedges,Hmasked)
plt.xlabel('min 85 GHz PCT (K)')
plt.ylabel('min 37 GHz PCT (K)')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Probability of Lightning (%)')
plt.show()
这使图表看起来很漂亮,但是绘制的数据是计数或落入每个仓中的样本数。验证变量是一个包含1和0的数组,其中1表示闪电,0表示没有闪电。我希望图表中的数据是基于来自验证变量的数据针对给定bin发生闪电的概率-
因此,我需要bin_mean * 100才能获得此百分比。
我尝试使用类似于此处所示的方法(使用scipy /
numpy在python中对数据进行分箱
),但是我很难使它适用于2D直方图。
问题答案:
至少可以使用以下方法做到这一点
# xedges, yedges as returned by 'histogram2d'
# create an array for the output quantities
avgarr = np.zeros((nbins, nbins))
# determine the X and Y bins each sample coordinate belongs to
xbins = np.digitize(x, xedges[1:-1])
ybins = np.digitize(y, yedges[1:-1])
# calculate the bin sums (note, if you have very many samples, this is more
# effective by using 'bincount', but it requires some index arithmetics
for xb, yb, v in zip(xbins, ybins, verification):
avgarr[yb, xb] += v
# replace 0s in H by NaNs (remove divide-by-zero complaints)
# if you do not have any further use for H after plotting, the
# copy operation is unnecessary, and this will the also take care
# of the masking (NaNs are plotted transparent)
divisor = H.copy()
divisor[divisor==0.0] = np.nan
# calculate the average
avgarr /= divisor
# now 'avgarr' contains the averages (NaNs for no-sample bins)
如果您事先知道bin的边缘,则只需添加一行就可以用相同的方式处理直方图部分。