我是熊猫的新手,也是泰坦尼克号数据集的实验对象。在使用了groupby函数之后,如果我应用mean()
,那么它就能正常工作。但是当我计算std()
时,我得到的是NaN
。我理解这意味着默认情况下排除缺失的值,而不考虑它们,但是std将它们考虑在内。
我尝试过更改ddof=1
和skipna=True
,但它们并不适用于groupby。请帮帮忙。我的代码中的标题来自使用功能工程的名称。我试图根据特定人群中所有乘客年龄的平均值和性病来预测年龄的缺失值(就像主人的平均年龄一样)。因此,只根据母机的均值和std按群函数计算母机丢失的年龄)。
std = train_df.groupby(['Title'])['Age'].std()
print(std)
mean = train_df.groupby(['Title'])['Age'].mean()
print(mean)
Output for sd
Title
Capt NaN
Col 2.828427
Don NaN
Dr 19.295941
Jonkheer NaN
Lady NaN
Major 4.949747
Master 3.792621
Miss 14.525089
Mlle 0.000000
Mme NaN
Mr 17.604569
Mrs 16.292678
Ms NaN
Rev 13.136463
Sir NaN
the Countess NaN
Output for mean
Title
Capt 70.000000
Col 58.000000
Don 40.000000
Dr 36.000000
Jonkheer 38.000000
Lady 48.000000
Major 48.500000
Master 4.025000
Miss 17.450549
Mlle 24.000000
Mme 24.000000
Mr 24.903288
Mrs 31.016000
Ms 28.000000
Rev 43.166667
Sir 49.000000
the Countess 33.000000
DATAFRAME
Survived Pclass Name Sex Age SibSp Parch Ticket Fare Embarked Title
0 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 S Mr
1 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1 0 PC 17599 71.2833 C Mrs
2 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 S Miss
3 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 S Mrs
4 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 S Mr
发布于 2019-09-05 11:55:00
标准差计算为sqrt(sum((x-x.mean())**2)/(n-ddof))
,其中n
是序列的长度,ddof
是均值的δ自由度。来自文档
在标准统计实践中,ddof=1提供了对无限总体方差的无偏估计。ddof=0为正态分布变量提供了方差的最大似然估计。
请注意,1
的默认值是ddof
,ddof
是,而是numpy。对于只有一个条目的组,由于除以0,我们得到了NaN:
df.groupby(['Title'])['Age'].agg(['count','std'])
count std
Title
Capt 1 NaN
Col 2 2.828427
Don 1 NaN
Dr 6 12.016655
Jonkheer 1 NaN
Lady 1 NaN
Major 2 4.949747
Master 36 3.619872
Miss 146 12.990292
Mlle 2 0.000000
Mme 1 NaN
Mr 398 12.708793
Mrs 108 11.433628
Ms 1 NaN
Rev 6 13.136463
Sir 1 NaN
the Countess 1 NaN
如果希望std
与ddof
of 0
一起使用,则可以为熊猫函数指定ddof=0
,也可以使用numpy函数及其ddof
默认值0
df.groupby(['Title'])['Age'].std(ddof=0)
df.groupby(['Title'])['Age'].agg(lambda x: pd.np.std(x))
https://stackoverflow.com/questions/57806722
复制