我有一个dataframe (df),它有两个列(学生编号,值),一个列(值),每行有多个逗号分隔的值。我想计算每行在该列中出现的唯一值的次数。
df看起来如下所示:
我想知道每个学生编号的“值”列中每个值(0和1)出现了多少次。
本例中的结果如下所示:
student vector
0 (15,12)
1 (10,11)
2 (8,10)
3 (13,6)
4 (9,16)
( 15 , 12 )表示第一行中的数字(0)出现15次,数字(1)出现12次(学生编号0)。
( 10 , 11 )表示该数字(0)出现10次,数字(1)在第二行出现11次(学生编号1)等。
注意:
df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 student 5 non-null int64
1 values 5 non-null object
dtypes: int64(1), object(1)
memory usage: 208.0+ bytes
发布于 2022-10-29 21:38:53
因为在数据列中,不是字符串,而是数组,所以如果不匹配,可以将Counter
与dict.get一起用于获取0
:
from collections import Counter
def f(x):
d = Counter(x)
return ((d.get(0, 0), d.get(1, 0)))
df['vector'] = df['values'].apply(f)
print (df)
student values vector
0 0 [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0] (9, 0)
1 1 [1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0] (3, 6)
2 2 [0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0] (7, 2)
3 3 [1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0] (3, 6)
4 4 [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0] (9, 0)
https://stackoverflow.com/questions/74248991
复制相似问题