根据其他行中的值删除行
问题内容:
我一直在寻找一种根据要检查的条件从另一行中删除行的方法。
这是我的数据框:
product product_id account_status
prod-A 100 active
prod-A 100 cancelled
prod-A 300 active
prod-A 400 cancelled
如果产品&和product_id组合存在account_status =’active’的行,则保留该行并删除其他行。
所需的输出是:
product product_id account_status
prod-A 100 active
prod-A 300 active
prod-A 400 cancelled
请提出建议。
问题答案:
对于更一般的解决方案,account_status
如果每个组中至少存在一个active
值,则仅删除每个组中的另一个值:
print (df)
product product_id account_status
0 prod-A 100 active
1 prod-A 100 cancelled <- necessary remove
2 prod-A 300 active
3 prod-A 400 cancelled
4 prod-A 500 active
5 prod-A 500 active
6 prod-A 600 cancelled
7 prod-A 600 cancelled
s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
product product_id account_status
0 prod-A 100 active
2 prod-A 300 active
3 prod-A 400 cancelled
4 prod-A 500 active
5 prod-A 500 active
6 prod-A 600 cancelled
7 prod-A 600 cancelled
还可以与多个类别配合使用:
print (df)
product product_id account_status
0 prod-A 100 active
1 prod-A 100 cancelled <- necessary remove
2 prod-A 100 pending <- necessary remove
3 prod-A 300 active
4 prod-A 300 pending <- necessary remove
5 prod-A 400 cancelled
6 prod-A 500 active
7 prod-A 500 active
8 prod-A 600 pending
9 prod-A 600 cancelled
s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
product product_id account_status
0 prod-A 100 active
3 prod-A 300 active
5 prod-A 400 cancelled
6 prod-A 500 active
7 prod-A 500 active
8 prod-A 600 pending
9 prod-A 600 cancelled