根据其他行中的值删除行


问题内容

我一直在寻找一种根据要检查的条件从另一行中删除行的方法。

这是我的数据框:

product product_id  account_status
prod-A  100         active
prod-A  100         cancelled
prod-A  300         active
prod-A  400         cancelled

如果产品&和product_id组合存在account_status =’active’的行,则保留该行并删除其他行。

所需的输出是:

product product_id  account_status
prod-A  100         active
prod-A  300         active
prod-A  400         cancelled

我看到了这里提到的解决方案但无法将其复制为字符串。

请提出建议。


问题答案:

对于更一般的解决方案,account_status如果每个组中至少存在一个active值,则仅删除每个组中的另一个值:

print (df)
  product  product_id account_status
0  prod-A         100         active
1  prod-A         100      cancelled <- necessary remove
2  prod-A         300         active
3  prod-A         400      cancelled
4  prod-A         500         active
5  prod-A         500         active
6  prod-A         600      cancelled
7  prod-A         600      cancelled

s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
  product  product_id account_status
0  prod-A         100         active
2  prod-A         300         active
3  prod-A         400      cancelled
4  prod-A         500         active
5  prod-A         500         active
6  prod-A         600      cancelled
7  prod-A         600      cancelled

还可以与多个类别配合使用:

print (df)
  product  product_id account_status
0  prod-A         100         active
1  prod-A         100      cancelled <- necessary remove
2  prod-A         100        pending <- necessary remove
3  prod-A         300         active
4  prod-A         300        pending <- necessary remove
5  prod-A         400      cancelled
6  prod-A         500         active
7  prod-A         500         active
8  prod-A         600        pending
9  prod-A         600      cancelled

s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
  product  product_id account_status
0  prod-A         100         active
3  prod-A         300         active
5  prod-A         400      cancelled
6  prod-A         500         active
7  prod-A         500         active
8  prod-A         600        pending
9  prod-A         600      cancelled