减少字典列表的优雅方法?


问题内容

我有一个词典列表,每个词典包含完全相同的键。我想找到每个键的平均值,我想知道如何使用reduce进行操作(或者,如果不能使用比嵌套fors更优雅的方法,则无法做到)。

这是清单:

[
  {
    "accuracy": 0.78,
    "f_measure": 0.8169374016795885,
    "precision": 0.8192088044235794,
    "recall": 0.8172222222222223
  },
  {
    "accuracy": 0.77,
    "f_measure": 0.8159133315763016,
    "precision": 0.8174754717495807,
    "recall": 0.8161111111111111
  },
  {
    "accuracy": 0.82,
    "f_measure": 0.8226353934130455,
    "precision": 0.8238175920455686,
    "recall": 0.8227777777777778
  }, ...
]

我想找回我这样的字典:

{
  "accuracy": 0.81,
  "f_measure": 0.83,
  "precision": 0.84,
  "recall": 0.83
}

这是我到目前为止的内容,但我不喜欢它:

folds = [ ... ]

keys = folds[0].keys()
results = dict.fromkeys(keys, 0)

for fold in folds:
    for k in keys:
        results[k] += fold[k] / len(folds)

print(results)

问题答案:

或者,如果您要对数据进行这样的计算,则您可能希望使用熊猫(这对于一次过大来说是过大的,但是会大大简化此类任务…)

import pandas as pd

data = [
  {
    "accuracy": 0.78,
    "f_measure": 0.8169374016795885,
    "precision": 0.8192088044235794,
    "recall": 0.8172222222222223
  },
  {
    "accuracy": 0.77,
    "f_measure": 0.8159133315763016,
    "precision": 0.8174754717495807,
    "recall": 0.8161111111111111
  },
  {
    "accuracy": 0.82,
    "f_measure": 0.8226353934130455,
    "precision": 0.8238175920455686,
    "recall": 0.8227777777777778
  }, # ...
]

result = pd.DataFrame.from_records(data).mean().to_dict()

这给你:

{'accuracy': 0.79000000000000004,
 'f_measure': 0.8184953755563118,
 'precision': 0.82016728940624295,
 'recall': 0.81870370370370382}