将随机森林预测作为列添加到测试文件中

问题内容：

我正在使用python pandas（在Jupyter笔记本中）工作，在其中为Titanic数据集创建了一个Random Forest模型。
https://www.kaggle.com/c/titanic/data

我读了测试并训练了数据，然后清理了数据，并添加了新列（两者都相同）。

在拟合和重新拟合模型并尝试提升等之后；我决定一个模型：

 X2 = train_data[['Pclass','Sex','Age','richness']] 
 rfc_model_3 = RandomForestClassifier(n_estimators=200)
 %time cross_val_score(rfc_model_3, X2, Y_target).mean()
 rfc_model_3.fit(X2, Y_target)

然后我预测，如果有人幸存下来

 X_test = test_data[['Pclass','Sex','Age','richness']]
 predictions = rfc_model_3.predict(X_test)
 preds = pd.DataFrame(predictions, columns=['Survived'])

有没有办法将预测结果添加为column测试文件？

问题答案：

以来

rfc_model_3 = RandomForestClassifier(n_estimators=200)
rfc_model_3.predict(X_test)

返回y : array of shape = [n_samples]（请参阅docs），您应该能够直接将模型输出添加到其中，X_test而无需创建中间体DataFrame：

X_test['survived'] = rfc_model_3.predict(X_test)

如果您仍然想要中间结果，则@EdChum在评论中的建议会很好地工作。

将随机森林预测作为列添加到测试文件中

微信关注