公众号:尤而小屋 编辑:Peter 作者:Peter
大家好,我是Peter~
今天给大家分享一个机器学习和数据挖掘的模型融合方法:Stacking
Stacking是一种集成学习技术,也被称为堆叠泛化,是一种机器学习中的Ensemble方法,它通过组合多个模型的预测来提高整体的预测性能。
具体来说,Stacking的工作流程如下:
图片来源:https://blog.csdn.net/weixin_40633696/article/details/108721395
关于多个机器学习模型进行stacking,作者通过图解进行详细解释:
下面通过代码实操看看基于Stacking模型堆叠集成和单个模型的效果对比:
In 1:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_iris, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore")
分离X和y,并切分数据集:
In 2:
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
实例化模型对象:
In 3:
rf = RandomForestClassifier()
进行模型训练:
In 4:
rf.fit(X_train,y_train)
模型的预测:
In 5:
y_pred = rf.predict(X_test)
输出单个模型的准确率:
In 6:
acc1 = accuracy_score(y_test, y_pred)
acc1
Out6:
0.9649122807017544
In 7:
knc = KNeighborsClassifier()
knc.fit(X_train,y_train)
模型的预测:
In 8:
y_pred = knc.predict(X_test)
In 9:
acc2 = accuracy_score(y_test, y_pred)
acc2
Out9:
0.956140350877193
In 10:
svc = SVC()
svc.fit(X_train,y_train)
进行模型的预测:
In 11:
y_pred = svc.predict(X_test)
In 12:
acc3 = accuracy_score(y_test, y_pred)
acc3
Out12:
0.9473684210526315
In 13:
base_models = [
('rf', RandomForestClassifier(n_estimators=50, random_state=42)),
('knn', KNeighborsClassifier()),
('svc', SVC(probability=True, random_state=42))
]
In 14:
meta_model = LogisticRegression()
In 15:
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model)
stacking_clf.fit(X_train, y_train)
基于融合模型的预测:
In 16:
y_pred = stacking_clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
acc
Out16:
0.9736842105263158
对比两种方案的效果:
In 17:
print("基于Stacking模型融合比RandomForestClassifier提升效果:{:.2f}%".format((acc - acc1) * 100))
print("基于Stacking模型融合比KNeighborsClassifier提升效果:{:.2f}%".format((acc - acc2) * 100))
print("基于Stacking模型融合比SVC提升效果:{:.2f}%".format((acc - acc3) * 100))
基于Stacking模型融合比RandomForestClassifier提升效果:0.88%
基于Stacking模型融合比KNeighborsClassifier提升效果:1.75%
基于Stacking模型融合比SVC提升效果:2.63%
最终的结果对比:发现stacking融合后比单个模型的效果都有所提升
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。