前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器学习测试笔记(22)——综合_散点图

机器学习测试笔记(22)——综合_散点图

作者头像
顾翔
发布2021-01-28 17:58:56
2910
发布2021-01-28 17:58:56
举报
文章被收录于专栏:啄木鸟软件测试

1.引入算法头

代码语言:javascript
复制
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.svm importSVC,SVR,LinearSVC,LinearSVR
from sklearn.tree import DecisionTreeClassifier,DecisionTreeRegressor
from sklearn.ensemble importRandomForestClassifier,RandomForestRegressor,AdaBoostClassifier,AdaBoostRegressor,BaggingClassifier,BaggingRegressor,VotingClassifier
from sklearn.neural_network importMLPClassifier,MLPRegressor
from sklearn.naive_bayes importBernoulliNB,GaussianNB,MultinomialNB
from sklearn.cluster importKMeans,AgglomerativeClustering,DBSCAN
from sklearn.decomposition import PCA,NMF
from sklearn.discriminant_analysis importLinearDiscriminantAnalysis
import statsmodels.api as sm
import warnings
from sklearn.preprocessing importStandardScaler,MinMaxScaler,RobustScaler,Normalizer,MaxAbsScaler,QuantileTransformer,Binarizer
from PIL import Image
from sklearn.datasets import make_blobs
from matplotlib.colors import ListedColormap
from mttkinter import mtTkinter as tk

2 初始化

2.1 定义类及类属性

代码语言:javascript
复制
class Machine_Learn:
    def__init__(self):
       self.Liner_model_name=["LinearRegression","LogisticRegression","Ridge","Lasso","ElasticNet"]
       self.Liner_title_name =[u"线性回归算法",u"逻辑回归算法",u"岭回归算法",u"套索回归算法",u"弹性网络算法"]
       self.Liner_prams=[[],[],['{"alpha":"1,0.1,0.001"}'],['{"alpha":"1,0.1,0.001"}'],['{"alpha":"0.1,0.5,10","l1_ratio":"0.1,0.5,0.9"}']]

       self.RandomForest_model_name=["RandomForestClassifier","RandomForestRegressor"]
       self.RandomForest_title_name =[u"随机森林分类算法",u"随机森林回归算法"]
       self.RandomForest_prams=[['{"n_estimators":"4,5,6,7"}'],['{"random_state":"2,3,4,5"}']]

       self.DecisionTree_model_name =["DecisionTreeClassifier","DecisionTreeRegressor"]
       self.DecisionTree_title_name =[u"决策树分类算法",u"决策树回归算法"]
       self.DecisionTree_prams=[['{"max_depth":"1,3,5"}'],['{"max_depth":"1,3,5"}']]

       self.KNeighbors_model_name =["KNeighborsClassifier","KNeighborsRegressor"]
       self.KNeighbors_title_name =[u"K邻近分类算法",u"K邻近回归算法"]
       self.KNeighbors_prams=[[],[]]

       self.Bayes_model_name=["BernoulliNB","GaussianNB","MultinomialNB"]
       self.Bayes_title_name =[u"贝努利贝叶斯算法",u"高斯贝叶斯算法",u"多项式贝叶斯算法"]
       self.Bayes_prams=[[],[],[]]

       self.SVM_model_name=["SVC","SVR","LinearSVC","LinearSVR"]
       self.SVM_title_name =[u"支持向量机分类算法",u"支持向量机回归算法",u"线性向量机分类算法",u"线性向量机回归算法"]
       self.SVM_prams=[['{"kernel":"linear,rbf,sigmoid,poly","gamma":"0.1,1,10","C":"1.0,3.0,5.0"}'],
              ['{"kernel":"linear,rbf,sigmoid,poly","gamma":"0.1,1,10","C":"1.0,3.0,5.0"}'],
              ['{"C":"1.0,3.0,5.0"}'],
              ['{"C":"1.0,3.0,5.0"}']]


       self.Cluster_model_name=["KMeans","AgglomerativeClustering","DBSCAN"]
       self.Cluster_title_name =[u"K均值聚类算法",u"凝聚聚类算法",u"DBSCAN聚类算法"]
       self.Cluster_prams=[[],[],['{"min_samples":"2,3,5","eps":"1.0,2.0,3.0"}']]

       self.Neural_network_model_name=["MLPClassifier","MLPRegressor"]
       self.Neural_network_title_name =[u"神经网络分类算法",u"神经网络回归算法"]
       self.Neural_network_prams=[['{"activation":"relu,tanh,identity,logistic","alpha":"0.0001,0.001,0.01,1"}'],
              ['{"activation":"relu,tanh,identity,logistic","alpha":"0.0001,0.001,0.01,1"}']]

       self.Ensemble_model_name=["AdaBoostClassifier","AdaBoostRegressor","BaggingClassifier","BaggingRegressor","VotingClassifier"]
       self.Ensemble_title_name =[u"AdaBoost提升分类算法",u"AdaBoost提升回归算法",u"装袋分类算法",u"装袋回归算法",u"投票分类分类算法"]
       self.Ensemble_prams=[[],[],['{"base_estimator":"SVC(),SVR()"}'],['{"base_estimator":"SVC(),SVR()"}'],['{"voting":"hard,soft"}']]

       self.Decompositio_model_name=["PCA","NMF","LinearDiscriminantAnalysis"]
       self.Decompositio_title_name =[u"PCA降维算法",u"非负矩阵分解算法",u"线性判别分析算法"]
       self.Decompositio_prams=[[],[],[]]

       self.no_score_list =["PCA",'KMeans','AgglomerativeClustering','DBSCAN','NMF',"BaggingClassifier","BaggingRegressor"]
#self.no_bord_list =['PCA','NMF',"BaggingClassifier","BaggingRegressor"]
       self.fit_predict_list =['AgglomerativeClustering','DBSCAN']
       self.no_bord_list = ["PCA",'NMF','AgglomerativeClustering','DBSCAN',"BaggingClassifier","BaggingRegressor"]

注:由于'AgglomerativeClustering'和'DBSCAN'画边界非常慢,一般情况下可不做不边界,放入self.no_bord_list中,要用的时候取出来。

2.2 获得算法对象

代码语言:javascript
复制
defget_model(self,model_name,key="",value=0):
#线性
if model_name == "LinearRegression":
           clf = LinearRegression()
elif model_name == "LogisticRegression":
           clf = LogisticRegression()
elif model_name == "Ridge":
if key=="alaph":
                clf = Ridge(alaph=value)
else:
                clf = Ridge()
elif model_name == "Lasso":
if key=="alaph":
                clf = Lasso(alaph=value)
else:
                clf = Lasso()
elif model_name == "ElasticNet":
if key=="alaph":
                clf = ElasticNet(alaph=value)
elif key=="l1_ratio":
                clf =ElasticNet(l1_ratio=value)
else:
                clf = ElasticNet()
#SVM
elif model_name == "SVC":
if key=="kernel":
                clf =SVC(kernel=value,max_iter=100000)
elif key=="gamma":
                clf =SVC(gamma=value,max_iter=100000)
elif key=="C":
                clf =SVC(C=value,max_iter=100000)
else:
                clf = SVC(max_iter=10000)
elif model_name == "SVR":
if key=="kernel":
                clf =SVR(kernel=value,max_iter=100000)
elif key=="gamma":
                clf =SVR(gamma=value,max_iter=100000)
elif key=="C":
                clf =SVR(C=value,max_iter=100000)
else:
                clf = SVR(max_iter=100000)
elif model_name == "LinearSVC":
if key=="C":
                clf =LinearSVC(C=value,max_iter=100000)
else:
                clf =LinearSVC(max_iter=100000)
elif model_name == "LinearSVR":
if key=="C":
                clf =LinearSVR(C=value,max_iter=100000)
else:
                clf = LinearSVR(max_iter=100000)
#KNeighbors
elif model_name == "KNeighborsClassifier":
           clf = KNeighborsClassifier()
elif model_name == "KNeighborsRegressor":
           clf = KNeighborsRegressor()
#决策树
elif model_name == "DecisionTreeClassifier":
if key=="max_depth":
                clf =DecisionTreeClassifier(max_depth=value)
else:
                clf = DecisionTreeClassifier()
elif model_name == "DecisionTreeRegressor":
if key=="max_depth":
                clf =DecisionTreeRegressor(max_depth=value)
else:
                clf = DecisionTreeRegressor()
#聚类
elif model_name == "KMeans":
           clf = KMeans()
elif model_name == "AgglomerativeClustering":
           clf = AgglomerativeClustering()
elif model_name == "DBSCAN":
if key=="min_samples":
                clf = DBSCAN(min_samples=value)
elif key=="eps":
                clf = DBSCAN(eps=value)
else:
                clf = DBSCAN()
#神经网络
elif model_name == "MLPClassifier":
if key=="activation":
                clf =MLPClassifier(solver='lbfgs',hidden_layer_sizes=[10,10],activation=value,max_iter=100000)
elif key=="alpha":
                clf =MLPClassifier(solver='lbfgs',hidden_layer_sizes=[10,10],alpha=value,max_iter=100000)
else:
                clf =MLPClassifier(solver='lbfgs',hidden_layer_sizes=[10,10],max_iter=100000)
elif model_name == "MLPRegressor":
if key=="activation":
                clf =MLPRegressor(solver='lbfgs',hidden_layer_sizes=[10,10],activation=value,max_iter=100000)
elif key=="alpha":
                clf = MLPRegressor(solver='lbfgs',hidden_layer_sizes=[10,10],alpha=value,max_iter=100000)
else:
                clf =MLPRegressor(solver='lbfgs',hidden_layer_sizes=[10,10],max_iter=100000)
#随机森林
elif model_name == "RandomForestClassifier":
if key=="n_estimators":
                clf =RandomForestClassifier(n_estimators=value)
elif key=="random_state":
                clf =RandomForestClassifier(random_state=value)
else:
                clf = RandomForestClassifier()
elif model_name == "RandomForestRegressor":
if key=="n_estimators":
                clf =RandomForestRegressor(n_estimators=value)
elif key=="random_state":
                clf =RandomForestRegressor(random_state=value)
else:
                clf = RandomForestRegressor()
#集成学习
elif model_name == "AdaBoostClassifier":
           clf = AdaBoostClassifier(n_estimators=50,random_state=11)
elif model_name == "AdaBoostRegressor":
           clf = AdaBoostRegressor(n_estimators=50,random_state=11)
elif model_name == "BaggingClassifier":
if key=="base_estimator" and value=="SVC()":
                clf =BaggingClassifier(base_estimator=SVC(),n_estimators=10, random_state=4)
elif key=="base_estimator" and value=="SVR()":
                clf =BaggingClassifier(base_estimator=SVR(),n_estimators=10, random_state=4)
else:
                clf =BaggingClassifier(n_estimators=10, random_state=4)
elif model_name == "BaggingRegressor":
if key=="base_estimator" and value=="SVC()":
                clf =BaggingRegressor(base_estimator=SVC(),n_estimators=10, random_state=4)
elif key=="base_estimator" and value=="SVR()":
                clf =BaggingRegressor(base_estimator=SVR(),n_estimators=10, random_state=4)
else:
                clf =BaggingRegressor(n_estimators=10, random_state=4)
elif model_name == "VotingClassifier":
if key=="voting":
                clf =VotingClassifier(estimators=[('log_clf', LogisticRegression()),('svm_clf',SVC(probability=True)),('dt_clf',DecisionTreeClassifier(random_state=666))],voting=value)
else:
                clf = VotingClassifier(estimators=[('log_clf',LogisticRegression()),('svm_clf', SVC(probability=True)),('dt_clf',DecisionTreeClassifier(random_state=666))])
#降维
elif model_name == "PCA":
           clf = PCA(n_components=0.9,random_state=62)
elif model_name == "NMF":
           clf = NMF(n_components=105,random_state=62,max_iter=10000)
elif model_name == "LinearDiscriminantAnalysis":
           clf = LinearDiscriminantAnalysis(n_components=2)
#贝叶斯
elif model_name == "BernoulliNB":
           clf = BernoulliNB()
elif model_name == "GaussianNB":
           clf = GaussianNB()
elif model_name == "MultinomialNB":
           clf = MultinomialNB()
else:
           print("不存在你输入的模型")
return clf

2.3 其他

代码语言:javascript
复制
#根据类型获得数据
    defget_data(self,data_type):
if data_type== "iris":
           mydata = datasets.load_iris()
elif data_type== "wine":
           mydata = datasets.load_wine()
elif data_type== "breast_cancer":
           mydata = datasets.load_breast_cancer()
elif data_type== "diabetes":
           mydata = datasets.load_diabetes()
elif data_type== "boston":
           mydata = datasets.load_boston()
elif data_type== "two_moon":
           mydata = datasets.load_two_moon()
else:
           print("数据类型不正确")
       X = mydata.data[:,:2]
       y = mydata.target
return X,y

#根据key类型返回clf
    defjudg_clf(self,key,mytype,model_name,valuedic):
if key == 'kernel' or key =='activation':
           clf = self.get_model(model_name,key,value = str(valuedic))
elif mytype == 'float':
           clf = self.get_model(model_name,key,value = float(valuedic))
elif mytype == 'int':
           clf = self.get_model(model_name,key,value = int(valuedic))
elif mytype == 'str':
           clf = self.get_model(model_name,key,value = str(valuedic))
else:
           print("错误的clf或mytype")
return clf

#打印得分
    defprint_score(self,sign,model_name,clf,X_train,y_train,X_test,y_test):
if model_name not in self.no_score_list:
           print('训练集得分('+sign+'): {:.2%}'.format(clf.score(X_train,y_train)))
           print('测试集得分('+sign+'):{:.2%}'.format(clf.score(X_test,y_test)))


#设置图片集字体信息
    defset_ply_font_info_and_show(self,title):
       plt.rcParams['font.sans-serif']=['SimHei']
       plt.rcParams['axes.unicode_minus']=False
       plt.suptitle(title)
       plt.show()

#画多个图片的预处理
    defCombination_Diagram_Prepare(self,valuedic,key,mytype,X_train,y_train,i,maxj,m,model_name,X_test=[],y_test=[],sig=None):
       clf = self.judg_clf(key,mytype,model_name,valuedic)
try:
           clf.fit(X_train,y_train)
except:
            clf.fit(X_train,y_train.astype('int'))
if sig=="print_score":
           sign = str(key)+"="+str(valuedic)
           self.print_score(sign,model_name,clf,X_train,y_train,X_test,y_test)
       plt.subplot(i,maxj,m+1)
       plt.title(key+"="+valuedic)
return clf

#返回图的行列
    defGet_line_and_Column(self,pramdic):
       i = 0
       maxj=0
for key,values in pramdic.items():
           valuedics = values.split(",")
           j = 0
for valuedic in valuedics:
                j=j+1
           maxj = max(maxj,j)
           i = i + 1
       figure,axes = plt.subplots(i,j,figsize =(100,10))
       plt.subplots_adjust(hspace=0.95)
return i,maxj

#返回数据类型
    defget_algorithm_type(self,scattertype):
       ML = Machine_Learn()
if scattertype == "Liner":
           prams,model_name,title_name =ML.Liner_prams,ML.Liner_model_name,ML.Liner_title_name
elif scattertype == "RandomForest":
            prams,model_name,title_name =ML.RandomForest_prams,ML.RandomForest_model_name,ML.RandomForest_title_name
elif scattertype == "DecisionTree":
           prams,model_name,title_name =ML.DecisionTree_prams,ML.DecisionTree_model_name,ML.DecisionTree_title_name
elif scattertype == "KNeighbors":
           prams,model_name,title_name =ML.KNeighbors_prams,ML.KNeighbors_model_name,ML.KNeighbors_title_name
elif scattertype == "Bayes":
           prams,model_name,title_name = ML.Bayes_prams,ML.Bayes_model_name,ML.Bayes_title_name
elif scattertype == "SVM":
           prams,model_name,title_name =ML.SVM_prams,ML.SVM_model_name,ML.SVM_title_name
elif scattertype == "Cluster":
           prams,model_name,title_name =ML.Cluster_prams,ML.Cluster_model_name,ML.Cluster_title_name
elif scattertype == "Neural_network":
           prams,model_name,title_name =ML.Neural_network_prams,ML.Neural_network_model_name,ML.Neural_network_title_name
elif scattertype == "Ensemble":
           prams,model_name,title_name =ML.Ensemble_prams,ML.Ensemble_model_name,ML.Ensemble_title_name
elif scattertype == "Decompositio":
           prams,model_name,title_name =ML.Decompositio_prams,ML.Decompositio_model_name,ML.Decompositio_title_name
else:
           print("在Draw_algorithm_scatter函数中输入类型不匹配")
return prams,model_name,title_name
#获得参数类型
    defget_pram_type(self,model_name):
       int_lists =["RandomForestClassifier","RandomForestRegressor","DecisionTreeClassifier","DecisionTreeRegressor"]
       str_lists =["AdaBoostClassifier","AdaBoostRegressor","BaggingClassifier","BaggingRegressor","VotingClassifier","PCA","NMF","LinearDiscriminantAnalysis"]
if model_name in int_lists:
           mytype = "int"
elif model_name in str_lists:
           mytype = "str"
else:
           mytype = "float"
return mytype

3 画散点图

3.1函数和类调用图

3.2 代码

代码语言:javascript
复制
class Scatter:
    def__init__(self,data):
self.data = data

#画一个算法的散点图
    defdraw_scatter(self,X_train, X_test, y_train, y_test,clf,title,model_name):
       ML = Machine_Learn()
       cmap_light = ListedColormap(['#FFAAAA','#AAFFAA','#AAAAFF'])
       cmap_bold = ListedColormap(['#FF0000','#00FF00','#0000FF'])
#分别将样本的两个特征值创建图像的横轴和纵轴
       x_min,x_max = X_train[:,0].min()-0.5,X_train[:,0].max()+0.5
       y_min,y_max = X_train[:,1].min()-0.5,X_train[:,1].max()+0.5
       xx, yy = np.meshgrid(np.arange(x_min, x_max, .02),np.arange(y_min,y_max, .02))
#给每个样本分配不同的颜色
if model_name not in ML.no_bord_list:
if model_name in ML.fit_predict_list:
                Z =clf.fit_predict(np.c_[xx.ravel(),yy.ravel()]).reshape(xx.shape)
else:
                Z =clf.predict(np.c_[xx.ravel(),yy.ravel()]).reshape(xx.shape)
           plt.pcolormesh(xx,yy,Z,cmap=cmap_light,shading='auto')
#用散点把样本表示出来
       plt.scatter(X_train[:,0],X_train[:,1],c=y_train,cmap=cmap_bold,s=20,edgecolors='k')
       plt.scatter(X_test[:,0],X_test[:,1],c=y_test,cmap=cmap_bold,s=20,marker='*')
       plt.xlim(xx.min(),xx.max())
       plt.ylim(yy.min(),yy.max())

#画算法散点图
    deflearn_scatter(self,model_name,title,pram):
       ML = Machine_Learn()
       mytype =ML.get_pram_type(model_name)
       X,y = ML.get_data(self.data)
       X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=8)
       i = 0
       print(model_name)
if len(pram) == 0:
           clf = ML.get_model(model_name)
try:
                clf.fit(X_train,y_train)
except:
                scaler = MinMaxScaler()
                scaler.fit(X_train)
                X_train =scaler.transform(X_train)
                X_test =scaler.transform(X_test)
                clf.fit(X_train,y_train)
           ML.print_score(model_name,model_name,clf,X_train,y_train,X_test,y_test)
self.draw_scatter(X_train, X_test, y_train, y_test,clf,title,model_name)
else:
           pramdic = eval(pram[0])
           i = 0
           maxj=0
           i,j = ML.Get_line_and_Column(pramdic)
           m =0
for key,values in pramdic.items():
                valuedics =values.split(",")
for valuedic in valuedics:
                    clf =ML.Combination_Diagram_Prepare(valuedic,key,mytype,X_train,y_train,i,j,m,model_name,X_test,y_test,"print_score")
self.draw_scatter(X_train,X_test, y_train, y_test,clf,title,model_name)
                    m=m+1
       ML.set_ply_font_info_and_show(title)

#准备画散点图
    defDraw_algorithm_scatter(self,scattertype):
       ML = Machine_Learn()
       prams,model_name,title_name = ML.get_algorithm_type(scattertype)
       i = 0
for pram in prams:
self.learn_scatter(model_name[i],title_name[i],pram)
              i=i+1

3.3调用

代码语言:javascript
复制
if __name__=="__main__":
   scatter = Scatter("iris")
   scatter = Scatter("breast_cancer")
   scatter.Draw_algorithm_scatter("Liner")
   scatter.Draw_algorithm_scatter("RandomForest")
   scatter.Draw_algorithm_scatter("DecisionTree")
   scatter.Draw_algorithm_scatter("KNeighbors")
   scatter.Draw_algorithm_scatter("Bayes")
   scatter.Draw_algorithm_scatter("SVM")
   scatter.Draw_algorithm_scatter("Neural_network")
   scatter.Draw_algorithm_scatter("Ensemble")
   scatter.Draw_algorithm_scatter("Decompositio")

4 结果

本结果数据来源:

sklearn.datasets.load_iris()

4.1 线性模型

输出

代码语言:javascript
复制
LinearRegression
训练集得分(LinearRegression): 72.99%
测试集得分(LinearRegression):71.12%
LogisticRegression
训练集得分(LogisticRegression): 84.82%
测试集得分(LogisticRegression):68.42%
Ridge
训练集得分(alpha=1): 72.97%
测试集得分(alpha=1):70.90%
训练集得分(alpha=0.1): 72.97%
测试集得分(alpha=0.1):70.90%
训练集得分(alpha=0.001): 72.97%
测试集得分(alpha=0.001):70.90%
Lasso
训练集得分(alpha=1):0.00%(欠拟合)
测试集得分(alpha=1):0.00%
训练集得分(alpha=0.1):0.00%
测试集得分(alpha=0.1):0.00%
训练集得分(alpha=0.001):0.00%
测试集得分(alpha=0.001):0.00%
ElasticNet
训练集得分(alpha=0.1):3.24%(欠拟合)
测试集得分(alpha=0.1):3.20%
训练集得分(alpha=0.5):3.24%
测试集得分(alpha=0.5):3.20%
训练集得分(alpha=10):3.24%
测试集得分(alpha=10):3.20%
训练集得分(l1_ratio=0.1): 37.12%(欠拟合)
测试集得分(l1_ratio=0.1):36.05%
训练集得分(l1_ratio=0.5):3.24%(欠拟合)
测试集得分(l1_ratio=0.5):3.20%
训练集得分(l1_ratio=0.9):0.00%
测试集得分(l1_ratio=0.9):0.00%

4.2 随机森林

输出

代码语言:javascript
复制
RandomForestClassifier
训练集得分(n_estimators=4): 94.64%
测试集得分(n_estimators=4):63.16%(过拟合)
训练集得分(n_estimators=5): 92.86%
测试集得分(n_estimators=5):68.42%(过拟合)
训练集得分(n_estimators=6): 93.75%
测试集得分(n_estimators=6):68.42%(过拟合)
训练集得分(n_estimators=7): 95.54%
测试集得分(n_estimators=7):63.16%(过拟合)
RandomForestRegressor
训练集得分(random_state=2): 94.59%
测试集得分(random_state=2):58.03%(过拟合)
训练集得分(random_state=3): 94.59%
测试集得分(random_state=3):59.06%(过拟合)
训练集得分(random_state=4): 94.95%
测试集得分(random_state=4):59.21%(过拟合)
训练集得分(random_state=5): 94.76%
测试集得分(random_state=5):58.69%(过拟合)

整个结果测试集比训练集都要低,存在着过拟合现象。

4.3 决策树

输出

代码语言:javascript
复制
DecisionTreeClassifier
训练集得分(max_depth=1): 63.39%
测试集得分(max_depth=1):65.79%
训练集得分(max_depth=3): 86.61%
测试集得分(max_depth=3):71.05%
训练集得分(max_depth=5): 88.39%
测试集得分(max_depth=5):68.42%(过拟合)
DecisionTreeRegressor
训练集得分(max_depth=1): 58.97%
测试集得分(max_depth=1):59.45%(欠拟合)
训练集得分(max_depth=3): 84.20%
测试集得分(max_depth=3):69.89%(过拟合)
训练集得分(max_depth=5): 88.19%
测试集得分(max_depth=5):62.75%(过拟合)

4.4 K邻近算法

输出

代码语言:javascript
复制
KNeighborsClassifier
训练集得分(KNeighborsClassifier): 87.50%
测试集得分(KNeighborsClassifier):68.42%
KNeighborsRegressor
训练集得分(KNeighborsRegressor): 86.97%
测试集得分(KNeighborsRegressor):70.15%

4.5 贝叶斯算法

输出

代码语言:javascript
复制
BernoulliNB
训练集得分(BernoulliNB):33.93%
测试集得分(BernoulliNB):31.58%(欠拟合)
GaussianNB
训练集得分(GaussianNB): 83.04%
测试集得分(GaussianNB):68.42%
MultinomialNB
训练集得分(MultinomialNB): 66.07%
测试集得分(MultinomialNB):65.79%(欠拟合)

4.6 SVM

输出

代码语言:javascript
复制
SVC
训练集得分(kernel=linear): 83.04%
测试集得分(kernel=linear):71.05%
训练集得分(kernel=rbf): 83.93%
测试集得分(kernel=rbf):71.05%
训练集得分(kernel=sigmoid):33.93%
测试集得分(kernel=sigmoid):31.58%(欠拟合)
训练集得分(kernel=poly): 85.71%
测试集得分(kernel=poly):71.05%
训练集得分(gamma=0.1): 83.93%
测试集得分(gamma=0.1):71.05%
训练集得分(gamma=1): 84.82%
测试集得分(gamma=1):71.05%
训练集得分(gamma=10): 88.39%
测试集得分(gamma=10):68.42%
训练集得分(C=1.0): 83.93%
测试集得分(C=1.0):71.05%
训练集得分(C=3.0): 85.71%
测试集得分(C=3.0):68.42%
训练集得分(C=5.0): 84.82%
测试集得分(C=5.0):71.05%
SVR
训练集得分(kernel=linear): 72.24%
测试集得分(kernel=linear):70.81%
训练集得分(kernel=rbf): 78.14%
测试集得分(kernel=rbf):67.68%
训练集得分(kernel=sigmoid):-0.00%
测试集得分(kernel=sigmoid):-0.00%(欠拟合)
训练集得分(kernel=poly): 74.86%
测试集得分(kernel=poly):63.64%
训练集得分(gamma=0.1): 77.03%
测试集得分(gamma=0.1):70.56%
训练集得分(gamma=1): 81.50%
测试集得分(gamma=1):64.05%
训练集得分(gamma=10): 86.63%
测试集得分(gamma=10):61.78%
训练集得分(C=1.0): 78.14%
测试集得分(C=1.0):67.68%
训练集得分(C=3.0): 78.90%
测试集得分(C=3.0):66.10%
训练集得分(C=5.0): 79.22%
测试集得分(C=5.0):65.32%
LinearSVC
训练集得分(C=1.0): 83.93%
测试集得分(C=1.0):68.42%
训练集得分(C=3.0): 84.82%
测试集得分(C=3.0):68.42%
训练集得分(C=5.0): 83.93%
测试集得分(C=5.0):68.42%
LinearSVR
训练集得分(C=1.0): 71.61%
测试集得分(C=1.0):70.75%
训练集得分(C=3.0): 72.21%
测试集得分(C=3.0):71.15%
训练集得分(C=5.0): 72.43%
测试集得分(C=5.0):71.37%

在SVC、SVR中,gamma越大,包围圈越小

4.7聚类

4.8 神经网络

输出

代码语言:javascript
复制
MLPClassifier
训练集得分(activation=relu): 86.61%
测试集得分(activation=relu):71.05%
训练集得分(activation=tanh): 87.50%
测试集得分(activation=tanh):71.05%
训练集得分(activation=identity): 85.71%
测试集得分(activation=identity):68.42%(过拟合)
训练集得分(activation=logistic): 91.07%
测试集得分(activation=logistic):68.42%(过拟合)
训练集得分(alpha=0.0001): 85.71%
测试集得分(alpha=0.0001):71.05%
训练集得分(alpha=0.001): 84.82%
测试集得分(alpha=0.001):71.05%
训练集得分(alpha=0.01):33.93%
测试集得分(alpha=0.01):31.58%(欠拟合)
训练集得分(alpha=1): 85.71%
测试集得分(alpha=1):71.05%
MLPRegressor
训练集得分(activation=relu): 83.68%
测试集得分(activation=relu):70.17%
训练集得分(activation=tanh): 84.25%
测试集得分(activation=tanh):69.02%
训练集得分(activation=identity): 72.99%
测试集得分(activation=identity):71.12%
训练集得分(activation=logistic): 85.25%
测试集得分(activation=logistic):67.18%
训练集得分(alpha=0.0001): 83.99%
测试集得分(alpha=0.0001):70.05%
训练集得分(alpha=0.001): 72.99%
测试集得分(alpha=0.001):71.12%
训练集得分(alpha=0.01): 84.42%
测试集得分(alpha=0.01):68.65%
训练集得分(alpha=1): 83.20%
测试集得分(alpha=1):71.39%

4.9 集成学习

输出

代码语言:javascript
复制
AdaBoostClassifier
训练集得分(AdaBoostClassifier): 50.89%
测试集得分(AdaBoostClassifier):36.84% (欠拟合)
AdaBoostRegressor
训练集得分(AdaBoostRegressor): 82.37%
测试集得分(AdaBoostRegressor):74.00%
BaggingClassifier
BaggingRegressor
VotingClassifier
训练集得分(voting=hard): 84.82%
测试集得分(voting=hard):68.42%
训练集得分(voting=soft): 92.86%
测试集得分(voting=soft):68.42% (过拟合)

4.10 降维

输出

代码语言:javascript
复制
LinearDiscriminantAnalysis训练集得分(LinearDiscriminantAnalysis):85.71%测试集得分(LinearDiscriminantAnalysis):73.68%
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-01-25,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 2 初始化
    • 2.1 定义类及类属性
      • 2.2 获得算法对象
      • 3 画散点图
      • 4 结果
        • 4.1 线性模型
          • 4.2 随机森林
            • 4.3 决策树
              • 4.4 K邻近算法
                • 4.5 贝叶斯算法
                  • 4.6 SVM
                    • 4.8 神经网络
                      • 4.9 集成学习
                        • 4.10 降维
                        领券
                        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档