pyhton中sklearn机器学习算法的示例分析 - 行业资讯 - 肥雀云

　　介绍

这篇文章主要介绍pyhton中sklearn机器学习算法的示例分析,文中介绍的非常详细,具有一定的参考价值,感兴趣的小伙伴们一定要看完!

导入必要通用模块

import pandas as pdimport matplotlib.pyplot as pltimport osimport numpy as npimport copyimport reimport 数学

<强>一机器学习通用框架:以资讯为例

#利用邻近点方式训练数据不太适用于高维数据得到sklearn.model_selection import train_test_split #将数据分为测试集和训练集得到sklearn.neighbors import KNeighborsClassifier #利用邻近点方式训练数据# 1。读取数据data=https://www.yisu.com/zixun/pd.read_excel(“数据/样本数据.xlsx”) # 2。将数据标准化从sklearn进口preprocessingfor坳data.columns[2]: #为了不破坏数据集中的离散变量,只将数值种类数高于10的连续变量标准化　　如果len(设置(数据(col)))> 10: 　　数据(col)=preprocessing.scale(数据(col)) # 3。构造自变量和因变量并划分为训练集和测试集X=数据[[‘month_income’,‘education_outcome’,‘relationship_outcome’,‘entertainment_outcome’,‘traffic_’,‘表达’, 　　“express_distance”、“satisfac’,‘wifi_neghbor’,‘wifi_relative’,‘wifi_frend’,‘互联网’]]y=数据(“无线”)X_train X_test, y_train, y_test=train_test_split (X, y, test_size=0.3) #利用train_test_split进行将训练集和测试集进行分开,test_size # 4占30%。模型拟合模型=KNeighborsClassifier() #引入训练方法model.fit (X_train y_train) #进行填充测试数据进行训练y_predict=model.predict (X_test) #利用测试集数据作出预测#通过修改判别概率标准修改预测结果proba=model.predict_proba (X_test) #返回基于各个测试集样本所预测的结果为0和为1的概率值# 5。模型评价#(1)测试集样本数据拟合优度,model.score (X, y) model.score (X_test y_test) #(2)构建混淆矩阵,判断预测精准程度”“” 　　混淆矩阵中行代表真实值,列代表预测值　　TN:实际为0预测为0的个数外交政策:实际为0预测为1的个数　　FN:实际为1预测为0的个数TP:实际为1预测为1的个数　　　　精准率精度=TP/(TP + FP)——被预测为1的样本的的预测正确率　　召回召回率=TP/(TP + FN)——实际为1的样本的正确预测率　　从sklearn”“”。指标导入confusion_matrix 　　cfm=confusion_matrix (y_test y_predict) plt.matshow (cfm,提出=plt.cm.gray) #提出参数为绘制矩阵的颜色集合,这里使用灰度plt.show() #(3)精准率和召回从sklearn率。指标导入precision_score recall_score 　　precision_score (y_test y_predict) #精准率recall_score (y_test y_predict) #召回率#(4)错误率矩阵row_sums=np.sum (cfm,轴=1)err_matrix=cfm/row_sums 　　np.fill_diagonal (err_matrix,0) #对err_matrix矩阵的对角线置0,因为这是预测正确的部分,不关心plt.matshow (err_matrix,提出=plt.cm.gray) #亮度越高的地方代表错误率越高plt.show ()

<强>二数据处理

# 1。构造数据集得到sklearn import 数据集#引入数据集# n_samples为生成样本的数量,n_features为X中自变量的个数,n_targets为y中因变量的个数、偏差表示使线性模型发生偏差的程度,X, y=datasets.make_regression (n_samples=100, n_features=1, n_targets=1,噪音=1,偏差=0.5,tail_strength=0.1) plt.figure (figsize=(12日12))plt.scatter (X, y) # 2。读取数据data=https://www.yisu.com/zixun/pd.read_excel(“数据/样本数据.xlsx”) # 3。将数据标准化——preprocessing.scale(数据)从sklearn进口预处理#为了不破坏数据集中的离散变量,只将数值种类数高于10的连续变量标准化的坳data.columns [2]: 　　如果len(设置(数据(col)))> 10: 　　数据(col)=preprocessing.scale(数据(col))

<强>三回归

<强> 1。普通最小二乘线性回归

import numpy as npfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_split 　　　　[[X=数据& # 39;工作·# 39;,,& # 39;work_time& # 39;,, & # 39; work_salary& # 39;, 　　,,,,,,& # 39;work_address& # 39;,, & # 39; worker_number& # 39;,, & # 39; month_income& # 39;,, & # 39; total_area& # 39; 　　,,,,,,& # 39;own_area& # 39;,, & # 39; rend_area& # 39;,, & # 39; out_area& # 39; 　　,,,,,,& # 39;agricultal_income& # 39;,, & # 39;事情# 39;,,& # 39;wifi # 39;,, & # 39; internet_fee& # 39;,, & # 39; cloth_outcome& # 39; 　　,,,,,,& # 39;education_outcome& # 39;,, & # 39; medcine_outcome& # 39;,, & # 39; person_medicne_outcome& # 39; 　　,,,,,,& # 39;relationship_outcome& # 39;,, & # 39; food_outcome& # 39;,, & # 39; entertainment_outcome& # 39; 　　,,,,,,& # 39;agriculta_outcome& # 39;,, & # 39; other_outcome& # 39;,, & # 39;欠# 39;,,& # 39;owe_total& # 39;,, & # 39;债务# 39; 　　,,,,,,& # 39;debt_way& # 39;,, & # 39; distance_debt& # 39;,, & # 39; distance_market& # 39;,, & # 39; traffic_& # 39;,, & # 39;表达# 39; 　　,,,,,,& # 39;express_distance& # 39;,, & # 39;运动# 39;,,& # 39;satisfac& # 39;,, & # 39; wifi_neghbor& # 39; 　　,,,,,,& # 39;wifi_relative& # 39;,, & # 39; wifi_frend& # 39;,, & # 39;互联网# 39;,,& # 39;medical_insurance& # 39;]] y=数据(& # 39;total_income& # 39;]模型=LinearRegression () .fit (X, y) #拟合模型model.score (X, y) #拟合优度model.coef_ #查看拟合系数model.intercept_ #查看拟合截距项model.predict (np.array (X.ix[25日:]).reshape(1,1)) #预测model.get_params() #得到模型的参数