In this recipe, we're going to show how you can keep your model around for a later usage.For example, you might want to actually use a model to predict the outcome and automatically make a decision.
在这部分,我们将要展示给你如何保存你的模型以备未来使用。例如,你可能想要使用模型来预测输出并且自动做出决策。
Getting ready准备工作
In this recipe, we will perform the following tasks:在这个部分,我们将展示以下任务:
1. Fit the model that we will persist.拟合一个我们要保存的模型
2. Import joblib and save the model.导入joblib并保存模型
How to do it...怎么做
To persist models with joblib, the following code can be used:为了使用joblib来保存模型,将使用以下代码:
from sklearn import datasets, tree
X, y = datasets.make_classification()
dt = tree.DecisionTreeClassifier()
dt.fit(X, y)
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=None, splitter='best')
from sklearn.externals import joblib
joblib.dump(dt, "dtree.clf")
['dtree.clf']
How it works...如何运行的
The preceding code works by saving the state of the object that can be reloaded into a scikit-learn object. It's important to note that the state of model will have varying levels of complexity, given the model type.
之前的代码将被保存为对象的状态,它能被重用为一个scikit-learn对象。值得注意的是不同类型的模型会有不同复杂级别的模型状态。
For simplicity sake, consider that all we'd need to save is the way to predict the outcome for the given inputs. Well, for regression that would be easy, a little matrix algebra and we're done. However, for models like random forest, where we could have many trees, and those trees could be of various complexity levels, regression is difficult.
为了简单的原则,想象一下我们所有需要保存的就是给定输入预测输出的方法。好了,因为回归模型会比较简单,少量的矩阵代数,并且我们已经做过。然而,对于想随机森林模型,我们可能有很多树,并且这些树有大量复杂的层级,回归就很复杂。
There's more...扩展阅读
We can check the size of decision tree versus random forest:让我们检查随机森林里决策树的大小:
from sklearn import ensemble
rf = ensemble.RandomForestClassifier()
rf.fit(X, y)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)
I'm going to omit the output, but in total, there we were 52 files outputted on my machine:我将要减去输出这一次,但是总的来说,在我的机器上,这有52文件的输出。
joblib.dump(rf, "rf.clf")
['rf.clf']
再次调用该模型:
rf = joblib.load("rf.clf")
终于结束了,希望明天后天顺利。过完这两天,可以好好准备换工作了。加油加油!
本文系外文翻译,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文系外文翻译,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。