Persisting Models in Machine Learning

Day 24 of #100DaysOfCode

This is the continuation of the previous blog. You can read it here: ilkecandan.hashnode.dev/calculating-the-acc..

So far, we have basically done the following: 1- We import our dataset, 2- Make a model. 3- Train it, 4- Request that it make predictions.

Training a model can be time-consuming at times. That is why model persistence is critical. We'll train our model before saving it to a file. When we wish to make predictions again, we simply load the model from the file and ask it to do so. We don't need to retrain that model because it has already been trained.

We should add this module to our code:

import joblib

joblib object has methods for saving and loading models.

So, our whole code should look like this:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import joblib

music=pd.read_csv('music.csv')
X = music.drop(columns=['genre'])
y= music['genre']

model=DecisionTreeClassifier()
model.fit(X, y)
joblib.dump(model, 'music_recommendation.joblib')

So in output, we will have an array that contains the name of our file.

data set me.png You can check the joblib file now in your own desktop as well. This is where our model is stored. It is simply a binary file.

Now, let's try to load our model. We should delete most of the previous code and add a "load" function. And, we can ask it to make predictions like we did before. Final code:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import joblib

model=joblib.load('music_recommendation.joblib')
predictions=model.predict([[21,0]])
predictions

Our final result is :

generating random values.png