How to Save a Machine Learning Model

While using the scikit learn library for machine learning, it is necessary to save and restore the models to use them again to compare with other models or test the model against new data. The process of saving data is referred to as serialization, while the process of restoring data is referred to as Deserialization. We also handle different types and sizes of data. While some datasets can be trained quickly (e.g. they take less time), but the large datasets (more than 1GB) may take a lot of time to train, even on a local computer with GPU. To avoid losing time and avoid wastage, save the trained model from being used in future projects.

Two Ways to Save a Model from scikit-learn:

1. Pickle string:

The pickle module implements an efficient yet fundamental algorithm for serializing or deserializing Python object structures.

The pickle model offers the following functions:

dump: For serializing an object hierarchy, we can use dump() function.
load: For deserializing a data stream, we can use the loads() function.

Example: Let's use K Nearest Neighbor to the iris dataset, then save the model.

Code:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn. neighbours import KNeighborsClassifier
import pickle as pkl

# Loading the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2020)

# Importing the KNeighborsClassifier model
knn = KNeighborsClassifier(n_neighbors=4)

# Training the model
knn.fit(X_train, y_train)

# Saving the trained model as a pickle string
saved_model1 = pkl.dumps(knn)

# Loading the pickled model
knn_from_pkl = pkl.loads(saved_model1)

# Using the loaded pickled model for making predictions
predictions = knn_from_pkl.predict(X_test)
print(predictions)

Output:

[1 2 0 2 1 1 2 1 0 0 0 2 1 1 0 1 1 0 0 1 2 0 1 0 2 1 1 2 1 0 2 2 0 0 2 0 2 2 2 1 2 2 1 0 0 1 0 0 1 0 0 1 2 1 2 2 1 0 2 1 0 0 1 0 2 2 0 1 2 1 0 1 2 0 1 0 1 1 1 0 1 2 2 2 1 1 1 2 1 0 0 1 1 2 0 0 0 1 0 2 1 0 0 1 2 0 0 2 2 2 1 2 1 2 0 2 2 0 2 1 0 0 2 0 2 2 1 1 2 1 1 2 0 1 2 2 0 2 2 1 1 2 0 1 0 1 1 0 2 0 1 1 2 1 2 2 0 2 2 1 2 1 1 2 1 2 0 2 0 1 0 2 2 1 1 2 2 2 0 2 0 0 0 0 0 0 0 2 0 2 2 0 1 1 1 0 0 0 1 2 2 2 1 2 2 0 1 0 1 0 2 1 2 0 1 2 2 0 0 1 1 0 1 1 0 0 1 0 1 2 0 2 0 0 1 2 2 1 0 2 1]
``

Explanation:

In the program shown, the iris dataset is first loaded and then divided into training and test sets. The KNeighborsClassifier system is then imported and trained using the training set of data. We make use of the pickle.dumps() method to store the trained model as a pickle string. Subsequently, using pickle.loads(), we may load the pickled model and use that loaded model to generate predictions.

2. Pickle Model as File using joblib:

Joblib replaces pickle because it is faster on objects with large numpy arrays. These functions only accept file-like objects instead of filename.

The pickled model as a file using joblib offers the following functions:

dump: This is used for serializing object hierarchy.

load: This is used for deserializing a data stream.

Example: Using joblib to save to pickled file

Code:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn. neighbors import KNeighborsClassifier
import joblib

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2020)

# Import the KNeighborsClassifier model
knn = KNeighborsClassifier(n_neighbors=4)

# Train the model
knn.fit(X_train, y_train)

# Save the model as a pickled file
joblib.dump(knn, 'model.pkl')

# Load the model from the file
knn_from_joblib = joblib.load('model.pkl')

# Use the loaded pickled model for making predictions
predictions = knn_from_joblib.predict(X_test)
print(predictions)

Output:

[1 2 0 2 1 1 2 1 0 0 0 2 1 1 0 1 1 0 0 1 2 0 1 0 2 1 1 2 1 0 2 2 0 0 2 0 2 2 2 1 2 2 1 0 0 1 0 0 1 0 0 1 2 1 2 2 1 0 2 1 0 0 1 0 2 2 0 1 2 1 0 1 2 0 1 0 1 1 1 0 1 2 2 2 1 1 1 2 1 0 0 1 1 2 0 0 0 1 0 2 1 0 0 1 2 0 0 2 2 2 1 2 1 2 0 2 2 0 2 1 0 0 2 0 2 2 1 1 2 1 1 2 0 1 2 2 0 2 2 1 1 2 0 1 0 1 1 0 2 0 1 1 2 1 2 2 0 2 2 1 2 1 1 2 1 2 0 2 0 1 0 2 2 1 1 2 2 2 0 2 0 0 0 0 0 0 0 2 0 2 2 0 1 1 1 0 0 0 1 2 2 2 1 2 2 0 1 0 1 0 2 1 2 0 1 2 2 0 0 1 1 0 1 1 0 0 1 0 1 2 0 2 0 0 1 2 2 1 0 2 1]

Explanation:

In this case, we work the same way as previously, rather than just using Pickle to store and load the models, we use joblib. We use the joblib.dump() function to store the model, which creates a pickled file. Next, we can use joblib. load() to load the model from the file and predict outcomes using the obtained model.

Conclusion:

You may preserve your machine learning models for later usage and save time by using Pickle or joblib instead of having to retrain them. When it comes to serialising objects, Pickle is a flexible alternative, whereas joblib excels when working with huge numpy arrays.

Next TopicMachine Learning Model with Teachable Machine

← prev next →