How to Save a Machine Learning Model

While using the scikit learn library for machine learning, it is necessary to save and restore the models to use them again to compare with other models or test the model against new data. The process of saving data is referred to as serialization, while the process of restoring data is referred to as Deserialization. We also handle different types and sizes of data. While some datasets can be trained quickly (e.g. they take less time), but the large datasets (more than 1GB) may take a lot of time to train, even on a local computer with GPU. To avoid losing time and avoid wastage, save the trained model from being used in future projects.

Two Ways to Save a Model from scikit-learn:

1. Pickle string:

The pickle module implements an efficient yet fundamental algorithm for serializing or deserializing Python object structures.

The pickle model offers the following functions:

  • dump: For serializing an object hierarchy, we can use dump() function.
  • load: For deserializing a data stream, we can use the loads() function.

Example: Let's use K Nearest Neighbor to the iris dataset, then save the model.

Code:

Output:

[1 2 0 2 1 1 2 1 0 0 0 2 1 1 0 1 1 0 0 1 2 0 1 0 2 1 1 2 1 0 2 2 0 0 2 0 2 2 2 1 2 2 1 0 0 1 0 0 1 0 0 1 2 1 2 2 1 0 2 1 0 0 1 0 2 2 0 1 2 1 0 1 2 0 1 0 1 1 1 0 1 2 2 2 1 1 1 2 1 0 0 1 1 2 0 0 0 1 0 2 1 0 0 1 2 0 0 2 2 2 1 2 1 2 0 2 2 0 2 1 0 0 2 0 2 2 1 1 2 1 1 2 0 1 2 2 0 2 2 1 1 2 0 1 0 1 1 0 2 0 1 1 2 1 2 2 0 2 2 1 2 1 1 2 1 2 0 2 0 1 0 2 2 1 1 2 2 2 0 2 0 0 0 0 0 0 0 2 0 2 2 0 1 1 1 0 0 0 1 2 2 2 1 2 2 0 1 0 1 0 2 1 2 0 1 2 2 0 0 1 1 0 1 1 0 0 1 0 1 2 0 2 0 0 1 2 2 1 0 2 1]
``

Explanation:

In the program shown, the iris dataset is first loaded and then divided into training and test sets. The KNeighborsClassifier system is then imported and trained using the training set of data. We make use of the pickle.dumps() method to store the trained model as a pickle string. Subsequently, using pickle.loads(), we may load the pickled model and use that loaded model to generate predictions.

2. Pickle Model as File using joblib:

Joblib replaces pickle because it is faster on objects with large numpy arrays. These functions only accept file-like objects instead of filename.

The pickled model as a file using joblib offers the following functions:

dump: This is used for serializing object hierarchy.

load: This is used for deserializing a data stream.

Example: Using joblib to save to pickled file

Code:

Output:

[1 2 0 2 1 1 2 1 0 0 0 2 1 1 0 1 1 0 0 1 2 0 1 0 2 1 1 2 1 0 2 2 0 0 2 0 2 2 2 1 2 2 1 0 0 1 0 0 1 0 0 1 2 1 2 2 1 0 2 1 0 0 1 0 2 2 0 1 2 1 0 1 2 0 1 0 1 1 1 0 1 2 2 2 1 1 1 2 1 0 0 1 1 2 0 0 0 1 0 2 1 0 0 1 2 0 0 2 2 2 1 2 1 2 0 2 2 0 2 1 0 0 2 0 2 2 1 1 2 1 1 2 0 1 2 2 0 2 2 1 1 2 0 1 0 1 1 0 2 0 1 1 2 1 2 2 0 2 2 1 2 1 1 2 1 2 0 2 0 1 0 2 2 1 1 2 2 2 0 2 0 0 0 0 0 0 0 2 0 2 2 0 1 1 1 0 0 0 1 2 2 2 1 2 2 0 1 0 1 0 2 1 2 0 1 2 2 0 0 1 1 0 1 1 0 0 1 0 1 2 0 2 0 0 1 2 2 1 0 2 1]

Explanation:

In this case, we work the same way as previously, rather than just using Pickle to store and load the models, we use joblib. We use the joblib.dump() function to store the model, which creates a pickled file. Next, we can use joblib. load() to load the model from the file and predict outcomes using the obtained model.

Conclusion:

You may preserve your machine learning models for later usage and save time by using Pickle or joblib instead of having to retrain them. When it comes to serialising objects, Pickle is a flexible alternative, whereas joblib excels when working with huge numpy arrays.






Latest Courses