StandardScaler in Sklearn
When and How to Use StandardScaler?
When the features of the given dataset fluctuate significantly within their ranges or are recorded in various units of measurement, StandardScaler enters the picture.
The data are scaled to a variance of 1 after the mean is reduced to 0 via StandardScaler. But when determining the empirical mean of the data and standard deviation, outliers present in data have a significant impact that reduces the spectrum of characteristic values.
Many machine learning algorithms may encounter issues due to these variations in the starting features. For algorithms that calculate distance, for instance, if any of the dataset's features have values having large or completely different ranges, that particular feature of the dataset will control the distance calculation.
The StandardScaler function of sklearn is based on the theory that the dataset's variables whose values lie in different ranges do not have an equal contribution to the model's fit parameters and training function and may even lead to bias in the predictions made with that model.
Therefore, before including the features in the machine learning model, we must normalize the data (µ = 0, σ = 1). Standardization in feature engineering is commonly employed to address this potential issue.
Standardizing using Sklearn
By eliminating the mean from the features and scaling them to unit variance, features are standardised using this function.
The formula for calculating a feature's standard score is z = (x - u) / s, where u is the training feature's mean (or zero if with_mean = False) and s is the standard deviation of the sample (or one if with_std = False).
By calculating the pertinent statistics on the features in the training set, centring and scaling are applied independently to each feature. Then, for usage with later samples using transform(), the fit() method stores the mean and standard deviation.
Methods of the StandardScaler Class
Example of StandardScaler
Firstly, we will import the required libraries. To use the StandardScaler function, we need to import the Sklearn library.
Then we will load the iris dataset. We can import the IRIS dataset from the sklearn.datasets library.
We will create an object of the StandardScaler class.
Separating the independent and target features.
We will use the fit transform() method to implement the transformation to the dataset.
We initially built an instance of the StandardScaler() method following the syntax mentioned above. Additionally, we standardise the data by using fit_transform() together with the provided object.
[[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2]] [[-0.90068117 1.01900435 -1.34022653 -1.3154443 ] [-1.14301691 -0.13197948 -1.34022653 -1.3154443 ] [-1.38535265 0.32841405 -1.39706395 -1.3154443 ]] [5.84333333 3.05733333 3.758 1.19933333]