上QQ阅读APP看书，第一时间看更新

Feature scaling

A very important engineering technique that is necessary to perform even with neural networks is feature scaling. It's necessary to scale the numerical input to have all the features on the same scale; otherwise, the network will give more importance to features with larger numerical values.

A very simple transformation is re-scaling the input between 0 and 1, also known as MinMax scaling. Other common operations are standardization and zero-mean translation, which makes sure the standard deviation of the input is 1 and the mean is 0, which in the scikit-learn library are implemented in the scale method:

from sklearn import preprocessing
import numpy as np
X_train = np.array([[ -3., 1.,  2.],
                   [ 2.,  0.,  0.],
                   [ 1.,  2., 3.]])
X_scaled = preprocessing.scale(X_train)

The preceding command generates the following result:

Out[2]:
array([[-1.38873015,  0.        ,  0.26726124],
       [ 0.9258201 , -1.22474487, -1.33630621],
       [ 0.46291005,  1.22474487,  1.06904497]])

You can find many other numerical transformations already available in scikit-learn. Some other important transformations from its documentation are as follows:

PowerTransformer: This transformation applies a power transformation to each feature in order to transform the data to follow a Gaussian-like distribution. It will find the optimal scaling factor to stabilize the variance and at the same time minimize skewness. The PowerTransformer transformation of scikit-learn will force the mean to be zero and force the variance to 1.
QuantileTransformer: This transformation has an additional output_distribution parameter that allows us to force a Gaussian distribution to the features instead of a uniform distribution. It will introduce saturation for our inputs' extreme values.