avatar

机器学习-特征预处理:归一化

一、定义

通过对原始数据进行变换把数据映射到(默认为[0,1])之间

二、公式

image.png

三、API

sklearn. preprocessing .MinMaxScaler (feature_range=(0, 1)…)

​ o MinMaxScalar .fit_ transform(X)

X: numpy array格式的数据[n_ samples, n_ features]

​ 返回值:转换后的形状相同的array

四、代码实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from sklearn.preprocessing import MinMaxScaler
import jieba
import pandas as pd 

def minmax_demo():
    #归一化
    #1.获取数据
    data = pd.read_csv("dating.txt")
    data = data.iloc[:,:3]
    print("data:\n",data)
    #2.实例化一个转换器类
    transfer = MinMaxScaler(feature_range=[0,1])
    #3.调用fit_transform
    data_new = transfer.fit_transform(data)
    print("data_new:\n",data_new)
    return None

五、运行结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
data:
milage Liters Consumtime
0 40920 8.326976 0.953952
1 14488 7.153469 1.673904
2 26052 1.441871 0.895124
3 75136 13.147394 0.428964
4 38344 1.669788 0.134296
5 72993 18.141748 1.932955
6 35948 6.838792 1.213192
7 42666 13.276369 0.543888
8 67497 8.631577 0.749278
9 35483 12.273169 1.508953
10 50242 3.723498 0.831917
11 63275 8.385879 1.669485
12 5569 4.875435 0.728658
13 51052 4.688098 0.625224
14 77372 15.299570 0.331351
15 43673 1.889461 0.191283
16 61364 7.516754 1.269164
17 69673 14.239195 0.261333
18 15669 0.000000 1.259185
19 28488 10.528555 1.304844
20 6487 3.549265 0.822483
21 37708 2.991551 0.833920
22 22620 5.297865 0.638306
23 28782 6.593803 0.187108
24 19739 2.816760 1.686209
25 36788 12.458258 0.649617
26 5741 0.000000 1.656418
27 28567 9.968648 0.731232
28 6808 1.364838 0.640103
data_new:
[[0.49233319 0.45899524 0.45570394]
[0.12421487 0.3943098 0.85597548]
[0.28526663 0.07947806 0.42299736]
[0.96885924 0.72470382 0.1638265 ]
[0.45645725 0.09204119 0. ]
[0.93901369 1. 1. ]
[0.42308817 0.37696434 0.59983354]
[0.51664972 0.73181311 0.22772076]
[0.86247093 0.4757853 0.34191139]
[0.41661212 0.67651524 0.76426771]
[0.62216063 0.20524472 0.38785618]
[0.80367116 0.46224206 0.85351865]
[0. 0.26874119 0.33044729]
[0.6334415 0.2584149 0.27294112]
[1. 0.84333494 0.10955662]
[0.53067421 0.10414989 0.03168305]
[0.77705667 0.41433461 0.63095228]
[0.89277607 0.7848855 0.07062873]
[0.14066265 0. 0.62540426]
[0.31919279 0.58034953 0.65078928]
[0.01278498 0.19564074 0.38261116]
[0.44759968 0.16489872 0.38896978]
[0.23746919 0.29202616 0.28021432]
[0.32328733 0.36346018 0.02936187]
[0.19734551 0.15526398 0.86281669]
[0.43478685 0.68671762 0.28650289]
[0.00239544 0. 0.84625379]
[0.32029302 0.54948663 0.33187836]
[0.01725555 0.07523189 0.28121339]]

六、总结

最大值最小值是变化的,最大值与最小值非常容易受异常点影响

所以这种方法鲁棒性较差,只适合传统精确小数据场景


评论