「Python 數據分析」學習筆記（三）

Memo's Blog

2018-06-13

「Python 數據分析」Study Notes

前言

使用套件

從 numpy 擴充程式庫引入 sin() 函式。

1
2
3

>>> import numpy
>>> numpy.sin(3)
0.1411200080598672

幫引入的擴充程式庫 numpy 取一個別名 np。

1
2
3

>>> import numpy as np
>>> np.pi
3.141592653589793

數據分析基本套件

數據分析的標準前置動作。

使用 %matplotlib inline 可以直接印出圖像，不必再輸入 plt.show() 指令。

1
2
3

>>> %matplotlib inline
>>> import numpy as np
>>> import matplotlib.pyplot as plt

繪製折線圖。

1	>>> plt.plot([1, -5, 6, 19, 33])

取得 10 個平均値是 0 且標準差是 1 的亂數。

1
2
3

>>> np.random.randn(10)
array([ 1.23484066,  1.20427008,  0.68396149,  0.67672962, -1.10757575,
       -1.23647484,  0.05671657,  0.45058016, -0.85757227,  0.77991074])

處理串列元素

在迴圈使用 append() 函式將處理過的元素放進陣列。

>>> p = [456, 615, 849]
>>> c = 30.1
>>> result = []
>>> for k in p:
        twd = k * c
        result.append(twd)
>>> result
[13725.6, 18511.5, 25554.9]

在數據分析中使用迴圈並不是一個好方法。

陣列

使用 numpy 提供的 np.array() 方法，將串列轉換為陣列。

>>> p = [456, 615, 849]
>>> prices = np.array(p)
>>> prices
array([456, 615, 849])
>>> prices * c
array([13725.6, 18511.5, 25554.9])

在數據分析中使用陣列是一個好方法。

矩陣運算

把陣列中的各個元素乘上不同的加權係數，並且相加起來。

>>> grades = np.array([85, 76, 82])
>>> weights = np.array([0.3, 0.4, 0.3])
>>> g = grades * weights
>>> g
array([25.5, 30.4, 24.6])
>>> sum(g)
80.5

或直接使用 numpy 提供的 np.dot() 方法，進行矩陣相乘。

1 2	>>> np.dot(grades, weights) 80.5

也可以將 np.dot() 方法運用在二維陣列。

>>> grades = [[78, 69, 80],
              [58, 99, 57],
              [90, 80, 58],
              [91, 96, 86]]
>>> weights = np.array([0.3, 0.4, 0.3])
>>> np.dot(grades, weights)
array([75. , 74.1, 76.4, 91.5])

定義多維陣列

使用 numpy 提供的 np.shape() 方法，查看或定義陣列的維度與大小。

>>> A = np.random.randn(12)
>>> A
array([-1.31021773,  0.90918643,  1.48955726, -0.14605302, -1.76752982,
        0.07526171,  0.92027213,  1.08025008,  0.73763194, -1.15403166,
       -0.53647627, -0.25006991])
>>> A = A * 10 + 50 # 將標準差變成 10，平均値變成 50
>>> A
array([36.89782265, 59.09186435, 64.89557259, 48.53946981, 32.32470176,
       50.75261707, 59.20272128, 60.80250077, 57.37631941, 38.45968336,
       44.63523726, 47.4993009 ])
>>> np.shape(A)
(12,)
>>> A.shape = (3, 2, 2)
>>> A
array([[[36.89782265, 59.09186435],
        [64.89557259, 48.53946981]],

       [[32.32470176, 50.75261707],
        [59.20272128, 60.80250077]],

       [[57.37631941, 38.45968336],
        [44.63523726, 47.4993009 ]]])
>>> np.shape(A)
(3, 2, 2)

或使用 reshape() 方法，達到同樣的效果。

>>> A = A.reshape(6, 2)
array([[36.89782265, 59.09186435],
       [64.89557259, 48.53946981],
       [32.32470176, 50.75261707],
       [59.20272128, 60.80250077],
       [57.37631941, 38.45968336],
       [44.63523726, 47.4993009 ]])

reshape() 方法不會更動原來的陣列。

生成陣列

使用 for 迴圈。

>>> xy = [[x, y] for x in range(3) for y in range(3)]
>>> np.array(xy)
array([[0, 0],
       [0, 1],
       [0, 2],
       [1, 0],
       [1, 1],
       [1, 2],
       [2, 0],
       [2, 1],
       [2, 2]])

生成一個値都是 0 的陣列。

1 2	>>> np.zeros(10) array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

生成一個値都是 1 的陣列。

1 2	>>> np.ones(10) array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

生成一個 n 階的單位矩陣。

>>> np.eye(5)
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

陣列繪圖

使用 numpy 提供的 np.linspace() 方法建立繪圖區間。

1 2	>>> x = np.linspace(0, 10, 10) >>> plt.plot(x, np.sin(x)) # 繪製 sin 的圖形

np.linspace() 方法的三個參數分別是起始値、結束値，以及描點的數量。

陣列篩選

判斷陣列當中符合條件的元素。

>>> L = np.array([3, -2, 57, 64, -33, 17, 96, -74])
>>> L < 0
array([False,  True, False, False,  True, False, False,  True])
>>> L[L < 0]
array([ -2, -33, -74])
>>> x = np.linspace(-5, 5, 500)
    y = np.sinc(x)
    plt.plot(x, y)
>>> plt.plot(x[y < 0], y[y < 0], 'r') # 只畫出 y < 0 的部分