2017年12月30日 星期六

Gnuplot-3D


splot "data.txt" using 1:2:3 with lines






#data.txt 每一條線隔2行空白, 線可分開繪製, 若是只隔一行空白則會連起來




#data.txt
# X   Y    Z
16    16   0
16   32   1
16   64   15
16    128  4
16    256  9


32    16   10
32   32   21
32    64   31
32    128  34
32    256  49


64    16   0
64   32   12
64   64   19
64    128  41
64    256  93


128   16   30
128   32   31
128   64   15
128   128  14
128   256  93


256   16   30
256   32   31
256   64   80
256   128  14
256   256  5

https://goo.gl/EcCcj7

https://goo.gl/uMDpM1

2017年12月7日 星期四

GPU vs CPU




如果你要跑CNN或是RNN , 由於Training Accuracy 要降, 沒有執行一定次數的參數update是無法做到的, 所以資料量大又要不斷運算, 藉由GPU來執行,會快上超多!

經實測結果以下面兩種平台法測試, 速度真得差了很多...



平台1: 沒有GPU的平台
CPU: model name   : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
# of Thread    : 12
RAM: 10G
OS: Ubuntu 16.04



平台2 : 具有GPU的平台
CPU model name    : Intel Core i7-7700K CPU @ 4.5GHz
# of Thread     : 6
GPU: GeForce GTX 1080
RAM:  32G
OS: Ubuntu 16.04


DeviceQuery Example



https://goo.gl/uMDpM1


2017年11月22日 星期三

.py 和 ipython notebook (.ipynb) 互轉?


1) 如何將 xxx .py 轉成 xxx.ipynb ?

在jupyter 上的, 執行

%load relu.py

就可以將relu.py 程式碼讀入, 然後再存檔, 就會自動存成了 relu.ipynb 了!









2) 如何將 xxx .ipynb 轉成 xxx.py ?

在juypter 的一個cell下, 執行下列命令!

!jupyter nbconvert --to script hw3.ipynb







https://goo.gl/EcCcj7


2017年11月17日 星期五

CNN using keras



傳統 DNN, Layer 是一個平面的


在CNN中, 每一個Layer是一個Cube



A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).



An example input volume in red (e.g. a 32x32x3 CIFAR-10 image), and an example volume of neurons in the first Convolutional layer. Each neuron in the convolutional layer is connected only to a local region in the input volume spatially, but to the full depth (i.e. all color channels). Note, there are multiple neurons (5 in this example) along the depth, all looking at the same region in the input - see discussion of depth columns in text below


he neurons from the Neural Network chapter remain unchanged: They still compute a dot product of their weights with the input followed by a non-linearity, but their connectivity is now restricted to be local spatially.


 

Pooling layer downsamples the volume spatially, independently in each depth slice of the input volume. Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square).


如何計算參數量

# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print('y_train shape:', y_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(2, (5, 5), padding='same', input_shape=x_train.shape[1:]))
print('input_shape:',x_train.shape[1:])
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])
model.summary()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_46 (Conv2D)           (None, 32, 32, 2)         152  ((5*5*3)+1)*2=152
                                  其他位置也都是共用這152參數,所以不是圖愈大參數愈多
                                  +1: bias 項, WX+b
______________________________________________________________
activation_67 (Activation)   (None, 32, 32, 2)         0         
_________________________________________________________________
conv2d_47 (Conv2D)           (None, 30, 30, 32)        608  ((3*3*2)+1)*32=608      
_________________________________________________________________
activation_68 (Activation)   (None, 30, 30, 32)        0         
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 15, 15, 32)        0         
_________________________________________________________________
dropout_34 (Dropout)         (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_48 (Conv2D)           (None, 15, 15, 64)        18496     
_________________________________________________________________
activation_69 (Activation)   (None, 15, 15, 64)        0         
_________________________________________________________________
conv2d_49 (Conv2D)           (None, 13, 13, 64)        36928     
_________________________________________________________________
activation_70 (Activation)   (None, 13, 13, 64)        0         
_________________________________________________________________
max_pooling2d_24 (MaxPooling (None, 6, 6, 64)          0         
_________________________________________________________________
dropout_35 (Dropout)         (None, 6, 6, 64)          0         
_________________________________________________________________
flatten_12 (Flatten)         (None, 2304)              0         
_________________________________________________________________
dense_23 (Dense)             (None, 512)               1180160   
_________________________________________________________________
activation_71 (Activation)   (None, 512)               0         
_________________________________________________________________
dropout_36 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_24 (Dense)             (None, 10)                5130      
_________________________________________________________________
activation_72 (Activation)   (None, 10)                0         
=================================================================
Total params: 1,241,474
Trainable params: 1,241,474
Non-trainable params: 0
_________________________________________________________________

https://goo.gl/EcCcj7




2017年11月11日 星期六

PCA



In PCA, we are interested to find the directions (components) that maximize the variance in our dataset. In DataSet, 某些Feature對於每一筆X有比較大的差異, 對Y分類影響會比較明顯。若Feature 對於所有X都差不多, 則此Feature並不是作為分群(Partition)的最好選擇。

PCA(n_components=2), 會自動找出前2個主要Feature。下圖為2維, 畫出其分類(0,1,2)的狀態



==============


Residual variance ?

Then you fit a regression model. You use the regression equation to calculate a predicted score for each person.

Then you find the difference between the predicted scores and the actual scores. You calculate the variance of the set of scores. It's the residual variance. The residual variance will be less than the total variance (or if your predictors are completely useless, they will be equal).

How much variance did you explain?
Explained variance = (total variance - residual variance)

The proportion of variance explained is therefore:

explained variance / total variance

If your predicted scores exactly match the outcome scores, you've perfectly predicted the scores, and you've explained all of the variance. The residuals are all zero.  

(Note: The calculations are done with sums of squares, variances will give a very slightly different answer as they are usually calculated as SS/(N-1). But if the sample is large, this difference is trivial).


2017年11月10日 星期五

k-fold cross-validation


    #另一種方法做cross validation, 若validation data從training data拿,則training sample 就會少了..
    #使用 k-fold cross-validation 可以解決此問題
   scores=cross_val_score(dct, Xtrain, ytrain, cv=5)
   print(scores.mean())




However, by partitioning the available data into three sets, we drastically reduce the number of samples which can be used for learning the model, and the results can depend on a particular random choice for the pair of (train, validation) sets.
A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:
  • A model is trained using k-1 of the folds as training data;
  • the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).
The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop. This approach can be computationally expensive, but does not waste too much data (as it is the case when fixing an arbitrary test set), which is a major advantage in problem such as inverse inference where the number of samples is very small.



2017年11月9日 星期四

Multi-layer Perceptron classifier



sklearn.linear_model.LogisticRegression
This model optimizes the log-loss function using LBFGS or stochastic gradient descent.

solver : 要使用那一種演算法去解最佳化的問題, 如 weight optimization.

solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’},
   default: ‘liblinear’ Algorithm to use in the optimization problem.

batch_size : int, optional, default ‘auto’
Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)

max_iter : int, optional, default 200
Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
that is , total parameter update= (# of batch_size) * (# of epochs)

tol : float, optional, default 1e-4
Tolerance for the optimization. When the loss or score is not improving by at least tol for two consecutive iterations, unless learning_rate is set to ‘adaptive’, convergence is considered to be reached and training stops.
momentum : float, default 0.9
Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.
early_stopping : bool, default False
Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10% of training data as validation and terminate training when validation score is not improving by at least tol for two consecutive epochs. Only effective when solver=’sgd’ or ‘adam’
validation_fraction : float, optional, default 0.1
The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True


fit(X, y)Fit the model to data matrix X and target(s) y.
get_params([deep])Get parameters for this estimator.
predict(X)Predict using the multi-layer perceptron classifier
predict_log_proba(X)Return the log of probability estimates.
predict_proba(X)Probability estimates.
score(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels.
set_params(**params)Set the parameters of this estimator.

predict_proba(X) Probability estimates. The returned estimates for all classes are ordered by the label of classes.

Y=predict_proba(X) is a method of a (soft) classifier outputting the probability of the instance being in each of the classes.
Y 是一個list, 列出X 會落在Y分類中的機率值
若是二元分類, 則Y為落在0 或落在1的機率
若是多元分類, 則Y為落在各類別的機率值 (加起來的機率總合為1)



2017年11月1日 星期三

資料的4種(變數)型態與 MEASUREMENT SCALES



資料的4種(變數)型態與 MEASUREMENT SCALES


Nominal

用一個名稱(Label)來表示分類其選擇

Let’s start with the easiest one to understand.  Nominal scales are used for labeling variables, without any quantitative value.  “Nominal” scales could simply be called “labels.”



Ordinal
Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.
有次序關係: 非常滿意、滿意、普通、不滿意 、非常不滿意
“Ordinal” is easy to remember because is sounds like “order” and that’s the key to remember with “ordinal scales”–it is the order that matters, but that’s all you really get from these.
Interval ==> +, 1
Like the others, you can remember the key points of an “interval scale” pretty easily.  “Interval” itself means “space in between,” which is the important thing to remember–interval scales not only tell us about order, but also about the value between each item.

有次序關係: 非常滿意、滿意、普通、不滿意 、非常不滿意
Ratio   ==> +, - x / 
一般Rational Number 有理數 (即整數與可化為"分數"的數都是有理數), 有理數可以做加減乘除
Ratio scales provide a wealth of possibilities when it comes to statistical analysis. 

Summary
In summary, nominal variables are used to “name,” or label a series of values.  Ordinal scales provide good information about the order of choices, such as in a customer satisfaction survey.  Interval scales give us the order of values + the ability to quantify the difference between each one.  Finally, Ratio scales give us the ultimate–order, interval values, plus the ability to calculate ratios since a “true zero” can be defined.


Multi class classification轉成binary classification 的問題




Multi class classification 轉成binary classification 的問題

Multiclass option can be either ‘ovr’ or ‘multinomial’. If the option chosen is ‘ovr’, then a binary problem is fit for each label. Else the loss minimised is the multinomial loss fit across the entire probability distribution. Does not work for liblinear solver.




https://goo.gl/9gnVYe



Control Your Raspberry Pi using Line Bot





#lineBot #raspberry Pi #IoT

https://goo.gl/VMiYmq


2017年10月20日 星期五

為什麼要對特徵標準化(feature normalization)?



為什麼在做clustering 或Classification  常會將Feature做Normalize, 但為什麼要呢?
google了些關於正規化(normalize)的資料 都只有公式和分布狀態的資訊 但我疑惑的點是在比較資料時 但還是不懂為什麼要正規化呢?

z-score:
標準化(normalize)就是去掉不同Feature單位的影響. 所以標準化只改變尺度(Scale)和中心點(平移)但Shape 不變

所以對每一個feature 都做正規化後,Feature 之間都是"unit variance", 每一個Feature 都具有相同的scale。例如: 用身高的數值都比眼睛視力的數值大, 正規化後都變具有相同的scale

Feature具有相同scale , 對於d loss/d w 就不會有差異, 否則scale較大的, 它的w對Loss 影響較大


https://en.wikipedia.org/wiki/Feature_scaling

===================================
Normalization 的程序:





2017年10月17日 星期二

Python 操控 S4A 透過通訊協定來下手

Python 操控 S4A 透過通訊協定來下手

S4A Project 是一個有趣的專案,讓Scratch 也可以操作Arduino。 要達到這樣的功能,Arduino需燒入S4A的官方韌體。
在這邊要說的是,其實不用 scratch 也可以直接操作帶有 S4A 韌體的板子,只要搞清楚其中的連線是如何進行的就可以。這邊用 Python 的程式碼來控制 S4A 的板子。

通訊協定封包格式

其實 S4A 的板子在連線之後是透過 UART 介面來收發封包進行連線和操作的。所以我們可以使用邏輯分析儀來進行UART封包的分析:

取樣之後會發現 S4A 板每 16 ms 就發送一個封包給 scratch,而 scratch 會每85ms就回一個封包給 S4A 板子。

我們先將S4A板子傳給 scratch 的封包放大來看,可以看到裡面的傳遞數值的情形:

可以知道每個封包帶有 16 bytes,而S4A封包的設計上是用一個 high byte 和 一個low byte 共兩個 byte 組成一個 Channel 的資訊。所以可以知道,S4A 每16ms送一次的封包之中就帶有8個channel的資訊,這就符合 scratch 上在連線時所看到的即時channel資訊(analog x 6 和 digit x 2)。也就是 s4a每16ms就會更新一次所有 輸入pin腳(共8個)的資訊。
將目光轉移到 scratch 傳給 s4a 板子的封包。

可以知道每個封包帶有 20 bytes,所以 S4A 每 85ms 送一次的封包之中就帶有10個channel的資訊,也就是 scratch 每 85ms 就要更新一次所有 輸出 pin腳(共20個)的資訊。而這10個channel所代表的輸出資訊表如下。表格中有所有輸出入channel的定義和封包格式。其中我們將 motoduino所使用到用來控制車輪的輸出項標示出來。

不過光看表格其實很難理解所,每2byte所組成的 channel 資訊的格式是怎麼樣的,所以下我們整理了一下,以方便理解。

如上圖所示,每個 channel 的資訊是由兩個值: Channel ID 和 Value 所組成。
Channel ID 用來表示指定的輸出入 pin 腳號碼,以 s4a 板子傳遞給 scratch 來說,channel ID 為 "2" ,二進位表示 "0010",此二進位值要填進去上圖的4格紅色格子中。
Value 用來表示指定 pin 腳的數值,如果是數位輸出入其值為1或0,如果是pwm輸出時其值為0~255,如果是類比輸入時其值為0~1023。將數值轉換為二進位時最多會使用到10個bit,將這10bit分成前三格 後七格 的形式分別填入 high-byte 的右邊三格綠色格子和low-byte的右邊7個格子。
而 high byte 的最左邊的一格格子會固字填入 "1" ,low byte 的最左邊格子會固定填入 "0",這樣的話,我們在解析封包的時候才不會把 high-byte 和 low-byte 搞錯。


用Python 程式碼來做編碼和解碼的動作

知道這樣的規則之後,我們就可以用一小段程式碼寫出編解碼的函示:
def pack_to_data(data):
    dev_id = (data[0] & 0b01111000) >> 3
    dev_val = ((data[0] & 0b00000111 ) << 7) | (data[1] & 0b01111111)
    return (dev_id, dev_val)

def data_to_pack(dev_id,dev_val):
    data[0] = 0b10000000|( dev_id<<3)|((dev_val>>7)&0b00000111)
    data[1] = (0b0001111111&dev_val)
    return bytes([data[0],data[1]])
第一段程式碼是用來解碼的。參數 data 是一個 2byte 的s4a封包。最後會回傳dev_id(就是channel ID)和 dev_val (就是value)。
而第二段程式碼是用來解碼的。代入參數dev_id (channel ID)和 dev_val (Value)之後就會回傳s4a 形式的封包。

Python 程式碼和s4a板子溝通

在Linux 系統上,arduino 用USB連接之後,會出現 "/dev/ttyUSB0" 這個裝置檔。這表示說,arduino 是用標準的 serial port 的形式來連線。所以我們用下面的一段python程式碼來建立起連線:
#!/usr/bin/env python3

import serial
from time import sleep
import sys
import threading


#ser = serial.Serial('/dev/ttyUSB0',9600,8,serial.PARITY_NONE,serial.STOPBITS_ONE)




class s4a_slave(object):

    def __init__(self,port):
        self.ser = serial.Serial(port,38400,8,'N',1)
        self.pin_outputs = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] # dev id : 4 ~ 13
        self.pin_inputs = [0,0,0,0,0,0,0,0,0,0,0] # analog 0 ~ 5
        self.count = 0

    def main_loop(self):
        while(True):
            sleep(0.001)
            data = self.ser.read(2)
            if data[0] & 0b10000000:
                dev_id, dev_val = self.pack_to_data(data)
                self.pin_inputs[dev_id] = dev_val
            else:
                data = self.ser.read()
            self.count += 1
            if( self.count >= 64):
                self.count = 0
                for i in range(4,14):
                    data = self.data_to_pack( i , self.pin_outputs[i] )
                    self.ser.write(data)

    def pack_to_data(self,data):
        dev_id = (data[0] & 0b01111000) >> 3
        dev_val = ((data[0] & 0b00000111 ) << 7) | (data[1] & 0b01111111)
        return (dev_id, dev_val)

    def data_to_pack(self,dev_id,dev_val):
        data = [0,0]
        data[0] = 0b10000000 | ((dev_id & 0b00001111) << 3) | ( ( dev_val >> 7 ) & 0b00000111 )
        data[1] = ( 0b0001111111 & dev_val )
        return bytes([data[0],data[1]])

    def start(self):
        self.th = threading.Thread(target = self.main_loop , args=())
        self.th.start()

    def set_dev(self,dev_id,dev_val):
        if((dev_val >= 0) and (dev_val < 1024)):
            self.pin_outputs[dev_id] = int(dev_val)



if __name__=="__main__":
    s4a = s4a_slave('/dev/ttyUSB0')
    s4a.start()
    while(True):
        cmd = input().split()
        s4a.set_dev(int(cmd[0]),int(cmd[1]))
程式碼執行之後,畫面會等待使用者輸入如下的指令:
5 200 (按ENTER)
表示將channel ID 5 的 Value 設定為 200(channel ID 5 的 pin 腳被s4a定義為 pwm 輸出)。 如果正確連線的話,s4a板子的 pin 5 就會開始輸出 pwm 訊號 (duty cycle 為 200/255)。
進行到這裡,其實我們就可以不用再管scratch 而直接使用python來控制 s4a 的板子了。