2017年11月26日 星期日

CAP 定理 與 NoSQL




NoSQL是對不同於傳統的關聯式資料庫資料庫管理系統的統稱。

兩者存在許多顯著的不同點,其中最重要的是


  1. NoSQL不使用SQL作為查詢語言
  2. 資料存儲可以不需要固定的表格模式 (Schema)
  3. 經常會避免使用SQL的JOIN操作,一般有水平可延伸性的特徵。

不同於RDMBS 的ACID特點,NoSQL的結構通常提供弱一致性的保證 ,如最終一致性,或交易僅限於單個的資料項。




CAP 定理 與 NoSQL 套件對照表

CAP 定理 可參考下述文件:
CAP theorem (CAP定理), 下述摘錄自此兩篇:
在 理論計算機科學中, CAP定理(CAP theorem), 又被稱作 布魯爾定理(Brewer's theorem), 它指出對於一個 分布式計算系統 來說,不可能同時滿足以下三點:
  • Consistency: 一致性 (所有節點在同一時間具有相同的數據) (all nodes see the same data at the same time)
  • Availability: 可用性 (保證每個請求不管成功或者失敗都有響應) (a guarantee that every request receives a response about whether it was successful or failed)
  • Partition tolerance: 分隔容忍 (系統中任意信息的丟失或失敗不會影響系統的繼續運作) (the system continues to operate despite arbitrary message loss or failure of part of the system)
根據定理,分佈式系統只能滿足三項中的兩項而不可能滿足全部三項。
因為 CAP 三者無法同時達成, 所以 NoSQL 套件目前都是符合其中兩項, 另外一項支援度就會差一點, 下述文章有整理目前 NoSQL 的資料庫, 對應 CAP 定理 的整理.
BASE理論
BASE理論是CAP理論結合實際的產物。 BASE(Basically Available, Soft-state,Eventually consistent) BASE恰好和ACID是相對的,BASE要求犧牲高一致性,獲得可用性或可靠性
References:
https://blog.longwin.com.tw/2013/03/nosql-db-choose-cap-theorem-2013/


https://goo.gl/dcdPPA

2017年11月22日 星期三

.py 和 ipython notebook (.ipynb) 互轉?


1) 如何將 xxx .py 轉成 xxx.ipynb ?

在jupyter 上的, 執行

%load relu.py

就可以將relu.py 程式碼讀入, 然後再存檔, 就會自動存成了 relu.ipynb 了!









2) 如何將 xxx .ipynb 轉成 xxx.py ?

在juypter 的一個cell下, 執行下列命令!

!jupyter nbconvert --to script hw3.ipynb








2017年11月17日 星期五

CNN using keras



傳統 DNN, Layer 是一個平面的


在CNN中, 每一個Layer是一個Cube



A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).



An example input volume in red (e.g. a 32x32x3 CIFAR-10 image), and an example volume of neurons in the first Convolutional layer. Each neuron in the convolutional layer is connected only to a local region in the input volume spatially, but to the full depth (i.e. all color channels). Note, there are multiple neurons (5 in this example) along the depth, all looking at the same region in the input - see discussion of depth columns in text below


he neurons from the Neural Network chapter remain unchanged: They still compute a dot product of their weights with the input followed by a non-linearity, but their connectivity is now restricted to be local spatially.


 

Pooling layer downsamples the volume spatially, independently in each depth slice of the input volume. Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square).


如何計算參數量

# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print('y_train shape:', y_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(2, (5, 5), padding='same', input_shape=x_train.shape[1:]))
print('input_shape:',x_train.shape[1:])
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])
model.summary()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_46 (Conv2D)           (None, 32, 32, 2)         152  ((5*5*3)+1)*2=152
                                  其他位置也都是共用這152參數,所以不是圖愈大參數愈多
                                  +1: bias 項, WX+b
______________________________________________________________
activation_67 (Activation)   (None, 32, 32, 2)         0         
_________________________________________________________________
conv2d_47 (Conv2D)           (None, 30, 30, 32)        608  ((3*3*2)+1)*32=608      
_________________________________________________________________
activation_68 (Activation)   (None, 30, 30, 32)        0         
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 15, 15, 32)        0         
_________________________________________________________________
dropout_34 (Dropout)         (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_48 (Conv2D)           (None, 15, 15, 64)        18496     
_________________________________________________________________
activation_69 (Activation)   (None, 15, 15, 64)        0         
_________________________________________________________________
conv2d_49 (Conv2D)           (None, 13, 13, 64)        36928     
_________________________________________________________________
activation_70 (Activation)   (None, 13, 13, 64)        0         
_________________________________________________________________
max_pooling2d_24 (MaxPooling (None, 6, 6, 64)          0         
_________________________________________________________________
dropout_35 (Dropout)         (None, 6, 6, 64)          0         
_________________________________________________________________
flatten_12 (Flatten)         (None, 2304)              0         
_________________________________________________________________
dense_23 (Dense)             (None, 512)               1180160   
_________________________________________________________________
activation_71 (Activation)   (None, 512)               0         
_________________________________________________________________
dropout_36 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_24 (Dense)             (None, 10)                5130      
_________________________________________________________________
activation_72 (Activation)   (None, 10)                0         
=================================================================
Total params: 1,241,474
Trainable params: 1,241,474
Non-trainable params: 0
_________________________________________________________________


2017年11月11日 星期六

PCA



In PCA, we are interested to find the directions (components) that maximize the variance in our dataset. In DataSet, 某些Feature對於每一筆X有比較大的差異, 對Y分類影響會比較明顯。若Feature 對於所有X都差不多, 則此Feature並不是作為分群(Partition)的最好選擇。

PCA(n_components=2), 會自動找出前2個主要Feature。下圖為2維, 畫出其分類(0,1,2)的狀態



==============


Residual variance ?

Then you fit a regression model. You use the regression equation to calculate a predicted score for each person.

Then you find the difference between the predicted scores and the actual scores. You calculate the variance of the set of scores. It's the residual variance. The residual variance will be less than the total variance (or if your predictors are completely useless, they will be equal).

How much variance did you explain?
Explained variance = (total variance - residual variance)

The proportion of variance explained is therefore:

explained variance / total variance

If your predicted scores exactly match the outcome scores, you've perfectly predicted the scores, and you've explained all of the variance. The residuals are all zero.  

(Note: The calculations are done with sums of squares, variances will give a very slightly different answer as they are usually calculated as SS/(N-1). But if the sample is large, this difference is trivial).

https://goo.gl/xVQ1r8

2017年11月10日 星期五

k-fold cross-validation


    #另一種方法做cross validation, 若validation data從training data拿,則training sample 就會少了..
    #使用 k-fold cross-validation 可以解決此問題
   scores=cross_val_score(dct, Xtrain, ytrain, cv=5)
   print(scores.mean())




However, by partitioning the available data into three sets, we drastically reduce the number of samples which can be used for learning the model, and the results can depend on a particular random choice for the pair of (train, validation) sets.
A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:
  • A model is trained using k-1 of the folds as training data;
  • the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).
The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop. This approach can be computationally expensive, but does not waste too much data (as it is the case when fixing an arbitrary test set), which is a major advantage in problem such as inverse inference where the number of samples is very small.

https://goo.gl/xVQ1r8

2017年11月9日 星期四

Multi-layer Perceptron classifier



sklearn.linear_model.LogisticRegression
This model optimizes the log-loss function using LBFGS or stochastic gradient descent.

solver : 要使用那一種演算法去解最佳化的問題, 如 weight optimization.

solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’},
   default: ‘liblinear’ Algorithm to use in the optimization problem.

batch_size : int, optional, default ‘auto’
Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)

max_iter : int, optional, default 200
Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
that is , total parameter update= (# of batch_size) * (# of epochs)

tol : float, optional, default 1e-4
Tolerance for the optimization. When the loss or score is not improving by at least tol for two consecutive iterations, unless learning_rate is set to ‘adaptive’, convergence is considered to be reached and training stops.
momentum : float, default 0.9
Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.
early_stopping : bool, default False
Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10% of training data as validation and terminate training when validation score is not improving by at least tol for two consecutive epochs. Only effective when solver=’sgd’ or ‘adam’
validation_fraction : float, optional, default 0.1
The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True


fit(X, y)Fit the model to data matrix X and target(s) y.
get_params([deep])Get parameters for this estimator.
predict(X)Predict using the multi-layer perceptron classifier
predict_log_proba(X)Return the log of probability estimates.
predict_proba(X)Probability estimates.
score(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels.
set_params(**params)Set the parameters of this estimator.

predict_proba(X) Probability estimates. The returned estimates for all classes are ordered by the label of classes.

Y=predict_proba(X) is a method of a (soft) classifier outputting the probability of the instance being in each of the classes.
Y 是一個list, 列出X 會落在Y分類中的機率值
若是二元分類, 則Y為落在0 或落在1的機率
若是多元分類, 則Y為落在各類別的機率值 (加起來的機率總合為1)

http://www.ittraining.com.tw/ittraining/106AIoT/AIoT.html

2017年11月4日 星期六

機器學習(Machine Learning)與深度學習(Deep Learning)


什麼是機器學習(Machine Learning)?
機器學習是人工智慧的其中一個分支,簡單來說它就是讓機器可以像人一樣,可藉由閱讀大量的資料建立規則而具有知識,而機器學習就是透過演算法來分析數據、從中學習來判斷或預測現實世界裡的某些事(專家其實是訓練有素的狗,機器學習大概就是這個味道) 。閱讀已知的大量資料就是所謂訓練的過程,透過足夠多的Training Data 建立出一個 Model (最佳函式), 這個Model  就是"機器的知識",之後你可以餵未知的資料給機器,機器就能進行預測或判斷 。機器學習並非手動事先用程式撰寫好指令規則來判斷,而是使用大量的數據和演算法來「訓練」機器,自動產生規則自動判斷或預測。 


什麼是深度學習(Deep Learning)?

它是機器學習領域中的一種方法,它企圖用模擬人類大腦中的神經網路(Neural Network)的運作方式來建構機器學習的Model。例如,眼睛看到車快撞上來了,人的反應會立刻跳開,這中間可能經過大腦無數個神經元的運算及傳導,至今人類可能都還搞不清楚大腦怎麼辦到的,因為中間過程太複雜,但從頭尾結果來看卻很簡單, 就是看到車要撞到了,人會閃。Neural Network 概念,其實正是如此,一堆資料X 而每筆資料有多個特徵值(x1,x2,x3,x4),NN中每一層的權重W就可以決定出最終的 Y (y1,y2,y3)。中間可能有很多層,很多層就叫做Deep Neural Network (DNN)。但如何建構NN,包括應該有多少層、每一層要有多少個Neuron 、如何出權重值的方法,則是Deep Learning 所要採討技術核心。 基於DNN 概念,又發展出了 CNN卷積神經網路(Convolution Neuron Networks) 是影像處裡常用的方法,在圖片中抽取特徵後做Convolution再做pooling,而將能將圖片進行分類預測。RNN  (Recurrent Neural Networks)主要用於文字語意處理,可以用來判斷前後文而使機器能理解一句話的意思。



機器學習或深度學習的技術可以應用在哪?
機器學習的演算法已廣泛應用於資料探勘、電腦視覺、自然語言處理、語音和手寫識別、生物特徵識別、DNA序列測、搜尋引擎、醫學診斷、金融數據分析與證券市場分析等方面。所以幾乎各行各業,包含金融業、零售業、製造業、醫療產業等,都會用到機器學習的技術。而以深度學習作為機器學習的方法,則多用在影像辨識、語音分析,這種比較沒有因果關係或者連人類都很難解釋的邏輯推理。
https://goo.gl/QTCwwo

===================================
深度學習的Hello World:  "MNIST" 手寫數字辨識

簡單說明Artifical Neural Network 基本流程, 當然也可以Deep Learning 的CNN (Constitutional Neural Network )實作它, 你就對CNN有更進一步的認識


step: training data 將每一個手寫數字轉成pixel vector, 作為input X , 每個pixel 作為feature 




建立Network Model (網路參數.....)


以手寫數字"6" 示意在input layer 的態度


 訓練好Model 後, 對testing data 進行predict, 能將手寫的0~9數字Label出對應的 0~9. 然後比較正確率(Accuracy) 
當正確率不足, 如何修改網路參數以提高正確率...正是學問之所在? 在kaggle 上目前有6萬多筆的手寫數字, 其正確率是100%......

https://goo.gl/EcCcj7