official tutorial

Learning notebook of Udacity's deep learning course

https://www.udacity.com/course/deep-learning--ud730

Install Tensorflow https://www.tensorflow.org/install/

adviced to have a NVIDIA GPU, mine is GTX1060
I've successfully installed Tensorflow on Windows 7
- install CUDA® Toolkit 8.0
  - http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/

my relevant notebooks¶

1. beginning notebook http://jishichao.com/deep1
  - softmax
  - cross-entropy
    - cPickle, one-hot, nomalizing, hashing, ...
1. dive deeper http://jishichao.com/deep2
  - neural network
  - Back propagation
  - regularization
    - L2
    - dropout
Implementation
- data exploratory
http://jishichao.com/deepinp1
```
 * scikit-learn implementation
```

build tensorflow models¶

http://jishichao.com/deepinp2

notMNIST dataset¶

http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from six.moves import cPickle as pickle
from sklearn.metrics import accuracy_score, classification_report
%matplotlib inline

pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (480000, 28, 28) (480000,)
Validation set (40000, 28, 28) (40000,)
Test set (18500, 28, 28) (18500,)

explore data¶

plt.figure(figsize=(10,10))
print (train_labels[:12])
for i in range(12):
    plt.subplot(1,12,i+1)
    plt.imshow(train_dataset[i], cmap='gray')
    plt.axis('off')

[2 2 9 2 8 0 6 8 9 7 0 1]

tensorflow only accepts onehot label¶

Reformat into a shape that's more adapted to the models we're going to train:

data as a flat matrix,
labels as float 1-hot encodings.

valid_labels

array([1, 5, 0, ..., 3, 9, 1])

image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)

  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = pd.get_dummies(labels).values.astype(np.float32)
#   labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (480000, 784) (480000, 10)
Validation set (40000, 784) (40000, 10)
Test set (18500, 784) (18500, 10)

build model structure¶

plt.figure(figsize=(10,10))
plt.imshow(plt.imread('2-1.jpg'))

<matplotlib.image.AxesImage at 0xe633710>

1. specify a graph¶

graph = tf.Graph()

Create notes over a computation graph¶

2. define inputs, placeholders, constants, variables¶

The difference is that with tf.Variable you have to provide an initial value when you declare it.
- With tf.placeholder you don't have to provide an initial value and
  - you can specify it at run time with the feed_dict argument inside Session.run
### Variables.
These are the parameters that we are going to be training.
- The weight matrix will be initialized using random values following a (truncated) normal distribution.
- The biases get initialized to zero.

3. write operations¶

Training computation.
- We multiply the inputs with the weight matrix, and add biases.
- We compute the softmax and cross-entropy
  - (it's one operation in TensorFlow, because it's very common, and it can be optimized).
- loss function
  - cross-entropy across all training examples ### tensorflow functions
    - tf.matmul(t1, t2)
      - matrix multiply
    - tf.reduce_mean(t)
      - Computes the mean of elements across dimensions of a tensor.
    - tf.nn.softmax_cross_entropy_with_logits(labels = , logits = )
      - Computes softmax cross entropy between logits and labels.
    - tf.nn.softmax(logits)
      - Computes softmax activations.

Optimizer¶

### the target of running this graph
- We are going to find the minimum of this loss using gradient descent
  - tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
    - specify learning rate
    - the target is to minimize loss

makes this `Graph` the default graph.¶

with graph.as_default():
    pass

add L2 regularization and dropout¶

dropout in kernel func
L2 in loss func

batch_size = 128

beta = 0.001

with graph.as_default():
    
    # Input data.
    # For the training data, we use a placeholder that will be fed.
    tf_train_dataset = tf.placeholder(tf.float32, shape = (None, image_size * image_size))    
    tf_train_labels = tf.placeholder(tf.float32, shape = (None, num_labels))
    
    # valid dataset
    # Load the validation data into constants that are attached to the graph.
    tf_valid_dataset = tf.constant(valid_dataset)
    # Load the test data into constants that are attached to the graph.
    tf_test_dataset = tf.constant(test_dataset)
    
    # Variables.
    # These are the parameters that we are going to be training. 
    # The weight matrix will be initialized using random values following a (truncated) normal distribution. 
    # The biases get initialized to zero.
    
    weights = tf.Variable(tf.truncated_normal(shape = [image_size * image_size, num_labels]))
    biases = tf.Variable(tf.zeros(shape = [num_labels]))
    
    
    # Training computation.
    # We multiply the inputs with the weight matrix, and add biases. 
    # We compute the softmax and cross-entropy 
    # (it's one operation in TensorFlow, because it's very common, and it can be optimized). 
    # We take the average of this cross-entropy across all training examples: that's our loss.
    
    # tf.matul(t1, t2): matrix multiply
    
    # add dropout
    def kernel(in_put):
#         global weights, biases
        out_put = tf.matmul(in_put, weights) + tf.nn.dropout(biases, 1)
        return out_put
    
    logits = kernel(tf_train_dataset)
    
    # tf.reduce_mean(t): Computes the mean of elements across dimensions of a tensor.
    # tf.nn.softmax_cross_entropy_with_logits(labels = , logits = ): Computes softmax cross entropy between logits and labels.
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits( labels = tf_train_labels, logits = logits )
    + beta*(tf.nn.l2_loss(weights)+tf.nn.l2_loss(biases))
    )
    
    # Optimizer.
    # We are going to find the minimum of this loss using gradient descent.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(loss = loss)
    
    
    # Predictions for the training, validation, and test data.
    # These are not part of training, but merely here so that we can report accuracy figures as we train.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(kernel(tf_valid_dataset))
    test_prediction = tf.nn.softmax(kernel(tf_test_dataset))

evaluation¶

accuracy¶

# def accuracy(predictions, labels):
    
#     return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
#           / predictions.shape[0])
# # accuracy_score?

def accuracy(labels, predictions):
    from sklearn.metrics import accuracy_score
    
    percent = accuracy_score(
        np.argmax(labels,1), np.argmax(predictions,1)
    )*100
    
    return percent

run the graph¶

Then you can run the operations on this graph as many times as you want by calling session.run(),
- providing it outputs to fetch from the graph that get returned.
  - This runtime operation is all contained in the block below:
    - with tf.Session(graph=graph) as session:
      
      ...

stochastic gradient descent¶

create a Placeholder node which will be fed actual data at every call of session.run().

tf.global_variables_initializer().run()
- Run the named file inside IPython as a program.

Prepare a dictionary telling the session where to feed the minibatch.¶

The key of the dictionary is the placeholder node of the graph to be fed,
- and the value is the numpy array to feed to it.

session.run(fetches, feed_dict=None, options=None, run_metadata=None)¶

https://www.tensorflow.org/api_docs/python/tf/Session

Docstring: Runs operations and evaluates tensors in fetches.

This method runs one "step" of TensorFlow computation,
- by running the necessary graph fragment to execute every Operation
  - and evaluate every Tensor in fetches,
    - substituting the values in feed_dict for the corresponding input values.

The `fetches` argument¶

may be a single graph element,
or an arbitrarily nested list, tuple, namedtuple, dict, or OrderedDict containing graph elements at its leaves.
- A graph element can be one of the following types:
  - An [Operation], The corresponding fetched value will be None.
  - A [Tensor], The corresponding fetched value will be a numpy ndarray containing the value of that tensor.
  - A [SparseTensor] The corresponding fetched value will be a [SparseTensorValue], containing the value of that sparse tensor.
  - A get_tensor_handle op. The corresponding fetched value will be a numpy ndarray containing the handle of that tensor.
  - A string which is the name of a tensor or operation in the graph.

The value returned by run() has the same shape as the fetches argument,
- where the leaves are replaced by the corresponding values returned by TensorFlow.

make sure to train the train_dataset once¶

train_dataset.shape[0]

480000

batch_size = 128
num_steps = 10001

for i in range(num_steps):
    c = ( i * batch_size) % (train_labels.shape[0] - batch_size)
    if i%1000==0:
        print (c, end = ',')
        
print (c)

0,128000,256000,384000,32128,160128,288128,416128,64256,192256,320256,320256

batch_size = 128
num_steps = 40001

from time import time
start = time()

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    
    for step in range(num_steps):
        
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.    
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
    
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.    
        feed_dict = {tf_train_dataset : batch_data,
                    tf_train_labels : batch_labels}
        
        
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            
            
        if (step % 5000 == 0):
            
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(batch_labels, predictions))
            
            print("Validation accuracy: %.1f%%" % accuracy(valid_labels, valid_prediction.eval()))
            
    pred, real = test_prediction.eval(), test_labels        
    print("Test accuracy: %.1f%%" % accuracy(pred, real))

print ('\ntesting')
print('\ntotal time: %.2f s'%(time()-start))

Initialized
Minibatch loss at step 0: 21.258343
Minibatch accuracy: 6.2%
Validation accuracy: 6.2%
Minibatch loss at step 5000: 1.387043
Minibatch accuracy: 88.3%
Validation accuracy: 78.5%
Minibatch loss at step 10000: 0.870092
Minibatch accuracy: 83.6%
Validation accuracy: 81.8%
Minibatch loss at step 15000: 0.753875
Minibatch accuracy: 80.5%
Validation accuracy: 82.6%
Minibatch loss at step 20000: 0.602650
Minibatch accuracy: 89.1%
Validation accuracy: 83.0%
Minibatch loss at step 25000: 0.780403
Minibatch accuracy: 77.3%
Validation accuracy: 83.1%
Minibatch loss at step 30000: 0.571501
Minibatch accuracy: 86.7%
Validation accuracy: 83.1%
Minibatch loss at step 35000: 0.618889
Minibatch accuracy: 83.6%
Validation accuracy: 83.1%
Minibatch loss at step 40000: 0.586011
Minibatch accuracy: 85.9%
Validation accuracy: 83.0%
Test accuracy: 89.7%

testing

total time: 85.21 s

Test accuracy: 88.0%

batch_size = 128

num_steps = 8001

letters = list('ABCDEFGHIJK')
letters

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

letters[0]

'A'

dic = {i: letters[i] for i in range(len(letters))}
print(dic)

{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H', 8: 'I', 9: 'J', 10: 'K'}

plt.figure(figsize=(10,10))
for i in range(12):
    plt.subplot(1,12,i+1)
    plt.imshow(test_dataset[i].reshape(28,28), cmap='gray')
    plt.axis('off')

for i in pd.Series(pred.argmax(1)).map(dic)[:12]:print (i, end=' ')

C D J J F G J E E A E G

for i in pd.Series(real.argmax(1)).map(dic)[:12]:print (i, end=' ')

C D J J F G J E A A E D

mnist dataset exploration

http://jishichao.com/mnist0

my relevant notebooks¶

build tensorflow models¶

notMNIST dataset¶

explore data¶

tensorflow only accepts onehot label¶

build model structure¶

1. specify a graph¶

Create notes over a computation graph¶

2. define inputs, placeholders, constants, variables¶

3. write operations¶

Optimizer¶

makes this Graph the default graph.¶

add L2 regularization and dropout¶

evaluation¶

accuracy¶

run the graph¶

stochastic gradient descent¶

Prepare a dictionary telling the session where to feed the minibatch.¶

session.run(fetches, feed_dict=None, options=None, run_metadata=None)¶

The fetches argument¶

make sure to train the train_dataset once¶

makes this `Graph` the default graph.¶

The `fetches` argument¶