Hello,
While studying the book "Deep Learning with JavaScript" by Shanqing Cai, Stanley Bileschi, Eric D. Nielsen, and François Chollet,
I was not entirely satisfied with the examples and results obtained using the TensorFlow.js library.
As a result, I decided to rewrite the examples from scratch using different strategies to see if better results could be achieved. I replaced too commonjs modules by a modern usage of the ES6 modules. I removed all unnecessary dependencies (like the one for CSV format).
You need to install tensoflow.js (I used the simple one tfjs) on your machine
Using my package :
npm ior directly :
npm i @tensorflow/tfjsThe data directory contains a local version of the Boston Housing dataset (all with the CSV format), which includes 12 features and 333 samples. I wrote myself the CSV parsing inside the loadData method.
I choose to normalize the data using the formula: (value − min_value) / (max_value − min_value). I don't use the book normalize function with the mean. This is a general normalizer for any tensor2d object.
I added a second parameter colValues containing the max/min value for each column of the train data set to be sure to have the same normalization space.
function normalizer( tensor2d, colValues ) {
const shape = tensor2d.shape;
const colCount = shape[1];
const normalisees = [];
const lastColValues = [];
for ( let i = 0; i < colCount; i++ ) {
const col = tensor2d.slice( [ 0, i ], [-1, 1 ] );
const minValue = colValues ? colValues[ i ].minValue : col.min();
const maxValue = colValues ? colValues[ i ].maxValue : col.max();
const delta = maxValue.sub( minValue );
const colNorm = ( col.sub( minValue ) ).div( delta );
normalisees.push( colNorm );
lastColValues.push( {
maxValue,
minValue
} );
}
return {
tensor:tf.concat( normalisees, 1),
colValues:lastColValues
}
}I train a non-linear model with 2 layers using various strategies.
The goal is to estimate the price for house using 12 features (crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,lstat).
npm run ex1or
node src/example1.jsFinal loss inside the Book at 23 with 50 units for the first layer ?
All my strategies (the hyperparameters) are configurable via this table:
const strategies = [
{ maxUnits : 1, maxEpochs : 100, loss : "meanAbsoluteError", activation: "relu", optimizer : "sgd" }, // Good
{ maxUnits : 5, maxEpochs : 100, loss : "meanAbsoluteError", activation: "relu", optimizer : "sgd" }, // Good
{ maxUnits : 5, maxEpochs : 100, loss : "meanSquaredError", activation: "relu", optimizer : "sgd" }, // Bad
{ maxUnits : 5, maxEpochs : 100, loss : "meanAbsoluteError", activation: "sigmoid", optimizer : "sgd" }, // Bad
{ maxUnits : 5, maxEpochs : 100, loss : "meanAbsoluteError", activation: "relu", optimizer : "adam" }, // Bad
];It uses 5 units only and the basic "sgd" optimizer for the best result.
The result for this data is:
Loss Result with {"maxUnits":1,"maxEpochs":100,"loss":"meanAbsoluteError","activation":"relu","optimizer":"sgd"} .... : 3.983289031982422
Loss Result with {"maxUnits":5,"maxEpochs":100,"loss":"meanAbsoluteError","activation":"relu","optimizer":"sgd"} .... : 3.623922109603882
Loss Result with {"maxUnits":5,"maxEpochs":100,"loss":"meanSquaredError","activation":"relu","optimizer":"sgd"} .... : 19.283775329589844
Loss Result with {"maxUnits":5,"maxEpochs":100,"loss":"meanAbsoluteError","activation":"sigmoid","optimizer":"sgd"} .... : 6.683498382568359
Loss Result with {"maxUnits":5,"maxEpochs":100,"loss":"meanAbsoluteError","activation":"relu","optimizer":"adam"} .... : 9.521827697753906
The best loss achieved is 3.6, but it is possible to go below 2 by increasing the number of epochs.
Definitely, the choices proposed in the book were not optimal. Using a much more computationally expensive strategy, they achieved a loss of 23, which is double the worst performance of my example.
It is possible that the normalization technique has a significant impact ?.
The data directory contains a local version of the Phishing web site dataset (all with the CSV format), which includes 30 features and about 5000 samples. I wrote myself the CSV parsing inside the loadData method.
We needn't to normalize data as this is a set of 0 or 1.
I train a non-linear model with 2 layers using various strategies.
The goal is to classify a phishing or not web site using 30 features. We considere a label to 1 for "Yes this is a phishing web site" and the label to 0 for "No, this is not a phishing web site".
npm run ex2or
node src/example2.jsconst strategies = [
{ maxUnits : 10, maxEpochs : 100, loss : "binaryCrossentropy", activation: "sigmoid", optimizer : "adam", threshold:0.5 },
{ maxUnits : 10, maxEpochs : 100, loss : "binaryCrossentropy", activation: "sigmoid", optimizer : "adam", threshold:0.6 },
{ maxUnits : 10, maxEpochs : 100, loss : "binaryCrossentropy", activation: "sigmoid", optimizer : "adam", threshold:0.7 },
{ maxUnits : 10, maxEpochs : 100, loss : "binaryCrossentropy", activation: "sigmoid", optimizer : "adam", threshold:0.8 }
];We use only the binaryCrossentropy which adds good or bad score depending on the prediction rate.
When a result probability > theshold, then it means this is a label 1 and else this is a label 0. So each time we increase the threshold, we improve the precision of the prediction because we ask a very high probability.
The result are only for the Label 1 (phishing detection). Good prediction is a rate of quality when the model predicts a label 1. The Miss prediction is for the case the model predicts a label 0 for a label 1.
- Label 1 : Good prediction (97.64791025872988%) - Missed prediction (4.776551474579338%)
- Label 1 : Good prediction (97.90121223086665%) - Missed prediction (5.210783426813824%)
- Label 1 : Good prediction (98.91442011941378%) - Missed prediction (8.576081056631084%)
- Label 1 : Good prediction (100%) - Missed prediction (43.495567215487604%)
- Label 1 : Good prediction (99.6924190338339%) - Missed prediction (11.072914781979373%)
=> High Precision (100% at threshold 0.7):
The model only predicts "phishing" when it is almost certain. Pro: No false alarms (all predicted "phishing" sites are truly phishing). Con: Many actual phishing sites are missed (high missed prediction rate, e.g., 43%).
Low Missed Predictions (Lower Threshold):
The model predicts "phishing" more often, catching more actual phishing sites. Pro: Fewer missed phishing sites. Con: More false alarms (lower precision).
There's no universal solution ! The choice depends on your priority
The data directory contains a dataset for the iris flowers.
There're 4 features for detecting the following flowers :
Iris-setosa Iris-versicolor Iris-virginica
Each flower is used as an array with 3 columns :
Iris-setosa : [1,0,0] Iris-versicolor : [0,1,0] Iris-virginica : [0,0,1]
Each column is a probability (so 1 is for 100%).
Be able to detect a flower from 4 features.
npm run ex3or
node src/example3.jsconst strategies = [
{ maxUnits : 100, maxEpochs : 500, loss : "categoricalCrossentropy", activation: "sigmoid", optimizer : "adam" },
{ maxUnits : 10, maxEpochs : 250, loss : "categoricalCrossentropy", activation: "sigmoid", optimizer : "adam" },
{ maxUnits : 100, maxEpochs : 250, loss : "categoricalCrossentropy", activation: "relu", optimizer : "adam" },
{ maxUnits : 10, maxEpochs : 500, loss : "categoricalCrossentropy", activation: "relu", optimizer : "adam" },
{ maxUnits : 10, maxEpochs : 500, loss : "categoricalCrossentropy", activation: "tanh", optimizer : "adam" }
];The "categoricalCrossentropy" is required for a multi-class problem. Here we have 3 labels for 3 flowers.
We displayed both the total accuracy and the accuracy by flower for each strategy. You may run several times for comparing the results.
{"maxUnits":100,"maxEpochs":500,"loss":"categoricalCrossentropy","activation":"sigmoid","optimizer":"adam"}
Total Accuracy =98%
- Flower Iris-setosa Accuracy = 100 %
- Flower Iris-versicolor Accuracy = 100 %
- Flower Iris-virginica Accuracy = 96 %
--------------------------------------------
{"maxUnits":10,"maxEpochs":250,"loss":"categoricalCrossentropy","activation":"sigmoid","optimizer":"adam"}
Total Accuracy =68%
- Flower Iris-setosa Accuracy = 100 %
- Flower Iris-versicolor Accuracy = 100 %
- Flower Iris-virginica Accuracy = 65 %
--------------------------------------------
{"maxUnits":100,"maxEpochs":250,"loss":"categoricalCrossentropy","activation":"relu","optimizer":"adam"}
Total Accuracy =98%
- Flower Iris-setosa Accuracy = 100 %
- Flower Iris-versicolor Accuracy = 100 %
- Flower Iris-virginica Accuracy = 73 %
--------------------------------------------
{"maxUnits":10,"maxEpochs":500,"loss":"categoricalCrossentropy","activation":"relu","optimizer":"adam"}
Total Accuracy =97%
- Flower Iris-setosa Accuracy = 100 %
- Flower Iris-versicolor Accuracy = 100 %
- Flower Iris-virginica Accuracy = 77 %
{"maxUnits":10,"maxEpochs":500,"loss":"categoricalCrossentropy","activation":"tanh","optimizer":"adam"}
Total Accuracy =98%
- Flower Iris-setosa Accuracy = 100 %
- Flower Iris-versicolor Accuracy = 96 %
- Flower Iris-virginica Accuracy = 100 %The relu activation for the first layer is good enough. The sigmoid usage from the book is not necessary. 10 neurons is enough too for the first layer.
The MNIST Data is a 60000 28x28 images for training and 10000 28x28 images for testing. Each image is a number between 0 to 9. So a label is this number.
The data base used a ubyte specific format for storing each image and label.
Be able to detect the number that this drawn inside an image.
npm run ex4or
node src/example4.jsImportant note : Running the MNIST data base is a high cost for the CPU. So it is recommanded to switch from
import tf from '@tensorflow/tfjs';to
import tf from '@tensorflow/tfjs-node';or better if you have a GPU
import tf from '@tensorflow/tfjs-node-gpu';Note that for ARM (my cpu here), you can't switch to tfjs-node or tfjs-node-gpu.
I have added a limitSize parameter if you use tfjs only. Else you must use
const limitSize = Number.MAX_SAFE_INTEGER;For running on the whole MNIST database.
{
kernelSize:2,
filters:8,
units:32
},
{
kernelSize:3,
filters:16,
units:64
},
{
kernelSize:3,
filters:32,
units:64
},
{
kernelSize:3,
filters:32,
units:128
},
{
kernelSize:3,
filters:64,
units:64
},
{
kernelSize:2,
filters:32,
units:64
},
{
kernelSize:4,
filters:32,
units:64
}I have limited the model to one convolution layer for performance reason.
The training/evaluation here is only with 1000 images, so it impacts the accuracy rate. I didn't set too the epochs parameter for performance.
{"kernelSize":2,"filters":8,"units":32}
Loss : 2.100555896759033 / Accuracy : 40.50%
Evaluating...
================================
{"kernelSize":3,"filters":16,"units":64}
Loss : 1.7298730611801147 / Accuracy : 62.10%
Evaluating...
================================
{"kernelSize":3,"filters":32,"units":64}
Loss : 1.7157704830169678 / Accuracy : 52.60%
Evaluating...
================================
{"kernelSize":3,"filters":32,"units":128}
Loss : 1.5706546306610107 / Accuracy : 56.00%
Evaluating...
================================
{"kernelSize":3,"filters":64,"units":64}
Loss : 1.6900482177734375 / Accuracy : 53.20%
Evaluating...
================================
{"kernelSize":2,"filters":32,"units":64}
Loss : 1.820560097694397 / Accuracy : 48.10%
Evaluating...
================================
{"kernelSize":4,"filters":32,"units":64}
Loss : 1.6055097579956055 / Accuracy : 56.80%It seems there's no a big impact for the filter around 16, and for the units of the complex layer around 64.
