## Introduction to Automated Machine Learning

@donald_whyte

Originally made for: A bit about myself...
![bloomberg](images/bloomberg-logo.svg) Infrastructure Engineer
Hackathons Organiser / Mentor / Hacker
![bt](images/bt.svg) ![ibm](images/ibm.svg) ![helper.io](images/helper.svg) Applied machine learning in: * network security * enterprise software management * employment
## What is Machine Learning?
A mechanism for machines to **learn** behaviour with no human intervention Programs that can **adapt** when exposed to new data Based on **pattern recognition**
![education](images/application1.jpg) ![security](images/application2.jpg) ![robotics](images/application3.jpg) ![finance](images/application4.jpg) ![speech recognition](images/application5.jpg) ![advertising](images/application6.jpg)
### Huge Growth
### Why? * cheaper computing computer * cheaper data storage * more data than ever ‐ everyone is online * produces: - greater volume - greater variety * because it's *cool*
### The Difference ![venn diagram](images/venn-diagram.svg)
## Supervised Learning Use labeled historical data to predict future outcomes
Given some input data, predict the correct output ![shapes](images/shapes.svg) What **features** of the input tell us about the output?

### Feature Space

• A feature is some property that describes raw input data
• An input can be represented as a vector in feature space
• 2 features = 2D vector = 2D space
### Why Use Feature Space? ![feature-extractor](images/feature-extractor.svg) * Could simply use raw binary data as input * Raw inputs are complex and noisy * Abstract the complexity away by using features
• Training data is used to produce a model
• f(x̄) = mx̄ + c
• Model divides feature space into segments
• Each segment corresponds to one output class

Use trained model to classify new, unseen inputs

### Choosing a Suitable Model

### Evaluation * Many ways to evaluate - overall accuracy, number of false positives, etc. * Depends on what problem you're trying to solve - are false positive acceptable? * `k`-fold cross validation commonly used
![k fold cross validation](images/k-fold-cross-validation.svg) * testing accuracy just using the training dataset is bad * classifier should generalise to unseen data points * split training data into two sets - one for training - one for testing * perform this process *k* times using different splits
### Discrete or Continuous? * Classification - output is from a finite, discrete set - of *'classes'* * Regression - output is a real number from a continuous range
## Process
[Jason Brownlee's Process](http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/): 1. define the problem 2. prepare data 3. spot check algorithms 4. tuning 5. present results
### 1. Define the Problem Models are useless if you're solving the wrong problem * what is it you exactly want to do? * primary requirement: speed? correctness? * what does the data come from? * how will users/systems be affected by your model?
### Faces vs Vegetables ![face-veg-img](demo/data/face1.jpg) ![face-veg-img](demo/data/vegetable1.jpg) ![face-veg-img](demo/data/face5.jpg) ![face-veg-img](demo/data/vegetable10.jpg) ![face-veg-img](demo/data/vegetable19.jpg) ![face-veg-img](demo/data/vegetable7.jpg) | | | | ----------- | ------------------ | | **Input:** | image (rgb pixels) | | **Output:** | face or vegetable |
### 2. Prepare the Data Garbage in, garbage out * What's the source of the data? - collected by your system(s)? - provided by third-parties? * Which features to extract? - more features **≠** better accuracy - how long does it take to extract features?
### Data Source * 40 images - 20 faces - 20 vegetables * manually labelled by a friend - who we trust
### Features to Extract from Images * average intensity of each colour channel across all pixels - red, green, blue * average saturation across all pixels - bright and strong colours
![sobel edge detection](images/sobel-edge-detection.png) * how about the complexity of the images? * more complicated shapes have more edge pixels * feature: - proportion of pixels are 'edge pixels' * **sobel edge detection** used to find edge pixels
1. mean red colour 2. mean green colour 3. mean blue colour 4. mean saturation 5. edge pixel ratio - \# edge pixels / \# total pixels
### 3. Spot Check Algorithms
![](images/machinelearningalgorithms.png) ![recursive data space](images/data-space2.png) ![decision tree](images/decision-tree.png)
![arbitrary data space](images/data-space3.png) ![neural network](images/neural-network.png)
Which algorithm to use? * Humans can be biased * Take the decision out of our hands entirely * **Automate** the selection of algorithms
Spot check every algorithm you can! * Run your dataset(s) across dozens of algorithms * **10-fold cross-validation** - measure accuracy, false positives and false negatives * compare results of each algorithm using statistical tests - say *"algorithm A is better than B"* with confidence
### Let's try it out! ![python](images/python.svg) ![sklearn](images/sklearn.svg) * Python * scikit-learn * Could also use: * Pylearn2, MILK, Theano, ...
``` from sklearn import svm from sklearn import tree from sklearn import naive_bayes classifiers = { 'SVM': svm.SVC(kernel='linear', C=1), 'Decision Tree': tree.DecisionTreeClassifier(criterion='gini', split='best'), 'Gaussian Naive Bayes': naive_bayes.GaussianNB(), ... } ```
``` from sklearn import preprocessing from sklearn import cross_validation featureVecs = preprocessing.normalize(featureVectors) cross_validation.cross_val_score( classifier, featureVectors, labels, cv=kFolds) ```
| Classifier | Mean Acc. | Confidence Interval | Lower Bound Acc | | ------------------------ | --------- | ------------------- | --------------- | | Gaussian Naive Bayes | 0.975 | 0.15 | 0.825 | | Decision Tree | 0.95 | 0.2 | 0.75 | | Multi-Nomial Naive Bayes | 0.85 | 0.331662 | 0.518338 | | SVM | 0.775 | 0.415331 | 0.359669 | | Neural Network (Sigmoid) | 0.525 | 0.269258 | 0.255742 | | Bernoulli Naive Bayes | 0.5 | 0 | 0.5 |
Decision tree had good performance ![face veg decision tree](images/face-veg-decision-tree.svg)
### 4. Tuning * Pick top `x` algorithms from previous step * Smaller set of algorithms to manually investigate * Greater confidence chosen algorithms are naturally good at picking out the structure of the dataset / feature space
#### Squeeze out Remaining Performance 1. algorithm tuning - tune each algorithm for better accuracy - search hyperparameter space 2. ensembles - combine multiple 'okay' models into one, better model 3. feature refinement
#### Hyperparameter Optimisation * Learn model **parameters** * Must set model's **hyperparameters** before training * Tune performance by searching hyperparameter space ![hyperparameter_optimisation](images/hyperparameter_optimisation.jpg)
[**auto_sklearn**](https://github.com/automl/auto-sklearn) Library on top of `sklearn` Automates the spot check and tuning stages
Automatically build ensemble of models Learns best model types and hyperparameters for you It's as easy as: ```python # Train classifier for up to two minutes. # Longer training time, better accuracy (in general). autoClassifier = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=120) autoClassifier.fit(trainingFeatures, trainingLabels) # Output accuracy of classifier when run against test dataset. print(autoClassifier.score(testFeatures, testLabels)) print(autoClassifier.show_models()) ```
``` Score: 0.987452948557 [(0.940000, SimpleClassificationPipeline(configuration={ 'balancing:strategy': 'none', 'classifier:__choice__': 'adaboost', 'classifier:adaboost:algorithm': 'SAMME', 'classifier:adaboost:learning_rate': 1.2306208006800998, 'classifier:adaboost:max_depth': 6, 'classifier:adaboost:n_estimators': 499, 'imputation:strategy': 'median', 'one_hot_encoding:use_minimum_fraction': 'False', 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:criterion': 'entropy', 'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_classification:max_features': 3.5347851525007146, 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 6, 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 8, 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'rescaling:__choice__': 'standardize'})), (0.020000, SimpleClassificationPipeline(configuration={ 'balancing:strategy': 'none', 'classifier:__choice__': 'adaboost', 'classifier:adaboost:algorithm': 'SAMME', 'classifier:adaboost:learning_rate': 1.0081104516473922, 'classifier:adaboost:max_depth': 6, 'classifier:adaboost:n_estimators': 468, 'imputation:strategy': 'mean', 'one_hot_encoding:use_minimum_fraction': 'False', 'preprocessor:__choice__': 'liblinear_svc_preprocessor', 'preprocessor:liblinear_svc_preprocessor:C': 1.1828431725901418, 'preprocessor:liblinear_svc_preprocessor:dual': 'False', 'preprocessor:liblinear_svc_preprocessor:fit_intercept': 'True', 'preprocessor:liblinear_svc_preprocessor:intercept_scaling': 1, 'preprocessor:liblinear_svc_preprocessor:loss': 'squared_hinge', 'preprocessor:liblinear_svc_preprocessor:multi_class': 'ovr', 'preprocessor:liblinear_svc_preprocessor:penalty': 'l1', 'preprocessor:liblinear_svc_preprocessor:tol': 0.0022792606924326923, 'rescaling:__choice__': 'min/max'})), (0.020000, SimpleClassificationPipeline(configuration={ 'balancing:strategy': 'weighting', 'classifier:__choice__': 'proj_logit', 'classifier:proj_logit:max_epochs': 11, 'imputation:strategy': 'median', 'one_hot_encoding:minimum_fraction': 0.002883367159521145, 'one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:__choice__': 'gem', 'preprocessor:gem:N': 16, 'preprocessor:gem:precond': 0.30628439346357783, 'rescaling:__choice__': 'standardize'})), (0.020000, SimpleClassificationPipeline(configuration={ 'balancing:strategy': 'weighting', 'classifier:__choice__': 'passive_aggressive', 'classifier:passive_aggressive:C': 0.0036975653885940544, 'classifier:passive_aggressive:fit_intercept': 'True', 'classifier:passive_aggressive:loss': 'hinge', 'classifier:passive_aggressive:n_iter': 326, 'imputation:strategy': 'mean', 'one_hot_encoding:use_minimum_fraction': 'False', 'preprocessor:__choice__': 'kitchen_sinks', 'preprocessor:kitchen_sinks:gamma': 0.6227804363658538, 'preprocessor:kitchen_sinks:n_components': 1821, 'rescaling:__choice__': 'normalize'})), ] ```
### 5. Present Results * Produce document that explains: - problem - solution (final algorithm/features/datasets used) - accuracy / speed of solution - limitations of solution * Be sure to list any other insights discovered along the way
![process](images/process.svg)
### Automation is Key * Create a test harness that: - feature extraction - trains models using many algorithms - evaluates models in a rigorous way * Ensure you can trust that harness
## Summary
Machine learning is all about **automation**. It's about automatically finding patterns in data... ...and building models to fit that data. Models that also *generalise*.
Follow the five step process: 1. define problem 2. prepare data 3. spot check algorithms 4. tuning 5. present results Automate as much as you can. So you can focus on **feature engineering**.
Don't reinvent the wheel. Use the **hundreds of tools** already there.
### Tools ![r](images/r.svg) ![python](images/python.svg) ![java](images/java.svg) ![scala](images/scala.svg) ![sklearn](images/sklearn.svg) ![theano](images/theano.png) ![mdp](images/mdp.png) ![spark](images/spark.png)
### Useful Resources [Data Mining: Practical Machine Learning Tools and Techniques](http://machinelearningmastery.com/6-practical-books-for-beginning-machine-learning/) [A Tour of Machine Learning Algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/) [Jason Brownlee's Process](http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/) [Introduction to Machine Learning with sci-kit Learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html) [Efficient and Robust Automated Machine Learning](https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf)
### This Presentation [donaldwhyte.github.io/intro-to-ml/automated](http://donaldwhyte.github.io/intro-to-ml/automated) [github.com/DonaldWhyte/intro-to-ml/](http://github.com/DonaldWhyte/intro-to-ml)