Silicon Vanity | Tech lifestyle in Silicon Valley: October 2016

Monday, October 31, 2016

Udacity Machine Learning Nanodegree Udacity Connect Intensive Syllabus

1. Model evaluation and validation

1.1 STATISTICAL ANALYSIS
1.2 DATA MODEL
1.3 EVALUATION AND VALIDATION
1.4 MANAGING ERROR AND COMPLEXITY
1.5 PROJECT

1.3 EVALUATION AND VALIDATION

1.3.1 TRAINING AND TESTING

1.3.1.1 Benefit of testing
1.3.1.2 Train / Test Split in sklearn
Useful concepts : train_test_split function

1.3.2 EVALUATION METRICS

1.3.2.1 Metrics
1.3.2.2 Classification and Regression

Useful concepts: Categorical data vs continuous data

1.3.2.3 Classification metrics

Useful concepts: discrete predictions

1.3.2.4 Accuracy

Useful concepts: proportion of items classified or labeled correctly, my_model.score(X_test, y_test). Shortcoming of accuracy if data is skewed, or need to err on side of innocence or git. Accuracy: no. of items in a class labeled correct / all items in that class (Erron has a small number of innocent people)

Picking the Most Suitable Metric

Concept: information asymmetry

Confusion Matrix

Concept: if care about asymmetric learning, may want to shift the decision front up or down to include certain results

Decision Tree: confusion matrix
Precision and Recall
Equation for Precision

Concept: precision = true positives / (true positives and false positives)

Equation for Recall

Concept: precision = true positives / (true positives and false negatives)

Precision vs Recall
F1 Score
Regression metrics

Mean Absolute Error
Mean Squared Error
Regression Scoring Function

Managing Error and Complexity
Cause of Error
Error due to bias
Linear Learner, Quadratic Data (programming learning curve)
Error due to Variance - Precision and Overfitting

Representative Power of a Model

1.1. Curse of Dimensionality

1.2. Curse of Dimensionality Two
Learning Curves and Model Complexity
1.1 Learning Curves
1.2 Learning Curves II
1.3 Ideal Learning Curves
1.4 Model Complexity
1.5 Learning Curves and Model Complexity
1.6 Practical Use of Model Complexity

Section Syllabus

Supervised Learning

Regression

Continuous supervised learning

Sunday, October 30, 2016

Udacity Machine Learning Nanodegree Udacity Connect Intensive Review PROS CONS

This blog post is a work in progress
POSTIVE
Industry ready. Pandas, Numpy, Python are industry standards. The course gets your hands dirty right away in industry-standard competitive software packages and libraries.
Online contents are made by folks who actually work professionally in the field, invented things, and are top of their field. Being a good tutor is different from being a good professional. Udacity tends to have professional engineers from top tech firms.
Great motivation, easy to stay on track. Past experience with Nanodegrees is that it was hard to stay on track and easy to get stuck. When a class of people is moving ahead together in person, and the classes are day long, it becomes easier to for me personally to stay on track. There are classmates who move faster as well as slower. It's easy to find help and engage in discussions before falling behind too much.
Amazing in-person instructor. I have Nick Hoh. He is an experienced instructor who has a lot of teaching experience. His material supplement and even exceeds the online videos, making it very helpful study material. During his sessions he also talks about how he would approach a problem and break it down. It's helpful to get a new perspective from the online videos and it's very helpful to be able to chat in person, ask questions and chat on Slack occasionally. The help makes a big difference.
Offline instructor as a point of contact and a great mentor. Having that one point of contact is really reassuring. While plenty of work needs to be done through extra research and online forums like StackOverflow, having that one point of key contact makes all the difference for me.

NEGATIVE
Course work seems patched together from existing Udacity courses. The content is not always cohesive. Some contents are out of date or inaccurate. Students may be stuck without additional help. For example, the Boston Housing project has a data attribute called PTRATIO. One section calls it ratio of students to teachers, another calls it pupil-student ratio. Pupil is a British word for young students in secondary schools. The actual name is pupil-to-teacher ratio. One section wants us to import a python library from a newer release using a new API call, but the Python installed on Udacity server is of an older release. Beginners will be stuck here forever trying to understand the bug is from the configuration not their code. The courses are constantly improving but the quality of the content needs to be better: more cohesive, consistent and accurate.
I find myself Googling a lot for external materials to study. A lot like way above 50%. While it is a common "industry practice" to google and learn additional information, above 50% also means that the course is not doing its job.
Lots of implicit prerequisites. While not mandatory, the course actually implicitly requires previous course work in Statistics, Probability, Linear Algebra, data analysis and Python coding. Basic statistics will be used a lot. Linear Algebra is a big part. Python coding is a must. Experience in data analysis and pivot tables. I found doing a massive and comprehensive review beforehand was very helpful. Most of my classmates are engineers. I come from an Economics background from Stanford, which thank goodness, forced me to take linear algebra

Udacity Machine Learning Nanodegree Udacity Connect Intensive Cheatsheet Key Concepts

Prerequisites

Basic Statistics
Python coding
Intro to Linear Algebra
Basic Statistics

Feature versus Target

Features are data attributes, variables, we use for training in order to predict the target result.

Training versus testing

Split data into training and testing subsets and shuffle
Train test split

Data Split

Further reading http://www.win-vector.com/blog/2015/01/random-testtrain-split-is-not-always-enough/

Performance Metric

Coefficient of Determination R2

http://stattrek.com/statistics/dictionary.aspx?definition=coefficient_of_determination

Decision Tree and maximum depth

Not covered in details online
We had a separate lecture in-person on decision tree
Entropy - coin example

Coin is random, 50% head 50% tail, can't predict it Entropy = 1 bit
Coin with double heads, 100% head, can predict always get head, no Entropy. Entropy = 0

Learning Curve
Model Selection

occam's razor the simpler model is preferred [wikipedia source https://en.wikipedia.org/wiki/Occam%27s_razor]

Udacity Machine Learning Nanodegree Cheatsheet Useful Functions and Libraries

# libraries
import numpy as np
import pandas as pd
# data processing
from sklearn.cross_validation import ShuffleSplit
from sklearn.cross_validation import train_test_split

# scoring
from sklearn.metrics import r2_score

# visualizations code visuals.py
import visuals as vs

# visual display for Jupyter notebooks
%matplotlib inline

# Load dataset
data = pd.read_csv('xyz.csv')
target = data['col_xyz']
features = data.drop('col_xyz', axis = 1)

#data processing
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=1)

# Success
print "XYZ dataset has {} data points with {} variables each.".format(*data.shape)

# Exploring dataset
my_dataframe.head()
my_dataframe.head(5)
my_dataframe.describe()

# learning curve from sklearn.model_selection import learning_curve

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import recall_score as recall
from sklearn.metrics import precision_score as precision

model selection

from sklearn.grid_search import GridSearchCV #legacy
from sklearn.model_selection import GridSearchCV #new release

Useful sklearn modules

from sklearn.feature_extraction.text import TfidfTransformer

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.cross_validation import train_test_split

from sklearn.metrics import confusion_matrix

from sklearn import metrics

from sklearn.metrics import roc_curve, auc

from nltk.stem.porter import PorterStemmer

Wednesday, October 19, 2016

Udacity Machine Learning Nanodegree Udacity Connect in-Person Experience

First day (last Saturday), I started my Udacity Connect experience. For a discounted price, the Udacity Machine Learning Nanodegree and a plethora of in-person help, office hours, and study sessions become available at students' disposal. The classroom is a rather simple and gloomy windowless room at Cogswell College (a digital arts, gaming school) in San Jose. The staff is very friendly and our session lead was extremely friendly and energetic. My classmates are on the older side. Many worked for giants of the previous dot-com boom. My first observation is that the course material is new but still "patched" together - they re-used many Udacity courses, video clips and exercises. Unfortunately, recycle and reuse does not always lend the material more clarity. Officially the prerequisites are brief. Like all Nanodegrees, this one just requires some prior knowledge of math, data and coding. The reality is I find myself scrambling to review Linear Algebra, Statistics, Probabilities, Python, data analysis, data science, basic courses on machine learning, AI. The course does not have all the materials one need to succeed even if one does not mind burning time on the course (this situation has changed significantly at the time of this update July 2017. The early lessons are now great and tailored to students). You will find yourself jumping from branches to branches and googling a lot. While this is a "learning experience" per Udacity, it can be overwhelming for beginners. Just watch out. The experience does mimic real world on the job experience. It was challenging and rewarding. It is still a recommended. See our blog for more Udacity Connect reviews.

Silicon Vanity | Tech lifestyle in Silicon Valley

Ad

Monday, October 31, 2016

Udacity Machine Learning Nanodegree Udacity Connect Intensive Syllabus

Sunday, October 30, 2016

Udacity Machine Learning Nanodegree Udacity Connect Intensive Review PROS CONS

Udacity Machine Learning Nanodegree Udacity Connect Intensive Cheatsheet Key Concepts

Udacity Machine Learning Nanodegree Cheatsheet Useful Functions and Libraries

Useful sklearn modules

Wednesday, October 19, 2016

Udacity Machine Learning Nanodegree Udacity Connect in-Person Experience

React UI, UI UX, Reactstrap React Bootstrap

Ad

Monday, October 31, 2016

Sunday, October 30, 2016

Subscribe to our mailing list

Useful sklearn modules

Subscribe to our mailing list

Wednesday, October 19, 2016

Subscribe to our mailing list