Ad

Saturday, March 25, 2017

Difference between Batch Gradient Descent and Stochastic Gradient Descent - Udacity Machine Learning Nanodegree Coursera

Recommend this great 13 minutes crystal clear video by Andrew Ng on Coursera explaining the differences between batch gradient descent aka gradient descent aka normal gradient descent versus Stochastic Gradient Descent. https://www.coursera.org/learn/machine-learning/lecture/DoRHJ/stochastic-gradient-descent It's clear simple and easy to understand without prerequisite. Andrew Ng shows you how the formula differs, how the step by step train strategy differs and a visualization of the trajectory to find global minimum (the center of all the ellipse in his graph).


  • Summary
    • Gradient Descent may have issues when the scale of the data is large
      • If the number of training samples is large
      • Gradient Descent algorithm requires summing over all of m
        • e.g. US population census population of 300MM
    • Stochastic Gradient Descent is a modification of gradient descent
      • In other words, the cost functions are different
    • Stochastic every iteration is faster
    • Steps: randomly shuffle dataset, optimize one training data at a time, improve parameters early one at a time, instead of looking at the examples together as a batch
    • Weakness: generally moves towards global minimum, but doesn't always go there, can reach the general vicinity of the global minimum. Does not converge as nicely as gradient descent. 
    • In reality, practical data science, once it gets close to the global minimum its parameters are good enough. In real life, it works out.

K Means Clustering Unsupervised Learning - Udacity Machine Learning Nanodegree Flash Card


  • Draw a line connecting two centroids and use the half way line as a division line for two hyperplanes (if two clusters). Results vary greatly.

  • Initial positions of centroid can strongly influence result. Different initial positions give completely different results.

  • Analogy "Rubber Band"

  • Center of the cluster is called a centroid

  • Number of centroids at initiation can heavily influence the result. 

  • Great for ... PROS:

  • Bad for ... CONS ... limitations:
    • Hill climbing algorithm.
    • Result depends on initiation
    • If initiation is close to local optima, may be sticky. Never move away. Ignore global optima. Bad initial centroids exist
    • If there are more potential clusters, there are more local optima. Run iterate the algorithm many times to avoid being stuck. 

Thursday, March 23, 2017

Follow my new website - Zero Budget Growth Hacking for Small Businesses

Dear entrepreneurs, small business owners and startup techies, how do you go from zero to one with no marketing budget? I will show you how in my new blog. Here's my background highlighted in the first post http://www.matterr.co/2017/03/about-me.html

What makes me a special growth hacker? I don't just advertise, I code, hacked and actually took multiple stores, youtube channel, and contents from zero to one.

My Biography
TL;DR Dilys is a social media growth hacker. Dilys' background is the intercept of business, technology, and startup. She has experience working with giant corporations and top YCombinator startups. She contributed to USATODAY, Fast Company, VentureBeat, Crunchies by TechCrunch and was invited to Google social media studies, tech conferences. She ran campaigns to kickstart e-Commerce stores: Chinese Alibaba Taobao 0 to Level 6, eBay 0 to PowerSeller, Shopify 0 to Shopify & Uber partner. She recently took an experimental Youtube partner channel from 0 to 400,000 minutes watched, 0 to 300,000 views, 0 to 900 subscribers in just one month (February 2017 the shortest month too!).

Can't wait to share all my unique experiences as a seller, growth hacker, startup growth person with you. FREE. Just content and some Google ads. That's it. No subscription needed. Follow my blog now.

Wednesday, March 22, 2017

Udacity Digital Marketing Nanodegree Reviews (updating in progress)

This review is updated continuously throughout the program. Yay I just joined the Udacity Nanodegree for Digital Marketing! I am such an Udacity and learning junkie LOL. What grabbed my attention was the line-up of partners, the real world projects and also Avinash Kaushik's presence. I wonder what's the oracle of Google Analytics doing promoting this course.


  • First impression, clean beautiful videos, unlike some of the programming Georgia Tech videos Udacity has
  • The partners really do show up early in the syllabus and seems like they will participate
  • Though jobs are not guaranteed, there are mentions of hiring partners
  • Classmates are young and energetic marketing veterans. Already very active on slack
  • Meet the students use hashtag #ImInDMND on instagram
  • Realworld like non-trivial business cases and owner / user statements
  • What are the projects like? Udacity allows you to use Udacity as a real-world marketing project.
  • Amazing speakers, famous authors and speakers including author of crossing the chasm, avinash kaushik Google Analytics evangelist
  • Mentorship - mentorship is available. My mentor has been unresponsive and unhelpful so far. I do not recommend.
  • Mini interviews with industry giants
  • The Facebook Ad project is extremely useful. The real world project experience can have tangible results. It is resume worthy. I got a sizable view and conversion of which I am comfortable to talk about in future interviews.

Tuesday, March 21, 2017

Udacity Machine Learning Nanodegree - Projects Step by Step Walkthrough High Level Cheat Sheet

High level steps to solve Udacity Machine Learning Nanodegree projects:

  • Import dependencies: numpy, pandas, sklearn, matplotlib
  • Data cleaning:
    • Replace all data with numeric value such as binaries 0 and 1 or scale down to between -1 to 1, or 0 to 1 (normalization). 
    • Replace yes/no binary answers with 1,0
    • Replace categorical data A, B, C with dummy columns |A|B|C| use 1 if true, 0 if false
  • Split data into features and target aka label
  • Perform initial exploration, turns data CSV into Pandas.DataFrame
    • Computer summary stats: mean, counts etc.
  • from sklearn import model
  • clf = sklearnmodel.model() #specify the classifier
  • clf.fit( ... ) #fit the model wither parameters
  • clf.predict() #make predictions
  • Metrics:
    • R^2 R squared - great for linear regression 0 to 1, 1 being the best
  • Errors:
  • This list is under construction

Sklearn machine learning model cheat sheet
What are the best algorithms to use for each machine learning problem?
Classification versus regression
Supervised versus unsupervised

Saturday, March 18, 2017

Commonly seen python error messages - Learn to code Python for Beginners


  • Python KeyError if dict[key]: cannot do this have to change to if key in dict: 

Pandas Sample Code - Udacity Machine Learning


  • .groupby()
  • .count()
  • pandas.DataFrame.count
  • .sum()
  • df[df["class"]==1].count()["value"]
  • countOfColumn = myDataFrame[conditionColumn["myCondition"]=="myCondValue"].count()["conditionColumn"] get row count by column condition and value
  • pandas.Series.map
  • pandas.DataFrame.count
  • df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
  • pandas.DataFrame.sum
  • df.groupby('a').count()
  • df.first()

Tuesday, March 14, 2017

Startup small business tax part 4 - miscellaneous calendar dates

Startup or Small Business Tax Deadliens

  • March 15th +/- 5 days tax due for partnership LLCs
  • April 18th deadlines for corporations
    • Annual Delaware Franchise Tax (if startup is incorporated in Delaware)
    • Annual California Franchise Tax (if startup is incorporated in Delaware and doing business as a foreign entity in California)
    • Statement of Information - California  (if startup is incorporated in Delaware and doing business as a foreign entity in California)


Personal Tax Deadlines

  • Jan 31st +/- 5 days  W2 and 1099
    • 1099 DIV 1099 INT : Stock Dividend, Bank Interest. Examples include Scottrade, eTrade, Vanguard
  • April 18th deadlines for personal tax

Disclaimer: no post on this blog should be considered legal nor professional advice. Only CPA, professionals, certified financial advisors can provide legal or professional advice. All information for my personal use, and for entertainment purpose only. 

Sunday, March 12, 2017

Udacity Machine Learning Nanodegree Bayes Rule Bayesian Analysis Walkthrough

quiz
<xi, di>
di = f(xi) + err
x, d, h(x) = x mod 9, h(x) = x/3, h(x) = 2,
1, 1, 1%9 = 1, 1/3, 2,
3, 0, 3%9 = 3,  1, 2,
6, 5, 6%9 = 6, 2, 2,
10, 2, 10%9= 1, 10/3, 2,
11, 1, 11%9= 2, 11/3, 2,
13, 4, 13%9 = 4, 13/3, 2,

sum of squared errors for each (excel calc)


h(x) = x mod 9
sum of squared errors = 12


h(x) = x/3
sum of squared errors = 19.44

h(x) = 2
sum of squared errors = 19

Use the smallest
Or better way: write a python script

Saturday, March 11, 2017

R Squared Coefficient of Determination - Machine Learning Concept

*coefficient of determination*](http://stattrek.com/statistics/dictionary.aspx?definition=coefficient_of_determination)

R^2
R<sup>2</sup>

coefficient of determination
useful statistics for regression analysis
measures how good the model makes prediction.


R^2 range {0, 1}
can be negative, arbitrarily worse
percentage of square correlection between predicted and actual values of target variable

indicates what percentage of the target variable, using this model, can be explained by the **features**.


r2_score from sklearn.metrics

Wednesday, March 8, 2017

Pandas Numpy Data Analysis Tool Kit - Udacity Machine Learning Nanodegree 01

Numpy perfect for statistical analysis, matrix manipulation. Learn to Code Notes.
Numpy Documentation
https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

Code pattern 01 numpy use array().T to get matrix transpose
Example:
X = [1,2,3]
XT = array(X).T

numpy.dot(series1, series2)

Pandas Numpy Data Analysis Tool Kit - Udacity Machine Learning Nanodegree 00

SERIES & DATAFRAME

Basic units data structures of Pandas, data analysis using Python

Allows users to store a large amount of information and perform data analysis

Dataframe documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/dsintro.html#dataframe

A dictionary
  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • Series
  • Another DataFrame


Sample Code: 

d = {'key_name':Series([1,2,3], index=['a','b','c'])}

Analogy : Excel Spreadsheet
Will also return number of rows and columns

Pandas.Series()
Pandas.Series([],index=[])


----

More sample code:
   my_data = pd.DataFrame(data)
    print my_data.dtypes
    print ""
    print my_data.describe()
    print ""
    print my_data.head()
    print ""

    print my_data.tail()


# Retrieve columns
df[['col_name','col2_name']]
# Retrieve rows
df.loc['a']


df[df['col_name'] >= 30]

get row column counts of Pandas Dataframe
.shape
len(DataFrame.index)
.count() count each column of the entire table

Wednesday, March 1, 2017

Startup Tax How to get Turbotax Discount?

7 easy steps to get Turbotax discount. A good reason to buy TurboTax? The IRS assume people make 20% more mistakes when preparing their own tax. Using TurboTax can potentially reduce auditing risk (note, just lifehack tips, not professional advice, please consult your tax and legal professionals).

01 Google "turbotax discount" literally

Did you know that you can find discounts by literally googling for it? If you don't ask for it, it won't be given. It can take you to a landing page, or a bulk discount site that gives customers more favorable deals. Get $20 dollars off.

02 Use American Express Discount

Did you know that American Express Offer has TurboTax discount? But only for personal filing though.  Save 5% to 10%. 

03 Use Partnership Discounts - Fidelity TurboTax Discount

Some companies offer joint discount! Fidelity offers TurboTax discount for its customers. Save $20.

04 Use a Membership Business Toolbox Discount - FounderCard

Memberships like FounderCard is geared towards startup founders and users. It gives discounts to all kinds of products and services including TurboTax and Moo Business Cards. Save 10% off.

05 Buy TurboTax on Amazon

It's painfully obvious Amazon offers the steepest discount. TurboTax Business 2016 perfect for c corp startups incorporated in Delaware and doing business in California is $50 dollars off! Insane. State filings require additional though. Don't buy the Delaware one. You can't file via TurboTax anyway.

Use my Amazon referral for TurboTax for the steepest discount. http://amzn.to/2mNA7TR








06 Buy TurboTax Disc - Hard Copy

Buying online? You are in a hurry. No discount. Buy a disk? You are probably a real budgeting, accounting person who is price sensitive. Buying a TurboTax disc instead of a digital copy sometimes can save you money. Just keep in mind, you may have to pay more for special and specific filings.

07 Online Merchant Account Discount for TurboTax - eBay

eCommerce platforms like eBay and Etsy have special discount codes for online shops and merchants. Use your subscriber discount for TurboTax and QuickBooks.

Rules of Sudoku for Algorithm Exercises

Need to code a Sudoku solver? Here are three rules of Sudoku: A 9x9 grids, Each row ... Each column ... Each of the 9 3x3 grids (examp...