Wednesday, April 12, 2017

K mean clustering sklearn best practice - Udacity Machine Learning Nanodegree Unsupervised Learning

There are three key k means clustering parameters in sklearn that you will need to pay attention to:

  • Number of centroids, aka center of clusters, initialized
  • Max number of iterations, used to optimize the algorithm. Best practice recommended by Udacity is 300
  • Number of different iterations, with initialization of centroids

Saturday, March 25, 2017

Difference between Batch Gradient Descent and Stochastic Gradient Descent - Udacity Machine Learning Nanodegree Coursera

Recommend this great 13 minutes crystal clear video by Andrew Ng on Coursera explaining the differences between batch gradient descent aka gradient descent aka normal gradient descent versus Stochastic Gradient Descent. It's clear simple and easy to understand without prerequisite. Andrew Ng shows you how the formula differs, how the step by step train strategy differs and a visualization of the trajectory to find global minimum (the center of all the ellipse in his graph).

  • Summary
    • Gradient Descent may have issues when the scale of the data is large
      • If the number of training samples is large
      • Gradient Descent algorithm requires summing over all of m
        • e.g. US population census population of 300MM
    • Stochastic Gradient Descent is a modification of gradient descent
      • In other words, the cost functions are different
    • Stochastic every iteration is faster
    • Steps: randomly shuffle dataset, optimize one training data at a time, improve parameters early one at a time, instead of looking at the examples together as a batch
    • Weakness: generally moves towards global minimum, but doesn't always go there, can reach the general vicinity of the global minimum. Does not converge as nicely as gradient descent. 
    • In reality, practical data science, once it gets close to the global minimum its parameters are good enough. In real life, it works out.

K Means Clustering Unsupervised Learning - Udacity Machine Learning Nanodegree Flash Card

  • Draw a line connecting two centroids and use the half way line as a division line for two hyperplanes (if two clusters). Results vary greatly.
  • Initial positions of centroid can strongly influence result. Different initial positions give completely different results.
  • Analogy "Rubber Band"
  • Center of the cluster is called a centroid
  • Number of centroids at initiation can heavily influence the result. 
  • Great for ... PROS:
  • Bad for ... CONS ... limitations:
    • Hill climbing algorithm.
    • Result depends on initiation
    • If initiation is close to local optima, may be sticky. Never move away. Ignore global optima. Bad initial centroids exist
    • If there are more potential clusters, there are more local optima. Run iterate the algorithm many times to avoid being stuck. 

Thursday, March 23, 2017

Follow my new website - Zero Budget Growth Hacking for Small Businesses

Dear entrepreneurs, small business owners and startup techies, how do you go from zero to one with no marketing budget? I will show you how in my new blog. Here's my background highlighted in the first post

What makes me a special growth hacker? I don't just advertise, I code, hacked and actually took multiple stores, youtube channel, and contents from zero to one.

My Biography
TL;DR Dilys is a social media growth hacker. Dilys' background is the intercept of business, technology, and startup. She has experience working with giant corporations and top YCombinator startups. She contributed to USATODAY, Fast Company, VentureBeat, Crunchies by TechCrunch and was invited to Google social media studies, tech conferences. She ran campaigns to kickstart e-Commerce stores: Chinese Alibaba Taobao 0 to Level 6, eBay 0 to PowerSeller, Shopify 0 to Shopify & Uber partner. She recently took an experimental Youtube partner channel from 0 to 400,000 minutes watched, 0 to 300,000 views, 0 to 900 subscribers in just one month (February 2017 the shortest month too!).

Can't wait to share all my unique experiences as a seller, growth hacker, startup growth person with you. FREE. Just content and some Google ads. That's it. No subscription needed. Follow my blog now.

Wednesday, March 22, 2017

Udacity Digital Marketing Nanodegree Reviews (updating in progress)

This review is updated continuously throughout the program. Yay I just joined the Udacity Nanodegree for Digital Marketing! I am such an Udacity and learning junkie LOL. What grabbed my attention was the line-up of partners, the real world projects and also Avinash Kaushik's presence. I wonder what's the oracle of Google Analytics doing promoting this course.

  • First impression, clean beautiful videos, unlike some of the programming Georgia Tech videos Udacity has
  • The partners really do show up early in the syllabus and seems like they will participate
  • Though jobs are not guaranteed, there are mentions of hiring partners
  • Classmates are young and energetic marketing veterans. Already very active on slack
  • Meet the students use hashtag #ImInDMND on instagram
  • Realworld like non-trivial business cases and owner / user statements
  • What are the projects like? Udacity allows you to use Udacity as a real-world marketing project.
  • Amazing speakers, famous authors and speakers including author of crossing the chasm, avinash kaushik Google Analytics evangelist
  • Mentorship - mentorship is available. My mentor has been unresponsive and unhelpful so far. I do not recommend.
  • Mini interviews with industry giants

K mean clustering sklearn best practice - Udacity Machine Learning Nanodegree Unsupervised Learning

There are three key k means clustering parameters in sklearn that you will need to pay attention to: Number of centroids, aka center of c...