Ad

Wednesday, April 12, 2017

K mean clustering sklearn best practice - Udacity Machine Learning Nanodegree Unsupervised Learning

There are three key k means clustering parameters in sklearn that you will need to pay attention to:

  • Number of centroids, aka center of clusters, initialized
  • Max number of iterations, used to optimize the algorithm. Best practice recommended by Udacity is 300
  • Number of different iterations, with initialization of centroids

Saturday, March 25, 2017

Difference between Batch Gradient Descent and Stochastic Gradient Descent - Udacity Machine Learning Nanodegree Coursera

Recommend this great 13 minutes crystal clear video by Andrew Ng on Coursera explaining the differences between batch gradient descent aka gradient descent aka normal gradient descent versus Stochastic Gradient Descent. https://www.coursera.org/learn/machine-learning/lecture/DoRHJ/stochastic-gradient-descent It's clear simple and easy to understand without prerequisite. Andrew Ng shows you how the formula differs, how the step by step train strategy differs and a visualization of the trajectory to find global minimum (the center of all the ellipse in his graph).


  • Summary
    • Gradient Descent may have issues when the scale of the data is large
      • If the number of training samples is large
      • Gradient Descent algorithm requires summing over all of m
        • e.g. US population census population of 300MM
    • Stochastic Gradient Descent is a modification of gradient descent
      • In other words, the cost functions are different
    • Stochastic every iteration is faster
    • Steps: randomly shuffle dataset, optimize one training data at a time, improve parameters early one at a time, instead of looking at the examples together as a batch
    • Weakness: generally moves towards global minimum, but doesn't always go there, can reach the general vicinity of the global minimum. Does not converge as nicely as gradient descent. 
    • In reality, practical data science, once it gets close to the global minimum its parameters are good enough. In real life, it works out.

K Means Clustering Unsupervised Learning - Udacity Machine Learning Nanodegree Flash Card


  • Draw a line connecting two centroids and use the half way line as a division line for two hyperplanes (if two clusters). Results vary greatly.
  • Initial positions of centroid can strongly influence result. Different initial positions give completely different results.
  • Analogy "Rubber Band"
  • Center of the cluster is called a centroid
  • Number of centroids at initiation can heavily influence the result. 
  • Great for ... PROS:
  • Bad for ... CONS ... limitations:
    • Hill climbing algorithm.
    • Result depends on initiation
    • If initiation is close to local optima, may be sticky. Never move away. Ignore global optima. Bad initial centroids exist
    • If there are more potential clusters, there are more local optima. Run iterate the algorithm many times to avoid being stuck. 

Thursday, March 23, 2017

Follow my new website - Zero Budget Growth Hacking for Small Businesses

Dear entrepreneurs, small business owners and startup techies, how do you go from zero to one with no marketing budget? I will show you how in my new blog. Here's my background highlighted in the first post http://www.matterr.co/2017/03/about-me.html

What makes me a special growth hacker? I don't just advertise, I code, hacked and actually took multiple stores, youtube channel, and contents from zero to one.

My Biography
TL;DR Dilys is a social media growth hacker. Dilys' background is the intercept of business, technology, and startup. She has experience working with giant corporations and top YCombinator startups. She contributed to USATODAY, Fast Company, VentureBeat, Crunchies by TechCrunch and was invited to Google social media studies, tech conferences. She ran campaigns to kickstart e-Commerce stores: Chinese Alibaba Taobao 0 to Level 6, eBay 0 to PowerSeller, Shopify 0 to Shopify & Uber partner. She recently took an experimental Youtube partner channel from 0 to 400,000 minutes watched, 0 to 300,000 views, 0 to 900 subscribers in just one month (February 2017 the shortest month too!).

Can't wait to share all my unique experiences as a seller, growth hacker, startup growth person with you. FREE. Just content and some Google ads. That's it. No subscription needed. Follow my blog now.

Wednesday, March 22, 2017

Udacity Digital Marketing Nanodegree Reviews (updating in progress)

This review is updated continuously throughout the program. Yay I just joined the Udacity Nanodegree for Digital Marketing! I am such an Udacity and learning junkie LOL. What grabbed my attention was the line-up of partners, the real world projects and also Avinash Kaushik's presence. I wonder what's the oracle of Google Analytics doing promoting this course.


  • First impression, clean beautiful videos, unlike some of the programming Georgia Tech videos Udacity has
  • The partners really do show up early in the syllabus and seems like they will participate
  • Though jobs are not guaranteed, there are mentions of hiring partners
  • Classmates are young and energetic marketing veterans. Already very active on slack
  • Meet the students use hashtag #ImInDMND on instagram
  • Realworld like non-trivial business cases and owner / user statements
  • What are the projects like? Udacity allows you to use Udacity as a real-world marketing project.
  • Amazing speakers, famous authors and speakers including author of crossing the chasm, avinash kaushik Google Analytics evangelist
  • Mentorship - mentorship is available. My mentor has been unresponsive and unhelpful so far. I do not recommend.
  • Mini interviews with industry giants

Tuesday, March 21, 2017

Udacity Machine Learning Nanodegree - Projects Step by Step Walkthrough High Level Cheat Sheet

High level steps to solve Udacity Machine Learning Nanodegree projects:

  • Import dependencies: numpy, pandas, sklearn, matplotlib
  • Data cleaning:
    • Replace all data with numeric value such as binaries 0 and 1 or scale down to between -1 to 1, or 0 to 1 (normalization). 
    • Replace yes/no binary answers with 1,0
    • Replace categorical data A, B, C with dummy columns |A|B|C| use 1 if true, 0 if false
  • Split data into features and target aka label
  • Perform initial exploration, turns data CSV into Pandas.DataFrame
    • Computer summary stats: mean, counts etc.
  • from sklearn import model
  • clf = sklearnmodel.model() #specify the classifier
  • clf.fit( ... ) #fit the model wither parameters
  • clf.predict() #make predictions
  • Metrics:
    • R^2 R squared - great for linear regression 0 to 1, 1 being the best
  • Errors:
  • This list is under construction

Sklearn machine learning model cheat sheet
What are the best algorithms to use for each machine learning problem?
Classification versus regression
Supervised versus unsupervised

Saturday, March 18, 2017

Commonly seen python error messages - Learn to code Python for Beginners


  • Python KeyError if dict[key]: cannot do this have to change to if key in dict: 

Pandas Sample Code - Udacity Machine Learning


  • .groupby()
  • .count()
  • pandas.DataFrame.count
  • .sum()
  • df[df["class"]==1].count()["value"]
  • countOfColumn = myDataFrame[conditionColumn["myCondition"]=="myCondValue"].count()["conditionColumn"] get row count by column condition and value
  • pandas.Series.map
  • pandas.DataFrame.count
  • df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
  • pandas.DataFrame.sum
  • df.groupby('a').count()
  • df.first()

Tuesday, March 14, 2017

Startup small business tax part 4 - miscellaneous calendar dates

Startup or Small Business Tax Deadliens

  • March 15th +/- 5 days tax due for partnership LLCs
  • April 18th deadlines for corporations
    • Annual Delaware Franchise Tax (if startup is incorporated in Delaware)
    • Annual California Franchise Tax (if startup is incorporated in Delaware and doing business as a foreign entity in California)
    • Statement of Information - California  (if startup is incorporated in Delaware and doing business as a foreign entity in California)


Personal Tax Deadlines

  • Jan 31st +/- 5 days  W2 and 1099
  • April 18th deadlines for personal tax

Disclaimer: no post on this blog should be considered legal nor professional advice. Only CPA, professionals, certified financial advisors can provide legal or professional advice. All information for my personal use, and for entertainment purpose only. 

Sunday, March 12, 2017

Udacity Machine Learning Nanodegree Bayes Rule Bayesian Analysis Walkthrough

quiz
<xi, di>
di = f(xi) + err
x, d, h(x) = x mod 9, h(x) = x/3, h(x) = 2,
1, 1, 1%9 = 1, 1/3, 2,
3, 0, 3%9 = 3,  1, 2,
6, 5, 6%9 = 6, 2, 2,
10, 2, 10%9= 1, 10/3, 2,
11, 1, 11%9= 2, 11/3, 2,
13, 4, 13%9 = 4, 13/3, 2,

sum of squared errors for each (excel calc)


h(x) = x mod 9
sum of squared errors = 12


h(x) = x/3
sum of squared errors = 19.44

h(x) = 2
sum of squared errors = 19

Use the smallest
Or better way: write a python script

Saturday, March 11, 2017

R Squared Coefficient of Determination - Machine Learning Concept

*coefficient of determination*](http://stattrek.com/statistics/dictionary.aspx?definition=coefficient_of_determination)

R^2
R<sup>2</sup>

coefficient of determination
useful statistics for regression analysis
measures how good the model makes prediction.


R^2 range {0, 1}
can be negative, arbitrarily worse
percentage of square correlection between predicted and actual values of target variable

indicates what percentage of the target variable, using this model, can be explained by the **features**.


r2_score from sklearn.metrics

Wednesday, March 8, 2017

Pandas Numpy Data Analysis Tool Kit - Udacity Machine Learning Nanodegree 01

Numpy perfect for statistical analysis, matrix manipulation. Learn to Code Notes.
Numpy Documentation
https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

Code pattern 01 numpy use array().T to get matrix transpose
Example:
X = [1,2,3]
XT = array(X).T

numpy.dot(series1, series2)

Pandas Numpy Data Analysis Tool Kit - Udacity Machine Learning Nanodegree 00

SERIES & DATAFRAME

Basic units data structures of Pandas, data analysis using Python

Allows users to store a large amount of information and perform data analysis

Dataframe documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/dsintro.html#dataframe

A dictionary
  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • Series
  • Another DataFrame


Sample Code: 

d = {'key_name':Series([1,2,3], index=['a','b','c'])}

Analogy : Excel Spreadsheet
Will also return number of rows and columns

Pandas.Series()
Pandas.Series([],index=[])


----

More sample code:
   my_data = pd.DataFrame(data)
    print my_data.dtypes
    print ""
    print my_data.describe()
    print ""
    print my_data.head()
    print ""

    print my_data.tail()


# Retrieve columns
df[['col_name','col2_name']]
# Retrieve rows
df.loc['a']


df[df['col_name'] >= 30]

get row column counts of Pandas Dataframe
.shape
len(DataFrame.index)
.count() count each column of the entire table

Wednesday, March 1, 2017

Startup Tax How to get Turbotax Discount?

7 easy steps to get Turbotax discount. A good reason to buy TurboTax? The IRS assume people make 20% more mistakes when preparing their own tax. Using TurboTax can potentially reduce auditing risk (note, just lifehack tips, not professional advice, please consult your tax and legal professionals).

01 Google "turbotax discount" literally

Did you know that you can find discounts by literally googling for it? If you don't ask for it, it won't be given. It can take you to a landing page, or a bulk discount site that gives customers more favorable deals. Get $20 dollars off.

02 Use American Express Discount

Did you know that American Express Offer has TurboTax discount? But only for personal filing though.  Save 5% to 10%. 

03 Use Partnership Discounts - Fidelity TurboTax Discount

Some companies offer joint discount! Fidelity offers TurboTax discount for its customers. Save $20.

04 Use a Membership Business Toolbox Discount - FounderCard

Memberships like FounderCard is geared towards startup founders and users. It gives discounts to all kinds of products and services including TurboTax and Moo Business Cards. Save 10% off.

05 Buy TurboTax on Amazon

It's painfully obvious Amazon offers the steepest discount. TurboTax Business 2016 perfect for c corp startups incorporated in Delaware and doing business in California is $50 dollars off! Insane. State filings require additional though. Don't buy the Delaware one. You can't file via TurboTax anyway.

Use my Amazon referral for TurboTax for the steepest discount. http://amzn.to/2mNA7TR








06 Buy TurboTax Disc - Hard Copy

Buying online? You are in a hurry. No discount. Buy a disk? You are probably a real budgeting, accounting person who is price sensitive. Buying a TurboTax disc instead of a digital copy sometimes can save you money. Just keep in mind, you may have to pay more for special and specific filings.

07 Online Merchant Account Discount for TurboTax - eBay

eCommerce platforms like eBay and Etsy have special discount codes for online shops and merchants. Use your subscriber discount for TurboTax and QuickBooks.

Saturday, February 11, 2017

Udacity Machine Learning Udacity Connect Lesson 01 Syllabus

In-person Udacity Connect Machine Learning Nanodegree syllabus
  • Practice running python from within a Jupyter Notebook (FKA IPython Notebook).
  • Become familiar with importing useful modules and packages, e.g. pandas, numpy, matplotlib.pyplot.
  • Learn about the pandas data structures, including the Series and DataFrame objects.
  • Create a DataFrame object from data in a comma-separated variable (csv) file using pandas.read_csv
  • Index and select data from Series and DataFrame objects using loc and iloc
  • Compute descriptive statistics on a Series or DataFrame, including the mean, the median, and the min & max
  • Explore a public data set found on Kaggle
  • Conduct some exploratory data analysis, and visualize trends in data using matplotlib

Pre Lesson Activities
  • Student Handbook
  • Class schedule and holidays
    • It's an aggressive schedule
  • Logistics
  • Github repo
Lesson 1 in-person activity
  • Meet classmates
  • Meet and greet
  • First lunch is provided. No free lunches in future sessions


Friday, February 10, 2017

Seven Extraordinary Startup Founder Stories in Tweetable Sizes

Seven Power Hustle Stories about Early Startups

Andrew Chen wrote in his recent essay the key to the future of growth is to execute (growth tactics) thoughtfully and iteratively. Ingenious moves of growth hacking could be luck, a sparkle, a clever thought or simply persistence. You can call it hustling, bootstrapping, hacking. There are some crazy stories. Paul Graham calls it do things that do not scale. Here are 11 founders who did the unthinkable at the dawn of their startups in byte-size stories:
  1. Ben S. #founder of @pinterest used to sneak into Palo Alto Apple Store & change safari homepages to Pinterest until he’s kicked out. #hustle
  2. Adora C. brushed teeth at McDonald’s to save $. Learned how to clean like a maid to found billion dollar HomeJoy on-demand cleaning service.
  3. @stripe #founder brothers would personally implement Stripe on #YC batchmate’s computers to get other developers to try the product
  4. @Codecademy never stopped pivoting at #YC its interactive #javascript tutorial wasn’t live until the night before #startup #demoday
  5. #Reddit #founder once created fake accounts to boost user number in the early days of reddit
  6. Founder of Muse #millennial #career site was flagged as a spammer on Gmail after sending out many cold emails in an effort to up user count
  7. @Airbnb founders turned their loft apartment into 1st Airbnb listing to fund rent & the #startup. Ditched Craigslist for being “impersonal”
Comment and favorite if you want to read more byte-size hustle stories like these. Pss all the stories are perfectly tweetable.
Originally published on Medium

Wednesday, February 8, 2017

Machine Learning K Nearest Neighbors KNN Algorithm


KNNk nearest neigbhors to a given point
kthe number of neighbors
nthe number of data points, data is sorted to speed up the algorithm
distanceEuclidean distance shortest line connecting the pointsOR a custom function that defines the distance
visualizescatter plot, each point is a circle with a radius that include certain number of neighbors
KNNa query intensive algorithm, not learning intensive
performancebig o notation, log(n) binary search, 1 is constant, n is linear
intuitionKNN stores all the data, then performs a binary search on the data when querying. Linear regression only stores the model y = mx+b. Key concepts: LEARN vs QUERY
Running TimeSpace
1 NN 1-nearest neigbhor, 1 dimensional list e.g. [1 2 4 7 8 9]learning1nKNN all data to storage without learning, so running time is 1 which means constant in Big O notation, and storage space is n for the number of data points
querylog(n) binary search to find one point1
K NNlearning1n
querylog(n) + k binary search log(n) to find one point and the k items next to it in a sorted list1
linear regressionlearningn1
query11

Monday, February 6, 2017

11 Things You Can Do at the Crunchies 2017

Hot off the press, use your super powers and unleash your inner extrovert at the Crunchies award ceremony today in San Francisco! adapted from Medium

  1. Meet famous founders, tech investors and journalists such as Mark Zuckerberg, Dave McClure and Ron Conway. Kungfu hustle for your budding tech product in after party open bar
  2. Take fabulous runway pictures at this tech oscars on the green carpet
  3. Get advice and mentorship from fellow founders, YCombinator 500 Startup alumni and tech workers
  4. Find inspiration and actionables for your startup
  5. Trend spotting and policy crunching.
  6. Indulge in the sparkles and glamours of tech giants like Kevin of Instagram, Pinterest, Mark Zuckerberg of Facebook, and famous investors like Ron Conway. It’s hard to resist a selfie even if it is mildly impressing.
  7. Take a selfie. Take lots of selfies.
  8. Chuckle at occasional tech nerd humor squeezed into the ceremony by guest artists, hosts, comedians and even opera singers. Past guest artists include the Daily Show actors.
  9. Drink and hustle. There is a bar with flowing champagnes and cocktails that can give you a boost and bring out your networking inner extrovert. Prep dinero, it ain’t no free ride.
  10. This year’s nominee’s include SpaceX and Pokemon GO! Expect to see Elon Musk and the Niantic studio in the crowd?
  11. Network with reporters from VentureBeat, Mashable, TechCrunch … for your baby startup?

What is the Crunchies award ceremony?

Here’s a whimsical opera intro to Crunchies! Enjoy



In short it is the oscars for the tech community where all the startup and tech giants and inner circles meet. Like HBO’s Silicon Valley show? You will likely really enjoy this award ceremony. But if you are not huslting in tech, this event may be nothing more than a selfie opportunity.

Crunchies’ Past

In the past Crunchies, people arrived at the events in style in their fabulous outfits and with their tech gadgets. Travis of Uber had his pampered lap dog with him. People showed off their Google Glasses and Tesla in the past. This year, expect Snap to make a splash with their unique personality and Snap Spectacle glasses before their big IPO.
Past guest speakers included Mayor Ed Lee and Ron Conway “the Godfather of Silicon Valley”. They provide great insights for the future of the Silicon Valley.

My Crunchies’ Past

Fun fact, I shared the stage with Mark Zuckerberg, Marissa Meyers and Kevin Systrom of Instagram at the 2012 Crunchies award ceremony. I was at 2 YC startups. Dave McClure 500 startup and I follow each other on Twitter :P

Saturday, February 4, 2017

Growth Hacker Tool - Google Mobile Friendliness Online Tester

Building a website? Launching a startup? Make sure Google can find you and wants to find you. One key search metric is mobile responsiveness and friendliness. Use this official Google tool to test your website.
https://search.google.com/search-console/mobile-friendly

Friday, February 3, 2017

Machine Learning 101 Resources, Lessons and Tips

Thinking about getting started with Machine Learning? Silicon Vanity is your go-to resource on the Learn-to-Code movement #learntocode, tutorials and the hottest job trends in the Silicon Valley.  Educational tech is dear to my heart. Here are some resources to get you started.


  • Udacity Machine Learning Nanodegree.
  • Udacity Machine Learning Nanodegree Udacity Connect Intensive. An in-person, intensive bootcamp version fo the full Nanodegree. Github workbooks.
  • Udacity Deep Learning Nanodegree. Created in collaboration with YCombinator startup founder / member Siraj, who has his own Machine Learning Youtube show. 
  • Coursera Machine Learning
  • Stanford Machine Learning course on Youtube
  • Khan Academy Machine Learning
  • Machine Learning competition on Kaggle. Kaggle is a great place fo datasets and competitions. Talking about competitions, Alibaba sponsored a customer flow competition on its Koubei product.
  • Udacity Machine Learning notes, slides, forum, online one-on-one mentoring 
  • Books Machine Learning for Dummies

The above resources are more academic than practical. Udacity has tried to marry practicality, industry requirement with academic coursework. The course is still in its early stage of becoming beginner friendly.

Is there a resource that you would recommend? Please share with me. 

Sunday, January 29, 2017

12 Silicon Valley Tech Startup Job Search Tips


  1. Indicate on Linkedin that you are an available candidate, a service offered by their premium plan
  2. Consider having a portfolio, works and arts to show rather than just a resume
  3. Previous startup experience is a huge plus. Startup founders and teams look for like minded people who can hustle and deal with the startup crunch
  4. Check the job board for alumni offered by your university. 
  5. Google and read information about the company on the internet. Any info on the internet is fair game in the interview process.
  6. Prepare for phone screening interviews with recruiters. Sometimes these calls are scheduled quickly after resume submission when a position needs to be fulfilled.
  7. Be ready for coding and technical interviews over the phone. Silicon Valley is tech savy even recruiters known how to ask a technical question or two.
  8. Brush up on software and web based technical skills. Silicon valley is very software heavy except for Apple and a few other places.
  9. If you are applying for an data analyst job, be prepared to be interviewed like a junior data scientist. Even the business roles can be technical in nature.
  10. Use new job sites such as White Truffle, Hired, Muse ... in addition to traditional sites like Linkedin, job boards etc.
  11. Take an advanced online course to brush up your skills or learn cutting edge technology such as self driving car on Udacity.
  12. Research the technical stack used by the startup

Machine Learning Resources: Stanford Youtube Machine Learning by Andrew Ng

All of Stanford's machine learning course by Andrew Ng (not the coursera version) is posted on Youtube. Here's the course material site: http://cs229.stanford.edu/ and here's the video playlist https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLA89DCFA6ADACE599 This lecture series is a full-version academic course on Machine Learning that has and is Stanford University rigor. Its machine learning slides can be found here http://cs229.stanford.edu/materials.html

Hugo Barra Googler Xiaomi VP Joins Facebook VR - Startup News

Hugo Barra once a senior Googler, Xiaomi global VP will join Facebook VR and leads all Facebook VR efforts.

After spending years at Google and rumored to had been ousted due to personal and leadership differences with the Google founders, Hugo Barra had spent many years building Xiaomi under founder Leijun in China. Xiaomi has since then developed many hardware products, viral phone strategies and expanded to India.

Hugo will lead and shape the vision of Oculus VR at Facebook. He may not be as famous as Steve Jobs, but he essentially is one of the most experienced super tech product manager and leader in the Silicon Valley and the world. Facebook will need his experience scaling giant tech companies as well as dealing with China, a market that Facebook's founder Mark Zuckerberg has been courting.

iOS prototyping design tool: prototyping notepad

Google UX UI prototyping design team and many silicon valley startups use paper prototyping tools such as this iOS screen real estate user flow notepad.

Saturday, January 28, 2017

Machine Learning Stanford on Youtube Lecture 02 Notes



  • Agenda: linear regression, gradient descent and normal equations
  • Machine learning notations and conventions
Note this is not the coursera course. This is the long youtube version of the Stanford.

Seth Godin Marketing Wisdom quote on story telling

Seth Godin marketing class on skillshare bestselling author in business and marketing.  Growth hacking marketing tips and quotes by Seth Godin.
"
Marketing and advertising were the same thing, but going forward that's not what marketing is. 

Marketing is the act of telling a story to people who want to hear it, making that story so vivid so true, that people who hear it tell other people.
"

Reaching Level 30 on Pokemon GO: get lots of items


Items you can receive upon reaching level 30. 30 ultra balls, 20 max potion, 20 max revive, 20 razz berry, 3 incense, 3 lucky egg, 3 egg incubator, 3 lure module

Monetize WordPress How to add adsense to free WordPress site


Premium

WordPress.com Premium
$8.25
How to add google adsense ad to free wordpress site, meaning a website that is not advanced hosted wordpress or custom domain. It used to easy: simply generate an ad unit in adsense, copy and paste the code to WordPress layout sidebar widget, text widget. It no longer works. You will have to subscribe to the $8 dollar premium plan to use WordAd. To monetize your WordPress site, you will now have to pay for a premium plan. Keep in mind you want to reach a small critical mass of audience on your blog before you can monetize your blog and only then does it make monetary sense to monetize your site. 

Tuesday, January 24, 2017

It's real google will pay you to do what you love - monetize your blog and YouTube channel


Monetize your blog and YouTube channel

It's true it's real google will actually pay you to do what you love the most. I have been blogging about learning to code, japan, and Pokémon GO and I just got paid by google - my first ad revenue is real and in the bank. How did I do it?


  • Write about what you love. Your niche may be small but the authenticity of your writing, opinions and details make a huge difference. My blog audience is small, like in the hundreds and thousands but because my posts are relevant users end up enjoying the ads they are served (udacity, amazon web services). Despite not being able to earn much from CPM (based on 1000s impression), my blog's click through rare is high. CPC is a valuable earning 
  • Be detail oriented. I am no internet sensation, I am not viral. How do I deliver value to my readers? By being thorough and detailed in my presentation of observations and the follow up analyses.
  • Be compliant be 100% compliant. When Google Adsense first came out, I was young and had an account that I played with. I tested my site and clicked on the ads and was blocked forever. I probably can contact them to unblock because it has been 10 years but the reality is grim Google is hard to reach and bans are permanent to deter opportunists. They really mean it. If you are not compliant, displayed fraudulent info, violated copy right content on YouTube, click on your own ads Google's algorithm will find you and terminate your entire account and stop monetization on all channels. Google's algorithm has proven to be intelligent (machine learning), complex (deep learning) and capable ( big data). You will be caught. 
  • Follow a schedule. Smartly timed intervals of updates is desirable. No one wants to be overwhelmed by spam or visit your site or channel and find nothing new 3 times in a row
  • Have a brand or a style. All successful viral influencers have a brand or a style that is extremely distinctive. Think Justin Bieber, he has a very specific hair cut and look, and he only sings certain cheesy songs. But that makes him extremely recognizable. Bread girl on Instagram has a repeatable viral machine : putting her face in different types of bread. While the safety and the usefulness of such an act is questionable, its success is very repeatable. It's a machine! We may not have that money making machine but we can think and design our brand message. For me, this blog is about Silicon Valley tech lifestyle so it has gadgets, Silicon Valley jobs, startup tax and logistics. I cannot post food recipes here unless it is about how to make a Star Wars cake. Clam chowder is irrelevant here and will make my readers question my blog. It's 
  • Analyze what works. My Pokémon GO posts on blogger generate thousands of views but no one watch my YouTube Pokémon GO videos. So I have to post more Pokémon on the blog and focus more on learn to code tutorials on my YouTube channel. Find what works, optimize and then repeat success make it better

Friday, January 20, 2017

Machine Learning code pattern 01

Code pattern 01 numpy use array().T to get matrix transpose

Example:

X = [1,2,3]
XT = array(X).T


Machine Learning Concepts 01

 There are three main machine learning styles:


  1.  Supervised learning 
  2.  Unsupervised learning
  3.  Reninforcement learning

Wednesday, January 18, 2017

Udacity Machine learning Nanodegree Syllabus and Summary

udacity nanodegree - becoming a machine learning engineer. This is my personal notes summarizing what I learned from the section, consider it my personal study notes. The part I labeled syllabus is the actual outline of the course (e.g. I try to use section title as the syllabus section title)

Section supervised learning
Sub section artificial neutral networks
Sub section neural networks
     how does a neuron work (illustrated)
     How artificial neural network works (illustrated) perceptron
     A group of inputs, each with a weights, processed by the network and output 1 of and only if a preconfigured threshold is met
     Artificial neural networks can be tuned

  • Think of inputs as signals with different strengths, weights as sensitivity to those strengths. Can be tuned and adjusted to computer variety of tasks. Hence a collection of perceptroncomputing units is powerful
  • The weights sum of all the inputs is called the activation
  • When activation is greater than the threshold theta the perceptron outputs 1

Friday, January 13, 2017

Data Science for Business by Provost Fawcett data science book review


  • Interesting business approach to understand and apply data science
  • Use verbose texts when explaining simple concepts
  • Introduce advanced concepts like Information Gain and Entropy early in the book pg51. Less accessible for beginners. Good for people preparing for interviews and people who already got an introduction to machine learning
  • Writes the full formula out, avoid using sigma notation to make it more accessible for business readers
  • Useful graph of entropy pg 52
  • Helpful graphical illustration pg54-pg55 two trees with different information gain
  • The learning curve is not a slowly ascending one. The topics jump around in this book
  • Variance measures impurity pg56

Thursday, January 12, 2017

Udacity Machine Learning Nanodegree Linear Regression sample code

from sklearn import linear_model

reg = linear_model.LinearRegression()
reg.coef_
reg.intercept_

Codecademy SQL Table Transformation Subqueries Walkthrough


  • 1. Table Transformation
    • SELECT * FROM flights WHERE (SELECT code FROM airports WHERE elevation < 2000); 
    • Nested subqueries 
    • First get all the codes from airports table if its elevation is smaller than 2000
    • Use this as a filter to query all columns of data from the flights table
    • Don't forget the ; semicolon in the end

Friday, January 6, 2017

Udacity Year in Review

Udacity online course highlighted itssuccesses and milestones in 2016. 
  • Udacity currently offers 159 courses
  • Udacity's site-wide busiest time for learning is the Month of October
  • Student watched the Developing Android Apps video by Google the most
  • Read the full article here Udacity 2016 Year in Review

Udacity Machine Learning Engineer Nanodegree - skills you will learn

  • Sklearn python machine learning library
  • Jupyter Notebook
  • Panda

You will not spend too much time on the following, so please review, study, before proceeding to the course:
  • Reviewing linear algebra 
  • Reviewing probability
  • Reviewing statistics

Udacity Intensive Connect Data Analyst Machine Learning Nanodegrees

Udacity is making two nanodegrees available for Udacity Intensive Connect: Data Analyst and Machine Learning Engineers. Its in-person meetup lessons utilizes the online material but features a bootcamp-like part-time learning opportunity with industry / professional classmates. Here are some PROs and CONs of Udacity Intensive Connect

  • PROs
    • Affordable price tag
      • The intensive class unlocks the corresponding online material for months, it is a much better deal than Udacity expensive monthly nanodegree price tag well north of $100
    • Udacity markets its program as : bootcamp level intensity, in-person collaboration, accountability, part-time, no need to leave or quit your current job
    • Fast paced, stringent project timeline
    • Attend classes with diverse and experienced industry professionals
    • In-person lectures that are more targeted and easily adjusted for the needs of the class. High quality, in-person lectures, Q&A opportunities, one-on-one help. 
  • CONs
    • It is still expensive the price tag will be near $1000 or more
    • The physical location may be an hour away from your current location. I had to commute from San Francisco to San Jose. Ultimately it didn't work out. 
    • It still relies heavily on the online videos. If those videos didn't speak to you, the in-person interactions will not be able to shift your learning retention.  
    • Fast paced, stringent project timeline
    • Attend classes with diverse and experienced industry professionals. Not always beginner friendly
    • It is a significant time commitment
    • A lot of studying on the side, additional studying, looking up additional materials is required. The online videos will not provide all the information needed to complete the projects.
    • Significant in-person time commitment. Not attending the sessions will cause you to lose a lot of materials as they are not available online. It will put you behind schedule. I had to miss sessions because of business trips and it was hard to catch up.
Conclusion: for me personally, the Udacity Intensive Connect sessions did not work out. The No.1 reason was that the online materials play a significant role, is less cohesive, is not stand-alone. I cannot rely on the online materials to learn machine learning. But this program did get me start to think about Machine Learning. I realized that I had a strong interest in ML, and a lot of materials are available online for free once you get pass that initial block. Once you know what machine learning is, it is easy to learn it online. A lot of time will be spent playing with datasets hands-on any way. Without those practices, it is not possible to take on a real job as a machine learning engineer.

The holy grail question: can you become a machine learning engineer after completing the nanodegree?
No, your knowledge, experience will not be sufficient. More practice, knowledge acquisition is needed. You will become a much better data analyst, putting you closer to a data scientist role than an analyst role. The Nanodegree will not be sufficient to get you a job at Google. However, if you are already experienced, already in the industry, just need some technical skills to climb over a hill, this course can really help you make the transition internally. 

Udacity Machine Learning Nanodegree Instructors Review


  • Georgia Tech Udacity online master degree instructors
    • This nanodegree utilizes video clips from the Georgia Tech Udacity computer science and engineering classes. The instructors are obviously highly qualified, technical and academic, but give sometimes nerdy and perhaps less engaging and relevant jokes and try to forcefully inject a sense of humor into the learning material. It didn't work out so well. Their explanation is professional and academic but less accessible to beginners. MINUS 
  • sebastian thrun and katie malone
    • Sebastian Thrun was a professor, successful entrepreneur, founder of Udacity, and the lead for many important Google businesses such as the self driving car. He's a really good teacher and gives valuable information on how machine learning is directly used in the industry. He is a god-like teacher in machine learning. PLUS!
    • Katie Malone was a student and a researcher and now a creator of several Udacity Machine Learning courses. She is great at explaining difficult concepts to beginners and advanced learners. She uses real life research examples, data sets from Kaggle, and simplifies the problems into workable problem sets for students. PLUS!
  • In-person lead Udacity Connect Intensive
    • If you join the Udacity Connect Intensive, you may get an in-person lead. He is usually a very qualified tutor and instructor. He/ she may not have the experience that Sebastian has, but is perhaps more practical and accessible for beginners. My session lead was once a Caltech lecturer, so he could go beginner friendly and also expert friendly. 
  • Conclusion: Udacity instructors are industry experts, academics, and highly experienced professionals in machine learning. However, despite each clip is high quality, the Udacity Machine Learning Nanodegree curriculum is patched together not in a cohesive manner. This curriculum will pose significant difficulty for people starting from scratch. Experienced professionals, professionals who had exposures to machine learning will have an easier time. 

Tuesday, January 3, 2017

Python crowned as the language of choice of data scientists

kaggle, a popular dataset data science machine learning competition site revealed in its recent Year In Review 2016 newsletter that Python has surpassed R as the language of choice for data scientists in recent years.this trend has been continuing for a few years. Kaggle's kernel language is now overwhelming Python despite that R was still popular and fresh in 2015. Why do you think that's the case? no revelation on Kaggle yet. Could it be because of increasing popularity of machine learning and specifically deep learning? Python works so well with ML

Monday, January 2, 2017

Codecademy Walkthrough SQL Table Transformation 01


Codecademy Walkthrough SQL Table Transformation 01
using SELECT * FROM tablename LIMIT 10;


How to be viral on Imgur and Reddit?

It is harder to be viral on Reddit and Imgur, one of the most popular image sharing, story telling site on the internet. It is not the best place to sell product, but it surely it is the best place to spread ideas and causes. It is also a great place to stay on top of current news and learn lifehacks. Like all forums, especially producthunt and hackernews, Imgur has its own algorithm to test out if a post should stay on its FrontPage, which essentially features the post and makes it viral. Below is a screenshot that has made it to the FrontPage, it also happens to explain how Imgur evaluates and weighs posts. Be aware, this is likely NOT the actual algorithm, but the actual model will look very similar to this. 


K mean clustering sklearn best practice - Udacity Machine Learning Nanodegree Unsupervised Learning

There are three key k means clustering parameters in sklearn that you will need to pay attention to: Number of centroids, aka center of c...