Friday, June 12, 2020

Regularization in Machine Learning, Deep Learning

Regularization can prevent overfitting and potentially make algorithm converge faster and more performant. Useful in deep learning tasks, in neural networks. Regularization acts on the loss function (cost function) by adding an extra penalty term. The penalty term depending on the method of regularization, penalizing the weight parameters so it is a function of w

Two common regularization methods:
  • Lasso 
    • Uses L1-norm
  • Ridge
    • Uses L2-norm
A trick to remember the norm is that letter L comes before letter R, so Lasso is L1 norm and Ridge is L2 norm. 

One is more likely to result in sparse solutions turning one or more coefficients zero. Which one do you think it is? 

Quiz: which formula is Lasso? Which one is ridge?

  • Regularization penalizes overly complex models
  • Large weights usually make penalty term higher, so smaller effective weights are preferred
    • Larger weights cost more
  • Regularization = regular_loss_function + extra_penalty_term(lambda, weights)
    • The extra penalty term also depends on the weights parameter and the lambda rate parameter

Thursday, June 11, 2020

WeChat Basics

  • WeChat requires approval
  • Important to decouple app with data API so we can change more after
  • Where to host WeChat app? Best hosting best API?
  • 微信开发者工具 WeChat Developer Tools
    • 三个部分组成,模拟器,编辑器和调试器
  • Developer also need a wechat account to administer and manage wechat apps and login to developer tools. It is used constantly for login, verification and testing, so it's very important. It also is like a 2 factor authentication. It is often used to verify before logging in. 
  • index.js is the main page to write your code
  • WeChat Games
    • Amazing example viral game 跳一跳
  • Performance
    • Compressed images
    • Stored locally in geo locations
  • WeChat Mini Program with AR
    • Example Armani cosmetic app allows users to try makeup
    • Other use cases for WeChat AR including gaming, real estate house previewing, open house, hotel room previewing, house shopping
    • Limited API availability, only available for some brands and developers for AR
  • WeChat voice interface
  • WeChat is a platform, not just a messaging app, it also include e-commerce, game, web browsing, search and content publishing
  • Features
  • Can create a wechat test app, previously known as sandbox
  • To test your app, click on preview on iphone to use your iphone to scan a QR code and be able to test it on your local phone
Write HelloWorld in WeChat Mini Program
When the wechat mini program launches, the onLoad function will be invoked. In index.js this code will console.log hello world. As you can see the syntax is very much similar to JavaScript and is stored in a .js script file. 


  onLoad() {

    console.log("Hello World!")



onLoad is a lifecycle call back function

Making WeChat Stickers - Sticker Developer
It's very important to follow community guideline and developer policies. Here's the content requirement for WeChat

Linear Algebra Review

A matrix is a grid of number

Points and vectors

||w|| length of w, also known as the magnitude of w, also the L2 norm.

U(U^T) square value of matrix is matrix U multiply by the transpose of U

Sudoku - Technical Interview

Each row can be represented using letters.Can be stored using row='ABCDEFGHI' note there are 9 letters. Iteration: for r in row : #do something

Each column can be represented using numbers. Can be store using cols ='123456789' also 9 digits. 

Use dot . as placeholder.   

Saturday, May 30, 2020

React + React Native Basics in 2020

I am writing new blog posts for technologies every year because they change, they evolve. JavaScript today is nothing like the JavaScript 10 years ago. Today's topic is React and React Native. This is a narrative style cheat sheet of React Basics. It is my hope to organize those concepts in a cohesive way. It will tell you how the concepts are connected, but to find out more, to dive deeper, it is important to seek out external tutorials and resources. 

Post in progress, in construction. Updated daily.

Key Concepts in React:
Declarative program : opposite of imperative programming, where we specify step-by-step instruction, implementing behaviors in details. In declarative programming we tell React what we want back, like a component. HTML is a declarative, because we don't need to implement every detail just tell the browser to render a <div>.

JSX : 
React uses JavaScript to write HTML codes using JSX.
const my_html = <h1>Hello World</h1>
It is a combination of JavaScript and HTML, won't validate in vanilla JavaScript. 

ReactDOM : insert components into DOM
Babel : compatibility, helps convert JSX to JavaScript (Browsers can understand Js code, not JSX)
React Native : Build mobile apps using just JavaScript. Write once, deploy any where. Supports iOS and Android. Dependency is React. JavaScript is bundled, transpiled from ES7 ES6 ESNext down to ES5 code. Also minified (Source: CS50 Harvard). Multiple JavaScript files compiled into a one big JavaScript bundle. Separate threads for UI, layout, JavaScript (which is Single Thread and can get locked up). 

Link to React, ReactDOM, and Babel using script tags in the headers.

Basic unit of react organized around components. It inherits from React.component. Usually contains a render function. 

class CappedComponentName extends React.component{
        return <h1>Some HTML Code</h1>

Props : objects that are passed to elements. It looks similar to JSON, but unlike JSON which can only handle strings, props can handle other JavaScript data types. 

Arrow function: 
Benefit of using arrow function, is to handle the event variable and bind this correctly. this object can get kind of funky in js. 

Best practice:

A workaround best practice to ensure compatibility is to compile JSX to JavaScript before deployment. 

What is the different between props and states. States is something the component may want to track and modify. Props is somewhat like initialization, configuration, like states but with fewer changes

Post in progress, in construction. Updated daily.

Sunday, May 10, 2020

Intro to Data Visualization

REPOST from Medium with permission

Data Visualization in Machine Learning — Beyond the Basics

This is not a tutorial. These are my notes from various Machine Learning articles and tutorials. My personal cheatsheet for interviews and reviews. Any feedback and corrections are welcome. If you’d like to read more, please let me know as well. These notes are more applicable for python users. Does not include ggplot, great for R.

Prerequisites and Dependencies

This tutorial and overview is python based so we use matplotlib.pyplot. These commands can be run in command line and in Python Notebook with just a bit of modifications. Any reference to plt means the function is from the matplotlib library.
import matplotlib.pyplot as plt
# will get object does not have bar, scatter.... function_name error # if not imported

Plot a Bar Chart

Bar chart, bin chart: useful for frequency analysis, distributions and counts.
labels = ['A','B','C','D','E','F','G']
nums = [13,24,5,8,7,10,11]
xs = range(len(nums)) #[0, 1, 2, 3, 4, 5, 6]
#xs is a convention variable name for x axis,nums)
plt.ylabel("Customize y label") 
plt.title("Customize graph label") #display the plot

Don’t be deceived by its simple look. Frequency analysis is very powerful in data EDA, stats and machine learning.

Plot a Histogram

Histogram will automatically divide data into bins.
import matplotlib.pyplot as plt
import pandas as pd
nums = [99, 1, 3, 5, 7,33, 23,684, 13, 3 ,0, 4]
# <matplotlib.axes._subplots.AxesSubplot object at 0x10d340d90>
# returns object in memory

Also useful for visualizing distribution and outliers.

Scatter Plot

How is scatter plot beyond the basics? Scatter plot is extremely intuitive yet powerful. Just plot the vertical coordinate and horizontal coordinate of each data point in the sample to get its scatter plot. If the relationship is non-linear, or there may be the presence of an outlier, these targets will be clearly visible in the scatter plot. In the case of many features i.e. dimensions, a scatterplot matrix can be used.
Below is a screenshot of pandas scatterplot matrix in the official documentation.

Clearly the relationship is not linear. The diagonal is the variable vs itself, so it’s showing a distribution graph instead of scatter plot. Neat, looks like the variable is normally distributed.
Scatterplot is a great first visual. Too many features? Try sampling or generating data subsets before visualizing.
Use pandas.DataFrame.describe() to summarize and describe datasets that are simply too big. This function will generate summary stats.
Scatterplots are useful for pairwise comparison of features.
Scatterplots can go beyond two dimensions. We can use marker size and color to illustrate the 3rd dimension, even 4th dimension as in the famous TED talk of economical inequality. The presenter even used timeline (animation) as the 5th dimension.

Visualizing Error

Youtube deep learning star Sraj shows a 3D visual of error function while altering y intercept aka bias and slope for linear regression. The global optima i.e. the global minimum in this case is the goal of gradient descent algorithm.
Error functions have shapes and can be visualized. Local optima which prevents your model from improving can potentially be visualized.

Gradient can be visualize as directional arrows that travel in the direction of the global minima along the shape of the 3D plot. It can also be visualized as a field of arrows in a matrix.
Each residual (y_i — y_hat) can be visualize as a vertical line connecting the data point with the fitted line in linear regression.

Data Scientists Love Box Plots

Why? It displays essential stats about distribution in a concise visual form. Aka candle stick plot. Also popular in finance.
Max, 3rd Quartile, Median, 1st Quartile, min.
This is known as the box and whisker graph too. It’s popular among statisticians. Used to visualize range. It can be drawn horizontally.
What’s between Q3 and Q1? The interquartile range, which used in analyzing outliers. Q1–1.5*IQR is too low, Q3+1.5*IQR is too high.
Box whisker plot displays outliers as a dot!
Check out Boston University’s Blood Pressure dataset box whisker plot with outliers.


Did you say heat map? Heat map has been in and out of favor. Web analytics still use heat map to track events and clicks on a webpage to identify key screen real estates. Why should we use heat map for machine learning?
It turns out that generating a heat map of all the feature variables — feature variables as row headers and column headers, and the variable vs itself on the diagonal— is extremely powerful way to visualize relationships between variables in high dimensional space.
For example, a correlation matrix with heat map coloring. A covariance matrix with heat map coloring. Even a massive confusion matrix with coloring.
Think less about the traditional use of heat map, but more like color is another dimension that can visually summarize the underlining data.
Correlation Matrix Heat Maps are frequently seen on Kaggle, for exploratory data analysis (EDA).

More Data Visualization Magic

Did you know that you can visualize decision trees using graphviz. It may output a very large PNG file. Remember the split of decision tree is not always stable — consistent over time. Take it with a grain of salt. The benefit of visualizing a decision tree is to understand where and how machines made decision splits. Decision tree boundaries can be visualized too, see screenshot below from Sklearn documentation.

Visualizing models, decision boundaries and prediction results may give hints whether the model is indeed a good fit or it is a poor fit for the data. For example, it is high bias to ignore the nature of our data if use a straight line to fit a circular scatter of dots.
Researchers even visualized different optimizers to see their descend to minimize loss.
Did you know you can create interactive plots using Plotly right in Jupyter Notebook? Interactive plots allow you to visualize complex data, toggle and change parameters. For example you can slide to change values of your hyperparameters and visualize how the model performance change in gridsearch and other systematic search of the space.

Wednesday, April 29, 2020

JavaScript Basic 2020

Learn what's new with JavaScript in 2020. It has changed a lot from the JavaScript you know.
  • JavaScript is interpreted as opposed to compiled
    • C is compiled
    • No need to declare variable types
    • Allows dynamic typing : given a variable there is no type associated with it until it is filled with value, it can be changed later. Some languages are not 
  • ES6 is the latest version of JavaScript full name is ECMAScript 6
    • Symbol
  • In new JavaScript languages semicolon ; is likely optional
  • JavaScript can check equality using double equal sign ==  , or triple equal sign  ===
    • == coerces the type
    • === requires to be exact, doesn't coerce the type 
  • Node command line runtime for JavaScript is built on V8 engine
  • typeof null --> returns object - one of those strange behaviors of JavaScript
  • JavaScript development is guided by ECMAScript standard. ECMA is pronounced Ehk-MA. E stands for Europe
  • You can think : the spec for JavaScript is written by ECMA
  • Each browser can have its own JavaScript engine, for example Chrome uses V8
  • Event Listener
    • Listening or subscribing events such as keydown 
    • aka the =Event handler
  • Modern JavaScript variations include Typescript, frequently used in Angular 
  • npm is the popular package manager for JavaScript
    • Has joined Github
    • In order words both Github and npm is now owned by Microsoft
  • Define a constant : const CONSTANT = 0.5
  • Enclose strings in double quotes or single quotes
  • Arrays can contain values as well as functions
    • const arr = ["value1",5, function(){console.log("Hello World")}]
      • Run the function arr[2]()
    • Can contain different types
    • Can access using indexing, starting from position zero
    • use array with for loop
for (let i = 0; i < arry.length : i++) {

  • JavaScript allows trailing comma
  • Types
    • primitives: 
      • no methods attached?  immutable
        • boolean, string, null, number, symbol, undefined
        • Number includes both float and integer, there is no separate type
  • Sudden differences between undefined and null
  • JavaScript string
    • concatenation is implicit coercion or type casting, if we use str(variable) then that's explicit coercion
  • checking types of input using typeof, e.g. typeof undefined // --> undefined, type 5 // --> number
  • Try out JavaScript interactively using Chrome browser inspect element mode, or install node and call interactive JavaScript prompt. Use those two as a JavaScript interpreter
  • In general, undefined is returned if nothing specific is returned
  • JavaScript documentation by Mozilla

Friday, April 3, 2020

Bootstrap Basics

  • Bootstrap is a front end framework used to quickly design, organize and beautify a modern website. It generates css fast for common front end patterns and UI elements
  • Horizontal containers are called row s
  • Vertical containers are called col s , short for column, can be used in designing grid system
  • Bootstrap allows you to focus more on the html file rather than CSS file, write a bit less CSS
  • And no need to reinvent the wheel : writing common UIs and interactions from scratch
  • Concept : bootstrap requires using specific class names to generate desired design 
  • Pro tip: use margin to organize layout, example: can do margin left auto to push things all the way to the right  mt-3 margin top 3
  • Pro tip : use chrome inspector on Bootstrap sample and tutorial page to see what class, and configurations are used.

Grid system

Bootstrap organizes html contents into grids. Each row of the grid is called a row, each column a column. Each row has 12 columns. 

Make the website responsive

Use media query to query screen size and type. Specify the content parameter to change html content. 

Viewport is the visible area. Be sure to utilize the actual size of the phone, prevent rendering websites as desktop version on mobile device, not pretending it s desktop load. 

<meta name="viewport" content="width=Device-width, initial-scale=1.0">

Bootstrap can detect screen size and label it as lg for large, sm for small. If we use the lg and sm parameter in the class of the html element, we can specify how much space a grid column will take if the screen is small versus large (based on screen size).

<div class="row">
<div class="col-lg-3 col-sm-6"> This is a section.

<div class="col-lg-3 col-sm-6"> This is another section.
<div class="col-lg-3 col-sm-6"> This is a third section.
<div class="col-19-3 col-sm-6"> This is a fourth section.

CSS review

Pseudo class : 

selector:pseudo-class {
  property: value;


a:link {
  color: #FF0000;

but more importantly in modern css
are two important pseudo class

p::after { 
  content: " - add a foot note";

Friday, March 27, 2020

My Little Green Book of Machine Learning and Deep Learning, Artificial Intelligence

Data pre-processing

Turn Complex Data into Numbers

Turn data into features. Turn data into feature vectors. Machine Learning models can only take numeric data. All input data must be represented numerically. For example, words need to be converted to word embeddings in some Natural Language Processing tasks. 

Training vs Inference Models

There are two major tasks in machine learning 1. build and train a model 2. deploy a model for inference. Part 1 takes known data, uses it to tune parameters of the model such as weights. Part 2 takes in unknown data, real world data or test data and calls a dot predict method on the new data. 

Normalization, Scaling Data

Normalize data, need to scale data to bound it. For example in Machine Learning, an error term can be arbitrarily large because the model can be arbitrarily bad, causing the lower bound of error term for f(x) = wx+b to be essentially unbounded. Bounding the error term by scaling the features numeric value can make the result easier to compute and make the search space easier for gradient descent.

Bias Variance Tradeoff

High bias may refer to underfitting, where the model is too simple, not complex enough to make accurate predictions. It can also mean when the model is practically ignoring the data.

High variance may refer to overfitting. That's when the model overfits, hence cannot generate to future data well. 

Tuesday, March 24, 2020

Natural Language Processing (NLP) 2020

It is year 2020 and vision 20/20. It is time to do another survey article of Natural Language Processing (NLP) field. What is there to learn / know? What is new?

Getting started with NLP

Great sources for NLP

Social media : Twitter, Facebook, comments, posts, forums
Transcripts : events, conferences, speech, call transcripts, zoom transcripts
Smart voice assistants : Alexa, Siri, Google Home

Libraries for NLP

SpaCy: works for Japanese, Chinese and English up to 45 languages. Source 4
Scikit-Learn sklearn TFIDF vectorizer

Advanced NLP

Machine Translation
Transfer learning in NLP


Latent Dirichlet Allocation (LDA) topic modeling


Sentiment analysis
Fake news classifier
Trump tweet maker

Monday, March 23, 2020

Growth Marketing - Technical Marketing

  • Hiring people can be a pitch for your app, startup 
  • How to use freemium monetization on content. Blur out important infographics images in newsletter, create a digital tease to motivate user to leave newsletter and head to content page 
  • Growth hack vanity address
  • Embed survey in emails. 
  • Use AMP for interactive emails. Easy to fix typos and content swamp if email is generated dynamically. 
  • When working with a Growth Manager, get to know what's her style, what's her vision. 
  • Turn super users into community managers, staffs, employees
  • Unfortunately on Youtube drama drives clicks. Dramatic Youtubers seem to grab attention. There was one joke: Uber hits a pothole, Youtuber describes it Gosh I almost died today.
  • Startups are all about growth. Funded startups are definitely about growth. Even bootstrapped startups need to think about growth, large scale growth asap. 
  • Youtube
    • Very important to have attention grabbing thumbnails

Advanced Python

  • Installation
    • import numpy as np
  • Request is deprecated
  • Python 2.x is deprecated (Python 2.7 is usually pre-installed on Macs 3-7 years from 2020). 
  • BeautifulSoup
  • Pyecharts
  • Scrapy
  • Pycharm
  • mylist = [1,2,3,4]
  • np_array= np.array(mylist)
  • Slicing
    • Entire list mylist[:]
    • Slicing zeroth to 0th, 1st element mylist[0:2]
    • The second position is exclusive
  • Function signature
    • What type of input is expected? Example CSV
    • What type of output is expected?
    • What is the functionaltiy
    • Each functionality does one task
  • Using an IDE
    • Anaconda Spyder (Scientific Python Development Environment)
    • PyCharm

Sunday, March 22, 2020

Technical Interview Tips

  • Time complexity
  • Data structure & algorithms
  • Test case: ZeroDivisionError
  • Implement function from scratch
  • Tree algorithms
  • Use pointers while lo < hi : do xyz lo = 0th index hi = len(input)-1
  • Language: Java, Python slightly preferred

Firebase APIs Basics

  • A project is a container for resources on Google Cloud
  • Install Firebase tools first to use command line utilities
  • Defer tag in html, means don't load the resources until the page finishes loading
  • Firebase serve local port 5000

Google Cloud Basics

  • .yaml extension of configuration file
  • Export python packages and environment dependencies as requirements.txt
  • python code
  • Google Cloud Function
    • By default cloud function is authenticated into other Google APIs
    • Serverless, fully managed, event triggered, considered a micro-service in the cloud
  • Google Cloud OAuth
  • Use stackdriver (build on AWS) to track performance
  • Role management using IAM
  • Google Cloud Discount
    • Google Cloud discount for students available
  • Cloud DNS is an available API
    • Add a record ANAME find ipv4 address
    • Copy external IP address
    • ANAME record is linked to external IP
    • CNAME www is connected to domain name .com
    • Now go to domain provider and do the same
    • Add the google domain one to the name server
    • Need generate an SSL a certificate to enable
  • Integrates with Firebase
    • Firebase APIs
  • Resources - Conference: Google Cloud Next Conference usually in April
  • Static versus ephemeral IP address
  • Market place: Google Cloud wordpress
  • Google Cloud partner: top partners include Accenture
  • Tutorial : Qwiklab
  • Concept : server less on the cloud, runs code in the cloud don’t need to know or manage what machine infrastructure it is operating on. No need to worry about automatic scaling, load balancing, security, patch.

Friday, March 20, 2020

Learn SQL — It’s on every job listing — Part 1

SQL is not obsolete. You can now build Machine Learning models with SQL, query real time or big data with SQL. It is true if you look around you will find plenty of job postings with SQL as a desired skill, even from FANG companies (Facebook, Apple, Netflix, Google). Uniqtech writes technical tutorials for coding bootcamp graduates, free lancers, self-study, MOOC students who are in the realm of data science, software engineering, machine learning and deep learning. Read our disclaimer here. This disclaimer applies to our entire site. Please take our words with a grain of salt. They are not considered professional advice nor are they considered professional opinions. Repost from Uniqtech Medium with permission. 

Microsoft Excel is a workbook that contain work sheets just like database contains tables.
Each table can be queried separately. To query tables, jointly, we will need to use join statements and keys to look up the corresponding data.
Each table row should have a unique ID, known as the primary ID. It can also have a foreign key (FK) which associates the row, aka record, with a unique primary ID of another table.
For example each e-commerce transaction has an unique ID, which can be generated with the timestamp of when the transaction happened. Each transaction ID can have a FK such as customer ID, which uniquely identifies the customer that made the transaction. His or her full information resides in the customers table.
That is the perfect sequel to talk about the philosophy and convention behind table names. You can think of table names are natural division of the data we want to model in forms of nouns, and in noun plural form: transactions, customers, products etc. Each row in the transactions table is a transaction (singular). Each row in the customers table is a customer. Each column represents a customer attribute, such as gender, age etc.
When designing the database, an architect or Database Admin (DBA) will construct a digital blue print stating how the tables are connected with each other or they are stand alone in the database. This diagram and the relations it specify is called the database schema.

What is SQL

SQL is a database query language. It doesn’t matter what relational database you use, SQL concepts are helpful. Pandas analytics library uses similar joins, query methods. Google BigQuery allows SQL like syntax.
Newer database such as NoSQL and graph databases use different query languages. Sample code from Google Cloud Datastore nosql database
1. // List Google companies with fewer than 400 employees.
2. var companies = query.filter(‘name =’, ‘Google’).filter(‘size <’, 400);

Important SQL Keywords


The one select statement to select them all is using the wildcard.
SELECT * FROM table_name
It is important to slow down and read the statement. It reads: select all from table_name. * means all columns.
Nested Select Statements
AS specifies the alias. When column names are not reader friendly or long, alias is your friend.
It selects the column of data.


The FROM keyword is usually followed by a tablename. FROM database.CUSTOMERS . It can also be followed by a nested query.
It specifies the table to operate on.


Where clause narrows down the query results by specifying conditions such as where TABLE_NAME.gender == 'Female' . It works on filtering the rows of data.

Putting it all together SELECT FROM WHERE

query = """
    SELECT my_column
    FROM my_table AS m
    WHERE m.gender = ‘F’


“The SQL WITH clause allows you to give a sub-query block a name (a process also called sub-query refactoring), which can be referenced in several places within the main SQL query.” — Geek for geeks


Sort the query result by columns ascending or descending.


LIMIT 1000
Show the first xx number of rows of records in a table.
Usually at the end of the query. The last line in SQL query.
Note in big data, where managing cost and resource use is important, LIMIT does not mean the entire database is not queried.

Data Structure and Algorithms

Time Efficiency and Space Efficiency Both Matter

How to write effective tests cases

Wednesday, March 18, 2020

Google Colab Basics

In Google's own words: Colab is zero configuration, free GPU, and easy to share. I honestly have to agree with that. Google Colab is the easiest environment to get started on machine learning with scikit learn, or deep learning with Tensorflow and Pytorch. Seriously, zero installation is awesome! Back in the days, when Ruby on Rails was hot, we had installation parties all the time. Because that's what took the most time, for every one. Now Colab even has access to free TPU!

Google Colab is basically like Google Doc is for Microsoft Office as it is for Jupyter Notebook. It basically lets you create, edit and host Jupyter Notebook in the cloud.

Google Colab for Training Models

Training is essentially free and easy on Google Colab. You have access to both GPU and TPU. Though the free version can lose temporary variables, files in the home directory, because it refreshes every 12 hours or less. There are ways to save and download the files to avoid such catastrophe. 

Use Google Colab for Demo Purpose

It is easy to build an example in Colab and share it with audience, give it away instead of Github source code, instead of slides.

Tuesday, March 17, 2020

Flask basics 2020

Flask is known as light weight and feature rich. It is also known as a micro framework.  Django is the heavy weight one. Used to dynamically code and update web applications, including Single Page Applications (SPA). It is a web development framework, not a library. Frameworks has certain philosophy, strategy as well as code patterns (for example must follow certain folder structure because the framework will automatically look for assets into those folders, as well as file, resource naming conventions). Frameworks can help us do routine tasks, which we have to do over and over again in web programming, way faster and more efficiently. It uses tried and true methods, and code patterns to implement common web app components, there is no need to reinvent the wheel every time. 

Model-View Controller framework

MVC framework, popularized by Ruby on Rails. MVC is a separation of concerns, a way of organizing and designing code projects. 

Controller: use a decorator like and URL like structure to define behaviors. Of how the URL will be handled. Called routes. Url_for() short hand to link url to functions instead having to type long url


Manage Flask ORM resources and records using

Launching Flask app via local server
$ flask run
Use above to launch a local website

Additional Flask concepts

Hello world, first script is commonly called

Jinja mixes Python and HTML together. Url_for() short hand to link url to functions instead having to type long url
Conditional HTML Jinja templating language along with Flask allows us to write if else statement in HTML

How to host flask applications: one example is that you can easily deploy flask applications on Heroku as well as separately on Google Cloud app engine.

__name__ the use of name keyword can determine whether a script is the main process that is being run, or it is being called upon by another script. If it is imported, then it is called by another script. Check whether the python __name__ == __main__ can inform Flask where to automatically look for static files.

Flask has debugging mode, which should never be used in production but is handy when developing.

Make Flask app available online use ngrok

Web Programming Web Development Basics 2020

Model view controller framework (MVC)
"a separation of concern"

  • Controller contains the code logics, routes
  • View contains the aesthetics codes CSS

  • Block versus inline
  • Can use CSS layout
  • build in tool called flexbox which can lay out automatically
  • flex box is great for making a bunch of cards that rearrange themselves as we shrink the page
    • 4 on each row
    • 3 on each row
    • ..
    • 1 on each row stacking.
HTTP protocol
  • Text protocol
  • HTTP requests contains a header
  • Key : value pairs
  • Http status codes 1xx 2xx 3xx 4xx 5xx
  • Http response


  • Using a web development framework allos us to by pass a lot of hard work and be able to write web apps quickly without knowing all the functions to call and libraries to import
Web hooks


  • D3 js visualize dataflow chart, org chart

Monday, March 16, 2020

API Design 2020

Designing API

  • What kind of resources are needed
  • What kind of actions will be taken
  • What kind of endpoints should be designed

Testing API

Testing API using curl

GET Method
curl 'https://[URL]/[resource].json'

curl -X PUT -d '{"key":{"nested_key":"value"}}' \

curl -X PATCH -d '{"key":"value"}' \

curl -X POST -d '{"key":{"nested_key":"value"}}' \

curl -X DELETE \

Use curl to check the documentation to see what the API does

Shopify Partner Basics

  • 2019 Shopify started to version its APIs. You can now refer to APIs by their version number. 
  • Career opportunities: Shopify store owner turned developer, turned partner
  • Shopify Buy Button is available for WordPress blogs
  • Utilize the Shopify partner blog, a great resource
  • Shopify Ping chat and Kit CRM robo assistant
  • Can use Apple Business Chat with Shopify channels
  • Can get approved to run product tagging and ads on instagram
  • Shopify Lounge provides co-working opportunities, photoshooting light box sessions
  • Point DNS on Shopify custom domain

GraphQL Basics on Shopify

GraphQL eliminates the need to define, a potentially infinite, number of endpoints for developer to interact with APIs. There is no more need to predefine the endpoints needed to interact with the API.  Many traditional API calls may be needed to get complex results back. "Multiple API calls from different schema hard for developers and slow for users." - Shopify forum discussion. GraphQL is can give back all the information in nested JSON format. Use GraphQL admin to manage the API. REST API needs multiple endpoints for each resource, to be designed and written, GraphQL technically just need one endpoint. In Shopify POST https://{shop} for example. Shopify has a GraphQL app ready for installation. Shopify Developers and Shopify Partners can potentially use this to design reporting fast. Traditional API call HTTP request GET /api/user?id=1 HTTP response {“id”:1, “name”:”xyz”}

Wednesday, March 4, 2020

Evaluating Classification Tasks in Machine Learning and Deep Learning

Confusion Matrix

Keywords: recall sensitivity, specificity

ROC curve, ROC AUC (curve)

Use real example: doctors, medicine, cancer example

Technical presentation components

  • Story telling
  • Workflow flow charts, where does it fit in the big picture
  • Code snippets
  • Take aways, action items after the talk
  • Link to slides
  • Link to codebase
  • Visualizations
  • Tricks to memorize, remember

Women in Data Science Conference (WiDS) summary, transcripts, notes from my personal experience

WiDS started small but is now a global movement with many regional events and branches. It is 5 years old in 2020. This year it is hosted at Stanford University.

Volunteer opportunities with WiDS: ambassadors, region events and branches, 500+ ambassadors world wide

Understand the history and evolution of Tensorflow by revisiting Tensorflow 1.0 Part 1

Tensorflow 2.0 has been beta since last year, and it is a completely different universe as its predecessor Tensorflow 1.0 but even in 2020 it is important to understand the history and evolution of TF library to understand how did it get from here, and why did it choose Keras as a high level API. It is important to understand what is a compute graph as it is a super useful concept in Deep Learning and that you can still visualize and inspect it in TensorBoard.

Let's go back in time and talk about Tensorflow 1.0 its data flow graph and everything executed in a Session object, C or C++ backend and how it handled parallel computing. Offered both Python and C++ API. Though it had a big learning curve at the time of its release, it was production ready and powerful, and had already been used internally at google before being outsourced. It supported CPU GPU and distributed processing in clusters. Its focus is on deep learning neural networks versus Scikit Learn focuses on traditional machine learning algorithm.

What is a data flow graph? It is a very important computer science concept. The node represents math operations, and the edges are multi dimensional arrays called tensors, which flow among the data graph hence the name Tensorflow! See the history is important. Using the graph we can easily visualize the neural networks. Numpy and Scitkit learn would not give that result.

The frustrating part is that graph needs to be built first before running it in a Session. This is where the learning curve got a bit hard and that it was hard to prototype and iterate, and requires a bit of math architect skills than just engineering and coding.

A quick note on tensor, which is also a concept in math and relativity. In this case, it just means a more complex multi-dimensional array of numbers, usually more than 2D (matrix), and has auto gradient compute compatibility and also capability to move to CPU or GPU and parallelize vector compute if possible. Technically even a vector (1D) or a number (0D) is a tensor.

In deep learning, usually we have to convert data such as texts or images into integers, and we usually represent them using tensors. Each image for example is a 3 dimensional tensor of red green and blue. Each dimensional has a matrix corresponding to the width and height of the image, with each element representing the pixel brightness at each w, h coordinate. This is called a feature matrix, aka a feature tensor. Though no one calls it a tensor in this case.

One cumbersome pattern in Tensorflow 1.0 was the need to define a placeholder tf.placeholder() and with type specified before filling it or initializing it with actual numbers. It has the benefit of contiguous memory but it can take time to get used especially when TF is trying to court dynamic type python users. The benefit is also to be able to construct the graph without knowing or filling in specific numeric values. One minus is the inability to test and prototype and iterate.

tf.Variable() allows initializing and filling in data that can be later changed. Each node is an unit of computation. Each edge is either an input or output of an operation.

tensorflow 1.0 like tensorflow 2.0 has a pythonic front-end, a pythonic API and can be deployed on many containers and devices such as CPU, GPU Android and other mobiles OS such as iOS, javascript (tensorflow 1.x+)  in the browser. It has always been quite production ready. Hence was popular before Pytorch 1.0 came along.

Tensorflow and Pytorch both focuses on deep learning and are optimized for deep learning.

Additional feature - Auto differentiation is important for gradient based deep learning algorithms. Additional feature - Optimizer for fine tuning weights efficiently.

Sunday, February 23, 2020

SpaCy for Natural Language Processing (NLP)

Documents are tokenized to sentences, then to words. Additional or readily available features can be made from these documents to work a task. One SpaCy task could be identify whether a tweet is positive or negative and among its texts, is there a specific product that is mentioned.


Install SpaCy with Python like any other python package using pip. The easiest way to install and configure is to use Tensorflow Colab. 

pip install and then import SpaCy. It works on Tensorflow Colab too. Perhaps the fastest way to get started.

Step 1. Need to import a language model before proceeding. SpaCy supports many language models.

Step 2. Load the English model:


Supports other models too, include en_core_web_sm

Update the model (optional)

Step 3. Init the model, wrap it in an nlp object
doc = nlp(u"document sentence here")

print out items from the spacy nlp model
- print out tokens
# Iterate over tokens in a Doc
for token in doc:

- print out entities
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)

this will print out the entity as well as the beginning and ending index and its label

Can also query doc using slicing example [1:4]
Can access .text attribute fo the token

Additional functions



Disable pipeline for custom training
nlp.disable_pTraining src 13

source -13 :
Why disable the pipeline? Say if you are using SpaCy for just one task such as NER, you can disable the pipeline to avoid some of the tasks.
Check the list of pipeline labels nlp.pipe_names()

Save the trained model

Bloom embedding, a type of optimized word embedding
1D CNN 1D convolutional neural network

Not a part of Spacy is the entry level tool kit to NLP. It can do basic part-of-speech tagging. But does not have advanced functionality like spacy

Spacy Deep Learning

Training src 13
source -13 :

SpaCy word embedding

To print out the word embedding 
print out vector, access word embedding use .vector method

SpaCy for Biomedical Research

scispaCy, a python package that provides SpaCy models for biomedical, clinical texts and scientific literature. Pre-processing. 

Why Natural Language Processing is hard?

Limitation of pre-trained models

" A model trained on Wikipedia, where sentences in the first person are extremely rare, will likely perform badly on Twitter. Similarly, a model trained on romantic novels will likely perform badly on legal text." - Spacy documentation

Friday, February 21, 2020

OpenCV cheat sheet

  • import cv2
  • cv2.imread()
  • cv2.resize()
  • .tranpose() on arrays
  • .reshape() on arrays

Google Cloud AutoML

  • Functionality provided by AutoML: Single Label Classification - 1. Predict the ONE correct label that you want to assign to a document 2. Multi-label Classification Predict ALL the correct labels that you want to assigned to a document. 3. Entity Extraction Identify entities within your text items. 4. Sentiment Analysis understand the overall sentiment expressed in block of text. (Source direct quote AutoML documentation). 

Thursday, February 20, 2020

My experience with TripleByte technical interview and quiz

I read a few really good posts on TripleByte experience. They were helpful so I am also posting my two cents here.

First of all, TripleByte is legit. It went through Y Combinator and it is being actively promoted by YC.

Amazing selection of quizzes:
I am so happy that they have full stack, data science as well as Machine Learning quizzes as of Feb 2020! The Data Science and Machine Learning Quizzes both have a NEW sign.

It is about 2 minutes per question.

I really like the FastTrack feature. It is a quick validation. It is encouraging and it quickly moves candidates to the next step  : actually doing or practicing technical interviews. Honestlly this part is not avoidable.

I haven't figured out a way to take other quizzes when passing one with FastTrack.

It is not very hard for me to get FastTrack or well but if I can get exceptionally well, then it is rarer and more meaningful, and there may even be an opportunity to be matched with top companies and opportunities. I don't think the Exceptionally Well is exactly trivial to obtain. TripleByte visualizes your skill set with sub categories that either has a scale of 1-5 rating or a radar map with similar scale. But one does not want to score a 3 in any of the sub categories - visually it makes the radar map looks weak.

With a little a bit of review and brief study, the quizzes should be easily passable. If you don't pass the quiz, may be it is time to learn more and get more experience, because it is not that hard to pass it.

From most of what I gathered online in forums, the technical interview portion is difficult. There is quite a bit of requirement in coding exercises and setting up the coding environment in the console. Because I come from a non-traditional background, I don't know C++ ... yet. I plan to learn it. Some of the exercises, quizzes and interview questions can be in C++. And that's a problem for me. The quiz C++ is easy to figure out even if you don't know the language. But the coding exercise in C++ cannot be figured out without prior knowledge.

Apparently you will be sent an interview guide if you do schedule a technical interview.

One trick to do well in technical interview is to have practiced the problem, then you will know the caveat, and won't stress to understand the problem (comprehension), and potentially know roughly what the optimal solution look like.

During the interview, it'd be good to think of a similar problem that you resolved and recall how you resolved it. Being able to discuss the problem in a real world setting is always helpful for finding optimal solution and also showcase your understanding of the technical problem.

How does TripleByte compare to HackerRank and Leetcode

TripleByte is more developer-friendly and better for candidates than HackerRank and Leetcode. Because first of all, it tests knowledge more than trivia. As long as you understand the problem, you likely can resolve the question fast, within 2 minutes (the requirement). It focuses one or two missing line, or the final returned result. This means you won't have to spend 45 minutes to conjure each solution. I like that a lot. I can demonstrate I understand the problem and its edge cases without having to get very detail right. 

Leetcode is more detailed, and there is a lot of competition for time performance, even a good solution may not be enough. HackerRank has a nice trajectory to level up, and is interesting, but like Leetcode it also requires the candidate to write a lot of code every time. Though eventually, you should probably still use HackerRank or Leetcode to prepare for the screenshared interview - first round. 

HackerRank supports a few choices of languages. TripleByte lets you choose category of your quiz but there no explicit language choice. Leetcode supports many languages. 

Wednesday, February 19, 2020

Unit Testing

Unit testing usually focuses on a single function or a small chunk of code.

Writing and fixing unit test code early can have the effect of managing bugs, errors and failing logic before they get out of hand.

Writing dedicated unit testing code is better than simply testing code interactively, which is a manual not repeatable process.

Test Driven Development (TDD) process likely will require developers to write the testing code first, expect it to fail, and then fixing the failing unit test before progressing.

Test automation is important because the number of unit tests can grow fast with new added features and functionalities.

It is different from regression testing and system integration testing, which both test the entire system more extensive including changed and unchanged parts.

It is also possible to be asked unit testing and testing questions in general in interviews.

Py test convention is to name the testing file similarly with the code file with a test suffix.

Tuesday, February 18, 2020

Neo4j Graph Database Basics 2020

Neo4j models graphs - relationships (edges) of nodes. It is in contrast with Relational Database, traditional tabular database. SQL Joints are expensive, costly, and hard-to-learn (confusing for analysts). 

UI - Neo4j Browser

Neo4j Browser is the query workbench using JavaScript bolt under the hood. Bolt is a binary protocol, a fast one, Neo4j uses for connections. 

Data Modeling

First step of getting started with graph database is to model the data in a graph. It is important to model and store data as a graph. 

Neo4j uses a property graph model. 

Neo4j can store direction one way but can query either ways. There are four main elements: node, can be labeled, similar to SQL table name, can have key property value pairs. Property can go on node or labels. 


Data modeling can be done with a white board or using apjones Arrows App 


Neo4j cypher query executed in session get back cursor of records

Cypher is an open sourced graph database query language, a part of the open cypher project. Other graph databases use cypher too. Not just Neo4j

Cypher versus SQL Comparison


View Schema

Call apoc to view schema

Call - cypher keyword to call functions and procedures

Read more about user defined procedures source 11

You try write and customize your own procedure
Cypher styling and query guide source 12

Graph database can also be queried and modeled using ORMs. 


"Just in time for GraphConnect, Michael released version of the popular APOC library. This release has support for defining custom procedures and functions implemented in plain Cypher and then calling them like regular ones, as well as a new procedure for scraping web pages" - May 2020 

Use cases for Neo4j
Salmon researchers, salmon hatchling in northern atmosphere, knowledge graph
Neo4j for journalist
Panama paper is available as a sandbox dataset
Investigative journalist: panama paper paradise paper
Relationship model can be super insightful in data analysis and for relationship modeling, Neo4j is great.

Can even use graph for chemicals drug discovery

Use for recommendation collaborative filtering

Fraud prevention

Use graph when context matters. How did the data result happen?

Graphing interaction data is also very useful. Relationship data is important.

Neo4j lead data scientist Alicia Frame PhD talks works on graph algorithms

Learn Neo4j (Neo4j Tutorials)

Neo4j Sandbox Feature : Neo4j tutorials can be run in Neo4j Sandboxes

Use :play to launch Neo4j tutorials

Using the :play feature in Neo4j sandbox and sandbox datasets, you can give your cypher skill a try and get started with Neo4j. It is insightful, easy and a lot of fun.

You can create your own Neo4j Browser Guide (tutorial). 

Graph Academy

Neo4j 4.0

Advanced Neo4j Experts

You can become a Neo4j Ninja, Neo4j expert, and join the Neo4j Speaker Program. Neo4j investigative journalism program

Certification available

Neo4j Community and Neo4j Universe

"GraphXR is a browser-based visual analytics platform that delivers unprecedented speed, power, and fluidity to anyone working with connected, high-dimensional, and big data." GraphXR in its own words.

Become a super user
Become a Neo4j Ninja
Load CSV

GRANDSTACK - Hosting Neo4j Website

Neo4j Desktop : allows managing multiple projects databases
Neo4j Graph Apps: graph apps are applications that interact with Neo4j database through the desktop app. Graph apps are single page applications (SPAs) that are built with vanilla JavaScript or front end web development frameworks. 

Advanced Algorithms with Neo4j  | Advanced Graph Algorithms

Use call to launch Neo4j helper functions, stored procs, and algorithms

Page Rank

Neo4j NLP library

Graph Academy | Getting Certified
Certification exam 
Duration 1 hour 80 questions
Introduction to Neo4j Online Course and Tutorial
Duration 1 day
Course Outline
Introduction to Graph Databases
Introduction to Neo4j
Setting up your Development Environment Tutorial
Introduction to Cypher
Getting More out of Queries
Creating Nodes and Relationships
Getting More out of Neo4j

Friday, February 7, 2020

Develop a smart Twilio app 2020 - for SMS phone voice or fax

Key Twilio Concepts

  • TwiML pronounced tweemle and ML does not stand for machine learning. "TwiML (the Twilio Markup Language) is a set of instructions you can use to tell Twilio what to do when you receive an incoming call, SMS, or fax." - official documentation
    • "When someone makes a call to one of your Twilio numbers, Twilio looks up the URL associated with that phone number and sends it a request. Twilio then reads the TwiML instructions hosted at that URL to determine what to do, whether it's recording the call, playing a message for the caller, or prompting the caller to press digits on their keypad."

Twilio Customers and Use Cases

Lyft and Uber both use Twilio for SMS and customer support. See more on the customers page and use case landing page. 

Cool products by Twilio

  • Twilio Studio for drag and drop app building released Jan 2018
  • Twilio Flex source 20
  • Twilio Function (beta as of Feb 2020) 
Tutorial Twilio + Flask + Spotify API
The spotify API is deprecated but this tutorial still shows you how to use Twilio with Flask.
To make the Flask app online, accessible by Twilio use ngrok


20 -
2- Getting start with Twilio examples, code snippet :

Wednesday, February 5, 2020

Getting started with Alexa 2020 for Developers

It is not too late to get started coding for Alexa - making Alexa skills. Let's get started!


  • Workflow: Build Test Launch and Measure
  • Use the Alexa Developer Console
  • Start with a custom skill or Flash Briefing or Smart Home or Video skill
  • Invocation name that is unique for your Alexa app
  • Build an interaction model voice user interface (VUI)VUI (specific to voice apps)
  • Need to provide examples, sample utterance
Wake word wakes up Alexa, for example "Alex, reminder me to ..." Alexa is the wake word.


  • How user interacts with Alexa? How does it work Source 1 (also see above image from source 5)
  • Innovation name is used often, succinct and unique is good
  • Intent is what your skill app can do
  • Utterances is how or the language that users expresses to your skill the right intent
  • Slot accept inputs from users
  • Connect other APIs use interfaces
  • JSON editor. You can use the visual console to enter intents and utterances or you can write JSON code.
  • Specify end point where the code will live. End point receives request. "where the custom logic will live". Not the interaction model. 
Source 6

User Alexa Interactions

  • The user says the wake word, Alexa.
  • Alexa hears the wake word and listens.
  • The Alexa service uses the interaction model to figure where to route the request.
  • A JSON request is sent to the skill's lambda function.
  • The lambda function inspects the JSON request.
  • The lambda function determines how to respond.
  • The lambda function sends a JSON response to the Alexa service.
  • The Alexa service receives the JSON response and converts the output text to an audio file.
  • The Alexa-enabled device receives and plays the audio.
Direct quote from Source 5

Free Alexa Developer Training Course

Use this full end-to-end training course from Amazon Source 5

Best practice

  • Welcome prompt is important. Provide important information.
  • Avoid jargons - difficult or professional words that are hard for users to recall, use or understand. See what I did there? Avoided jargons. 
  • Use conversations that are natural. Ironically, you should even test if your conversation models are natural. Does it understand common utterances - things users say or use? Is it what the users expect?
  • One breath test: dialogs should be finished in one breath
  • Read out loud to test the dialog scripts
  • Can use Alexa simulator to test. 
  • Recommend testing and get user feedback before submitting for certification
  • Take pauses 
text here <break time="600ms"/> more text here

More best practices source 7

Create Alexa Skills with Blueprint

Example Flash Briefing Blueprint Source 12

Create Alexa Skills with APIs

Some available APIs.

Create Alexa without code

Alexa for Gaming

Read the document or join a hackathon to learn more.


    • The backend is usually a lambda function and is interacted via a lambda endpoint
    • A simple storage service (S3) is provisioned if hosted on AWS

    Other resources

    • Udacity Natural Language Processing nanodegree (NLP) for deep learning and machine learning teaches Alexa and IBM Watson skills.
    • Twilio Autopilot can build smart language interaction models based on sample utterances, and outputs JSON files that can integrate with Alexa.
    • Shark Tank Mark Cuban talks about how he uses Alexa and why developers should develop for Alexa. Source 8
    • Amazon Pay in Alexa example Blu ai Source 9
    • Alexa Skills Kit Developer Console Source Source 11
    • Alexa for Business allows businesses to host private skills within the organization Source 13
    • You can write Alexa backend in either Node.js or Python
    • Alexa can speak other languages such as English, Japanese, Spanish, Italian etc. Unfortunately it cannot speak Chinese right now. Source 14
    • Alexa is on many IoT devices including driving speakers, and Facebook Portal photo frame and video call. 


    Can brand with a sound or a familiar tune. 


    7 -
    8 -
    9 -
    10 - Best practices
    11 -
    12 - Flash Briefing
    13 -
    14 -

    Regularization in Machine Learning, Deep Learning

    Regularization can prevent overfitting and potentially make algorithm converge faster and more performant. Useful in deep learning tasks, in...