Friday, March 27, 2020

My Little Green Book of Machine Learning and Deep Learning, Artificial Intelligence

Data pre-processing

Turn Complex Data into Numbers

Turn data into features. Turn data into feature vectors. Machine Learning models can only take numeric data. All input data must be represented numerically. For example, words need to be converted to word embeddings in some Natural Language Processing tasks.

Training vs Inference Models

There are two major tasks in machine learning 1. build and train a model 2. deploy a model for inference. Part 1 takes known data, uses it to tune parameters of the model such as weights. Part 2 takes in unknown data, real world data or test data and calls a dot predict method on the new data.

Normalization, Scaling Data

Normalize data, need to scale data to bound it. For example in Machine Learning, an error term can be arbitrarily large because the model can be arbitrarily bad, causing the lower bound of error term for f(x) = wx+b to be essentially unbounded. Bounding the error term by scaling the features numeric value can make the result easier to compute and make the search space easier for gradient descent.

Bias Variance Tradeoff

High bias may refer to underfitting, where the model is too simple, not complex enough to make accurate predictions. It can also mean when the model is practically ignoring the data.

High variance may refer to overfitting. That's when the model overfits, hence cannot generate to future data well.

Thursday, March 26, 2020

Tensorflow vs Pytorch

The difference between Tensorflow 1.x and Pytorch 1.x is huge. The difference has since then being reduced by Tensorflow 2.x

Tensorflow vs Pytorch Github Stars
Tensorflow has 142K stars. Pytorch has 37.1K stars on Github.
Retrieved on March 20, 2020

Tensorflow and Pytorch Basics

They both operate on a basic unit called tensor, which is a vector, a matrix or a high dimensional matrix. And both are deep learning libraries, both supported by tech giants. Both supports forms of auto differentiation aka auto grad, computing gradient.

Tensorflow 2.0 versus Pytorch 1.3

Key features Tensorflow 2.0

Eager execution by default, imperative programming
Keras integration, canonical API, promotion to primary citizen
Clean ups API etc, consolidation
Tensorflow.js - for browser
Tensorflow Lite - for mobile
Tensorflow serving - for production

Key Features Pytorch 1.3

TorchScript export to graph representation
Quantization
Pytorch Mobile (experimental)
TPU support

Source: MIT Deep Learning

Documentation Tensorflow vs Pytorch

My personal opinion is that Tensorflow documentation is better and more readable than Pytorch. Pytorch dot H .h files are: "an H file is a header file referenced by a document written in C, C++". - What is .h file

Use Pytorch with Tensorflow

Researchers often check their data with between Pytorch and Tensorflow for best performance. You can also use Tensorflow tensorboard for visualizing pytorch results . Uniqtech guide to tensorboard (pro members only) Tensorboard tensor board data visualization for deep learning neural networks

https://ml.learn-to-code.co/skillView.html?skill=flHsKHeS0458Hu9zUIgW

Tuesday, March 24, 2020

Natural Language Processing (NLP) 2020

It is year 2020 and vision 20/20. It is time to do another survey article of Natural Language Processing (NLP) field. What is there to learn / know? What is new?

Getting started with NLP

Great sources for NLP

Social media : Twitter, Facebook, comments, posts, forums

Transcripts : events, conferences, speech, call transcripts, zoom transcripts

Smart voice assistants : Alexa, Siri, Google Home

Libraries for NLP

SpaCy: works for Japanese, Chinese and English up to 45 languages. Source 4

Scikit-Learn sklearn TFIDF vectorizer

Advanced NLP

Machine Translation

Transfer learning in NLP

Algorithms

Latent Dirichlet Allocation (LDA) topic modeling

Projects

Sentiment analysis

Fake news classifier

Trump tweet maker

Can combine sentiment analysis with tweet analysis. Great Natural Language Processing (NLP) projects for hackathon : retrieve keywords from tweets (entities recognition for hashtags, brand names), pipe the result into a sentiment analysis model, predict sentiment negativity to positivity zero stars to five stars.

Sources
5. https://spacy.io/usage/models#languages

Monday, March 23, 2020

Growth Marketing - Technical Marketing

Digital marketing, growth marketing, technical marketing tips. The goal is to have marketing solutions that are native to the current internet and tools of the internet. Measure results and re-target using data driven marketing methods. Modern marketing has a strong emphasis on ROI and experiments. Data driven marketing takes the guess work out of marketing campaigns.

Embed survey in emails.
Call to action (CTA) : e.g. Sign up with your email to download the PDF from Gumroad.
Hiring people can be a pitch for your app, startup
How to use freemium monetization on content. Blur out important infographics images in newsletter, create a digital tease to motivate user to leave newsletter and head to content page
Growth hack vanity address
Machine learning for marketing, google
Newsletter:

Marketing, Growth - Newsletter: best practice send test email first. Mailchimp includes this functionality

Use AMP for interactive emails. Easy to fix typos and content swamp if email is generated dynamically.
When working with a Growth Manager, get to know what's her style, what's her vision.
Turn super users into community managers, staffs, employees
Unfortunately on Youtube drama drives clicks. Dramatic Youtubers seem to grab attention. There was one joke: Uber hits a pothole, Youtuber describes it Gosh I almost died today.
Startups are all about growth. Funded startups are definitely about growth. Even bootstrapped startups need to think about growth, large scale growth asap.
Youtube

Very important to have attention grabbing thumbnails

Advanced Python

Installation

import numpy as np

Request is deprecated
Python 2.x is deprecated (Python 2.7 is usually pre-installed on Macs 3-7 years from 2020).
BeautifulSoup
Pyecharts
Scrapy
mylist = [1,2,3,4]
np_array= np.array(mylist)
Slicing

Entire list mylist[:]
Slicing zeroth to 0th, 1st element mylist[0:2]
The second position is exclusive

Function signature

What type of input is expected? Example CSV
What type of output is expected?
What is the functionaltiy
Each functionality does one task

Using an IDE

Anaconda Spyder (Scientific Python Development Environment)
PyCharm What is pycharm https://ml.learn-to-code.co/skillView.html?skill=Bx3bnZQvelMdHisZlQnj
Why use an IDE:
tool kit plugin
Pycharm can also refactor update references when files move
get styling hints: json does not allow trailing comma
error highlighting

which python

this command is used to check which python version is launched

pip is the package management tool. It is used to install python software packages. Can also use pip to install another package manager, specializing in data science tasks called anaconda.
In any language, do not overly rely on print statements for debugging.
String is immutable example: string"[2] = 3 # TypeError: 'str' object does not support item assignment
Style

Use indentation to track code hierarchy, nested function calls etc.

Type hinting : modern python can specify variable types and get type hinting as a result. todo add flash card
Use args[index_num] to access one of the inputs (to a function, by indexing the list of arguments)
The last element in a python list does not have the next method, if calling next() on a python list generator when it is already the last item, there will be an error. next(my_generator) --> StopIteration

Sunday, March 22, 2020

Technical Interview Tips - Technical Interview with Python cheat sheet

Time complexity

Big O Exponential 2**n
Worst case, Big O
Best scenario
Average

Data structure & algorithms
Test case: ZeroDivisionError
Implement function from scratch
Tree algorithms
Use pointers while lo < hi : do xyz lo = 0th index hi = len(input)-1
Language: Java, Python slightly preferred
Exponential

Do you get this joke? This TripleByte banner is a clever time complexity joke:

Funny tweet about Big O notation and developer reactions :D lol

Firebase APIs Basics

A project is a container for resources on Google Cloud
Install Firebase tools first to use command line utilities
Defer tag in html, means don't load the resources until the page finishes loading
Firebase serve local port 5000

Google Cloud Basics

.yaml extension of configuration file
Export python packages and environment dependencies as requirements.txt
name.py python code
Google Cloud Function

By default cloud function is authenticated into other Google APIs
Serverless, fully managed, event triggered, considered a micro-service in the cloud

Google Cloud OAuth
Use stackdriver (build on AWS) to track performance
Role management using IAM
Google Cloud Discount

Google Cloud discount for students available

Cloud DNS is an available API

Add a record ANAME find ipv4 address
Copy external IP address
ANAME record is linked to external IP
CNAME www is connected to domain name .com
Now go to domain provider and do the same
Add the google domain one to the name server
Need generate an SSL a certificate to enable

Integrates with Firebase

Firebase APIs

Resources - Conference: Google Cloud Next Conference usually in April
Static versus ephemeral IP address
Market place: Google Cloud wordpress
Google Cloud partner: top partners include Accenture
Tutorial : Qwiklab
Concept : server less on the cloud, runs code in the cloud don’t need to know or manage what machine infrastructure it is operating on. No need to worry about automatic scaling, load balancing, security, patch.
Server side resource and library security is managed using IAM, which gives users roles.

Friday, March 20, 2020

Learn SQL — It’s on every job listing — Part 1

SQL is not obsolete. You can now build Machine Learning models with SQL, query real time or big data with SQL. It is true if you look around you will find plenty of job postings with SQL as a desired skill, even from FANG companies (Facebook, Apple, Netflix, Google). Uniqtech writes technical tutorials for coding bootcamp graduates, free lancers, self-study, MOOC students who are in the realm of data science, software engineering, machine learning and deep learning. Read our disclaimer here. This disclaimer applies to our entire site. Please take our words with a grain of salt. They are not considered professional advice nor are they considered professional opinions. Repost from Uniqtech Medium with permission.

Microsoft Excel is a workbook that contain work sheets just like database contains tables.

Each table can be queried separately. To query tables, jointly, we will need to use join statements and keys to look up the corresponding data.

Each table row should have a unique ID, known as the primary ID. It can also have a foreign key (FK) which associates the row, aka record, with a unique primary ID of another table.

For example each e-commerce transaction has an unique ID, which can be generated with the timestamp of when the transaction happened. Each transaction ID can have a FK such as customer ID, which uniquely identifies the customer that made the transaction. His or her full information resides in the customers table.

That is the perfect sequel to talk about the philosophy and convention behind table names. You can think of table names are natural division of the data we want to model in forms of nouns, and in noun plural form: transactions, customers, products etc. Each row in the transactions table is a transaction (singular). Each row in the customers table is a customer. Each column represents a customer attribute, such as gender, age etc.

When designing the database, an architect or Database Admin (DBA) will construct a digital blue print stating how the tables are connected with each other or they are stand alone in the database. This diagram and the relations it specify is called the database schema.

What is SQL

SQL is a database query language. It doesn’t matter what relational database you use, SQL concepts are helpful. Pandas analytics library uses similar joins, query methods. Google BigQuery allows SQL like syntax.

Newer database such as NoSQL and graph databases use different query languages. Sample code from Google Cloud Datastore nosql database

1. // List Google companies with fewer than 400 employees.
2. var companies = query.filter(‘name =’, ‘Google’).filter(‘size <’, 400);

Important SQL Keywords

SELECT

The one select statement to select them all is using the wildcard.

SELECT * FROM table_name

It is important to slow down and read the statement. It reads: select all from table_name. * means all columns.

Nested Select Statements

SELECT * FROM (SELECT "A" AS A, "B" AS B);

AS specifies the alias. When column names are not reader friendly or long, alias is your friend.

It selects the column of data.

FROM

The FROM keyword is usually followed by a tablename. FROM database.CUSTOMERS . It can also be followed by a nested query.

It specifies the table to operate on.

WHERE

Where clause narrows down the query results by specifying conditions such as where TABLE_NAME.gender == 'Female' . It works on filtering the rows of data.

Putting it all together SELECT FROM WHERE

query = """
    SELECT my_column
    FROM my_table AS m
    WHERE m.gender = ‘F’
 """

WITH

“The SQL WITH clause allows you to give a sub-query block a name (a process also called sub-query refactoring), which can be referenced in several places within the main SQL query.” — Geek for geeks

ORDER BY

Sort the query result by columns ascending or descending.

ORDER BY ASC
ORDER BY DESC

LIMIT

LIMIT 1000
LIMIT 25

Show the first xx number of rows of records in a table.

Usually at the end of the query. The last line in SQL query.

Note in big data, where managing cost and resource use is important, LIMIT does not mean the entire database is not queried.

Data Structure and Algorithms

Time Efficiency and Space Efficiency Both Matter

How to write effective tests cases

Wednesday, March 18, 2020

Google Colab Basics

In Google's own words: Colab is zero configuration, free GPU, and easy to share. I honestly have to agree with that. Google Colab is the easiest environment to get started on machine learning with scikit learn, or deep learning with Tensorflow and Pytorch. Seriously, zero installation is awesome! Back in the days, when Ruby on Rails was hot, we had installation parties all the time. Because that's what took the most time, for every one. Now Colab even has access to free TPU!

Google Colab is basically like Google Doc is for Microsoft Office as it is for Jupyter Notebook. It basically lets you create, edit and host Jupyter Notebook in the cloud.

Google Colab for Training Models

Training is essentially free and easy on Google Colab. You have access to both GPU and TPU. Though the free version can lose temporary variables, files in the home directory, because it refreshes every 12 hours or less. There are ways to save and download the files to avoid such catastrophe.

Use Google Colab for Demo Purpose

It is easy to build an example in Colab and share it with audience, give it away instead of Github source code, instead of slides.

Tuesday, March 17, 2020

Flask basics 2020

Flask is known as light weight and feature rich. It is also known as a micro framework. Django is the heavy weight one. Used to dynamically code and update web applications, including Single Page Applications (SPA). It is a web development framework, not a library. Frameworks has certain philosophy, strategy as well as code patterns (for example must follow certain folder structure because the framework will automatically look for assets into those folders, as well as file, resource naming conventions). Frameworks can help us do routine tasks, which we have to do over and over again in web programming, way faster and more efficiently. It uses tried and true methods, and code patterns to implement common web app components, there is no need to reinvent the wheel every time.

Model-View Controller framework

MVC framework, popularized by Ruby on Rails. MVC is a separation of concerns, a way of organizing and designing code projects.

Controller: use a decorator like and URL like structure to define behaviors. Of how the URL will be handled. Called routes. Url_for() short hand to link url to functions instead having to type long url

Database

SQL

Manage Flask ORM resources and records using

SQLAlchemy

Launching Flask app via local server
$ flask run
Use above to launch a local website

Additional Flask concepts

Hello world, first script is commonly called app.py
$python app.py

Jinja mixes Python and HTML together. Url_for() short hand to link url to functions instead having to type long url
Conditional HTML Jinja templating language along with Flask allows us to write if else statement in HTML

How to host flask applications: one example is that you can easily deploy flask applications on Heroku as well as separately on Google Cloud app engine.

Some useful Heroku commands:

$heroku open

opens deployed app.

Fun fact, Salesforce owns Heroku now.

__name__ the use of name keyword can determine whether a script is the main process that is being run, or it is being called upon by another script. If it is imported, then it is called by another script. Check whether the python __name__ == __main__ can inform Flask where to automatically look for static files.

Flask has debugging mode, which should never be used in production but is handy when developing.

Make Flask app available online use ngrok

Web Programming Web Development Basics 2020

Model view controller framework (MVC)
"a separation of concern"

Controller contains the code logics, routes
View contains the aesthetics codes CSS

CSS

Block versus inline
Can use CSS layout
build in tool called flexbox which can lay out automatically
flex box is great for making a bunch of cards that rearrange themselves as we shrink the page

4 on each row
3 on each row
..
1 on each row stacking.

HTTP protocol

Text protocol
HTTP requests contains a header
Key : value pairs
Http status codes 1xx 2xx 3xx 4xx 5xx
Http response

Frameworks

Using a web development framework allos us to by pass a lot of hard work and be able to write web apps quickly without knowing all the functions to call and libraries to import

Web hooks

Data

D3 js visualize dataflow chart, org chart

Monday, March 16, 2020

API Design 2020

Designing API

What kind of resources are needed
What kind of actions will be taken
What kind of endpoints should be designed

Testing API

Testing API using curl

GET Method

curl 'https://[URL]/[resource].json'

curl -X PUT -d '{"key":{"nested_key":"value"}}' \

'https://[PROJECT_ID].firebaseio/users/tom.json'

curl -X PATCH -d '{"key":"value"}' \

'https://[URL]/[resource].json'

curl -X POST -d '{"key":{"nested_key":"value"}}' \

'https://[PROJECT_ID].firebaseio/users/tom.json'

curl -X DELETE \

'https://[PROJECT_ID].firebaseio/users/tom.json'

Use curl to check the documentation to see what the API does

Shopify Partner Basics + Developer Basics

Use Shopify theme kit to quickly develop the front end
Shopify offers free stock photos for its store owners called Burst
Shopify offers Billing API, checkout API that you can embed on any website
Developers can use GraphQL to query Shopify transactions
Printful, dropshipping made-to-order apparel accessory mug and print, integrates with Shopify seamlessly
Shopify sometimes runs business competitions, winners get mentoring from likes of Shark Tank
2019 Shopify started to version its APIs. You can now refer to APIs by their version number.
Career opportunities: Shopify store owner turned developer, turned partner
Shopify Buy Button is available for WordPress blogs
Utilize the Shopify partner blog, a great resource
Shopify Ping chat and Kit CRM robo assistant
Can use Apple Business Chat with Shopify channels
Can get approved to run product tagging and ads on instagram
Shopify Lounge provides co-working opportunities, photoshooting light box sessions
Point DNS on Shopify custom domain

GraphQL Basics on Shopify

GraphQL is a schema definition language, resolver to handle and point query, Auth. GraphQL eliminates the need to define, a potentially infinite, number of endpoints for developer to interact with APIs. There is no more need to predefine the endpoints needed to interact with the API. Many traditional API calls may be needed to get complex results back. "Multiple API calls from different schema hard for developers and slow for users." - Shopify forum discussion. GraphQL is can give back all the information in nested JSON format. Use GraphQL admin to manage the API. REST API (the previous prevalent standard, before GraphQL got popular) needs multiple endpoints for each resource, to be designed and written, GraphQL technically just need one endpoint. In Shopify POST https://{shop}.myshopify.com/admin/api/2019-04/graphql.json for example. Shopify has a GraphQL app ready for installation. Shopify Developers and Shopify Partners can potentially use this to design reporting fast. Traditional API call HTTP request GET /api/user?id=1 HTTP response {“id”:1, “name”:”xyz”}

GraphQL is also used in Neo4j desktop app for low code development. There's a talk called low code development graphQL API by Will Lyson. Can use one-click API generation with Neo4j database. There’s also a talk called Build APIs with Neo4j GraphQL Library - Part of the NODES 2021 training series) GraphQL schema can spin up API service much quicker than defining endpoint one-by-one and having to modify that when the data model changes.

GraphQL is easy for front end developers to develop API without full stack knowledge.

Wednesday, March 4, 2020

Evaluating Classification Tasks in Machine Learning and Deep Learning

Confusion Matrix

Keywords: recall sensitivity, specificity

ROC curve, ROC AUC (curve)

Use real example: doctors, medicine, cancer example

Technical presentation components

Story telling
Workflow flow charts, where does it fit in the big picture
Code snippets
Take aways, action items after the talk
Link to slides
Link to codebase
Visualizations
Tricks to memorize, remember

Women in Data Science Conference (WiDS) summary, transcripts, notes from my personal experience

WiDS started small but is now a global movement with many regional events and branches. It is 5 years old in 2020. This year it is hosted at Stanford University.

Volunteer opportunities with WiDS: ambassadors, region events and branches, 500+ ambassadors world wide

Understand the history and evolution of Tensorflow by revisiting Tensorflow 1.0 Part 1

Tensorflow 2.0 has been beta since last year, and it is a completely different universe as its predecessor Tensorflow 1.0 but even in 2020 it is important to understand the history and evolution of TF library to understand how did it get from here, and why did it choose Keras as a high level API. It is important to understand what is a compute graph as it is a super useful concept in Deep Learning and that you can still visualize and inspect it in TensorBoard.

Let's go back in time and talk about Tensorflow 1.0 its data flow graph and everything executed in a Session object, C or C++ backend and how it handled parallel computing. Offered both Python and C++ API. Though it had a big learning curve at the time of its release, it was production ready and powerful, and had already been used internally at google before being outsourced. It supported CPU GPU and distributed processing in clusters. Its focus is on deep learning neural networks versus Scikit Learn focuses on traditional machine learning algorithm.

What is a data flow graph? It is a very important computer science concept. The node represents math operations, and the edges are multi dimensional arrays called tensors, which flow among the data graph hence the name Tensorflow! See the history is important. Using the graph we can easily visualize the neural networks. Numpy and Scitkit learn would not give that result.

The frustrating part is that graph needs to be built first before running it in a Session. This is where the learning curve got a bit hard and that it was hard to prototype and iterate, and requires a bit of math architect skills than just engineering and coding.

A quick note on tensor, which is also a concept in math and relativity. In this case, it just means a more complex multi-dimensional array of numbers, usually more than 2D (matrix), and has auto gradient compute compatibility and also capability to move to CPU or GPU and parallelize vector compute if possible. Technically even a vector (1D) or a number (0D) is a tensor.

In deep learning, usually we have to convert data such as texts or images into integers, and we usually represent them using tensors. Each image for example is a 3 dimensional tensor of red green and blue. Each dimensional has a matrix corresponding to the width and height of the image, with each element representing the pixel brightness at each w, h coordinate. This is called a feature matrix, aka a feature tensor. Though no one calls it a tensor in this case.

One cumbersome pattern in Tensorflow 1.0 was the need to define a placeholder tf.placeholder() and with type specified before filling it or initializing it with actual numbers. It has the benefit of contiguous memory but it can take time to get used especially when TF is trying to court dynamic type python users. The benefit is also to be able to construct the graph without knowing or filling in specific numeric values. One minus is the inability to test and prototype and iterate.

tf.Variable() allows initializing and filling in data that can be later changed. Each node is an unit of computation. Each edge is either an input or output of an operation.

tensorflow 1.0 like tensorflow 2.0 has a pythonic front-end, a pythonic API and can be deployed on many containers and devices such as CPU, GPU Android and other mobiles OS such as iOS, javascript (tensorflow 1.x+) in the browser. It has always been quite production ready. Hence was popular before Pytorch 1.0 came along.

Tensorflow and Pytorch both focuses on deep learning and are optimized for deep learning.

Additional feature - Auto differentiation is important for gradient based deep learning algorithms. Additional feature - Optimizer for fine tuning weights efficiently.