Ad

Friday, March 27, 2020

My Little Green Book of Machine Learning and Deep Learning, Artificial Intelligence

Data pre-processing

Turn Complex Data into Numbers

Turn data into features. Turn data into feature vectors. Machine Learning models can only take numeric data. All input data must be represented numerically. For example, words need to be converted to word embeddings in some Natural Language Processing tasks. 

Training vs Inference Models

There are two major tasks in machine learning 1. build and train a model 2. deploy a model for inference. Part 1 takes known data, uses it to tune parameters of the model such as weights. Part 2 takes in unknown data, real world data or test data and calls a dot predict method on the new data. 

Normalization, Scaling Data

Normalize data, need to scale data to bound it. For example in Machine Learning, an error term can be arbitrarily large because the model can be arbitrarily bad, causing the lower bound of error term for f(x) = wx+b to be essentially unbounded. Bounding the error term by scaling the features numeric value can make the result easier to compute and make the search space easier for gradient descent.

Bias Variance Tradeoff

High bias may refer to underfitting, where the model is too simple, not complex enough to make accurate predictions. It can also mean when the model is practically ignoring the data.

High variance may refer to overfitting. That's when the model overfits, hence cannot generate to future data well. 

Tuesday, March 24, 2020

Natural Language Processing (NLP) 2020

It is year 2020 and vision 20/20. It is time to do another survey article of Natural Language Processing (NLP) field. What is there to learn / know? What is new?

Getting started with NLP

Great sources for NLP

Social media : Twitter, Facebook, comments, posts, forums
Transcripts : events, conferences, speech, call transcripts, zoom transcripts
Smart voice assistants : Alexa, Siri, Google Home

Libraries for NLP

SpaCy: works for Japanese, Chinese and English up to 45 languages. Source 4
Scikit-Learn sklearn TFIDF vectorizer

Advanced NLP

Machine Translation
Transfer learning in NLP

Algorithms

Latent Dirichlet Allocation (LDA) topic modeling

Projects

Sentiment analysis
Fake news classifier
Trump tweet maker

Can combine sentiment analysis with tweet analysis. Great Natural Language Processing (NLP) projects for hackathon : retrieve keywords from tweets (entities recognition for hashtags, brand names), pipe the result into a sentiment analysis model, predict sentiment negativity to positivity zero stars to five stars. 

Monday, March 23, 2020

Growth Marketing - Technical Marketing


  • Hiring people can be a pitch for your app, startup 
  • How to use freemium monetization on content. Blur out important infographics images in newsletter, create a digital tease to motivate user to leave newsletter and head to content page 
  • Growth hack vanity address
  • Embed survey in emails. 
  • Use AMP for interactive emails. Easy to fix typos and content swamp if email is generated dynamically. 
  • When working with a Growth Manager, get to know what's her style, what's her vision. 
  • Turn super users into community managers, staffs, employees
  • Unfortunately on Youtube drama drives clicks. Dramatic Youtubers seem to grab attention. There was one joke: Uber hits a pothole, Youtuber describes it Gosh I almost died today.
  • Startups are all about growth. Funded startups are definitely about growth. Even bootstrapped startups need to think about growth, large scale growth asap. 
  • Youtube
    • Very important to have attention grabbing thumbnails

Advanced Python


  • Installation
    • import numpy as np
  • Request is deprecated
  • Python 2.x is deprecated (Python 2.7 is usually pre-installed on Macs 3-7 years from 2020). 
  • BeautifulSoup
  • Pyecharts
  • Scrapy
  • Pycharm
  • mylist = [1,2,3,4]
  • np_array= np.array(mylist)
  • Slicing
    • Entire list mylist[:]
    • Slicing zeroth to 0th, 1st element mylist[0:2]
    • The second position is exclusive
  • Function signature
    • What type of input is expected? Example CSV
    • What type of output is expected?
    • What is the functionaltiy
    • Each functionality does one task
  • Using an IDE
    • Anaconda Spyder (Scientific Python Development Environment)
    • PyCharm

Sunday, March 22, 2020

Technical Interview Tips - Technical Interview with Python cheat sheet


  • Time complexity
    • Big O Exponential 2**n
    • Worst case, Big O
    • Best scenario
    • Average 
  • Data structure & algorithms
  • Test case: ZeroDivisionError
  • Implement function from scratch
  • Tree algorithms
  • Use pointers while lo < hi : do xyz lo = 0th index hi = len(input)-1
  • Language: Java, Python slightly preferred
  • Exponential 

Firebase APIs Basics


  • A project is a container for resources on Google Cloud
  • Install Firebase tools first to use command line utilities
  • Defer tag in html, means don't load the resources until the page finishes loading
  • Firebase serve local port 5000

Google Cloud Basics


  • .yaml extension of configuration file
  • Export python packages and environment dependencies as requirements.txt
  • name.py python code
  • Google Cloud Function
    • By default cloud function is authenticated into other Google APIs
    • Serverless, fully managed, event triggered, considered a micro-service in the cloud
  • Google Cloud OAuth
  • Use stackdriver (build on AWS) to track performance
  • Role management using IAM
  • Google Cloud Discount
    • Google Cloud discount for students available
  • Cloud DNS is an available API
    • Add a record ANAME find ipv4 address
    • Copy external IP address
    • ANAME record is linked to external IP
    • CNAME www is connected to domain name .com
    • Now go to domain provider and do the same
    • Add the google domain one to the name server
    • Need generate an SSL a certificate to enable
  • Integrates with Firebase
    • Firebase APIs
  • Resources - Conference: Google Cloud Next Conference usually in April
  • Static versus ephemeral IP address
  • Market place: Google Cloud wordpress
  • Google Cloud partner: top partners include Accenture
  • Tutorial : Qwiklab
  • Concept : server less on the cloud, runs code in the cloud don’t need to know or manage what machine infrastructure it is operating on. No need to worry about automatic scaling, load balancing, security, patch.

Friday, March 20, 2020

Learn SQL — It’s on every job listing — Part 1

SQL is not obsolete. You can now build Machine Learning models with SQL, query real time or big data with SQL. It is true if you look around you will find plenty of job postings with SQL as a desired skill, even from FANG companies (Facebook, Apple, Netflix, Google). Uniqtech writes technical tutorials for coding bootcamp graduates, free lancers, self-study, MOOC students who are in the realm of data science, software engineering, machine learning and deep learning. Read our disclaimer here. This disclaimer applies to our entire site. Please take our words with a grain of salt. They are not considered professional advice nor are they considered professional opinions. Repost from Uniqtech Medium with permission. 

Microsoft Excel is a workbook that contain work sheets just like database contains tables.
Each table can be queried separately. To query tables, jointly, we will need to use join statements and keys to look up the corresponding data.
Each table row should have a unique ID, known as the primary ID. It can also have a foreign key (FK) which associates the row, aka record, with a unique primary ID of another table.
For example each e-commerce transaction has an unique ID, which can be generated with the timestamp of when the transaction happened. Each transaction ID can have a FK such as customer ID, which uniquely identifies the customer that made the transaction. His or her full information resides in the customers table.
That is the perfect sequel to talk about the philosophy and convention behind table names. You can think of table names are natural division of the data we want to model in forms of nouns, and in noun plural form: transactions, customers, products etc. Each row in the transactions table is a transaction (singular). Each row in the customers table is a customer. Each column represents a customer attribute, such as gender, age etc.
When designing the database, an architect or Database Admin (DBA) will construct a digital blue print stating how the tables are connected with each other or they are stand alone in the database. This diagram and the relations it specify is called the database schema.

What is SQL

SQL is a database query language. It doesn’t matter what relational database you use, SQL concepts are helpful. Pandas analytics library uses similar joins, query methods. Google BigQuery allows SQL like syntax.
Newer database such as NoSQL and graph databases use different query languages. Sample code from Google Cloud Datastore nosql database
1. // List Google companies with fewer than 400 employees.
2. var companies = query.filter(‘name =’, ‘Google’).filter(‘size <’, 400);

Important SQL Keywords

SELECT

The one select statement to select them all is using the wildcard.
SELECT * FROM table_name
It is important to slow down and read the statement. It reads: select all from table_name. * means all columns.
Nested Select Statements
SELECT * FROM (SELECT "A" AS A, "B" AS B);
AS specifies the alias. When column names are not reader friendly or long, alias is your friend.
It selects the column of data.

FROM

The FROM keyword is usually followed by a tablename. FROM database.CUSTOMERS . It can also be followed by a nested query.
It specifies the table to operate on.

WHERE

Where clause narrows down the query results by specifying conditions such as where TABLE_NAME.gender == 'Female' . It works on filtering the rows of data.

Putting it all together SELECT FROM WHERE

query = """
    SELECT my_column
    FROM my_table AS m
    WHERE m.gender = ‘F’
 """

WITH

“The SQL WITH clause allows you to give a sub-query block a name (a process also called sub-query refactoring), which can be referenced in several places within the main SQL query.” — Geek for geeks

ORDER BY

Sort the query result by columns ascending or descending.
ORDER BY ASC
ORDER BY DESC

LIMIT

LIMIT 1000
LIMIT 25
Show the first xx number of rows of records in a table.
Usually at the end of the query. The last line in SQL query.
Note in big data, where managing cost and resource use is important, LIMIT does not mean the entire database is not queried.

Data Structure and Algorithms

Time Efficiency and Space Efficiency Both Matter

How to write effective tests cases

Wednesday, March 18, 2020

Google Colab Basics

In Google's own words: Colab is zero configuration, free GPU, and easy to share. I honestly have to agree with that. Google Colab is the easiest environment to get started on machine learning with scikit learn, or deep learning with Tensorflow and Pytorch. Seriously, zero installation is awesome! Back in the days, when Ruby on Rails was hot, we had installation parties all the time. Because that's what took the most time, for every one. Now Colab even has access to free TPU!

Google Colab is basically like Google Doc is for Microsoft Office as it is for Jupyter Notebook. It basically lets you create, edit and host Jupyter Notebook in the cloud.

Google Colab for Training Models

Training is essentially free and easy on Google Colab. You have access to both GPU and TPU. Though the free version can lose temporary variables, files in the home directory, because it refreshes every 12 hours or less. There are ways to save and download the files to avoid such catastrophe. 

Use Google Colab for Demo Purpose

It is easy to build an example in Colab and share it with audience, give it away instead of Github source code, instead of slides.

Tuesday, March 17, 2020

Flask basics 2020

Flask is known as light weight and feature rich. It is also known as a micro framework.  Django is the heavy weight one. Used to dynamically code and update web applications, including Single Page Applications (SPA). It is a web development framework, not a library. Frameworks has certain philosophy, strategy as well as code patterns (for example must follow certain folder structure because the framework will automatically look for assets into those folders, as well as file, resource naming conventions). Frameworks can help us do routine tasks, which we have to do over and over again in web programming, way faster and more efficiently. It uses tried and true methods, and code patterns to implement common web app components, there is no need to reinvent the wheel every time. 

Model-View Controller framework

MVC framework, popularized by Ruby on Rails. MVC is a separation of concerns, a way of organizing and designing code projects. 

Controller: use a decorator like and URL like structure to define behaviors. Of how the URL will be handled. Called routes. Url_for() short hand to link url to functions instead having to type long url

Database

SQL
Manage Flask ORM resources and records using
SQLAlchemy 

Launching Flask app via local server
$ flask run
Use above to launch a local website

Additional Flask concepts

Hello world, first script is commonly called app.py
$python app.py

Jinja mixes Python and HTML together. Url_for() short hand to link url to functions instead having to type long url
Conditional HTML Jinja templating language along with Flask allows us to write if else statement in HTML

How to host flask applications: one example is that you can easily deploy flask applications on Heroku as well as separately on Google Cloud app engine.

__name__ the use of name keyword can determine whether a script is the main process that is being run, or it is being called upon by another script. If it is imported, then it is called by another script. Check whether the python __name__ == __main__ can inform Flask where to automatically look for static files.

Flask has debugging mode, which should never be used in production but is handy when developing.

Make Flask app available online use ngrok

Web Programming Web Development Basics 2020



Model view controller framework (MVC)
"a separation of concern"

  • Controller contains the code logics, routes
  • View contains the aesthetics codes CSS


CSS
  • Block versus inline
  • Can use CSS layout
  • build in tool called flexbox which can lay out automatically
  • flex box is great for making a bunch of cards that rearrange themselves as we shrink the page
    • 4 on each row
    • 3 on each row
    • ..
    • 1 on each row stacking.
HTTP protocol
  • Text protocol
  • HTTP requests contains a header
  • Key : value pairs
  • Http status codes 1xx 2xx 3xx 4xx 5xx
  • Http response
Cookies

Frameworks

  • Using a web development framework allos us to by pass a lot of hard work and be able to write web apps quickly without knowing all the functions to call and libraries to import
Web hooks

Data

  • D3 js visualize dataflow chart, org chart

Monday, March 16, 2020

API Design 2020

Designing API

  • What kind of resources are needed
  • What kind of actions will be taken
  • What kind of endpoints should be designed

Testing API

Testing API using curl

GET Method
curl 'https://[URL]/[resource].json'

curl -X PUT -d '{"key":{"nested_key":"value"}}' \
  'https://[PROJECT_ID].firebaseio/users/tom.json'

curl -X PATCH -d '{"key":"value"}' \
 'https://[URL]/[resource].json'

curl -X POST -d '{"key":{"nested_key":"value"}}' \
  'https://[PROJECT_ID].firebaseio/users/tom.json'

curl -X DELETE \
'https://[PROJECT_ID].firebaseio/users/tom.json'

Use curl to check the documentation to see what the API does

Shopify Partner Basics

  • 2019 Shopify started to version its APIs. You can now refer to APIs by their version number. 
  • Career opportunities: Shopify store owner turned developer, turned partner
  • Shopify Buy Button is available for WordPress blogs
  • Utilize the Shopify partner blog, a great resource
  • Shopify Ping chat and Kit CRM robo assistant
  • Can use Apple Business Chat with Shopify channels
  • Can get approved to run product tagging and ads on instagram
  • Shopify Lounge provides co-working opportunities, photoshooting light box sessions
  • Point DNS on Shopify custom domain

GraphQL Basics on Shopify

GraphQL eliminates the need to define, a potentially infinite, number of endpoints for developer to interact with APIs. There is no more need to predefine the endpoints needed to interact with the API.  Many traditional API calls may be needed to get complex results back. "Multiple API calls from different schema hard for developers and slow for users." - Shopify forum discussion. GraphQL is can give back all the information in nested JSON format. Use GraphQL admin to manage the API. REST API needs multiple endpoints for each resource, to be designed and written, GraphQL technically just need one endpoint. In Shopify POST https://{shop}.myshopify.com/admin/api/2019-04/graphql.json for example. Shopify has a GraphQL app ready for installation. Shopify Developers and Shopify Partners can potentially use this to design reporting fast. Traditional API call HTTP request GET /api/user?id=1 HTTP response {“id”:1, “name”:”xyz”}

Wednesday, March 4, 2020

Evaluating Classification Tasks in Machine Learning and Deep Learning

Confusion Matrix

Keywords: recall sensitivity, specificity

ROC curve, ROC AUC (curve)

Use real example: doctors, medicine, cancer example

Technical presentation components


  • Story telling
  • Workflow flow charts, where does it fit in the big picture
  • Code snippets
  • Take aways, action items after the talk
  • Link to slides
  • Link to codebase
  • Visualizations
  • Tricks to memorize, remember

Women in Data Science Conference (WiDS) summary, transcripts, notes from my personal experience

WiDS started small but is now a global movement with many regional events and branches. It is 5 years old in 2020. This year it is hosted at Stanford University.

Volunteer opportunities with WiDS: ambassadors, region events and branches, 500+ ambassadors world wide

Understand the history and evolution of Tensorflow by revisiting Tensorflow 1.0 Part 1

Tensorflow 2.0 has been beta since last year, and it is a completely different universe as its predecessor Tensorflow 1.0 but even in 2020 it is important to understand the history and evolution of TF library to understand how did it get from here, and why did it choose Keras as a high level API. It is important to understand what is a compute graph as it is a super useful concept in Deep Learning and that you can still visualize and inspect it in TensorBoard.

Let's go back in time and talk about Tensorflow 1.0 its data flow graph and everything executed in a Session object, C or C++ backend and how it handled parallel computing. Offered both Python and C++ API. Though it had a big learning curve at the time of its release, it was production ready and powerful, and had already been used internally at google before being outsourced. It supported CPU GPU and distributed processing in clusters. Its focus is on deep learning neural networks versus Scikit Learn focuses on traditional machine learning algorithm.

What is a data flow graph? It is a very important computer science concept. The node represents math operations, and the edges are multi dimensional arrays called tensors, which flow among the data graph hence the name Tensorflow! See the history is important. Using the graph we can easily visualize the neural networks. Numpy and Scitkit learn would not give that result.

The frustrating part is that graph needs to be built first before running it in a Session. This is where the learning curve got a bit hard and that it was hard to prototype and iterate, and requires a bit of math architect skills than just engineering and coding.

A quick note on tensor, which is also a concept in math and relativity. In this case, it just means a more complex multi-dimensional array of numbers, usually more than 2D (matrix), and has auto gradient compute compatibility and also capability to move to CPU or GPU and parallelize vector compute if possible. Technically even a vector (1D) or a number (0D) is a tensor.

In deep learning, usually we have to convert data such as texts or images into integers, and we usually represent them using tensors. Each image for example is a 3 dimensional tensor of red green and blue. Each dimensional has a matrix corresponding to the width and height of the image, with each element representing the pixel brightness at each w, h coordinate. This is called a feature matrix, aka a feature tensor. Though no one calls it a tensor in this case.

One cumbersome pattern in Tensorflow 1.0 was the need to define a placeholder tf.placeholder() and with type specified before filling it or initializing it with actual numbers. It has the benefit of contiguous memory but it can take time to get used especially when TF is trying to court dynamic type python users. The benefit is also to be able to construct the graph without knowing or filling in specific numeric values. One minus is the inability to test and prototype and iterate.

tf.Variable() allows initializing and filling in data that can be later changed. Each node is an unit of computation. Each edge is either an input or output of an operation.

tensorflow 1.0 like tensorflow 2.0 has a pythonic front-end, a pythonic API and can be deployed on many containers and devices such as CPU, GPU Android and other mobiles OS such as iOS, javascript (tensorflow 1.x+)  in the browser. It has always been quite production ready. Hence was popular before Pytorch 1.0 came along.

Tensorflow and Pytorch both focuses on deep learning and are optimized for deep learning.

Additional feature - Auto differentiation is important for gradient based deep learning algorithms. Additional feature - Optimizer for fine tuning weights efficiently.

AutoML Automatic Machine Learning landscape

Features: here are some features that may be offered Automatic visualization: Automatically generate visualizations Machine learning interpr...