Sunday, February 23, 2020

SpaCy for Natural Language Processing (NLP)

Documents are tokenized to sentences, then to words. Additional or readily available features can be made from these documents to work a task. One SpaCy task could be identify whether a tweet is positive or negative and among its texts, is there a specific product that is mentioned.


Install SpaCy with Python like any other python package using pip. The easiest way to install and configure is to use Tensorflow Colab. 

pip install and then import SpaCy. It works on Tensorflow Colab too. Perhaps the fastest way to get started.

Step 1. Need to import a language model before proceeding. SpaCy supports many language models.

Step 2. Load the English model:


Supports other models too, include en_core_web_sm

Update the model (optional)

Step 3. Init the model, wrap it in an nlp object
doc = nlp(u"document sentence here")

print out items from the spacy nlp model
- print out tokens
# Iterate over tokens in a Doc
for token in doc:

- print out entities
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)

this will print out the entity as well as the beginning and ending index and its label

Can also query doc using slicing example [1:4]
Can access .text attribute fo the token

Additional functions



Disable pipeline for custom training
nlp.disable_pTraining src 13

source -13 :
Why disable the pipeline? Say if you are using SpaCy for just one task such as NER, you can disable the pipeline to avoid some of the tasks.
Check the list of pipeline labels nlp.pipe_names()

Save the trained model

Bloom embedding, a type of optimized word embedding
1D CNN 1D convolutional neural network

Not a part of Spacy is the entry level tool kit to NLP. It can do basic part-of-speech tagging. But does not have advanced functionality like spacy

Spacy Deep Learning

Training src 13
source -13 :

SpaCy word embedding

To print out the word embedding 
print out vector, access word embedding use .vector method

SpaCy for Biomedical Research

scispaCy, a python package that provides SpaCy models for biomedical, clinical texts and scientific literature. Pre-processing. 

Why Natural Language Processing is hard?

Limitation of pre-trained models

" A model trained on Wikipedia, where sentences in the first person are extremely rare, will likely perform badly on Twitter. Similarly, a model trained on romantic novels will likely perform badly on legal text." - Spacy documentation

Friday, February 21, 2020

OpenCV cheat sheet

  • import cv2
  • cv2.imread()
  • cv2.resize()
  • .tranpose() on arrays
  • .reshape() on arrays

Google Cloud AutoML

  • Functionality provided by AutoML: Single Label Classification - 1. Predict the ONE correct label that you want to assign to a document 2. Multi-label Classification Predict ALL the correct labels that you want to assigned to a document. 3. Entity Extraction Identify entities within your text items. 4. Sentiment Analysis understand the overall sentiment expressed in block of text. (Source direct quote AutoML documentation). 

Thursday, February 20, 2020

My experience with TripleByte technical interview and quiz

I read a few really good posts on TripleByte experience. They were helpful so I am also posting my two cents here.

First of all, TripleByte is legit. It went through Y Combinator and it is being actively promoted by YC.

Amazing selection of quizzes:
I am so happy that they have full stack, data science as well as Machine Learning quizzes as of Feb 2020! The Data Science and Machine Learning Quizzes both have a NEW sign.

It is about 2 minutes per question.

I really like the FastTrack feature. It is a quick validation. It is encouraging and it quickly moves candidates to the next step  : actually doing or practicing technical interviews. Honestlly this part is not avoidable.

I haven't figured out a way to take other quizzes when passing one with FastTrack.

It is not very hard for me to get FastTrack or well but if I can get exceptionally well, then it is rarer and more meaningful, and there may even be an opportunity to be matched with top companies and opportunities. I don't think the Exceptionally Well is exactly trivial to obtain. TripleByte visualizes your skill set with sub categories that either has a scale of 1-5 rating or a radar map with similar scale. But one does not want to score a 3 in any of the sub categories - visually it makes the radar map looks weak.

With a little a bit of review and brief study, the quizzes should be easily passable. If you don't pass the quiz, may be it is time to learn more and get more experience, because it is not that hard to pass it.

From most of what I gathered online in forums, the technical interview portion is difficult. There is quite a bit of requirement in coding exercises and setting up the coding environment in the console. Because I come from a non-traditional background, I don't know C++ ... yet. I plan to learn it. Some of the exercises, quizzes and interview questions can be in C++. And that's a problem for me. The quiz C++ is easy to figure out even if you don't know the language. But the coding exercise in C++ cannot be figured out without prior knowledge.

Apparently you will be sent an interview guide if you do schedule a technical interview.

One trick to do well in technical interview is to have practiced the problem, then you will know the caveat, and won't stress to understand the problem (comprehension), and potentially know roughly what the optimal solution look like.

During the interview, it'd be good to think of a similar problem that you resolved and recall how you resolved it. Being able to discuss the problem in a real world setting is always helpful for finding optimal solution and also showcase your understanding of the technical problem.

How does TripleByte compare to HackerRank and Leetcode

TripleByte is more developer-friendly and better for candidates than HackerRank and Leetcode. Because first of all, it tests knowledge more than trivia. As long as you understand the problem, you likely can resolve the question fast, within 2 minutes (the requirement). It focuses one or two missing line, or the final returned result. This means you won't have to spend 45 minutes to conjure each solution. I like that a lot. I can demonstrate I understand the problem and its edge cases without having to get very detail right. 

Leetcode is more detailed, and there is a lot of competition for time performance, even a good solution may not be enough. HackerRank has a nice trajectory to level up, and is interesting, but like Leetcode it also requires the candidate to write a lot of code every time. Though eventually, you should probably still use HackerRank or Leetcode to prepare for the screenshared interview - first round. 

HackerRank supports a few choices of languages. TripleByte lets you choose category of your quiz but there no explicit language choice. Leetcode supports many languages. 

Wednesday, February 19, 2020

Unit Testing

Unit testing usually focuses on a single function or a small chunk of code.

Writing and fixing unit test code early can have the effect of managing bugs, errors and failing logic before they get out of hand.

Writing dedicated unit testing code is better than simply testing code interactively, which is a manual not repeatable process.

Test Driven Development (TDD) process likely will require developers to write the testing code first, expect it to fail, and then fixing the failing unit test before progressing.

Test automation is important because the number of unit tests can grow fast with new added features and functionalities.

It is different from regression testing and system integration testing, which both test the entire system more extensive including changed and unchanged parts.

It is also possible to be asked unit testing and testing questions in general in interviews.

Py test convention is to name the testing file similarly with the code file with a test suffix.

Tuesday, February 18, 2020

Neo4j Graph Database Basics 2021-2022

Neo4j models graphs - relationships (edges) of nodes. It is in contrast with Relational Database, traditional tabular database. SQL Joints are expensive, costly, and hard-to-learn (confusing for analysts).  

Graph: a collection of nodes (Vertices) and relationships (Edges) that connect them. The math symbol of it is G(V, E). 

Graph data is everywhere in real life. It is intuitive to model real world data using graph models.

UI - Neo4j Browser

Neo4j Browser is the query workbench using JavaScript bolt under the hood. Bolt is a binary protocol, a fast one, Neo4j uses for connections. 

Data Modeling

First step of getting started with graph database is to model the data in a graph. It is important to model and store data as a graph. 

Neo4j uses a property graph model. 

Neo4j can store direction one way but can query either ways. There are four main elements: node, can be labeled, similar to SQL table name, can have key property value pairs. Property can go on node or labels. 


Data modeling can be done with a white board or using apjones Arrows App 

Relationship edge can also have property associated with it. 


Cypher is the language developers use to make queries against a graph / retrieve information from a graph. It also controls what results are returned. Developers also use cypher to create nodes and relationships, modify nodes and relationships in a graph. 
Neo4j cypher query executed in session get back cursor of records

Cypher is an open sourced graph database query language, a part of the open cypher project. Other graph databases use cypher too. Not just Neo4j

Cypher versus SQL Comparison


View Schema

Call apoc to view schema

WHERE CLAUSE allows developers to filter nodes and relationships. 

CRUD with Cypher Neo4j

SELECT all nodes

SELECT all nodes with a specific label
MATCH (n:Label)

Match all relationships in neo4j graph

Neo4j create statement
CREATE(nodefirst:Label {propertname:"Property Name", propertyagain:99999})
CREATE (nodeagain)-[:REL_TO { roleproperty: ["List Item"]}]->(nodefirst)

Merge operation: equivalent of a SQL update, first look up if node exists, if not create it, don't create duplicates / new record if a record already exists. Get or Create.  

Call - cypher keyword to call functions and procedures

Read more about user defined procedures source 11

You try write and customize your own procedure
Cypher styling and query guide source 12

Graph database can also be queried and modeled using ORMs. 


Neo4j procedures (apoc) are community driven code modules. There are high quality NLP apoc procedures for Neo4j. 

"Just in time for GraphConnect, Michael released version of the popular APOC library. This release has support for defining custom procedures and functions implemented in plain Cypher and then calling them like regular ones, as well as a new procedure for scraping web pages" - May 2020 

Use cases for Neo4j:
Salmon researchers, salmon hatchling in northern atmosphere, knowledge graph, information management.
Graphs, in general, is great for highly connected data. 
Neo4j for journalist
Panama paper is available as a sandbox dataset
Investigative journalist: panama paper paradise paper
Relationship model can be super insightful in data analysis and for relationship modeling, Neo4j is great.

Can even use graph for chemicals drug discovery

Use for recommendation collaborative filtering

Fraud prevention

Use graph when context matters. How did the data result happen?

Graphing interaction data is also very useful. Relationship data is important.

Neo4j lead data scientist Alicia Frame PhD talks works on graph algorithms

Learn Neo4j (Neo4j Tutorials)

Neo4j Sandbox Feature : Neo4j tutorials can be run in Neo4j Sandboxes

Use :play to launch Neo4j tutorials

Using the :play feature in Neo4j sandbox and sandbox datasets, you can give your cypher skill a try and get started with Neo4j. It is insightful, easy and a lot of fun.

You can create your own Neo4j Browser Guide (tutorial). 

Graph Academy

Neo4j 4.0

Advanced Neo4j Experts

You can become a Neo4j Ninja, Neo4j expert, and join the Neo4j Speaker Program. Neo4j investigative journalism program

Certification available

Neo4j Community and Neo4j Universe

"GraphXR is a browser-based visual analytics platform that delivers unprecedented speed, power, and fluidity to anyone working with connected, high-dimensional, and big data." GraphXR in its own words.

Become a super user
Become a Neo4j Ninja
Load CSV

GRANDSTACK - Hosting Neo4j Website

Neo4j Desktop : allows managing multiple projects databases
Neo4j Graph Apps: graph apps are applications that interact with Neo4j database through the desktop app. Graph apps are single page applications (SPAs) that are built with vanilla JavaScript or front end web development frameworks. 

Advanced Algorithms with Neo4j  | Advanced Graph Algorithms
The study of graph theory and graph algorithms is set to be pioneered by Euler.  The interesting problem The Seven Bridges of K√∂nigsberg is solved using graph concepts. 

Use call to launch Neo4j helper functions, stored procs, and algorithms

Page Rank

Neo4j NLP library

Launch Neo4j on google cloud

Can host a Neo4j graph database and deploy using GKE. 

Launching Neo4j on Google Kubernetes Market Place
Neo4j prefers SSDs.
A strong password is automatically chosen. Not neo4j's typical default.
Retrieve it in cloud shell. 
 $ kubectl get secrets my-graph-neo4j-secrets -o yaml | grep neo4j-password: | sed 's/.*neo4j-password: *//' | base64 --decode

Graph Academy | Getting Certified
Certification exam 
Duration 1 hour 80 questions
Introduction to Neo4j Online Course and Tutorial
Duration 1 day
Course Outline
Introduction to Graph Databases
Introduction to Neo4j
Setting up your Development Environment Tutorial
Introduction to Cypher
Getting More out of Queries
Creating Nodes and Relationships
Getting More out of Neo4j

graphaware has plugin for tokenization
3rd party natural language processing platform for neo4j graphs

Using Python with Neo4j py2neo

Friday, February 7, 2020

Develop a smart Twilio app 2020 - for SMS phone voice or fax

Key Twilio Concepts

  • TwiML pronounced tweemle and ML does not stand for machine learning. "TwiML (the Twilio Markup Language) is a set of instructions you can use to tell Twilio what to do when you receive an incoming call, SMS, or fax." - official documentation
    • "When someone makes a call to one of your Twilio numbers, Twilio looks up the URL associated with that phone number and sends it a request. Twilio then reads the TwiML instructions hosted at that URL to determine what to do, whether it's recording the call, playing a message for the caller, or prompting the caller to press digits on their keypad."

Twilio Customers and Use Cases

Lyft and Uber both use Twilio for SMS and customer support. See more on the customers page and use case landing page. 

Cool products by Twilio

  • Twilio Studio for drag and drop app building released Jan 2018
  • Twilio Flex source 20
  • Twilio Function (beta as of Feb 2020) 
Tutorial Twilio + Flask + Spotify API
The spotify API is deprecated but this tutorial still shows you how to use Twilio with Flask.
To make the Flask app online, accessible by Twilio use ngrok


20 -
2- Getting start with Twilio examples, code snippet :

Wednesday, February 5, 2020

Getting started with Alexa 2021 for Developers

It is not too late to get started coding for Alexa - making Alexa skills. Let's get started! updated December 2021. 


Alexa design : Why Voice Design Matters: We Don’t Speak the Way We Write (2018)
  • Workflow: Build Test Launch and Measure
  • Use the Alexa Developer Console
  • Start with a custom skill or Flash Briefing or Smart Home or Video skill
  • Invocation name that is unique for your Alexa app
  • Build an interaction model voice user interface (VUI)VUI (specific to voice apps)
  • Need to provide examples, sample utterance
Wake word wakes up Alexa, for example "Alex, reminder me to ..." Alexa is the wake word.


  • How user interacts with Alexa? How does it work Source 1 (also see above image from source 5)
  • Innovation name is used often, succinct and unique is good
  • Intent is what your skill app can do
  • Utterances is how or the language that users expresses to your skill the right intent
  • Slot accept inputs from users
  • Connect other APIs use interfaces
  • JSON editor. You can use the visual console to enter intents and utterances or you can write JSON code.
  • Specify end point where the code will live. End point receives request. "where the custom logic will live". Not the interaction model. 
Source 6

User Alexa Interactions

  • The user says the wake word, Alexa.
  • Alexa hears the wake word and listens.
  • The Alexa service uses the interaction model to figure where to route the request.
  • A JSON request is sent to the skill's lambda function.
  • The lambda function inspects the JSON request.
  • The lambda function determines how to respond.
  • The lambda function sends a JSON response to the Alexa service.
  • The Alexa service receives the JSON response and converts the output text to an audio file.
  • The Alexa-enabled device receives and plays the audio.
Direct quote from Source 5

Free Alexa Developer Training Course

Use this full end-to-end training course from Amazon Source 5

Best practice

  • Welcome prompt is important. Provide important information.
  • Avoid jargons - difficult or professional words that are hard for users to recall, use or understand. See what I did there? Avoided jargons. 
  • Use conversations that are natural. Ironically, you should even test if your conversation models are natural. Does it understand common utterances - things users say or use? Is it what the users expect?
  • One breath test: dialogs should be finished in one breath
  • Read out loud to test the dialog scripts
  • Can use Alexa simulator to test. 
  • Recommend testing and get user feedback before submitting for certification
  • Take pauses 
text here <break time="600ms"/> more text here

More best practices source 7

Create Alexa Skills with Blueprint

Blueprint is the no-code way to get started building with Alexa. Minimum coding skills required! It's possible to non-engineers to build a common pattern alexa app quickly using Alexa Blueprint!

Example Flash Briefing Blueprint Source 12

Create Alexa Skills with APIs

Some available APIs.

Create Alexa without code

Alexa for Gaming

Read the document or join a hackathon to learn more.


    • The backend is usually a lambda function and is interacted via a lambda endpoint
    • A simple storage service (S3) is provisioned if hosted on AWS

    Other resources

    • Amazon often hosts Build Your First Alexa Skill in 1 Hour Webinars
    • Udacity Natural Language Processing nanodegree (NLP) for deep learning and machine learning teaches Alexa and IBM Watson skills.
    • Twilio Autopilot can build smart language interaction models based on sample utterances, and outputs JSON files that can integrate with Alexa.
    • Shark Tank Mark Cuban talks about how he uses Alexa and why developers should develop for Alexa. Source 8
    • Amazon Pay in Alexa example Blu ai Source 9
    • Alexa Skills Kit Developer Console Source Source 11
    • Alexa for Business allows businesses to host private skills within the organization Source 13
    • You can write Alexa backend in either Node.js or Python
    • Alexa can speak other languages such as English, Japanese, Spanish, Italian etc. Unfortunately it cannot speak Chinese right now. Source 14
    • Alexa is on many IoT devices including driving speakers, and Facebook Portal photo frame and video call. 


    Can brand with a sound or a familiar tune. 


    7 -
    8 -
    9 -
    10 - Best practices
    11 -
    12 - Flash Briefing
    13 -
    14 -

    React UI, UI UX, Reactstrap React Bootstrap

    React UI MATERIAL  Install yarn add @material-ui/icons Reactstrap FORMS. Controlled Forms. Uncontrolled Forms.  Columns, grid