Ad

Friday, March 20, 2020

Learn SQL — It’s on every job listing — Part 1

SQL is not obsolete. You can now build Machine Learning models with SQL, query real time or big data with SQL. It is true if you look around you will find plenty of job postings with SQL as a desired skill, even from FANG companies (Facebook, Apple, Netflix, Google). Uniqtech writes technical tutorials for coding bootcamp graduates, free lancers, self-study, MOOC students who are in the realm of data science, software engineering, machine learning and deep learning. Read our disclaimer here. This disclaimer applies to our entire site. Please take our words with a grain of salt. They are not considered professional advice nor are they considered professional opinions. Repost from Uniqtech Medium with permission. 

Microsoft Excel is a workbook that contain work sheets just like database contains tables.
Each table can be queried separately. To query tables, jointly, we will need to use join statements and keys to look up the corresponding data.
Each table row should have a unique ID, known as the primary ID. It can also have a foreign key (FK) which associates the row, aka record, with a unique primary ID of another table.
For example each e-commerce transaction has an unique ID, which can be generated with the timestamp of when the transaction happened. Each transaction ID can have a FK such as customer ID, which uniquely identifies the customer that made the transaction. His or her full information resides in the customers table.
That is the perfect sequel to talk about the philosophy and convention behind table names. You can think of table names are natural division of the data we want to model in forms of nouns, and in noun plural form: transactions, customers, products etc. Each row in the transactions table is a transaction (singular). Each row in the customers table is a customer. Each column represents a customer attribute, such as gender, age etc.
When designing the database, an architect or Database Admin (DBA) will construct a digital blue print stating how the tables are connected with each other or they are stand alone in the database. This diagram and the relations it specify is called the database schema.

What is SQL

SQL is a database query language. It doesn’t matter what relational database you use, SQL concepts are helpful. Pandas analytics library uses similar joins, query methods. Google BigQuery allows SQL like syntax.
Newer database such as NoSQL and graph databases use different query languages. Sample code from Google Cloud Datastore nosql database
1. // List Google companies with fewer than 400 employees.
2. var companies = query.filter(‘name =’, ‘Google’).filter(‘size <’, 400);

Important SQL Keywords

SELECT

The one select statement to select them all is using the wildcard.
SELECT * FROM table_name
It is important to slow down and read the statement. It reads: select all from table_name. * means all columns.
Nested Select Statements
SELECT * FROM (SELECT "A" AS A, "B" AS B);
AS specifies the alias. When column names are not reader friendly or long, alias is your friend.
It selects the column of data.

FROM

The FROM keyword is usually followed by a tablename. FROM database.CUSTOMERS . It can also be followed by a nested query.
It specifies the table to operate on.

WHERE

Where clause narrows down the query results by specifying conditions such as where TABLE_NAME.gender == 'Female' . It works on filtering the rows of data.

Putting it all together SELECT FROM WHERE

query = """
    SELECT my_column
    FROM my_table AS m
    WHERE m.gender = ‘F’
 """

WITH

“The SQL WITH clause allows you to give a sub-query block a name (a process also called sub-query refactoring), which can be referenced in several places within the main SQL query.” — Geek for geeks

ORDER BY

Sort the query result by columns ascending or descending.
ORDER BY ASC
ORDER BY DESC

LIMIT

LIMIT 1000
LIMIT 25
Show the first xx number of rows of records in a table.
Usually at the end of the query. The last line in SQL query.
Note in big data, where managing cost and resource use is important, LIMIT does not mean the entire database is not queried.

No comments:

Post a Comment

AutoML Automatic Machine Learning landscape

Features: here are some features that may be offered Automatic visualization: Automatically generate visualizations Machine learning interpr...