- Intermediate Concepts. Source: Kaggle Live Coding
- Bloom filter (a data structure) looking at overlapping in data. Checking if there's any overlap or cross over between train and test data. Test if element is an element of a set.
- Use in NLP, in n-grams, 8-grams arbitrary, 20-gram typical because sentences are 20ish words. 7-grams, human memory span around seven words. Average spoken language may be 7-grams. Can do both to see the amount of overlaps. Look at all sets of n grams. Pair wise comparison: what number of n-grams already exist in the set. Empty bloom filter is a bit set of m bits, all set to 0 (wikipedia). k hash functions look at the input, each map or hashes some element to m bits. k is much smaller than m.
- Kaggle competition with Google Cloud New York Taxi Fare Competition https://www.kaggle.com/c/new-york-city-taxi-fare-prediction
- Playground competition in partnership with Google Cloud, Coursera and Kaggle
Saturday, March 9, 2019
Kaggle Intermediate Cheat Sheet
Using Kaggle on Google Colab
Install Kaggle, and also install catboost
!pip install kaggle
# Google Colab file access feature
# allows Colab to import data directly into colab
from google.colab import files
# retrieve uploaded file
uploaded = files.upload()
# move kaggle.json into thfolder where APIs expects to finds the json file
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/kaggle/kaggle.json
#we will upload the kaggle.json file here so that colab knows our kaggle authentication
#Go to my account create new API token, which will be downloaded as a JSON file
now we can access the kaggle competition list
!kaggle competition list
Data cleaning Missing data Outlier Others: duplicates, typos, special characters Strategy for missing data: imputation, mean, median...
This review is updated continuously throughout the program. Yay I just joined the Udacity Nanodegree for Digital Marketing! I am such an Uda...
In this downtown startup work space design the designers used fat boy bean bags and an extra wide step tiered staircase to create work space...
In this exercise 8/12, Codecademy requires you to do some basic arithmetics using Java. Use the plus symbol + for addition, the minus symbol...