Ad

Saturday, March 9, 2019

Kaggle Intermediate Cheat Sheet


  • Intermediate Concepts. Source: Kaggle Live Coding
    • Bloom filter (a data structure) looking at overlapping in data. Checking if there's any overlap or cross over between train and test data. Test if element is an element of a set.
    • Use in NLP, in n-grams, 8-grams arbitrary, 20-gram typical because sentences are 20ish words. 7-grams, human memory span around seven words. Average spoken language may be 7-grams. Can do both to see the amount of overlaps. Look at all sets of n grams. Pair wise comparison: what number of n-grams already exist in the set. Empty bloom filter is a bit set of m bits, all set to 0 (wikipedia). k hash functions look at the input, each map or hashes some element to m bits. k is much smaller than m. 
  • Kaggle competition with Google Cloud New York Taxi Fare Competition https://www.kaggle.com/c/new-york-city-taxi-fare-prediction
  • Playground competition in partnership with Google Cloud, Coursera and Kaggle

Using Kaggle on Google Colab
Install Kaggle, and also install catboost
!pip install kaggle


# Google Colab file access feature
# allows Colab to import data directly into colab
from google.colab import files
# retrieve uploaded file
uploaded = files.upload()
# move kaggle.json into thfolder where APIs  expects to finds the json file
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/kaggle/kaggle.json
#we will upload the kaggle.json file here so that colab knows our kaggle authentication 
#Go to my account create new API token, which will be downloaded as a JSON file
now we can access the kaggle competition list
!kaggle competition list

No comments:

Post a Comment

Applying for jobs at the Lending Club

We tried to figure out Lending Club 's tech stack for 2019. Our analysis shows Lending Club asks for skills in Python, Tableau, SQL and ...