Saturday, March 9, 2019

Kaggle Intermediate Cheat Sheet

  • Intermediate Concepts. Source: Kaggle Live Coding
    • Bloom filter (a data structure) looking at overlapping in data. Checking if there's any overlap or cross over between train and test data. Test if element is an element of a set.
    • Use in NLP, in n-grams, 8-grams arbitrary, 20-gram typical because sentences are 20ish words. 7-grams, human memory span around seven words. Average spoken language may be 7-grams. Can do both to see the amount of overlaps. Look at all sets of n grams. Pair wise comparison: what number of n-grams already exist in the set. Empty bloom filter is a bit set of m bits, all set to 0 (wikipedia). k hash functions look at the input, each map or hashes some element to m bits. k is much smaller than m. 
  • Kaggle competition with Google Cloud New York Taxi Fare Competition
  • Playground competition in partnership with Google Cloud, Coursera and Kaggle

Using Kaggle on Google Colab
Install Kaggle, and also install catboost
!pip install kaggle

# Google Colab file access feature
# allows Colab to import data directly into colab
from google.colab import files
# retrieve uploaded file
uploaded = files.upload()
# move kaggle.json into thfolder where APIs  expects to finds the json file
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/kaggle/kaggle.json
#we will upload the kaggle.json file here so that colab knows our kaggle authentication 
#Go to my account create new API token, which will be downloaded as a JSON file
now we can access the kaggle competition list
!kaggle competition list

No comments:

Post a Comment

Understand the Softmax Function in Minutes

Reposted from Uniqtech's Medium publication with permission. This is retrieved on May 14 2019. Uniqtech may have a newer version. Unde...