- Intermediate Concepts. Source: Kaggle Live Coding
- Bloom filter (a data structure) looking at overlapping in data. Checking if there's any overlap or cross over between train and test data. Test if element is an element of a set.
- Use in NLP, in n-grams, 8-grams arbitrary, 20-gram typical because sentences are 20ish words. 7-grams, human memory span around seven words. Average spoken language may be 7-grams. Can do both to see the amount of overlaps. Look at all sets of n grams. Pair wise comparison: what number of n-grams already exist in the set. Empty bloom filter is a bit set of m bits, all set to 0 (wikipedia). k hash functions look at the input, each map or hashes some element to m bits. k is much smaller than m.
- Kaggle competition with Google Cloud New York Taxi Fare Competition https://www.kaggle.com/c/new-york-city-taxi-fare-prediction
- Playground competition in partnership with Google Cloud, Coursera and Kaggle
Using Kaggle on Google Colab
Install Kaggle, and also install catboost
!pip install kaggle
# Google Colab file access feature
# allows Colab to import data directly into colab
from google.colab import files
# retrieve uploaded file
uploaded = files.upload()
# move kaggle.json into thfolder where APIs expects to finds the json file
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/kaggle/kaggle.json
#we will upload the kaggle.json file here so that colab knows our kaggle authentication
#Go to my account create new API token, which will be downloaded as a JSON file
now we can access the kaggle competition list
!kaggle competition list
No comments:
Post a Comment