Ad

Wednesday, October 11, 2017

Famous Machine Learning Datasets - Machine Learning Wiki

  • MNIST dataset, a collection of 70,000+ labeled digits, starting point of machine learning practice
    • Beginner Machine Learning data
    • Each image is 28 by 28 pixels so 784 data points per image
    • Often used in Google Tensorflow demos
    • sklearn provides this dataset too
  • Inception-v3 pre-trained Inception-v3 model achieves state-of-the-art accuracy for recognizing general objects with 1000 classes, like "Zebra", "Dalmatian", and "Dishwasher"
  • vgg19 image data
  • What is VGG-16?

    "Since 2010, ImageNet has hosted an annual challenge where research teams present solutions to image classification and other tasks by training on the ImageNet dataset. ImageNet currently has millions of labeled images; it’s one of the largest high-quality image datasets in the world. The Visual Geometry group at the University of Oxford did really well in 2014 with two network architectures: VGG-16, a 16-layer convolutional Neural Network, and VGG-19, a 19-layer Convolutional Neural Network."
  • 1000+ different objects in 1.3 million high resolution training images
  • cornell movie dialog https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html


Additional datasets, some famous some lesser known
  • Movie review https://grouplens.org/datasets/movielens/100k/ 100K ratings from 1000 users on 1700 movies

No comments:

Post a Comment

Softmax function 101 in minutes tutorial

Understand softmax function in just a few minutes.