Ad

Wednesday, October 11, 2017

Famous Machine Learning Datasets - Machine Learning Wiki

  • MNIST dataset, a collection of 70,000+ labeled digits, starting point of machine learning practice
    • Beginner Machine Learning data
    • Each image is 28 by 28 pixels so 784 data points per image
    • Pixel value 0 to 255. Grayscale, zero means black, 255 means white or completely lit
    • Often used in Google Tensorflow demos
    • sklearn provides this dataset too
    • Small images written by students teachers and government workers
  • Inception-v3 pre-trained Inception-v3 model achieves state-of-the-art accuracy for recognizing general objects with 1000 classes, like "Zebra", "Dalmatian", and "Dishwasher"
  • vgg19 image data
  • What is VGG-16?

    "Since 2010, ImageNet has hosted an annual challenge where research teams present solutions to image classification and other tasks by training on the ImageNet dataset. ImageNet currently has millions of labeled images; it’s one of the largest high-quality image datasets in the world. The Visual Geometry group at the University of Oxford did really well in 2014 with two network architectures: VGG-16, a 16-layer convolutional Neural Network, and VGG-19, a 19-layer Convolutional Neural Network."
  • Imagenet can output 1000+ classes. If we don't need that many, instead need transfer learning should consider replacing it with bottleneck of only 1-10 classes.
  • Youtube 8M Video Data Kaggle https://www.kaggle.com/c/youtube8m
  • 1000+ different objects in 1.3 million high resolution training images
  • cornell movie dialog https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
  • More famous datasets on github - amazing public databases https://github.com/caesar0301/awesome-public-datasets
  • “Twenty Newsgroups” The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. To the best of our knowledge, it was originally collected by Ken Lang, probably for his paper “Newsweeder: Learning to filter netnews,” though he does not explicitly mention this collection. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.


Additional datasets, some famous some lesser known
  • Movie review https://grouplens.org/datasets/movielens/100k/ 100K ratings from 1000 users on 1700 movies
  • Datasets on Keras

1 comment:

  1. Machine Learning Projects for Final Year machine learning projects for final year



    Deep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.




    Python Training in Chennai
    Python Training in Chennai

    Angular Training

    ReplyDelete

Machine Learning for Beginners Resources

Uniqtech guide to Machine Learning. This guide explains the difference between machine learning, traditional programming, machine learning w...