Ad

Wednesday, March 8, 2017

Pandas Numpy Data Analysis Tool Kit - Udacity Machine Learning Nanodegree 00

SERIES & DATAFRAME

Basic units data structures of Pandas, data analysis using Python

Allows users to store a large amount of information and perform data analysis

Dataframe documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/dsintro.html#dataframe

A dictionary
  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • Series
  • Another DataFrame


Sample Code: 

d = {'key_name':Series([1,2,3], index=['a','b','c'])}

Analogy : Excel Spreadsheet
Will also return number of rows and columns

Pandas.Series()
Pandas.Series([],index=[])


----

More sample code:
   my_data = pd.DataFrame(data)
    print my_data.dtypes
    print ""
    print my_data.describe()
    print ""
    print my_data.head()
    print ""

    print my_data.tail()


# Retrieve columns
df[['col_name','col2_name']]
# Retrieve rows
df.loc['a']


df[df['col_name'] >= 30]

get row column counts of Pandas Dataframe
.shape
len(DataFrame.index)
.count() count each column of the entire table

No comments:

Post a Comment

K mean clustering sklearn best practice - Udacity Machine Learning Nanodegree Unsupervised Learning

There are three key k means clustering parameters in sklearn that you will need to pay attention to: Number of centroids, aka center of c...