- Specify Tensorflow version in Google Colab `%tensorflow_version 2.x`. It is not recommended to use pip install in Google Colab: quote "We recommend against using pip install to specify a particular TensorFlow version for GPU backends. Colab builds TensorFlow from source to ensure compatibility with our fleet of GPUs. Versions of TensorFlow fetched from PyPI by pip may suffer from performance problems or may not work at all."
- Check tensorflow version after installing `import tensorflow` `print(tensorflow.__version__)`
- TPU for Tensorflow 2.0 is not yet available. "TPUs are not fully supported in Tensorflow 2.0. We expect they will be supported in Tensorflow 2.1. Follow along on GitHub."
- tf.layers from Tensorflow 1.0 is going away in Tensorflow 2.0
- Previously use keras pip install keras
- Currently in Tensorflow 2.0 use keras : from tensorflow import keras
Sunday, December 29, 2019
Tuesday, December 10, 2019
- pandas.DataFrame.shape -- > (row_count, col_count)
- pandas.DataFrame.shape --> number of records, number of samples in the dataset
- my_dataframe['my_series_name'].unique() --> returns a unique values of a column, "radio button choices"
- dataframe.describe() --> returns summary data
- len(my_dataframe['my_series_name'].unique()) --> number of unique values
- import os os.listdir('name_of_directory_or_just_use_.') --> list the files in the current directory '.' os.listdir('.') or a specific directory with a name
- import os len(os.listdir('.') ) --> returns the number of files in the current directory
- my_dataframe.groupby(['col_1', 'col_2']) --> groupby column 1 first then groupby column 2
- Converting a Pandas GroupBy output from Series to DataFrame: .groupby() returns a groupby object with MultiIndex instead of a dataframe with a single index. it is also known as a hierarchical index. Will need to rename columns and reset index my_groupby.add_suffix('_Count').reset_index() or call the .size().reset_index() important to note that .size() is called on the groupby object not the usual dataframe. pandas.core.groupby.GroupBy.size calculates : Series Number of rows in each group
- group = ['col_1', 'col_2']; my_df.groupby(group).size().reset_index(name="colum_name")
- df = df[(df.col_name < 1) & (df.col_name_2 < 1)] complex condition query / filter in dataframe
- pd = pd.query('col_name != "my_value"')
- .value_count df.column.value_count()
- pandas cheatsheet
- Another way to use unique pd.unique(df.col_name)
- df.fillna(0) #fill the dataframe with zero the entire table
- df.reset_index(drop=True, inplace = True)
- remove target column or any column data.drop(['target'], axis = 1, inplace = True)
- NYU Technology Management
- Information theory, information management
Sunday, December 8, 2019
How does machine learning differ from procedural programming aka traditional programming? In traditional programming, we must specify the step-by-step line-by-line code and in some cases control flows and logic. Generally we need to tell the program exactly what to do. In machine learning, we choose the right algorithm and supply the training data to train and tune the algorithm, turn it into a model that can be used for prediction. Often more data points is better.
Another way to put it in traditional programming, we have to tell the computer what exactly the formula, function is, how does it calculate the output. For machine learning, we give the algorithm many examples so that it can approximate what is the formula or function.
Loss functions: There are many viable loss functions, each has strengths and weaknesses. Like everything else in machine learning, the choice is often a trade-off. Loss functions measure how good our model is at making prediction on input data.
Gradient Descent: often machine learning models use gradient descent to figure out the best or max direction of changes needed to update weights and parameters so that the loss can be decreased.
Some data is readily available as mentioned above. There is also data that is expensive and hard-to-collect such as financial and health data. Some data can be easily obtained such as image data. It estimated that 95 million photos are shared on Instagram each day.
Labeled Data Unlabeled Data
Supervised vs Unsupervised Learning
One question to ask is: Is the data labeled or not labeled? Supervised learning requires labeled data. A cat, a dog, there should be no overlap among the categories. Supervised learning can be regression as well. Unsupervised learning finds natural grouping among the data points, which do not have labels. The number of centers aka centroids is a hyperparameter that needs to be tuned and decided.
GPUInvented by NVIDIA has parallel processing power, in contrast with CPU which is usually single core or duo core (if GPU is a multi-lane highway, CPU only has maximum two or four lanes). According to NVIDIA David Shapiro the fast standard of art GPU can have up to 5000 lanes of compute "highway traffic", simultaneously.
Sunday, November 10, 2019
- Data cleaning
- Missing data
- Others: duplicates, typos, special characters
- Feature engineering
- Strategy for missing data: imputation, mean, median, np.nan, unknown
- Outlier: visualize, demo of linear regression change with outlier, IQR
- Curse of dimensionality: count of columns aka features vs count of rows,
- Data transformation:
- Categorical, one hot encoding, machine readable, ordinal versus independent
- Skewed data
- Class imbalance
- Feature engineering
- Rank transformation
- One hot encoding: a categorical column of three potential values: married, single, divorced will become three separate columns of 1, 0
Core Data structures
- Pytorch tensors
- Tensorflow tensors
- Numpy ndarray
- Pandas dataframe and series
Can hack schools solve Silicon Valley's talent crunch? The truth about coding bootcamps and the students left behind http://t.co/xXNfqN...
In this downtown startup work space design the designers used fat boy bean bags and an extra wide step tiered staircase to create work space...
The bogus request from P2PU to hunt for HTML tags in real life has yielded a lot of good thoughts. My first impression was that this is stup...