- Not having strong recommendations letters
- Without strong recommendation letters, nothing works. It's important to build that network out in the long strong: peers and supervisors in a work or academic setting is especially important
- Not having a strong personal statement
- I made the mistake of focusing on my technical skills. There are always people who are better than me. I should have had a narrative, a bigger picture. Higher performance.
- Even writing papers they require you to write about why the research project matters to me personally.
Wednesday, January 22, 2020
After transitioning to machine learning, I had to learn it the hard way that it is now very competitive and difficult to get into graduate school. Engineering departments are crammed. Here are some difficult lessons I have learned. I am not admissions, I don't know what's the best, but I know what didn't work.
How AWS Describes SageMaker:
"Amazon SageMaker provides a fully managed service for data science and machine learning workflows. One of the most important capabilities of Amazon SageMaker is its ability to run fully managed training jobs to train machine learning models." Source 1
The Estimator Object
AWS SageMaker instance types
Note AWS Sagemaker instances are now separated from EC2 instances, and can differ by region. It is has accelerated computing options more commonly known as GPUs such as ml.p2.xlarge.
See the full list of AWS Sagemaker instances here Source 2
There's a comprehensive table of instance type, vCPU count, GPU, Mem (GiB), GPU Mem (GiB), a simple description of Network Performance.
Optimization Bring your data to AWS
Optimization Bring your data to AWS
Previously all file has to be stored in S3 now you can use Amazon's distributed systems.
"Training machine learning models requires providing the training datasets to the training job. Until now, when using Amazon S3 as the training datasource in File input mode, all training data had to be downloaded from Amazon S3 to the EBS volumes attached to the training instances at the start of the training job. A distributed file system such as Amazon FSx for Lustre or EFS can speed up machine learning training by eliminating the need for this download step."
Amazon FSx for Lustre or Amazon Elastic File System (EFS) Source 1
Sunday, December 29, 2019
- Specify Tensorflow version in Google Colab `%tensorflow_version 2.x`. It is not recommended to use pip install in Google Colab: quote "We recommend against using pip install to specify a particular TensorFlow version for GPU backends. Colab builds TensorFlow from source to ensure compatibility with our fleet of GPUs. Versions of TensorFlow fetched from PyPI by pip may suffer from performance problems or may not work at all."
- Check tensorflow version after installing `import tensorflow` `print(tensorflow.__version__)`
- TPU for Tensorflow 2.0 is not yet available. "TPUs are not fully supported in Tensorflow 2.0. We expect they will be supported in Tensorflow 2.1. Follow along on GitHub."
Tuesday, December 10, 2019
- pandas.DataFrame.shape -- > (row_count, col_count)
- pandas.DataFrame.shape --> number of records, number of samples in the dataset
- my_dataframe['my_series_name'].unique() --> returns a unique values of a column, "radio button choices"
- len(my_dataframe['my_series_name'].unique()) --> number of unique values
- import os os.listdir('name_of_directory_or_just_use_.') --> list the files in the current directory '.' os.listdir('.') or a specific directory with a name
- import os len(os.listdir('.') ) --> returns the number of files in the current directory
- my_dataframe.groupby(['col_1', 'col_2']) --> groupby column 1 first then groupby column 2
- Converting a Pandas GroupBy output from Series to DataFrame: .groupby() returns a groupby object with MultiIndex instead of a dataframe with a single index. it is also known as a hierarchical index. Will need to rename columns and reset index my_groupby.add_suffix('_Count').reset_index() or call the .size().reset_index() important to note that .size() is called on the groupby object not the usual dataframe. pandas.core.groupby.GroupBy.size calculates : Series Number of rows in each group
- group = ['col_1', 'col_2']; my_df.groupby(group).size().reset_index(name="colum_name")
- df = df[(df.col_name < 1) & (df.col_name_2 < 1)] complex condition query / filter in dataframe
- .value_count df.column.value_count()
- pandas cheatsheet
- df.fillna(0) #fill the dataframe with zero the entire table
- NYU Technology Management
- Information theory, information management
After transitioning to machine learning, I had to learn it the hard way that it is now very competitive and difficult to get into graduate s...
This review is updated continuously throughout the program. Yay I just joined the Udacity Nanodegree for Digital Marketing! I am such an Uda...
In this downtown startup work space design the designers used fat boy bean bags and an extra wide step tiered staircase to create work space...
The bogus request from P2PU to hunt for HTML tags in real life has yielded a lot of good thoughts. My first impression was that this is stup...