Ad

Tuesday, August 13, 2019

Differential Privacy and Federated Learning 101

A list of glossary, vocabulary for differential privacy, secure AI, and federated learning on sensitive dataset. All concepts are theoretical, for discussion purpose only, are NOT intended for production nor any professional usage, and should NOT be used. This is a very new experimental concept in AI.

Before you read our article please read our disclaimer page. It is very important to note that this article and all articles, content on our website and affiliated websites are for discussion / entertainment purpose only. They should NOT be considered professional advice. Content on this site, our affiliated sites and social media are NOT intended for commercial purpose; NOT for production purpose; NOT for professional usage.

What is a scenario that differential privacy is useful? When a researcher wants to analyze a sensitive dataset, such as a dataset containing patient data, and or when a research wants to make a model that learns sensitive features, and or make a sensitive prediction: i.e. if a patient has a health issue, such as HIV.

Differential Privacy

"mathematical definition of privacy. In the simplest setting, consider an algorithm that analyzes a dataset and computes statistics about it (such as the data's mean, variance, median, mode, etc.). Such an algorithm is said to be differentially private if by looking at the output, one cannot tell whether any individual's data was included in the original dataset or not. In other words, the guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset " - Harvard Differential Privacy Group

Query:
See the above quote
Mean variance medium mode
Advanced: machine learning and deep learning model.

Differentially private tools

Sensitive database

Anonymization: the old way, removing sensitivity personally identifiable information, has shown to sometimes fail. No guarantee.

Example where data thought to be anonymous, fails to protect privacy:

 Latanya Sweeney showed gender, date-of-birth, zipcode can identify many Americans. This is also the governor medical record example. This is known linkage attack.

Linkage Attack:
See above

"too many innocuous (even completely random) queries about a database inherently violates the privacy of its individual contributors. ... tradeoff between statistical utility and privacy." - Harvard Privacy Group

owner: the initial virtual owner is me
pointer API for send() and get() data among virtualworkers.

Pysyft https://github.com/OpenMined/PySyft

Deep Learning using pysyft

No comments:

Post a Comment

Machine Learning Workflow

Data cleaning Missing data Outlier Others: duplicates, typos, special characters Strategy for missing data: imputation, mean, median...