AI News, GDPR Compliance and its Impact on Machine LearningSystems

GDPR Compliance and its Impact on Machine LearningSystems

It is “any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction.” And then we have the concepts of the “Controller“, which determinates the purpose of processing of personal data and the “Processor“, which processes personal data on behalf of the Controller.

The first right of the data subject is the “Non-discrimination Right”.  GDPR is quite explicit when it comes to processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation.

Regardless of opt-ins and the temptation to enrich data to further improve the model accuracy, there are clear lines that shouldn’t be crossed as expressed in the bestseller by Cathy O’Neil, “Weapons of Math Destruction”.  This book contains great examples of how human bias can be inherently hidden in your data and can get reinforced in the predictive models built on top of it if you are not careful e.g., even zip codes can sometimes result in racial discrimination.

As for the latter, you can easily see, filter and add new fields to your datasets or plot the correlations between various fields by using the dynamic scatterplot dataset visualization capability if you suspect certain fields may be proxies for more troublesome variables you’d rather stay away from during your modeling.  On the other hand, building an Association model can yield interesting statistically significant (association) rules that can point to built-in biases in your dataset.

By design, the BigML platform supports multiple capabilities that come to the rescue here.  From a global model perspective, each supervised learning model has a model summary report explaining which data fields had more or less impact on the model as a whole.  In addition, the visualizations for each ML resource allows the practitioner to better introspect and share the insights from her models.

We only scratched the surface on the importance and the potential impact of GDPR on the world of Machine Learning.  Hope this gives you some new thoughts on how your organization can best navigate this new regulatory climate while still being able to reach your goals.  As Machine Learning practitioners collectively experience the ups and downs of GDPR and learn the ropes, let BigML and it’s built-in qualities such as traceability, repeatability and interpretable models be an integral part of your action plan.

Transparency of machine-learning algorithms is a double-edged sword

The European Union’s General Data Protection Regulation (GDPR), which will come into force on May 25, 2018, redefines how organizations are required to handle the collection and use of EU citizens’

Legal details aside, the GDPR mandates that citizens are entitled to be given sufficient information about the automated systems used for processing their personal data in order to be able to make an informed decision as to whether to opt out from such data processing.

Yes, other citizens’ rights introduced or expanded by the GDPR, like the right to object to profiling, the right to obtain a copy of personal data gathered, or the right to be forgotten — can all be costly to comply with.

Based on data about previous loans – including their outcome, labeled as “good” or “bad” – the system learns on its own how to predict whether a new application would end up being a “good” or “bad” prospect for a loan.

The reasoning for the prediction – based on which a determination is made as to whether the applicant will or will not be able to afford to own a house, for example – lies with how a complex web of thousands of simulated neurons processes the data.

They must prevent their customers from opting out from automated processing of their personal data (to save costs and keep the business running) while preserving the illusion that the company is really respecting the customer’s right  to have a standard explanation, plus the right to have a human review should there be a contested result (so that the company can avoid those huge fines the GDPR imposes for non-compliance).

To be able to explain the reasoning behind their automated decision-making processes – and thus grant the right to explanation to their customers — companies must wait until radical improvements in understanding how machines learn.

Data Science Under GDPR: What Will Change?

The European Union (EU) is about to introduce new obligations for companies handling and analysing data about EU citizens.

The regulation aims to reinforce the data privacy rights of consumers, and to simplify the flow of information between EU countries by standardizing the data protection framework.

According to the articles 21 and 22 of GDPR, any profiling activity that “significantly affect or has a legal effect on him or her” falls under these strict regulations: GDPR officialises the activities that fall under these agreements, expanding beyond industries like social media, mobile, and credit cards that already use similar forms.

For example, pricing optimisation in insurance or retail, and personalised marketing or telecom network optimisation will require explicit consent from the customer and provide him or her the ability to opt out.

The articles 13 to 15 of GDPR stipulate that the data subject possesses the right to access “meaningful information about the logic involved, as well as the significance and the envisaged consequences” of automated decision-making systems, such as profiling.

The non-binding Recitals of GDPR, however, provide more insight to understand the regulation — profiling activities “should be subject to suitable safeguards, which should include [...] the right to obtain an explanation of the decision reached after such assessment” (Recital 71).

“Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation shall be prohibited.”

For automated profiling systems, the Recital 71 again gives more context: However, discrimination and model bias cannot be entirely eliminated by excluding sensitive data since other related factors might be present in the data.

Moreover, if your organisation uses multiple profiling models under GDPR, the various modelling teams are likely to need a standardised way to document model logic in order to be GDPR-ready for their clients.

Based on a technique called partial dependence analysis, Model X-Ray provides a visual display of the effect of a factor on the outcome of the model, after accounting for the effects of all other factors.

DataRobot uses an agnostic approach to test dozens of predictive algorithms and minimizes errors by applying guardrails that are based on data data science best practices.

In summary, if your organisation processes personal data from EU individuals, you will need to make an inventory of all the business processes and automated decision systems based on this data, and then develop a strategy for GDPR compliance.

GDPR and its impacts on machine learning applications

According to Goodman et al, much of the regulations in the GDPR “clearly aimed at perceived gaps and inconsistencies in the EU’s current approach to data protection.” This includes, for example, a clear specification of the right to be forgotten, and regulations on collecting data from EU citizens by foreign companies.

There are three major differences between the GDPR and the previous DPD: For the rest of the article we will focus on Article 22 of the GDPR regarding automated individual decision making: GDPR states in Article 22 Paragraph 4 that, decisions “which produces legal effects concerning him or her” or of similar importance shall not be based on the following categories of personal data specified in Article 9 Paragraph 1: Under minimal interpretation, using the above categories of sensitive data directly in algorithms is prohibited.

GDPR Article 22 Paragraph 3 states that a data controller “shall implement suitable measures to safeguard…at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision”, otherwise a person has “the right not to be subject to a decision based solely on automated processing” (Paragraph 1).

On the other hand, Goodman et al believe that Article 13 to 15 give a person the right to access the data that’s been collected, and the right to know the purpose of collecting it, which includes the right to receive “meaningful information about the logic (algorithm) and possible impact.” For example, Article 13 Paragraph 2 (f) states that data controllers must inform the user about the followings before collecting data: Therefore it might be worthwhile to ask to what extend can one ask for an explanation about an algorithm.

How GDPR Affects Data Science

GDPR defines profiling as: Any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular, to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements.

In general, organizations may process personal data when they can demonstrate a legitimate business purpose (such as a customer or employment relationship) that does not conflict with the consumer’s rights and freedoms.

GDPR grants consumers the right “not to be subject to a decision…which is based solely on automated processing and which provides legal effects (on the subject).”  Experts characterize this rule as a “right to an explanation.”  GDPR does not precisely define the scope of decisions covered by this section.

When organizations use automated decision-making, they must prevent discriminatory effects based on racial or ethnic origin, political opinion, religion or beliefs, trade union membership, genetic or health status or sexual orientation, or that result in measures having such an effect.

The new rules allow organizations to process personal data for specific business purposes, fulfill contractual commitments, and comply with national laws.

Consumers may not opt out of processing and profiling performed under these “safe harbors.” However, organizations may not use personal data for a purpose other than the original intent without securing additional permission from the consumer.

The principles of data protection should therefore not apply to … personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.

This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.
 The clear implication is that organizations subject to GDPR must build robust anonymization into data engineering and data science processes.

One expert in EU law argues that the requirement may force data scientists to stop using opaque techniques (such as deep learning), which can be hard to explain and interpret.

For example, regulations governing credit decisions in the United Kingdom are similar to those in the United States, where issuers must provide an explanation for adverse credit decisions based on credit bureau information.

Financial services giant Capital One considers them to be a potent weapon against hidden bias (discussed below.) But one should not conclude that GDPR will force data scientists to limit the techniques they use to train predictive models.

This rule places an extra burden of due diligence on data scientists who build predictive models, and on the procedures organizations use to approve predictive models for production.

The mandate against discriminatory outcomes means data scientists must also take steps to prevent indirect bias from proxy variables, multicollinearity or other causes.

Can the EU Lead in AI After the Arrival of the GDPR?

Although the EU is investing heavily in research in artificial intelligence (AI), the EU's General Data Protection Regulation (GDPR), which will go into effect in ...

"The Algorithm for Precision Medicine"

Presented by Matt Might, Director of the Hugh Kaul Personalized Medicine Institute, University of Alabama at Birmingham Talk Description: Powered by the ...

Introduction to Data Ethics - Brent Mittelstadt

Dr. Brent Mittelstadt is a Research Fellow at the Alan Turing Institute and University College London. His research addresses the ethics of algorithms, machine ...


Organised by Datatilsynet, the Norwegian DPA Chair: Frederik Zuiderveen Borgesius, IViR-UvA (NL) Moderator: Christian D'Cunha, EDPS (EU) Panel: Augustin ...

Transparency in algorithmic decision-making: Karen Yeung, Birmingham Law School

Karen Yeung is Interdisciplinary Professorial Fellow in Law, Ethics and Informatics at the University of Birmingham in the School of Law and the School of ...

Data Science and AI in Pharma and Healthcare (CXOTalk #275)

Data, artificial intelligence and machine learning are having a profound influence on healthcare, drug discovery, and personalized medicine. On this episode ...

How data brokers sold my identity | Madhumita Murgia | TEDxExeter

How much of our personal data is held by companies to be traded for profit? Madhumita Murgia decided to investigate and was shocked by what she found out.

GDPR Compliance - What To Know - The Sweary Guide

GDPR Compliance - What To Know - The Sweary Guide General Data Protection Regulation - GDPR - is arriving throughout Europe on May 25th 2018. But what ...

ETL 2.0 - Data Engineering for developers : Build 2018

In this session we will demonstrate how a Data Engineer can develop and orchestrate the processes of data ingestion, data transformation and data loading at ...