Data Privacy and Protection in the Age of AI

Introduction

Data has become key in the rapid advancement of artificial intelligence (AI).  Data is the very foundation upon which AI systems are built.  As AI systems continue to expand across various sectors, the demand for data among industry, developers and other creators is only intensifying. [1]

This data-driven approach poses significant risks to both individual and societal privacy. While Generative AI offers exciting potential, it also raises concerns about the ethical sourcing and usage and protection of data. To use a basic example, the widespread, now normalized practice of data scraping, highlights the potential for privacy violations, as personal information can be collected without consent and used for training AI models. [2]

Where Predictive AI have always been data-dependent for their training and development, Generative AI systems have massively increased the volume of data that is required for model training, most of which cannot be opted out of. [3]

Given this rapid technological advance, there is debate on whether there are sufficient data protections regarding these AI systems. Existing privacy regulations, such as the Fair Information Practices (FIPs) [4], while essential, seem to have fallen short in adequately addressing the challenges posed by AI. Primarily focused on individual data rights, they often operate reactively, addressing privacy violations after data collection has occurred. This approach, known as "privacy self-management," places a significant burden on individuals to navigate complex data ecosystems and protect their own privacy.  Moreover, the FIPs do not adequately consider societal-level privacy risks, which can arise from widespread data collection and the potential for misuse by both private companies and governments. The need for a more proactive and robust approach to data protection is evident, particularly as AI systems increasingly influence critical areas like employment, healthcare, humanitarian aid and law enforcement. 

The specific domain of Data protection and privacy within the practice of AI governance is necessary to tackle these urgent questions. Such work requires a combination of principles, policies, understanding of current standards and laws, and industry best practices when it comes to the design, development and use of AI tools.

In this research piece, we will selectively examine

  1. What is Data Privacy and Data Protection

  2. A brief look at the regulatory landscape from two perspectives

  3. Risks that Generative AI poses to data protection and privacy  in humanitarian sector

  4. Brief look at Signpost AI Efforts

Data Privacy and Data Protection

Data Privacy and Protection are often used interchangeably and while they are related and have some overlap, they differ in significant ways.

Data Privacy:

Data privacy revolves around the question of who has authorized access to handle one’s information (collection, processing , sharing, etc.) and the extent to which one can control this access (e.g. opting out of data collection). This term refers not just personal data but to any kind of data that, if accessed by others, would be seen a violation one’s personal autonomy [5]

Privacy has often been understood as control over one’s own information but given the scale and magnitude of the loss of control faced by many today, challenges this notion. Current frameworks and privacy regulations, however, still appear to operate on this principle of personal control.

The contextually contingent (e.g. sharing one’s location data to friends might be okay but that same data being collected by a company for advertising violates privacy) and relational (e.g. data is social and can appear in shared social media posts) nature of data further challenges the idea of privacy as personal control. [6] 


Data Protection:

Data Protection involves safeguarding personal information using procedural rights. It requires that data is handled equitably for specific purposes and collected on sound bases. Consent is the strictest basis and can allow people to withdraw it after the fact, whereas legitimate interest provides the most flexibility in terms of allowing entities to justify data processing on the necessity of this data on their activities [7]. 

Entities processing data must be respectful of individuals’ fundamental protection rights (e.g. providing notice upon collection of data, providing access to said collected data and the means to modify, delete or correct it. There is however a predilection for the assumption that acceptance is baked by default.

In the EU, there is a formal distinction between privacy and data protection in its European Charter of Fundamental Rights [8]. The two concepts often overlap and complement each other:

For example, when data is not considered personal falling outside of data protection rules (e.g. anonymized body scan information), privacy rights still come into play, given such information could still affect the person’s individual being. On the other hand, data protection regulations restrict and limit processing and handling of personal information, even in situations where privacy does not seem to be compromised.

Two Regulatory Frameworks: FIPs and GDPR

Fair Information Practices (FIPs)

Fair Information Practices (FIPs) is a more than 50 year old set of principles which provides the framework for giving individuals due process for their personal information. [9] The FIPs , as part of the US federal code introduced five safeguard requirements regarding personal privacy as a means of ensuring “informational due process.” [10] FIPs provide individuals the ability to know about, stop alternative use and correct information about themselves. 

FIPs do not frame privacy as fundamental human rights as with the United Nations Universal Declaration of Human Rights  [11] and the European Charter of Fundamental Rights. [12] Instead they outline rules and obligations between individuals and the data processor. This framing is based on the assumption that the modern state require record keeping (and data collection) for its administration and working.

This initial framing was modernized through the OECD in 1980 and amended in 2013, expanding into eight principles [13]. The principles are:

  • Personal data collection should have limits; it must be gathered through lawful and fair methods and, when applicable, with the knowledge or consent of the individual concerned

  • Personal data should be pertinent to its intended purposes, and it must be accurate, complete, and regularly updated

  • The purposes for which personal data is collected should be specified at the time of data collection and the subsequent use limited to the fulfillment of those purposes

  • Personal data should not be disclosed, made available, or used for purposes other than those specified under the Purpose Specification principle, except in the following cases: (a) with the consent of the data subject; or (b) as permitted by law

  • Personal data should be safeguarded by appropriate security measures to protect against risks such as loss, unauthorized access, destruction, use, modification, or disclosure

  • There should be a general policy of transparency regarding developments, practices, and policies related to personal data, including the primary purposes for its use and the identity of the data controller

  • An individual should have the right: (a) to obtain from a data controller the data the controller has about them (b) to challenge data relating to them and, if the challenge is successful to have the data erased, rectified, completed or amended

  • A data controller should be accountable for complying with measures which give effect to the principles stated above [14]

Despite being conceived long before the digital and information age, key components like collection limitation and purpose specification continue to impact today’s AI systems by limiting how broadly companies can repurpose collected data for one purpose to train or develop new AI systems.

The EU’s General Data Protection Regulation (GDPR) relies heavily on these principles, which we will look at now. 


General Data Protection Regulation (GDPR)

GDPR is a comprehensive data privacy law which updated the 1995 Data Protection Directive and consolidated national data privacy regimes across EU member states. It was passed in 2016, and became enforceable in May 2018. It grants individuals or “data subjects” rights regarding the processing of their personal data, such as the right to be informed and a limited right to be forgotten, and guides how businesses can process personal information.  [15]

It contains provisions which apply directly to AI systems even if the term artificial intelligence is not used. For example, Article 22 provides protections to individuals against decisions “based solely on automated processing” of personal data without human intervention also called automated decision making (ADM). [16] This enshrines individual rights not to be subjected to ADM where these decisions could produce adverse legal or significant effects on individuals. 

GDPR’s articles on “Data Minimization”, “Purpose Limitation” and “Consent” are especially relevant to data protection principles in the case of AI systems.

  • Article 5 of the GDPR on collected data states that it is “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed.” [17] This principle prescribes proportionality: entities should not blindly collect as much data as they want, particularly outside of the context they have provided for collection. 

  • Data “collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes.” [18] This underlines the significance of context and sets rules for re-using collected data for a different context

  • This is defined in Article 7 and Recital 32; consent must be “given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement.” [19] Consent is required for all processing, including multiple purposes: “consent should not be regarded as freely given if the data subject has no genuine or free choice or is unable to refuse or withdraw consent without detriment.” [20]

The GDPR also places transparency obligations where notices must be given to individuals when their personal information is processed and establishes rules granting individuals the right to access their own information and ensure that processing of their data is accurate. These principles are meant to curb unfettered data mining and processing that is commonplace in data-intensive AI systems. [21]

Data Privacy and Protection Risks Related to AI in Humanitarian Contexts

The harms to individual autonomy, including the inability to make informed choices, the difficulty in correcting data, and a general lack of control over the collection and usage of personal information, are significant concerns. These issues are pertinent to AI-based systems, much like they were to the technological advancements witnessed during the past three decades of internet expansion.

Although these harms existed prior to the integration of AI into the consumer sector, commercial AI systems are likely to perpetuate and exacerbate these issues, while also introducing new forms of harms, such as identity-based risks, data aggregation and inference risks, personality and emotional state inferences, and exposure of previously unavailable or redacted sensitive personal information, mis-identification and defamation. [22]

Moreover, the privacy risks and harms associated with AI systems extend beyond the individual level; they pose threats to groups and society as a whole in ways that cannot be effectively addressed through the exercise of individual data rights. An simple example is that relational inferences where even people people whose data is not included in training datasets are impacted. [23] [24]

These larger issues apply to the humanitarian context in a more specific manner. There are attempts to identify and recognize use-cases where AI can be deployed in the humanitarian sector. However, it is important to address existing algorithmic bias and data privacy related risks as a priority before any such deployments.  

There are sector-specific concerns relating to AI, including surveillance humanitarianism [25], an over-eager techno-solutionism [26] and the potential rise of techno-colonialism. [27] 

AI technologies have the potential to support humanitarian missions in three main dimensions: Preparedness, Response, Recovery. Refer to [28] for case study examples Yet AI comes at the expense of risks such as the following :

  • Data Quality: Input data may have poor quality (e.g. “Garbage In, Garbage out” [29]), difficult to procure due to constrained or low-resourced digital environments [30], and datasets may be historically biased  [31]

  • Algorithmic Bias: AI systems may have assumptions, biases of humans baked into them [32], biases may also present themselves as a result of poorly representative data sets used for training [33]

  • Security Risks: Generative AI systems create known and unknown security risks related to data security and theft. The use of such systems elevate risk of data breaches, identity theft, data leaks and inadvertent disclosure of sensitive information on clients as well as staff [34]

Another risk here is that to Data Privacy. As shown, international and regional human rights law instruments recognize the right to privacy and have frameworks which are meant to regulate that. In the humanitarian context, consent, for example, may not be completely unambiguous and freely given due to the inherent imbalance between humanitarian organizations and those who access their assistance and services. Refusal to consent to the collection and processing of personal data will essentially result in the denial of humanitarian assistance. Additionally, humanitarian actors may face challenges in ensuring that recipients of such assistance fully comprehend the implications of consent, given the presence of linguistic barriers and the complexities associated with administrative and institutional frameworks. [35]

Furthermore, fully informed, specific and unambiguous consent may also be challenging in the case of Generative AI, as such data goes into the black box of Large Language models (LLMs). Even if anonymized, such growing collection of data may inadvertently lead to surveillance humanitarian mentioned above and inadvertently increase the vulnerability of those in need. [36]

Finally, data protection becomes crucial in collaborations between humanitarian organizations and technology companies where hasty and under-planned deployments might lead to neglecting the needs and experiences of their users. 

Accordingly, it is essential that humanitarian organizations create clear guidelines for implementing AI in the humanitarian context specifically in the realm of data privacy and protections. It is also critical for humanitarians to develop, and operationalize strategies data protection and privacy principles. This will require combining in Data Protection guardrails what is already applicable legally and regulatory frameworks-wise with AI risk anticipatory actions safeguarding fundamental humanitarian principles. 

A Brief Look at Signpost Efforts

Signpost AI is in the process of developing strategies and guardrails which uphold the highest standards of data privacy, security and protection as it works on its Signpost AI chatbot  [37]. This chatbot needs data in order to personalize responses to users and enhance service provision to them. 

As we develop data privacy and protection guardrails, and policies, Signpost AI is deeply thinking about, not only, its approach Data Minimization, Purpose Limitation, Consent and Transparency as it relates to AI chatbot development but building upon that. As we work on this effort, we can offer some high level snapshot of our work in progress:

  • We have outlined in our documentation what data is collected, how it is secured (data minimization), the rationale and purpose for data collection, retention policies (purpose limitation) as well how requests for data deletion right to be forgotten are processed

  • We specify the the purpose for this data collection; to develop accurate and more efficient AI chatbot which can minimize user request overload especially during crisis and disasters

  • We have policies on data retention on cloud storage and customer service platforms that we use. Additionally, access to private data on all platforms is controlled based on user roles ensuring that access is permission-based and strictly for those who are authorized

  • We prioritize transparency by having clear privacy policies and cookie notices on all our program websites; this ensures that users understand the terms and conditions of interacting with Signpost AI services. Data during the course of interactions remains secure and is not shared with third parties unless the service operates on a third-party platform. In such a case, Signpost has strict data sharing agreements in place to safeguard user privacy

  • Third-party products within the program environment undergo due diligence to prioritize user well-being

  • Signpost AI prioritizes transparency throughout our data processing practices

  • Finally, Signpost AI, supported by Red and Quality Teams work together to safeguard user privacy:

    • The Red Team, for example, identifies and mitigates security vulnerabilities, and potential for discrimination during AI interactions [38]

    • The Quality Team prioritizes user well being by ensuring that AI interactions are trauma-informed, client-centered, safe and expectations-managed. [39]

Looking at AI privacy issues from a multifaceted approach and attempting to expand Data Privacy and Protection principles which take technological realities into consideration, we are also implementing privacy by design principles [40]:

  1. Proactive not Reactive; Preventative not Remedial

  2. Privacy as the Default Setting

  3. Privacy Embedded into Design

  4. Full Functionality – Positive-Sum, not Zero-Sum

  5. End-to-End Security – Full Lifecycle Protection

  6. Visibility and Transparency – Keep it Open

  7. Respect for User Privacy – Keep it User-Centric

This is a partial view of our efforts. A more detailed look at our Data Privacy and Protection actions and guardrails and how we are implementing Privacy by Design is forthcoming. 

Combining adherence to these principles alongside what we are already doing will empower our work in creating a humanitarian AI chatbot which prioritizes and centers data privacy and protection, from conception to deployment to beyond.







REFERENCES

  1. Rethinking Privacy in the AI Era

  2. Metz, Cade, Cecilia Kang, Sheera Frenkel, Stuart A. Thompson, and Nico Grant. 2024. “How Tech Giants Cut Corners to Harvest Data for A.I.” The New York Times, April 6.

  3. AI Is Probably Using Your Images and It's Not Easy to Opt Out

  4. https://www.fpc.gov/resources/fipps/

  5. Rethinking Privacy in the AI Era

  6. Ibid.

  7. Raphael Gellert and Serge Gutwirth, “The legal construction of privacy and data protection,” Computer Law & Security Review 29(5), October 2013, https://doi.org/10.1016/j.clsr.2013.07.005, 522-530.

  8. EU Charter of Fundamental Rights - European Commission

  9. Rethinking Privacy in the AI Era

  10. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2466418

  11. Universal Declaration of Human Rights | United Nations

  12. EU Charter of Fundamental Rights - European Commission

  13. Fair Information Practice Principles (FIPPS) Factsheet

  14. Ibid.

  15. What is GDPR, the EU’s new data protection law?

  16. GDPR

  17. Ibid.

  18. Ibid.

  19. Ibid.

  20. Ibid.

  21. White Paper Rethinking Privacy in the AI Era

  22. [2310.07879] Deepfakes, Phrenology, Surveillance, and More! A Taxonomy of AI Privacy Risks

  23. White Paper Rethinking Privacy in the AI Era

  24. "A Taxonomy of Privacy" by Daniel J. Solove

  25. Keren Weitzberg, Margie Cheesman, Aaron Martin and Emrys Schoemaker, “Between Surveillance and Recognition: Rethinking Digital Identity in Aid”, Big Data & Society, Vol. 8, No. 1, 2021.

  26. Duffield, M. (2016). The resilience of the ruins: towards a critique of digital  humanitarianism. Resilience, 4(3), 147–165. https://doi-org.libproxy.newschool.edu/10.1080/21693293.2016.1153772

  27. Technocolonialism: Digital Innovation and Data Practices in the Humanitarian Response to Refugee Crises

  28. Harnessing the potential of artificial intelligence for humanitarian action: Opportunities and risks | International Review of the Red Cross

  29. What is garbage in, garbage out (GIGO) ? | Definition from TechTarget

  30. Christopher Kuner and Massimo Marelli, Handbook on Data Protection in Humanitarian Action, 2nd ed., ICRC, Geneva, 2020, p. 39; OCHA, above note 15, p. 10; ICRC, The Engine Room and Block Party, above note 52, p. 32.

  31. Andrew Ferguson, The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement, New York University Press, New York, 2017

  32. James Zou and Londa Schiebinger, “AI Can Be Sexist and Racist — It's Time to Make It Fair”, Nature, Vol. 559, 2018

  33. Ibid.

  34. Generative AI for Humanitarians

  35. Harnessing the potential of artificial intelligence for humanitarian action: Opportunities and risks | International Review of the Red Cross

  36. Cashless cash: financial inclusion or surveillance humanitarianism? - Humanitarian Law & Policy Blog

  37. Signpostai

  38. Signpost AI Red Team: Metrics, Scope, and Workflows — signpostai

  39. https://www.signpostai.org/blog/xy08iss5ax1ipjohs4u04s79joxk6d

  40. https://www.datagrail.io/blog/data-privacy/privacy-by-design/





Previous
Previous

Mapping AI Design Principles

Next
Next

First Steps in AI Literacy: Training Moderators