What is Data De‑identification?

Introduction to data de‑identification

While sharing research data supports transparency and reproducibility, researchers must take careful steps to ensure data is managed ethically, responsibly, and securely. Data de‑identification involves removing or modifying identifying information to reduce the risk that individuals, communities, or animals can be re‑identified, helping protect privacy and confidentiality while complying with ethical and legal obligations. If sensitive information is not properly de‑identified and becomes exposed, it may lead to significant harm, including the unintended disclosure of identities and potential negative impacts on those represented in the research.

Terminology

In UBC terminology (Information Technology Standard U1), sensitive data is classified as medium risk, high risk, or very high risk. For the purposes of this workshop, however, we will use the term “sensitive data” to align with the terminology in Sensitive Data: Practical and Theoretical Considerations (Rod & Thompson, 2023, pp. 251-273). In this context, these terms will be treated as equivalent.

Governance of sensitive data

Who governs sensitive data? It is most often that the legalities, policies, and regulations of sensitive research data are governed on a provincial, territorial, and/or institutional level. Sometimes, there may be instances where the federal government is involved.

Indigenous research data

There are specific considerations and protocols for Indigenous research data collection, use, and sharing. Ensure that you’re engaging with Indigenous communities, collectives, and/or organizations to have your project align with Indigenous data sovereignty.

Three interdependent workshops on Data de-identification and anonymization:

In these three workshops, we will introduce the fundamentals of data de-identification. The first session covers key concepts and practical definitions, the second focuses on manual techniques for de-identifying data, and the third explores software-based approaches to data de-identification. Please note that all workshops are introductory; for more advanced guidance, refer to the links provided at the end of each session.


Need help?

Please reach out to research.data@ubc.ca for assistance with any of your research data questions.


Table of contents


View in GitHub

Loading last updated date...