Ethics and Legal Compliance
Managing and sharing research data involves various legal, ethical, and intellectual property issues that need to be addressed and respected. These issues may include:
- the protection of personal information,
- the confidentiality of sensitive data,
- the consent of data subjects,
- the ownership of data,
- the attribution of data sources,
- the compliance with relevant laws and regulations, and more.
Researchers who manage and share their data should be aware of these issues and explain how they will follow the appropriate guidelines and policies for their field and region.
They should also ensure that they comply with any applicable privacy legislation and laws, including those imposed by their funders or institutions. These may require researchers to obtain ethical approval, inform data subjects, anonymize or encrypt data, secure data storage and transfer, respect data licenses and agreements, and report any breaches or incidents.
Table of Content
Sensitive Data
Sensitive data is data that should not be shared in the public domain without additional consideration. Our colleagues at the UBC Advanced Research Computing have an excellent guide to sensitive data. This might include trade secrets, medical information, commercial information, preliminary analysis, third-party data, and some geospatially linked data. Sensitive research data requires careful handling and protection, and often is not suitable for open sharing. However, there may be ways to share sensitive research data legally and ethically, such as anonymizing, aggregating, or restricting access to the data.
In order to ensure you are handling data in an ethical manner, you should:
- evaluate the anonymity of your data
- obtain a confidential review (from a data repository admin)
- comply with institutional regulations (e.g. those of your institution’s research ethics board)
- comply with other regulations (e.g. HIPAA, BREB)
- have informed consent for data sharing
- restrict use of confidential data
De-identification
Sensitive data contain information that could reveal the identity or harm the interests of the people or entities involved in the research. To protect the privacy and confidentiality of the research subjects, researchers can use de-identification techniques.
De-identification is the process of removing or modifying any information that could be used to identify someone or something in a dataset. By doing this, researchers can share their data without disclosing sensitive information. However, de-identification is not a simple or foolproof solution. There is always a possibility that someone could re-identify the data by using other sources of information or advanced technology. Therefore, researchers need to be aware of the risks and challenges of de-identification and manage them accordingly.
There are different methods of de-identification, each with its own advantages and disadvantages.
Method of de-identification | Description | Pros | Cons |
---|---|---|---|
Anonymization | the most strict form where all identifying information is removed from the dataset and cannot be restored. | ensures a high level of privacy protection | may reduce the usefulness and quality of the data |
Pseudonymization | identifying information is replaced with artificial identifiers, such as codes or numbers | allows the data to be linked across different sources/datasets or over time | increases the risk of re-identification if the codes are exposed or cracked |
Aggregation | individual data points are grouped together into categories or ranges | preserves some statistical properties and patterns | reduces the level of detail and variability in the data |
Masking | identifying information is hidden or obscured by using techniques such as encryption, hashing, blurring, or noise addition | makes the data harder to read or interpret | introduces errors or distortions in the data |
Generalization | identifying information is replaced with more general or vague terms. For example, dates can be replaced with years, addresses can be replaced with regions, or names can be replaced with initials | preserves some semantic meaning and context | makes the data less specific and more ambiguous |
Risk of Re-identification
No matter what de-identification methods you choose to use, there is always a chance that someone could re-identify the data by using other sources of information or advanced technology. Therefore, researchers need to be aware of these risks and manage them accordingly.
Example: Anonymization
Consider this dataset that contains some identifiers:
Name | Address | Postal code | Year of birth | Gender | Occupation | Salary |
---|---|---|---|---|---|---|
Sally Xi | 123 City Roadway, Vancouver, BC | V5V 1P2 | 1970 | Female | Manager | 90,000 |
Sam Cooper | 4576 Town Way, Smalltown, BC | V8A 1A5 | 1982 | Male | Machinist | 65,000 |
An anonymized version of that dataset might look like this:
Postal code | Year of birth | Gender | Occupation | Salary |
---|---|---|---|---|
V5V 1P2 | 1970 | Female | Manager | 90,000 |
V8A 1A5 | 1982 | Male | Machinist | 65,000 |
In some cases, this might be enough to ensure that the data is not re-identified. However, the anonymized data may be easily re-identified in this case. For example, if there are not many machinists in the V8A 1A5 postal code, there is a strong risk of re-identification for the data related to Sam Cooper.
Reflection
What method(s) would you use to protect the sensitive information of the individuals?
Example: Pseudonymization
Data pseudonymization can preserve the linkability and utility of the data. Linkability means that the data can be connected to the same individual or entity across different datasets or over time. This can make the data more valuable for analysis and research, but it can also increase the risk of re-identification. Therefore, researchers need to assess the risk of re-identification and balance it with the benefit of data linkage.
On the other hand, data anonymization removes any information that can directly or indirectly identify an individual or an entity in a dataset. This means that the data cannot be linked to the original source or to other datasets.
Name | Anonymized | Pseudonymized |
---|---|---|
Sally Xi | ANON | P12L25 |
Sam Cooper | ANON | P38Q27 |
Sunil Gupta | ANON | P59M16 |
Sam Cooper | ANON | P38Q27 |
Sally Xi | ANON | P12L25 |
De-identification Tools
Researchers are increasingly using algorithm-based tools to help anonymize their data and manage the risk of re-identifying their anonymized data. An example of an anonymization tool would be:
FIPPA
The FIPPA, British Columbia Freedom of Information and Protection of Privacy Act, is provincial legislation that
- make public bodies more open and accountable by providing the public with a right of access to records;
- protect personal information from unauthorized collection, use, or disclosure by public bodies.
When it comes to choosing storage resources, researchers at UBC have a range of FIPPA-compliant options:
Eligibility | FIPPA-Compliant Storage Resources |
---|---|
Faculty & Staff | OneDrive, Home Drives, TeamShare, Chinook, EduCloud |
Students | OneDrive |
TCPS 2 (2022)
The Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans – TCPS 2 (2022) is a policy document that provides ethical guidelines and principles for conducting research involving human participants in Canada.
It was developed by the three federal research funding agencies: the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC). The policy applies to all research involving human participants that is funded by these agencies or conducted under the auspices of institutions that receive agency funding.
Our colleagues at the UBC Office of Research Ethics oversee the Behavioural and Clinical Research Ethics Boards, and are associated with the Ethics Board at UBC’s Okanagan campus, as well as UBC’s affiliated teaching hospitals. They provide outstanding support for UBC researchers and students and would be delighted to answer your specific questions.
Reflection
Does your data comply with FIPPA requirements?
Does your data contain any sensitive or confidential information?
Does your data include any personal identifiers?
Reach out to research.data@ubc.ca or arc.support@ubc.ca to discuss sensitive data sharing.
Need help?
Please reach out to research.data@ubc.ca
for assistance with any of your research data questions.