Ethics and Legal Compliance

Managing and sharing research data involves various legal, ethical, and intellectual property issues that need to be addressed and respected. These issues may include:

the protection of personal information,
the confidentiality of sensitive data,
the consent of data subjects,
the ownership of data,
the attribution of data sources,
the compliance with relevant laws and regulations, and more.

Researchers who manage and share their data should be aware of these issues and explain how they will follow the appropriate guidelines and policies for their field and region.

They should also ensure that they comply with any applicable privacy legislation and laws, including those imposed by their funders or institutions. These may require researchers to obtain ethical approval, inform data subjects, anonymize or encrypt data, secure data storage and transfer, respect data licenses and agreements, and report any breaches or incidents.

Table of Content

Sensitive Data
De-identification
- Risk of Re-identification
  - Example: Anonymization
  - Example: Pseudonymization
- De-identification Tools
FIPPA
TCPS 2 (2022)

Sensitive Data

Sensitive data is data that should not be shared in the public domain without additional consideration. Our colleagues at the UBC Advanced Research Computing have an excellent guide to sensitive data. This might include trade secrets, medical information, commercial information, preliminary analysis, third-party data, and some geospatially linked data. Sensitive research data requires careful handling and protection, and often is not suitable for open sharing. However, there may be ways to share sensitive research data legally and ethically, such as anonymizing, aggregating, or restricting access to the data.

In order to ensure you are handling data in an ethical manner, you should:

evaluate the anonymity of your data
obtain a confidential review (from a data repository admin)
comply with institutional regulations (e.g. those of your institution’s research ethics board)
comply with other regulations (e.g. HIPAA, BREB)
have informed consent for data sharing
restrict use of confidential data

De-identification

Sensitive data contain information that could reveal the identity or harm the interests of the people or entities involved in the research. To protect the privacy and confidentiality of the research subjects, researchers can use de-identification techniques.

De-identification is the process of removing or modifying any information that could be used to identify someone or something in a dataset. By doing this, researchers can share their data without disclosing sensitive information. However, de-identification is not a simple or foolproof solution. There is always a possibility that someone could re-identify the data by using other sources of information or advanced technology. Therefore, researchers need to be aware of the risks and challenges of de-identification and manage them accordingly.

There are different methods of de-identification, each with its own advantages and disadvantages.

Method of de-identification	Description	Pros	Cons
Anonymization	the most strict form where all identifying information is removed from the dataset and cannot be restored.	ensures a high level of privacy protection	may reduce the usefulness and quality of the data
Pseudonymization	identifying information is replaced with artificial identifiers, such as codes or numbers	allows the data to be linked across different sources/datasets or over time	increases the risk of re-identification if the codes are exposed or cracked
Aggregation	individual data points are grouped together into categories or ranges	preserves some statistical properties and patterns	reduces the level of detail and variability in the data
Masking	identifying information is hidden or obscured by using techniques such as encryption, hashing, blurring, or noise addition	makes the data harder to read or interpret	introduces errors or distortions in the data
Generalization	identifying information is replaced with more general or vague terms. For example, dates can be replaced with years, addresses can be replaced with regions, or names can be replaced with initials	preserves some semantic meaning and context	makes the data less specific and more ambiguous

Risk of Re-identification

No matter what de-identification methods you choose to use, there is always a chance that someone could re-identify the data by using other sources of information or advanced technology. Therefore, researchers need to be aware of these risks and manage them accordingly.

Example: Anonymization

Consider this dataset that contains some identifiers:

Name	Address	Postal code	Year of birth	Gender	Occupation	Salary
Sally Xi	123 City Roadway, Vancouver, BC	V5V 1P2	1970	Female	Manager	90,000
Sam Cooper	4576 Town Way, Smalltown, BC	V8A 1A5	1982	Male	Machinist	65,000

An anonymized version of that dataset might look like this:

Postal code	Year of birth	Gender	Occupation	Salary
V5V 1P2	1970	Female	Manager	90,000
V8A 1A5	1982	Male	Machinist	65,000

In some cases, this might be enough to ensure that the data is not re-identified. However, the anonymized data may be easily re-identified in this case. For example, if there are not many machinists in the V8A 1A5 postal code, there is a strong risk of re-identification for the data related to Sam Cooper.

Reflection

What method(s) would you use to protect the sensitive information of the individuals? 

Example: Pseudonymization

Data pseudonymization can preserve the linkability and utility of the data. Linkability means that the data can be connected to the same individual or entity across different datasets or over time. This can make the data more valuable for analysis and research, but it can also increase the risk of re-identification. Therefore, researchers need to assess the risk of re-identification and balance it with the benefit of data linkage.

On the other hand, data anonymization removes any information that can directly or indirectly identify an individual or an entity in a dataset. This means that the data cannot be linked to the original source or to other datasets.

Name	Anonymized	Pseudonymized
Sally Xi	ANON	P12L25
Sam Cooper	ANON	P38Q27
Sunil Gupta	ANON	P59M16
Sam Cooper	ANON	P38Q27
Sally Xi	ANON	P12L25

De-identification Tools

Researchers are increasingly using algorithm-based tools to help anonymize their data and manage the risk of re-identifying their anonymized data. An example of an anonymization tool would be:

ARX open source data anonymization software

FIPPA

The FIPPA, British Columbia Freedom of Information and Protection of Privacy Act, is provincial legislation that

make public bodies more open and accountable by providing the public with a right of access to records;
protect personal information from unauthorized collection, use, or disclosure by public bodies.

When it comes to choosing storage resources, researchers at UBC have a range of FIPPA-compliant options:

Eligibility	FIPPA-Compliant Storage Resources
Faculty & Staff	OneDrive, Home Drives, TeamShare, Chinook, EduCloud
Students	OneDrive

TCPS 2 (2022)

The Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans – TCPS 2 (2022) is a policy document that provides ethical guidelines and principles for conducting research involving human participants in Canada.

It was developed by the three federal research funding agencies: the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC). The policy applies to all research involving human participants that is funded by these agencies or conducted under the auspices of institutions that receive agency funding.

Our colleagues at the UBC Office of Research Ethics oversee the Behavioural and Clinical Research Ethics Boards, and are associated with the Ethics Board at UBC’s Okanagan campus, as well as UBC’s affiliated teaching hospitals. They provide outstanding support for UBC researchers and students and would be delighted to answer your specific questions.

Reflection

Does your data comply with FIPPA requirements? 
Does your data contain any sensitive or confidential information? 
Does your data include any personal identifiers? 

Reach out to research.data@ubc.ca or arc.support@ubc.ca to discuss sensitive data sharing.

Need help?

Please reach out to research.data@ubc.ca for assistance with any of your research data questions.