How is health data kept safe?

It is essential that patient data is kept safe and secure, to protect your confidential information. There are four main ways that privacy is protected:

  • Removing details that identify a person and taking further steps to de-identify information.
  • Using an independent review process to make sure the reason for using patient data is appropriate.
  • Ensuring strict legal contracts are in place before data is transferred or accessed.
  • Implementing robust IT security.

Data protection is a balancing act

Do the benefits of using patient data outweigh the risks? Could something go wrong, and what would be the impact? Sharing patient data will never be totally risk-free, but there must be appropriate measures in place to make sure any risk is as low as reasonably possible. Data is de-identified wherever possible. There are audit processes to check who is accessing data, and robust penalties can be issued where data is misused.

Can I be identified from the data? 

Personally identifiable data can only be used if you give your permission or where it is required by law, and even then only with robust safeguards. It cannot be used for insurance or marketing purposes without your consent. Some data will be used to produce statistics that are published monthly by your health care authority, for example hospital emergency waiting times or vaccination coverage. The information can only be openly published if the data is anonymised, so it is not possible to identify any individual.

Spectrum of identifiability 

In practical terms there is a wide spectrum of identifiability. This ranges from fully identifiable personal data, to data that has been through a robust anonymisation process. The bar is very high for data to be considered 'anonymous' under GDPR, which means there are lots of purposes for data use that still count as personal data. The identifiability of data depends both on the features of the dataset and on the environment where it is held and used. 

For example, data that is not identifiable on its own may become so if it is combined with other data. Some environments used to store data therefore include technical controls on what the data can be linked to and limitations on who can access it. The controls used to protect the data are just as important as the qualities of the data itself.

More about...

The key issue regarding health data use for purposes beyond your care is the balance that needs to be struck between maximizing the potential benefits and protecting against possible harms. Generally, the utility of health data is highest when few safeguards are imposed, but this also increases the potential risks with regards to privacy protection and the security of the data. 

There are two types of health data that are currently treated differently: identifiable and anonymised data.  

Anonymised data

When data is rendered completely anonymous, it is not considered to be a personal data anymore and therefore doesn't fall under the mandate of the General Data Protection Regulation, which only applies to identifiable data. The anonymisation process itself protects individuals from potentially harmful outcomes. Anonymisation is a continuum: there are different techniques that offer different levels of protection, such as those described below.

Data aggregation

Data aggregation refers to the pooling of data, so that individuals can no longer be identified, such as in the example provided below.


Individual data: A is fully vaccinated against COVID-19, B is fully vaccinated against COVID-19 etc.

...and aggregate data: In this population, 80% of people are fully vaccinated against COVID-19.


Data swapping

Data swapping happens where certain characteristics at an individual level are rearranged. This allows researchers to still perform analysis on the entire dataset, but comparisons on an individual level have become meaningless, such as in the following example.

Original dataset: 


Same dataset, swapped:  


Small cell risk analysis

A small cell risk analysis is a statistical analysis used to measure the risk of re-identification when only a small group of people is concerned, for example in the case of rare diseases or when many variables are combined. 

Advantages and disadvantages related to anonymous data.

  • On the one hand, for researchers, using anonymous data is easier because there are little to no restrictions to the use of anonymised data. For example, there is no need to ask for participants’ consent because no one can be identified. For citizens, anonymous data guarantees that no direct, personal harm - like privacy violation, discrimination or unintentional commercialisation - can occur when data about them is used. However, it is important to note that several publications have shown that sometimes it is possible to reidentify individuals in anonymised datasets. Click here to see an example. 
  • On the other hand, anonymous data has less utility. Some research questions cannot be answered without identifying information. For example, it is more difficult to compare across characteristics: if you want to know how a COVID-19 infection affects people with asthma, while correcting for age, high risk jobs and other factors, you need access to all these variables that allow individuals to be identified. For citizens, it is impossible to benefit directly from health data reuse if it is completely anonymised because data quality is reduced and the research participants cannot be recontacted. 

Identifiable data

Identifiable data are all types of data where it is possible to trace back to the individual person behind the data. This includes both: 

  • personally identifiable data, such as name, address, ID number, social security number etc...
  • de-identified or pseudonymised data, where a person’s name and other directly identifying information is removed. It is used very often in health research. There are different techniques of depersonalisation that offer different levels of protection. For example,  data can be pseudonymised by replacing some identifiable characteristics or by using a specific encryption (for example, changing names into numbers). A trusted third party can regulate communications between two partners to make sure that neither partner holds all the keys to revert the encryption. However, it may still be possible to re-identify the person if the data is combined with different sources - This would be like adding more pixels to a photograph or joining together different pieces of a puzzle. 

All data that is considered to be identifiable is protected by the GDPR. This data can only be used if organisations can show they have a lawful reason for using the data, known as a 'legal basis'. In the EU and the UK, organisations will usually use your data with your ‘consent’ or without your consent if the use of your data can be considered  ‘a task in the public interest’, or a ‘legitimate interest’.  Click here if you want to learn more about your rights under the GDPR.

A task in the public interest

Examples of tasks in the public interest include surveillance of illness and disease, archiving for historical or scientific purposes, and supporting an institution while they are performing a task defined previously by a regulation. Identifiable data can be reused for any task in the public interest when proper oversight is guaranteed (see below). 

Informed consent

Informed consent can be used as a legal basis to collect data in a clinical or research context. Asking for consent respects individual autonomy but it can also be burdensome, both for researchers and citizens alike who need to spend time, resources and energy to confirm their preferences every time. Additionally, in some circumstances it may be nearly impossible to obtain consent from some individuals, where their data is needed for a specific project. For example, when data from years ago needs to be accessed to determine trends and evolutions. 

Advantages and disadvantages related to identifiable data 

  • On the one hand, identifiable data offers high utility: it can be used to answer many questions, including when it is necessary to compare certain variables, such as smoking and the effectiveness of a lung cancer treatment. This implies that it has to know who is a smoker and who received a certain treatment. When the use of certain data involves risk to individuals’ privacy or other harms, de-identification techniques can be applied to manage these risks (see below). 
  • On the other hand, it is often a difficult balancing act to decide what the appropriate level of protection is for a certain type of data. Depersonalisation techniques can be time consuming and expensive. For citizens, the use of identifiable data can lead to direct personal benefits, but the risk always remains that individuals can be reidentified when they don’t want to be.


Different types of bodies can ensure the uses of health data for purposes beyond individual care are respectful of the regulation and citizens' privacy. 

  • Data protection authorities: data protection authorities are national governance bodies that can oversee structural links that may be created between large databases or decide whether something qualifies as ‘a task in the public interest’.
  • Data Protection Officer (DPO) : some institutions or companies processing personal data under the GDPR need to have a DPO who supports the data processors to make sure that they adhere to the rules described in the GDPR. 
  • Ethical Committees: when new research projects are launched in hospitals or other public institutions, they are submitted to a multidisciplinary ethical committee to decide if the research protocol is ethical and the proposed methods are proportional to the research goals.  
  • Data Access Committees : some databases install data access committees, who decide who gets access to the data and under which conditions. 

Learn more 


Under which conditions should health data be used for purposes beyond individual care

Have your say here!



You can also check out...