We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


What Is Data Integrity?

Lines connecting data points in three dimensions.
Credit: iStock
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 7 minutes

Data integrity definition

Data integrity is vital for the success of any business, laboratory or institution that deals with the generation, manipulation or storing of data. The integrity of data refers to many aspects of data use: completeness, consistency, accuracy and the validity of the data in question. The central dogma of data integrity is that when recorded, it is recorded exactly as the user intends, and when retrieved, it is in the exact same state that it was recorded.

Data generation has shown a steady increase over the last several decades that has mirrored the advent of big data analytics. Healthcare providers, biotech and pharmaceutical companies and medical device developers use this data to assess markets, predict consumer behavior or mitigate potential risks they may face. The sheer scale, complexity, and amount of data needed to develop actionable responses and meaningful results has increased exponentially, leading to many scientists and analysts to question the integrity of this data.

Why is data integrity important?

Aside from the accuracy and validity of data, data integrity also deals with the safety and security of data wherever it could potentially be used for malicious intent, such as patient data in healthcare or experimental and clinical data in the pharmaceutical or life sciences fields. If the data generated in these fields cannot be relied upon for accuracy or security, any actions taken (such as treatments for patients) or conclusions drawn (such as novel therapeutics developed) using the data are compromised. Compromised data is of little use to science, and sensitive data loss can result in lost work at best and lots of trouble (such as stolen intellectual property or irreparable financial damage) at worst. That is why the accuracy of recording and consistency of data over its lifetime – its integrity – is paramount.

Maintaining data integrity also ensures greater efficiency throughout the lifetime of the data that comes in the form of increased:
  • Recoverability
  • Searchability
  • Traceability (to origin)
  • Connectivity

Accurate or valid data will increase the stability and performance of the data and ensure that it can be retrievable with high fidelity.

Ensuring that your data maintains its integrity will help keep it free from outside influence and potential malicious intent. Your data will remain accurate, reliable, and complete regardless of the length of time it is stored or the frequency of access.

ALCOA+ principles and data integrity

One of the best ways to ensure that data integrity is properly maintained is to apply a rigid structure that keeps your data is safe. The FDA created the ALCOA+ principles (listed below) to provide your data with such a structure.

A graphic showing the different principles of ALCOA data integrity.

ALCOA Data Principles. Adapted from Labguru

  • Attributable – There should be an easily traceable lineage to the creator of the data and to anyone who modifies or alters it. It should also demonstrate who or what the data was about.
  • Legible – Data should be simple to understand and read either visually or electronically. It should also be indelible with original entries preserved.
  • Contemporaneous- Data should be created simultaneously with the observation of an experiment or activity.
  • Original – Sources or documents (such as raw databases or laboratory notebooks) relating to the data are preserved and accessible in their original form.  
  • Accurate – Data should be free of errors with edits or amendments documented.


  • Complete – Data must include all experimental results – this includes the results from any analyses, repeated results and metadata. To ensure complete data and prove nothing is lost or deleted, it is important to develop an audit trail by keeping your metadata linked and in context with all other data. Tools such as electronic lab notebooks (ELNs) can help with this process.
  • Consistent – To ensure consistency of data, it must maintain the sequence in which it occurred. The data must be traceable with a date and time stamp, and it should be created in a manner that is repeatable – uploaded, processed and maintained by the same methods. Using automated workflows instead of manual data entry can reduce human error and increase consistency.
  • Enduring – Data should endure throughout its lifetime and be recorded on acceptable media which will be equally enduring (such as paper or electronic).
  • Available – Data should be easily accessed when needed for review or auditing process. Keeping your data restricted to single platforms can help to ensure that your data is accessible to authorized personnel in the lab.

Understanding these ALCOA+ principles is a great frontline defense that will help protect your laboratory or institution from irregularities that could potentially result in action from regulatory authorities (such as inspection or warning letters). No lab is perfect, and even if regulatory actions occur, it is important to be able to demonstrate compliance in other areas to authorities. Automated data management systems can greatly decrease the chances of regulatory breaches while paving the way for future studies and bolstering the reproducibility of experiments.

How to ensure data integrity

A laboratory or system workflow that is generating regulated data will typically have ample opportunity for data to be compromised – as shown below. Since compromise can occur easily at any point during data generation, it is of paramount importance that there are protocols and practices in place to ensure data integrity.

Data compromise can occur even outside of the data workflow due to a number of reasons:

  • Malicious or unintentional human error.
  • Errors in data transfer from one device to another.
  • Malicious cyber threats such as hacking or virus attacks.
  • Hardware compromises such as physical destruction of device or storage drive crashes.

The graphic below illustrates some simple practices to keep in mind that can help ensure data integrity and avoid the compromises highlighted above.

A graphic showing practices to preserve data integrity

Practices to preserve data integrity. Adapted from Varonis

In the end, the responsibility of ensuring data integrity rests solely on the shoulders of the lab or institution generating the data. Most instances of data compromise can be avoided as long as there are adequate measures or protocols in place that follow the regulatory guidelines designed to maintain data integrity – such as the ALCOA+ principles. If budget or manpower is a concern, remember that the procedures and protocols put in place should reflect the actual value of the data. Therefore, it is important for labs, businesses, or data centers to employ data backup and duplication protocols along with industry best practices to ensure the continued integrity of their data.

Data integrity and compliance

The security, safety, and integrity of data are maintained through tight restrictions that are enforced through compliance with regulations such as the General Data Protection Regulation (GDPR) of the European Union. While the US does not have an equivalent to GDPR, there are more industry-specific regulations which include: Current Good Management Practices (cGMP) (Title 21 CFR 211.22) for pharmaceutical applications, or CAP and CLIA regulations in the clinical healthcare industries. These restrictions are typically self-enacted and created during the design of the business or institution that will be generating data. Organizations must conform to these standards or they risk fines, sanctions or in egregious cases – full shutdown of activities.

Laboratories that use chromatography data systems (CDS) often generate large amounts of regulated data, which can make CDS labs susceptible to regulatory breaches and citations. To avoid these non-compliance citations, CDS labs and other institutions that generate regulated data should follow these steps below that can help keep your laboratory and your data in compliance with regulatory authorities and improve the quality of your data.
  • Use unique user credentials for each individual and have appropriate access privileges to their roles. These credentials should be stored, curated, and maintained outside of the laboratory by an independent organization.
  • Data must be backed up effectively, with regular system checks, and should be performed by an independent organization (typically an IT department).
  • Available data should be complete with all files free from manipulation or deletion and access to the data should be restricted.
  • Unofficial testing and quality control should be prohibited. All quality control actions should go through approved procedures.
  • Technical support should be carried out through an independent organization (such as an IT department) and not the laboratory generating the data.
  • A clearly identifiable audit trail should be in place to help identify deviations from regulations such as CGMP, CAP, or CLIA.

Strict adherence to the regulations will not only ensure that you avoid citations and other sanctions, but will also protect your data from compromise. Structures such as the ALCOA+ principles mentioned above are designed to be easily followed to keep your data compliant and your business safe from regulatory action.

Data integrity and data security

The terms “data integrity” and “data security” can be easily confused as interchangeable, and this is true to some extent – one cannot be had without the other. However, data integrity is typically a desired result of data security.

Data security is the protection of data from unauthorized access and corruption (either malicious or accidental) that can potentially compromise data integrity. Data integrity, however, refers only to the accuracy and validity of the data, not the act of protecting or securing the data.

Data security is an important tool when it comes to maintaining data integrity. It can help reduce the risk of leaking sensitive information such as intellectual property, experimental data, healthcare data and emails. Some tactics employed to maintain data security include:
  • Permissions management (individual credentials and authorized user access)
  • Data classification
  • Threat detection
  • Security analytics

No solution is perfect and sensitive data can still be compromised – even with the best security- so it is important to take other measures as well. Data security must be combined with measures such as data backup and duplication (both on-site and in the cloud) to properly maintain data integrity.

The steady increase in data generation has made data integrity and security paramount in keeping expensive and important intellectual property safe and secure. Compliance with regulations will help to ensure that data integrity remains complete, unmanipulated and accurate for the lifetime of the data.