Using Data Lineage and Traceability to Optimize Publishing Potential
Want to get a head start when it comes to publishing in high-impact journals? Frequent, high-impact publication is one of the most powerful ways a researcher can showcase their work. It can consolidate a researcher’s reputation and open doors to collaboration and grant award success. Hence, strengthening the chances of high-impact publication can help both early-career and established researchers succeed in their chosen field.
Accurate and accessible data is key to upholding publication standards in scientific journals, and a critical factor in publishing. A number of recent papers have faced retraction after poor data reporting raised suspicions of misconduct.1 Whilst retraction is not always indicative of research fraud, the stigma associated with publication recall can throw doubt upon a researcher’s reputation. To avoid such a consequence, establishing best-practice data recording workflows is critical, and can optimize the publishing potential of original research and support success.
This article outlines the key principles of data lineage and traceability – two important factors in establishing good data recording practices. It explores how the publication potential of original research can be optimized using electronic lab notebooks for data management and sharing.
What is data lineage?In scientific terms, lineage refers to the sequence of descent from a common ancestor. Similarly, data lineage describes the evolution of a dataset from collection to publication. Data lineage typically incorporates analysis, visualization or interpretation aspects, and includes how the data was transformed. It helps assess data quality and allows for back sources to be traced and identified.2
Data lineage vs traceabilityWhilst lineage describes the evolution of a dataset, traceability ensures that information trails are both logical and robust. Like a chain of custody for evidence in a police investigation, data traceability ensures that information can be tracked through stages of research. Good data traceability can include clear approval processes, electronic signatures on data entries, or time stamps on data input. Such practices ensure that information is used ethically, effectively, and in compliance with governmental guidelines or industry standards.3
Together, data lineage and traceability ensure the integrity and reproducibility of publishable results (Figure 1). Not only do they clarify how experimental results were obtained, but they also support the validation of findings and increase a researcher’s academic credibility. Two papers recently published in high-impact journals faced retraction after issues arose regarding access to original data.1 Both papers shared common data sources, though the authors of neither paper had access to raw files. When the data lineage could not be traced, suspicions of fraudulent misconduct were supported, and authors’ careers were significantly negatively impacted.4
Scientific misconduct doesn’t always mean data fraud – it can also include the failure to report important experimental details.1 Without adequate data lineage and traceability, valuable information can be lost when details only become relevant later down the line. Major funding bodies across both North America and Europe have independently released new policies to reflect this, focusing on good data management and sharing practices.5 Conscientious data lineage and data traceability can help to meet these guidelines, secure funding and maximize research efficiency by reducing the need for undue experiment repeats.
The importance of keeping FAIR dataThe FAIR guiding principles emphasize that all data and metadata related to scientific publications must be findable, accessible, interoperable and reusable. Achieving FAIR data is a prerequisite for proper data management. Ensuring that a research project’s data is protected, usable, and trusted is also key for achieving FAIR data. Different tools contribute to improving data management through the centralization of research-related data in one place, annotated, traceable, searchable and easy to visualize and understand.6
Tools to visualize data experimental processes
Visualizing experimental processes is one of the easiest ways to ensure that appropriate integrity measures are in place. By mapping research tasks and processes in a visual format, scientists can find answers quickly and establish important validation steps (e.g., number of replicates, controls, quality control, etc.). Results associated with each step can be captured, ensuring that critical information is never missed. Hence, tools that support visualization can help researchers make the right choices for their data and clarify why key research decisions have been made.
Electronic lab notebooks (ELNs) can facilitate the visualization of experimental steps as well as lineage and traceability pipelines. For this reason, ELNs present an exciting opportunity for researchers looking to optimize their publication potential, prepare for collaborations, and increase grant award success.
What is an ELN?ELNs are computer software packages designed to replace traditional, paper lab notebooks. Like paper notebooks, ELNs are used to record protocol information, raw data and experimental observations. However, many packages integrate additional lab management, protocol templates, inventory management, and e-signature features.
ELNs vs LIMSLaboratory information management systems (LIMS) also offer opportunities to record data traceability information, however, LIMS are typically designed to record structured data. As ELNs allow users to record unstructured data, such as observations and analysis, they offer significant benefits over LIMS for many academic labs.7 Some ELNs and LIMS will feature attributes such as audit trails, e-signatures, and time stamps; these features can ensure robust traceability and data integrity. The chart below outlines potential laboratory needs to be considered when choosing an ELN or LIMS. Note that this is generalized; different ELN and LIMS systems might offer additional functionalities that are not included in this chart.
Table 1: Key differences between an ELN and a LIMS
Build data tracing workflows with collaboration in mind
The SciNote ELN offers an affordable data documentation solution for researchers looking to collaborate at any stage in their careers. The unique project/experiment/task structure and data management capabilities of SciNote can support labs as they grow. Whether data is being prepared for presentation, publication or technology transfer, SciNote allows researchers to design workflows with knowledge exchange in mind.
With SciNote, data is automatically backed-up on the Cloud, eliminating the risk of physical damage and ensuring that information is always available to share. Whilst fire and water damage is thankfully rare, it does pose a significant risk within the lab environment.9 SciNote keeps data safe from the kind of physical damage or misplacement which could ruin paper, or locally stored electronic records. Additionally, with data stored centrally online, and the access control feature, information is never lost when a lab member leaves. This is particularly important in research, where staff and student turnovers occur regularly, and a scarcity of postdocs is common.10 SciNote can be used to assign projects and experiments to group members, meaning that important tasks can be allocated according to expertise as soon as new members join the group. This functionality helps research to continue smoothly and offers a layer of protection against the rapidly changing academic environment.
SciNote is now also integrated with protocols.io – a secure collaborative research platform designed for protocol sharing and method optimization. Scientists can directly access protocols.io through SciNote without needing a protocols.io account. This feature makes searching for and importing external protocols quick and easy. External protocols can be saved, with appropriate accreditations, directly in a researcher’s own ELN, saving time and ensuring traceability.
Data lineage, traceability and good practice guidelines
What is good laboratory practiceGood practice guidelines lay out the processes and procedures which, if followed, are known to produce the best and most ethical outcome.
Good laboratory practice (GLP) refers to the standards of non- clinical safety and regulatory studies for the development of products for human health. These principles ensure that any data generated, handled and reported during non-clinical safety studies is of high quality and integrity.11
How can ELNs support GLP compliance?In the United States, Title 21 Code of Federal Regulations (CFR) part 11 defines the criteria for ensuring that electronic signatures and electronic records are reliable and trustworthy; only authorized personnel are granted access to electronic data.12,13 Food and Drug Administration (FDA) requirements are also shifting towards electronic data submission to streamline clinical data review.
Policies on good data management and sharing can always be subject to change. For example, the National Institute of Health (NIH) recently issued a new data management and sharing policy to update what’s currently in place.14 Using ELNs to organize data lineage and traceability allows researchers to quickly respond to changes in traceability guidelines and data sharing policies, and safeguard data integrity. SciNote offers electronic signatures, audit trails, time stamps, user roles and permissions to support GLP compliance requirements mandated by 21 CFR part 11, while keeping the system flexible and easy to use.
Increase productivity and save time
Using SciNote can significantly improve productivity; individuals save approximately nine hours per week while achieving the same amount of work (Figure 2).15 Project reports can be generated automatically, making the process 90 % faster than manual alternatives. Additionally, using templates to visualize projects and dependencies easily, makes planning and scheduling up to 80 % faster.16 SciNote also offers additional lab management tools, such as inventory management, that can further reduce the hours spent on mundane tasks and improve productivity.
Building robust data lineage and traceability workflows into scientific research saves time and increases productivity, while supporting GLP compliance. SciNote ensures easy data access and visualization to minimize wasted time on costly repeat experiments that delay publication. By supporting best practices for data sharing and management, SciNote facilitates high-impact publication, collaboration and grant award success.
Learn more about how an ELN can support data management for publication and grant writing.
- Boetto E, Golinelli D, Carullo G, et al. Frauds in scientific research and how to possibly overcome them. J Med Ethics. doi:10.1136/medethics-2020-106639
- Data provenance & lineage: technical guidance on the tracing of data - Part 1. Support Centre for Data Sharing. https://eudatasharing.eu/technical-aspects/data-provenance-part-1. Accessed November 3, 2022.
- Guidance on good manufacturing practice and good distribution practice: questions and answers. European Medicines Agency. https://www.ema.europa.eu/en/human-regulatory/research-development/compliance/good-manufacturing-practice/guidance-good-manufacturing-practice-good-distribution-practice-questions-answers. Accessed November 3, 2022.
- Researcher involved in retracted Lancet study has faculty appointment terminated, as details in scandal emerge. STAT News. https://www.statnews.com/2020/06/07/researcher-involved-in-retracted-lancet-study-has-faculty-appointment-terminated-as-details-in-scandal-emerge/. Published June 7, 2020. Accessed November 3, 2022.
- Research data management guide: NIH 2023 and European Commission policies. SciNote. https://www.scinote.net/nih-2023-policies-whitepaper-overview-235524/. Published January 19, 2022. Accessed November 4, 2022.
- FAIR principles – key take away messages for researchers. SciNote. https://www.scinote.net/blog/fair-principles/. Published July 6, 2022. Accessed November 3, 2022.
- ELN, LIMS, CDS, LES: What’s the difference? Technology Networks. https://www.technologynetworks.com/informatics/articles/eln-lims-cds-les-whats-the-difference-313834. Published January 11, 2019. Accessed November 3, 2022.
- Scandura, A, Iammarino, S. Academic engagement with industry: the role of research quality and experience. J Technol Transf. 2002;47:1000–1036. doi.org/10.1007/s10961-021-09867-0
- Preventing science laboratory fires. National Science Teaching Association. https://www.nsta.org/blog/preventing-science-laboratory-fires. Published February 27, 2019. Accessed November 3, 2022.
- Woolston C. Lab leaders wrestle with paucity of postdocs. Nature. 2022;10. doi:10.1038/d41586-022-02781-x
- Good laboratory practice (GLP). Organisation for
Economic Co-operation and Development. https://www.oecd.org/chemicalsafety/testing/good-laboratory-practiceglp.htm. Accessed November 3, 2022.
- Part 11, electronic records; electronic signatures - scope and application. U.S Food and Drug Administration. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application. Published August 24, 2018. Accessed November 3, 2022.
- Regulations: good clinical practice and clinical trials. U.S Food and Drug Administration. https://www.fda.gov/science-research/clinical-trials-and-human-subject-protection/regulations-good-clinical-practice-and-clinical-trials. Published January 1, 2020. Accessed November 3, 2022.
- The new NIH data sharing policy 2023 – why sharing is caring. SciNote. https://www.scinote.net/blog/the-2023-nih-data-sharing-policy/. Published August 10, 2022. Accessed November 3, 2022.
- Return on investment when implementing an electronic lab notebook. SciNote. https://www.scinote.net/blog/return-on-investment-when-implementing-an-eln-in-your-lab/. Accessed November 4, 2022.
- Managing work for scientific laboratories. SciNote. https://www.scinote.net/solutions-for-labs/managing-work/#creating-templates. Published October 6, 2022. Accessed November 4, 2022.