Proteomics is entering a new era where reproducibility, scalability, and data reuse define scientific impact. As datasets grow in both size and complexity, researchers face mounting challenges in tracking data provenance, standardizing workflows, and ensuring long-term accessibility.
In this interview, Dr. Yasset Perez-Riverol, team coordinator of proteomics services at EMBL-EBI, explores how building reproducible proteomics pipelines with modern workflow engines like Nextflow and software management tools like BioContainers and BioConda enables data robustness and reusability without sacrificing flexibility.
Watch this video to discover:
- How workflow engines and containers drive true reproducibility in proteomics
- Practical strategies for ensuring FAIR proteomics data through data provenance tracking
- Why open-source ecosystems accelerate large-scale data reuse
Further resources:
- Paper: Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines
- OpenMS 3 Enables Reproducible Analysis of Large-Scale Mass Spectrometry Data
- Paper: quantms: a Cloud-Based Pipeline for Quantitative Proteomics Enables the Reanalysis of Public Proteomics Data
- Presentation: quantms: Nextflow Proteomics in the Cloud
- Workflow Guide: quantms: A Cloud-Based Workflow for Peptide and Protein Quantification