We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


LipidFinder: An Open-Source Python Workflow for Novel Lipid Discovery

Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 4 minutes

Obtaining precise, high-quality lipidomic (or metabolomic) datasets comes with its challenges. One factor that I am sure comes to mind is the ability to minimize, or even better, eliminate those large numbers of artefacts that could otherwise hinder your mass spectrometry data analysis, to ensure accurate interpretation.

There are many existing computational workflows available to help researchers handle metabolomics datasets, however, these tend to be aimed at the investigation of known lipids, or have been commercially developed, meaning customisation and user modification is not necessarily possible.1

Researchers within Prof. Valerie O’Donnell's group at the University of Cardiff, UK have developed a Python based computational workflow that is tailored to the analysis of large volumes of data, specifically aimed at the identification of novel lipids. In an interview conducted between myself and Prof. O’Donnell, she explains the reasoning behind LipidFinder and the impact this tool has had on her group’s research.

LM: One of your recent publications focuses on the development of the open-source Python-based workflow, LipidFinder. Could you tell us a bit about this new tool?

A key fundamental research question, to which there is still no clear answer, is how many lipids does a mammalian cell contain. Our view is that mining the entire lipidome of cells, particularly under conditions relevant to disease, will provide opportunities for discovering new lipids that are biologically relevant and ultimately these could be used as biomarkers or therapeutic targets for inflammation.

LipidFinder was written by Chris Brasher, currently a PhD student in my laboratory, as a Python programme that aims to effectively clean up large files of high resolution mass spectrometry (MS) data. When lipid extracts from cells or plasma are analyzed using long chromatography runs along with high resolution MS, they contain up to 60K signals, of which only about 3–5K are likely to represent actual lipids. The rest of the data is background noise, or junk, that needs to be removed. This is a mammoth task that requires bespoke informatics tools. There are open access tools available, for example, the widely used and excellent XCMS, which provide some clean-up functionality, however these are better able to look for known lipids, not the vast number of unknowns. Many investigators don’t need to worry about these artefacts since they ignore them in their analysis, but for us they are a significant problem.

Our programme can be used to follow on from XCMS, to remove many more artefacts and give greater confidence that one is looking at real lipids. One innovation of LipidFinder is the optimiser workflow that applies machine-learning methods to the operator’s own data, to optimize the various parameters required for effective data analysis. The workflow is already on GitHub and is open access; however, it is currently being adapted to an interface on LIPID MAPS and should be available in that format in the next few months. Funded by an ERC grant, Dr Jorge Alvarez-Jarretta is also adding additional modules to LipidFinder, that will improve its clean-up capacity even more, and we will publish this soon as a second release.

LM: What was the main reasoning behind the development of LipidFinder? What key research successes have resulted from its implementation?

The first version of LipidFinder was originally written as a Microsoft Excel programme by Dr David Slatter, while a post doc fellow in my group. At that time, we were interested in mapping the platelet lipidome and understanding how aspirin regulated platelet lipids, but we had no tools available to do this work. David had a long-standing interest in programming and wrote this early version. Using this we published a paper in Cell Metabolism in 2016, which showed the approximate size and diversity of the platelet lipidome in three unrelated individuals (including myself!). This also allowed us to map >100 new lipids made by platelets, in particular oxidized phospholipids and fatty acids.

Obviously, this was a very slow and inefficient approach and during that time we started to collaborate with computer scientists at Cardiff, eventually resulting in the Python version of this workflow. We see this approach as analogous to a gene array, the data is unvalidated and results in hypothesis generation, so findings made using this tool needed rigorous validation, e.g. using targeted approaches, and MS/MS experiments. Following on from the platelet study, we have been characterising lipids in cardiovascular disease human cohorts and furthering our studies on platelet lipidomic diversity, but these studies have not yet been published.  Key questions we are keen to address include: how stable the lipidome is over time in the same people, and what is the influence of genetics versus environment in the control of platelet and cellular global lipidome composition?

LM: Is there a specific topic within the field that you may not have had the chance to explore thus far or would like to explore further?

Over the last 5 years, we have changed our way of working significantly, since the study of high resolution datasets and mining lipidomes requires a team approach, including interdisciplinary skills such as informatics and statistics as well as lipid biology. Developing this type of approach has transformed how our research group works, but also presents significant challenges. We have now generated several large datasets that need time and thought applied to them, to effectively analyze and understand them. At this time, my priority is to finish these studies and publish our findings, and while new questions arise all the time, I am keen to make sure we don’t lose focus, and avoid overlooking all the data we currently have to hand.

Having said that, there are many topics and questions we are keen to follow up on. While my main interests have always resided in the lipids of circulating vascular cells, we are increasingly interested in how the same lipids regulate tissue biology, particularly the skin and brain. It is likely that our future direction will include studies addressing the formation and characterization of bioactive lipids in normal wound healing and in neurotransmission, but that’s for the future.

More information on the clinical development of a lipid, whose inflammatory properties were discovered by Prof. O’Donnell and colleagues, can be found here.


1. O’Connor, A., Brasher, C. J., Slatter, D. A., Meckelmann, S. W., Hawksworth, J. I., Allen, S. M., & O’Donnell, V. B. (2017). LipidFinder: A computational workflow for discovery of lipids identifies eicosanoid-phosphoinositides in platelets. JCI Insight, 2(7). doi:10.1172/jci.insight.91634