Founded by Professor Henry Markram in May 2005, the École Polytechnique Fédérale de Lausanne’s (EPFL) Blue Brain Project is attempting to reverse engineer the rodent brain (and ultimately, the human brain) and recreate it at the cellular level inside a computer simulation. The goal of the project is to gain a complete understanding of the brain and from this, help in the long term to enable better and faster development of brain disease treatments. Scientists at the project have adopted an innovative approach. First, brain imaging, electrophysiology and neuron morphology data are acquired in the lab. Next, to extract the maximum possible information from the data the project has established a set of workflows supporting the acquisition, curation, databasing, post-processing and mining of their data. This second step is what sets Blue Brain apart. They call it data driven simulation. By creating supercomputer-based reconstructions and simulations based on experimental data they believe they can fast track our understanding of the brain. Exploiting the interdependencies in the experimental data collected to obtain dense maps of the brain, without measuring every detail of its multiple levels of organisation.
To facilitate this approach Blue Brain has a large neuroinformatics capability. The team responsible for building the platforms and systems that enable scientists at the project to store, analyse and process their data in a unified infrastructure.
To find out more about the key role informatics plays in the project we spoke to Samuel Kerrien, Section Manager of Neuroinformatics Software Engineering and Data and Knowledge Engineering, within the Computing Division, at Blue Brain.
Jack Rudd (JR): To start us off, what is neuroinformatics and what role does it play at Blue Brain?
Samuel Kerrien (SK): First, we should step back and look at what neuroinformatics is generally considered to be. In broad terms, it’s really concerned with organising neuroscience data and providing tools to facilitate the analysis of this data. At Blue Brain, our purpose is to support efforts to model the brain and neural activity. Across the project we have adopted a data driven simulation approach where we acquire data from the lab and then process it to generate insightful simulations. This data can be of several natures. For instance, we record the electrical activity of individual neurons to see how they react to stimulation. We record the morphologies of neurons so that we can visualise them in 3D space. And, we acquire brain imagery to allow us to map the brain in 3D, which we then use to realign our data and position it in our brain simulations.
We have already developed some single neuron mathematical models which are able to represent and simulate the electrical activity we’ve previously recorded. Through doing this, we have proven that we can learn a lot more from the data we’ve recorded in the lab. Based on these individual models, whole circuits of neurons can be recreated. Using these single neurons and circuits we can try to position the neurons in virtual space to create realistic models of parts of the brain. These models have the ability to demonstrate how these neurons connect to each other to create synapses so we can trace how they communicate with each other. Next, we embed our mathematical models into these neurons to run a simulation of the whole circuit and observe what is happening, how the neurons interact with each other and record all the activity. The process runs from gathering data in the lab, to generating very simple models of single neurons, to building full assemblies of neurons, running them on the supercomputer and then collecting the data coming out of the simulations.
JR: What do you do once you’ve produced and collected all this data?
SK: The next phase is to validate the output of these simulations to see if it’s as realistic as we hoped it would be. To carry out this validation step, we go back to the lab to collect more experimental data but this time we focus on making it as closely related to the simulation we are trying to validate as possible. By comparing new data with the model, we can figure out if the model is as good as we were hoping, or, if we need to bolster specific areas. If this is the case, we can go and search for additional data to strengthen the weak aspects of the model. Once we’ve made some changes the whole process starts again – this is what we call the data driven simulation loop.
Neuroinformatics is at the heart of this approach as we need to organise every single step. The data must be stored in a system where it can be easily looked up later on and fetched using very specific criteria. An individual may only be interested in electrophysiology data from a very specific kind of neuron, so the search system must be very precise. Everything is organised in a system we call the ‘Knowledge Graph’. Where all the files that have been generated during each step of the process and the precisely detailed meta data describing these data files, can be recorded and made available. This allows our users to build scientific queries to ask questions of the system and search for relevant data whenever they need to.
JR: How does the data integration platform support your data driven simulation approach?
SK: Our data integration platform, Blue Brain Nexus, which focuses on the integration of neuroscience data, can be easily broken down into a few key components. The first is what we call the ‘Data Space’, which is essentially a federation of data storage. There’s a lot of data that are very diverse coming from a variety of laboratories and research groups across the project. These data are scattered in several storage facilities, most of which are distributed geographically. Blue Brain Nexus is set up to bring awareness of where these data are being stored and improve digital accessibility without having to physically move all the data into one place.
The next component is a very important one and, one we have already mentioned, the Knowledge Graph. This is how you define the domain you’re working on. In this case, I work in neuroscience, I therefore want to deal with specimens like mice, rats or humans. You can define for these domains through a specific set of entities which can be assigned to the things that you are dealing with, an animal, a brain slice, a neuron – a dataset. Knowledge Graph is the infrastructure that organises these entities and gives your science structure and order. The Knowledge Graph also allows you to register all the data and metadata you need to properly record provenance, a crucial element in data driven science. We must accurately record where an entity is coming from, who generated it and, using what protocols.
We’ve also built what is known as the ‘Atlas Space’ component. This part of the platform allows us to take brain imagery data from the lab and piece together the 3D spatial organisation of the brain. You could take a specific individual, say a very specific rat and image its brain and produce a very accurate 3D volume but, all brains are subtly different. So, how do you get all of the scientists in the field to agree on the specific coordinate space of each individual neuron? Well, this is what we are trying to achieve with Atlas Templates. We are trying to encourage scientists to come together to build reference coordinate spaces for specific species. The Allen Institute for Brain Science for instance produces their Common Coordinate Framework Atlas for the mouse, this is an Atlas Template that we can then use at Blue Brain to organise our data. If we record our data within the constraints of these templates other scientists can do the same and more readily compare their data to ours.
There are two main approaches you can take to aligning data and creating references. You can do it simply through coordinate space, an XYZ approach that tells you where your data is in space. The other way is to do it through ontologies. When a group of scientists build a template, they can also delineate all the brain regions of a given brain, a process we call parcellation. Brain parcels are identified, delineated and tagged with descriptive information. All of which are structured into an ontology. Making it possible to determine the relationship to parent domains and give the data a hierarchical structure. It’s not as accurate as pinpointing a specific coordinate but it is still extremely informative.
The fourth component used in Blue Brain Nexus is the community encyclopaedia ‘Knowledge Space’, which we also contribute to and is where neuroscience concepts are linked to ontologies and this is not only limited to brain regions. It can be just about anything from the classification of cell types like a neuron type, morphological type or anything else that scientists record in their ontologies. Organising these concepts and linking them to the data that is available around the world is achieved with the Knowledge Space. For instance, linking to literature that includes particular mention of specific brain regions or datasets. Or, linking to data repositories such as neuromorpho.org or neuroelectro.org. These well-known data repositories can be referred to facilitate the work of scientists who are looking for relevant data to drive their science forward.
The final component is the one that really ties everything together, API’s or Application Programming Interfaces are what enables everyone to write software that gives direct access to all these components. As opposed to opening a browser, clicking and analysing things, you can write intelligent tools that connect the Knowledge Graph and the question, get back some data, do some processing and register some information back to the Knowledge Graph. API’s are what make all this possible.
Samuel Kerrien was speaking to Jack Rudd, Senior Editor for Technology Networks.