We've updated our Privacy Policy to make it clearer how we use your personal data.

We use cookies to provide you with a better experience. You can read our Cookie Policy here.


Embracing Data-Driven Modeling Approaches Into Biopharmaceutical Processing

A scientist uses a pipette while wearing blue gloves. The image is overlaid with white dots and lines to represent a network.
Credit: iStock

Want a FREE PDF version of this article?

Complete the form below and we will email you a PDF version of "Embracing Data-Driven Modeling Approaches Into Biopharmaceutical Processing "

Listen with
Register for FREE to listen to this article
Thank you. Listen to this article using the player above.
Read time:

Prediction of cell productivity, product quality attributes and process deviations during bioprocessing are critical and current efforts towards building models that allow for exploration and prediction during bioprocess operation are being explored. Here we explore new workflows using data-driven modeling to improve culture medium and new control algorithms allowing for better modulation within the bioprocess design space.  

What are data-driven models?

In bioprocessing, data-driven models investigate a statistical relationship between the input and output parameters of a bioprocess that demonstrate the predictive correlation between them, relying on data being used to build these types of models. This is different from mechanistic modeling, where the focus is on improving the process understanding by applying a first principle approach to track dynamic changes during bioprocess operation. These models can also be combined to allow for hybrid modeling in which data-driven models can substitute unknown equations to fill the gap in understanding.

To produce a data-driven model requires using design of experiments (DoE) to choose different statistical designs that are then examined through planned experimentation to produce a dataset. The interactions from the dataset are examined to evaluate the major effects between the input and output variables. ­Typically, this is done using linear regression, response surface models, multivariate analysis or machine learning.

Data-driven model-based framework to rationally improve cell culture media  

Cell culture medium constitutes the highest cost for bioprocessing; commercially available medium is used to increase productivity and product quality, which is why considerable resources are placed into medium screening and development for cell lines.1

The typical strategies for culture medium development largely rely on empirical and statistical approaches such as DoE and multivariate data analysis (MDVA). This is done by blending nutrients to produce formulations, evaluating the formulation for performance, using statistical analysis to identify critical medium components and then validating experimentally using DoE. Then MDVA can be used to capture hidden correlations among variables in the process. However, the issue with this current framework is that it relies on a black-box approach and does not consider how media components may influence cellular behaviors. To overcome this, in silico model-guided approach using flux balance analysis (FBA) of genome-scale metabolic models (GEM) can be exploited by mechanistically describing the metabolic behaviors which are related to culture conditions.

A framework for integrating cell culture and metabolite data with a model to build a rich data-driven system was published in Metabolic Engineering recently by Dr. Dong-Yup Lee, an associate professor of chemical engineering at Sungkyunkwan University in South Korea.2 Here, Dr. Lee explores a systematic media design framework for the knowledge-based identification of target media components.

The framework comprises six steps:

1. Cell culture and metabolite data collection, in which the cultures are run, and the necessary culture profiles are collected

2. Data processing and elemental balancing, where the data is processed into cell-specific rates, and the inputs and outputs are also checked

3. Multivariate statistical analysis to interpret relationships between process conditions and performance

4. Pathway enrichment analysis, where relevant metabolic pathways are suggested for the model-guided analysis

5. In silico modeling and flux prediction, in which relevant metabolic states can be described based on flux distribution

6. Identification of media component targets from highly coupled or decoupled reactions.

To showcase the framework as a proof of concept, Dr. Lee’s teams took two monoclonal antibody (mAb)-producing CHO cell lines and cultured them in two varying basal mediums to compare and evaluate four different combinations. During the culture, they studied the process and ran metabolism analysis of residual metabolites, such as amino acids, nucleotides, the TCA cycle, glycolysis and the pentose phosphate pathways.

“Using the framework, we were able to statistically narrow the focus down to the TCA cycle as a media-dependent enrichment pathway and rationally identify some downregulated reactions leading to insufficient energy synthesis. We suggested the addition of q10 replenishment to debottleneck the system,” explains Lee. q10 is an essential component of the mitochondrial electron transport chain, and additional supplementation has been shown in the literature to enhance energy metabolism by improving cell growth and production in CHO cell culture.3,4 To validate this, the team ran cell culture with the addition of variable concentrations of q10 and found that under those conditions, where there was insufficient cofactor regeneration, q10 did indeed help debottleneck the system and lead to an increase in viable cell densities.

“This systematic framework can be very useful for the industry in terms of research and development. There is a need today for companies to better understand the underlying mechanisms and to rationally engineer them to be more optimal producers,” says Lee. This can be done today by incorporating the data into models such as this one to elucidate better how the culture media influences the cell line, as an example in this work.

A process-aware data-driven model allows for bioreactor predictive control

Bioreactors are an essential part of the biopharmaceutical industry as they are a critical platform for the mass production of biotherapeutics, such as mAbs. The scalability of reactors allows them to produce biologics from small scales, a few hundred milliliters for R&D purposes, upwards to thousands of liters for mass production. To produce these biotherapeutics, it is vital to control the process; this is done by creating algorithms between known inputs and outputs of a bioreactor to ensure that the process runs as intended. Within a bioreactor, these inputs can be the feeding rate, dissolved oxygen (DO), pH and sparging rate, for example. While the outputs and tangible characteristics include cell density, viability and mAb product.

Further complexity is added when a bioreactor runs in a continuous mode, such as perfusion. This allows for better uniformity and productivity as the product is continually being harvested versus that of a tradition-fed batch in which the product is only harvested at the end of a campaign. However, a perfusion system set-up requires stricter process control as the system needs to be held in a semi-steady state, which can be challenging to optimize.

Traditional control used for industrial processing is proportional-integral (PI) control. However, using more sophisticated systems, such as model predictive control (MPC), allows one to calculate optimal input trajectory to meet desired output values.5 Using MPC is not common due to the sensitive nature of cells and set batch recipes. This gap to allow for MPC is something that the Mhaskar group is interested in looking into.

Dr. Prashant Mhaskar is a professor in the Department of Chemical Engineering at McMaster University in Ontario, Canada. The Mhaskar group works in two fields: non-linear process control and data-driven modeling and control. Recently his group published work looking at process-aware data-driven modeling and control of a mAb perfusion bioprocess–this work does two things, (1) build a data-driven model and (2) formulate a model predictive controller to demonstrate the use of this system versus traditional PI control.6

“A process-aware model looks to incorporate information, whether quantitative or qualitative into a model, instead of having what we refer to as a black box. For example, if a relationship is known between glucose feeding and product titer to have a positive correlation, we want to make sure you build a model that represents that constraint,” says Mhaskar. 

“In the model, we can define the gains, which are the relationships between input and output variables within the system. If a relationship is known, the gain can be constrained between those two variables to create process knowledge or process awareness. This creates a better outcome for the control system and a more robust system,” he adds. In other words, the model is built with constraints that are favorable for a bioprocess and does not allow the control to stray away from harsh input variables that may cause problems to the cells’ metabolism, or the quality of the biologic being produced. This is achieved by taking experimental data with known outcomes, training the controller with these inputs and outputs, and using validation data to ensure the predictive nature of the model when applying it for control.

“One main tangible finding we had was the ability to run our process more efficiently than the traditional PI control system, which also allowed us higher antibody productivity. This method is quite versatile in its application and can also be used for other cell lines,” explains Mhaskar. While higher productivity is one tangible output, the importance relies on the technique that was developed. This type of technology can be used universally for different cell types once optimized for a particular process and will allow for better bioprocess control, not only on a small scale but also can be transferred over to industrial scale bioprocessing if one keeps in mind the issues that arise with scale-up aswell.

Looking ahead, the Mhaskar group is looking to incorporate not only the outputs of the process, but also product quality into their model design, which will allow therapeutics with more homogenous post-translational modifications, a hot-topic issue for continuous bioprocessing.  

Outlook for bioprocess modeling

Unsurprisingly, there is still a great need to optimize current manufacturing practices. This is true for all aspects, from cell line development to medium development to upstream and downstream bioprocessing. However, solutions are being developed and enabled through process systems engineering. While some models are quite sophisticated in explaining underlying complex biological interactions and models that can simulate and predict certain aspects of a bioprocess, there is still a lack when it comes to building a comprehensive framework that incorporates product and process design. The use of models will help enable a mechanistic first principle understanding of the bioprocess, how cells produce these biologics and help elucidate better manufacturing strategies along the bioprocess lifecycle.


1. Ritacco F V, Wu Y, Khetan A. Cell culture media for recombinant protein expression in Chinese hamster ovary (CHO) cells: History, key components, and optimization strategies. Biotechnol Prog. 2018;34(6):1407-1426. doi: 10.1002/btpr.2706

2. Hong JK, Choi DH, Park SY, et al. Data-driven and model-guided systematic framework for media development in CHO cell culture. Metab Eng. 2022;73(June):114-123. doi: 10.1016/j.ymben.2022.07.003

3. Konno Y, Aoki M, Takagishi M, et al. Enhancement of antibody production by the addition of Coenzyme-Q 10. Cytotechnology. 2011;63(2):163-170. doi: 10.1007/s10616-010-9330-9

4. Noh YH, Kim KY, Shim MS, et al. Inhibition of oxidative stress by coenzyme Q10 increases mitochondrial mass and improves bioenergetic function in optic nerve head astrocytes. Cell Death Dis. 2013;4(10):1-12. doi: 10.1038/cddis.2013.341

5. Rathore AS, Mishra S, Nikita S, Priyanka P. Bioprocess control: Current progress and future perspectives. Life. 2021;11(6). doi: 10.3390/life11060557

6. Sarna S, Patel N, Corbett B, McCready C, Mhaskar P. Process‐aware data‐driven modelling and model predictive control of bioreactor for the production of monoclonal antibodies. Can J Chem Eng. 2022. doi: 10.1002/cjce.24752