Harnessing the Power of AI and Machine Learning to Revolutionize API Production

The Teva API site in Santhià, Italy, embarked on an innovative project to use artificial intelligence (AI) and machine learning (ML) to revolutionize API production.

The aim was to optimize and standardize the production process to reduce variability, raw material usage, and energy consumption, while enhancing productivity and API purity. It also sought to minimize downtime by predicting equipment failures and enabling rapid response to alarms.


Background and challenge


In the API and pharmaceutical industry, data is generated and collected by different tools like MRP, Lims, DCS, planning software, analytical instruments, and so on. In most cases, the data is not easily accessible from a single source, has different formats and is hard to organize and connect to a single event.

To fully understand a specific process phase, we need to identify the equipment where it is performed, connect all the process data and trends, the in-process control analytical results, the trace of the raw material entered in the lot, the maintenance and calibration history for the equipment used, the quality attributes, the possible deviations occurred before, during and after, and the efficiency of the utilities during the execution.

It is always difficult to have the full picture of a process phase, and even harder to have it in real time.


The solution


For this reason, we have developed an automated tool (in Python) to be able to do these connections for us. This tool can recognize the lots and the process phase that we want to study or investigate, can retrieve all the data necessary from different systems, recognize and extract the data of the same phase for other lots produced in the past.

The multi batch data (training set) are cleaned and aligned, then stored in a 3D matrix, where the three dimensions represent the batch number, variables and time. The data is analyzed with PCA and PLS methods, in order to understand the correlations between variables and reduce the dimensionality of the system.

At this stage in time, we are ready to prepare a model that can explain most of the variability between batches (yield, quality attributes etc.) with few variables (typically 2-3 dimensional models can explain 70-90% of the variability).

The model is then validated by testing the predictability on some lots that were not included in the training set (test set). The lots belonging to the test set can set and, at each cycle, the machine can learn something new.


Practical uses


How to use this model is up to you. It can be applied to different environments, including:

  • Reproducing golden batches, understanding the reason why they were ‘different’ from the others or avoid reproducing ‘inadequate’ batches.
  • Reducing the areas of the process to study in the lab, concentrating on the phases which are more connected with the variability of the result.
  • Recognizing trends to understand if any equipment is getting close to a failure, improving safety and reducing downtime.

The improvement of process performance, sustainability and safety using AI and Machine Learning in the API and pharmaceutical industries is innovative, flexible and can be applied in different contexts, such as IT, Engineering, Quality, EHS, Operations, and R&D.


Real life examples


  1. Alert generation from asset sensors

40 AI algorithms were developed to continuously monitor utilities and facility areas data. The algorithms can recognize patterns and trends that can potentially generate future alarms. The system generates immediate automatic alerts (emails, reports, potential route cause etc.) enabling the prompt check or corrective action by personnel.

This tool significantly reduced the time to resolution (in most cases avoiding alarm generation).  The increased monitoring of the asset, in the long term, also reduced the number of events.


  1. Reproduce best repeatable batches (Yield)

 Process parameters from 31 batches (training set) were automatically collected. Each one of the batches originated approximately 10.000 variables (process, analytical, utilities etc.).  Using PCA and PLS tools, the dimensional space was reduced to 2 principal components. The 2 components model was able to justify 70% of process yield variability. This dimensional reduction allowed to simplify the batches comparison and clearly identify the most relevant parameters for the yield optimization.

The predictive model was successfully validated on 2 new batches. Consequently, the test data set has been added to the training set in order to improve model predictive performance for next iterations.

This successful project demonstrates how it is possible to benefit from the utilization of data, which is often collected but rarely fully exploited. We believe that this is just the beginning of a long journey into modernization that creates opportunities for both short and long-term usage and production improvement.