Ready for AI to Transform CMC? Here’s How It Could Happen

July 18, 2024

AI solutions are already reshaping drug development—with one big exception.

At this point in the hype cycle, we hardly need to say it: AI is poised to deliver potentially transformative benefits for the pharmaceutical industry. In fact, many drug development disciplines have already moved past discussing the many possibilities of AI and are busy putting AI-powered solutions into practice.

For CMC programs though, the story is… well, a little different. Like its cousins in discovery and clinical development, CMC should be at the gates of a transformational new age of AI-enabled efficiency and predictability – were it not for persistent data challenges that keep that golden apple out of reach. The industry is ramping up its efforts to address that issue, inspired by the value waiting to be tapped by successful AI solutions. But we’ve got work to do.

In the meantime, though, I’ll borrow a phrase: “you musn’t be afraid to dream a little bigger, darling.” Let’s take a peek at what an AI-powered future could look like for CMC programs – once they make the right investment in their data.

AI in drug development: Where it’s already making an impact

As you’ve probably noticed, the application of AI technology is already well underway in many aspects of drug development.

In drug discovery, AI platforms are being used to both identify and engineer molecules with promising clinical potential. In the clinical development enterprise, a host of AI-powered solutions are already transforming trial design, site selection, recruitment, endpoint analysis, and more.

Of course, there’s still a long way to go when it comes to realizing the full potential of AI in any of these areas. AI solutions for clinical trials have likely only scratched the surface of what’s truly possible. And while AI has shown it can accelerate discovery of promising molecules, the first generation of those drug candidates has delivered middling results at best.

But still, it’s a game-changing start for these areas of the drug development process. Proofs of concept are stacking up quickly, and drug developers are rapidly figuring out how to expand and capitalize on them.

And then there’s CMC. Sigh.

CMC should be at the gates of a transformational new age of AI-enabled efficiency and predictability, were it not for persistent data challenges that keep that golden apple out of reach. The industry is ramping up its efforts to address that issue, inspired by the value waiting to be tapped by successful AI solutions. But we’ve got work to do.

CMC is falling behind, and the culprit is clear.

While AI may already be making a big impact on discovery, clinical development, and even commercial initiatives, CMC is still waiting for its moment in the sun. And if you follow our blog, it’s no surprise why: It’s all about the data.

Not the lack of it, of course. CMC programs continue to generate immense amounts of data. But most of it puddles in SharePoints and inboxes full of PDFs, spreadsheets, PPTs, and Word docs – a galaxy of unstructured formats completely unsuited to powering AI.

ML, NLP, LLM, or any other model or modality, AI needs a vast amount of high-quality, structured data to deliver its much-hyped benefits. That’s exactly what most CMC programs don’t have enough of, leaving visions of AI enablement on a far more distant horizon than other drug development disciplines.

I’m excited to say that that’s beginning to change. We’re beginning to see many top drug developers commit to structuring their data, connecting their CMC data ecosystems, and purposefully laying the groundwork for AI solutions. And as that effort gathers momentum, we’re starting to see what AI-powered CMC could truly look like.

Here are a few ways it could come to life for CMC programs that join the shift to data-centric drug development.

Use case 1: Synthetic process design

Drug manufacturing processes are complex, multidimensional workflows with thousands of inputs, variables, parameters, and risk factors. Like all such complex entities, developing them takes a healthy amount of experimentation, exploration, and “what-if” scenarios.

Necessary as that may be, however, it takes time and resources that drug developers have less and less of every year.

Luckily, as clinical trial innovators have already shown, today’s data models and algorithms are more than up to the task of simulating incredibly complex processes and scenarios. Data-centric CMC programs have a scintillating opportunity to take a page from that playbook, and use AI to design and run simulations of their own sophisticated processes or systems.

We’re beginning to see many top drug developers commit to structuring their data, connecting their CMC data ecosystems, and purposefully laying the groundwork for AI solutions. And as that effort gathers momentum, we’re starting to see what AI-powered CMC could truly look like.

One compelling version of this application is already beginning to take shape: creating “digital twins” of a manufacturing process based on historical data of similar products. With enough of that (high-quality, structured) data, a CMC program could potentially design, simulate, and pressure-test processes 100% in silico, saving time, reducing risk, and accelerating PD timelines.

In fact, this approach might make a host of questions more efficient to answer: what kinds of materials are best to use, which attributes of the product will be critical to its quality outcomes, how changing any single variable or parameter might affect the other components, and more. And crucially, it could be done virtually, using a fraction of the time and resources that drug developers currently expend experimenting, analyzing, and answering similar questions.

Of course, the caveat remains: The success of the digital model—and the accuracy and reliability of the simulations it generates—all depends on the quality of the data that it’s trained on. But for drug developers who can harness the right amount of high-quality CMC data, the payoff could be significant.

Use case 2: Streamlined studies & experiments

Digitally modeling entire processes feel a bit distant for your program? You might be able to start much smaller: some AI tools could potentially be used to streamline the generation of key data on stability, solubility, toxicity, and much more.

In today’s CMC programs, the studies and experiments used to produce this data typically lay the groundwork for much of the manufacturing processes that follow. But they’re undeniably time-consuming and expensive in their current state. That’s where AI solutions can offer a potential shortcut.

As software engineers and data scientists have already discovered, generative AI tools can be highly effective at extrapolating new code, datasets, and analyses from past data – all you need is the right prompt. Well-trained LLMs may hold similar potential for CMC programs: feed them enough historical data on similar products and prompt them based on your study design, and it’s feasible that a robust model could produce either largely accurate testing data. Or at the very least, data directional enough to refine the study design.

Whether it provides usable results or “just” valuable guidance, this capability could cut down on the overall time and cost currently required to conduct this iterative testing today.

Use case 3: Generative regulatory submissions

The regulatory submission process is another critical but time-consuming step in the CMC process – and one that’s primed for AI optimization.

Many of the industry’s hallmark regulatory documents – including modules of the NDA, IND, and eCTD – are heavily templated, making them ideal for automation. If you’ve tried the smart content authoring tool in QbDVision, you’ve already gotten a taste of how much more efficient automated reporting and document generation can be. Generative AI tools have the potential to vastly amplify the power of solutions like ours.

Trained on enough structured submissions – the kind regulators are actively moving toward – a specialized LLM could conceivably be used to generate net-new submissions with impressive accuracy and consistency. With that kind of AI tool, document-generating tasks that typically take weeks could potentially be reduced to mere minutes – freeing up time that’s usually spent chasing data, consolidating documents, and MacGyvering modules.

The regulatory submission process is another critical but time-consuming step in the CMC process – and one that’s primed for AI optimization. Many of the industry’s hallmark regulatory documents – including modules of the NDA, IND, and eCTD – are heavily templated, making them ideal for automation.

Use case 4: Accelerated risk analyses

Along with regulatory submissions, analyzing and reporting on process risks is often one of the more laborious, time-consuming tasks in today’s CMC programs.

All too frequently, that’s because risk-related data is scattered across departments and even facilities – even for established products and platforms. So analysts often have to start new analyses from scratch (or nearly so) each time.

But what if you could train an ML algorithm on all the existing, structured risk analyses from that product or platform’s previous CMC programs? If you had that kind of training data, you could potentially create a tool with deep awareness of past risks and how they evolved, and the ability to predictively identify how those risks would evolve with modified product parameters.

QbDVision functionality is already moving in this direction, starting with a clone-forward feature that enables users to build new risk analyses on a foundation of historical structured data. But as with regulatory documentation, an LLM trained on enough structured risk analyses could potentially have an even bigger impact, enabling CMC programs to generate brand new analyses based on specified attributes, parameters, and materials.

And of course, one of the built-in advantages of AI technology is that it’s designed to become more precise and reliable over time. The more you use it, and the more high-quality data inputs you feed it, the better the output. For CMC programs, that could translate to significant time savings and increasingly accurate risk analyses over time.

Use case 5: Predictive risk control

As all these potential use cases demonstrate, AI technology and automation capabilities come with two huge perks that CMC programs could profit from: time and cost savings.

But those aren’t the only long-term benefits that CMC workflows stand to gain from investing in the data infrastructure needed to develop AI solutions. AI also has the potential to help CMC programs better understand, adapt to, or proactively control risks throughout the drug manufacturing process.

For example, with enough high-quality, structured risk data, it’s feasible that AI solutions could predictively extrapolate future risks and proactively recommend process adaptations, parameter changes, causality changes, and more. Imagine being able to map risks and model control strategies throughout an entire process, all using AI tools trained on historical analyses from analogous products. Today’s ML algorithms are just waiting for the data.

Yes, that kind of predictive ability and risk control would also likely yield time savings of its own in the long run. But it will also aid decision-making, alert scientists to risks that may not be on their radar, and overall contribute to a more robust risk management strategy and manufacturing processes that are more dynamic, adaptable, and proactive than ever before.

A closer look at LLMs: It takes a lot to make these silver bullets fly.

While there are many good reasons to get excited about these use cases, we’ll happily be the ones to inject a cold dose of reality into the picture. Here it is: AI, ironically, takes a ton of work.

Perhaps the best example is LLM-powered generative AI: By far the shiniest AI object lighting up the industry right now.

Ask Claude to run a risk analysis for you, and you’ll quickly see why adapting LLMs for specialized purposes requires intensive additional training with substantial amounts of data – tens of thousands of examples, according to one recent publication. That’s because most off-the-shelf, generalist foundation models are trained on vast amounts of general information. Reliable, business-specific use cases typically require additional training on large quantities of robustly labeled, domain-specific data.

Creating the data to even get started is no mean feat. As this recent experiment from Cleanlab demonstrated, fine-tuning an application-specific LLM can require multiple rounds of both manual and automated data curation to achieve “acceptable” rates of accuracy (~75%…). Now imagine multiplying that effort by tens of thousands of CMC datasets and… there’s a chair right over there if you need to sit down for a moment.

Ponder that scenario for a moment, though, and you’ll quickly see why some AI thought leaders are beginning to ask if generative AI may be creating more work than it saves. They require so much high-quality data, and the labor required to curate it can be so immense, that – paradoxically – there may ultimately be domains where it’s more efficient to not use AI.

We’re a long way from making that determination for CMC, of course. But as our industry begins to explore the potential of AI for process development, it’s worth reminding ourselves of what that investment will entail. Does AI have the potential to unlock a transformative new level of performance for CMC programs? Absolutely. But will it be as simple as Sam Altman says?

We’ll let you take a look at your SharePoint.

GET IN TOUCH

Let’s get your CMC program on the AI onramp.

Reach out to our team to learn how QbDVision can help you lay the groundwork for automated and generative solutions.

Patrick Riordan

Content Marketing Manager, QbDVision

Ready for AI to Transform CMC? Here’s How It Could Happen

AI solutions are already reshaping drug development—with one big exception.

AI in drug development: Where it’s already making an impact

CMC is falling behind, and the culprit is clear.

Use case 1: Synthetic process design

Use case 2: Streamlined studies & experiments

Use case 3: Generative regulatory submissions

Use case 4: Accelerated risk analyses

Use case 5: Predictive risk control

A closer look at LLMs: It takes a lot to make these silver bullets fly.

Let’s get your CMC program on the AI onramp.

Patrick Riordan

Recent posts

Making Digital CMC Stick: 5 Ways to Foster a Data-Centric Culture in Your Program

So What Exactly is Digital CMC?

Tales From the Waymo: What We Learned at JPM25

Tina Beaumont

Whitney Pung

Tommy Cronin

Christoph Pistek

Kevin Healy

Devendra Deshmukh

Lewis Shipp

Mike Greene

Bill Pasutti

Victor Goetz

Isabel Guerrero Montero

Vijay Raju

Andy Zheng

Tim Adkins

Ravi Medandravu

Barbara Tessier

Luke Guerrero

Michael Stapleton

Yash Sabharwal​

Laurent Lefebvre

James Maxwell

Paul Denny-Gouldson

Chris McCurdy

Isabell Hagemann

Ganga Kalidindi

Fran Leira

Florian Aupert

Devendra Deshmukh

Mark Fish

Chris Puzzo

Victor Goetz, Ph.D

Rachelle Howard

Vijay Raju

Greg Troiano

Pat Sacco

Diana Bowley

Robert Dimitri, M.S., M.B.A.

Devendra Deshmukh

Grant Henderson

Ryan Nielsen

Shameek Ray

Max Peterson​

Michael Stapleton

Matthew Schulze

Daniel R. Matlis

Kir Henrici

Oliver Hesse

John Maguire

Chris Kopinski

Tim Adkins

Blake Hotz

Anthony DeBiase

Andy Zheng

Sue Plant

Yash Sabharwal​

Joschka Buyel

Luke Guerrero

Gloria Gadea Lopez

Speaker Name

Yash Sabharwal

Max Peterson

Yash Sabharwal