BenchSci Blog

Understanding the Limitations of Off-the-Shelf Foundation Models in Preclinical R&D

Written by Liran Belenzon (he/him) | Jun 13, 2024 6:17:47 PM

In recent years, artificial intelligence (AI) has reshaped numerous industries, enhancing innovation and efficiency through task automation and data analysis. This transformation is fueled, in part, by the rise of off-the-shelf foundation models, or large language models (LLMs), like OpenAI's GPT series, BigScience’s BLOOM, and Google's LaMDA. These powerful, pre-trained models, readily available through open-source or commercial platforms, have revolutionized natural language processing and understanding, providing a foundation for developing specialized AI applications and accelerating development cycles. Sectors like customer service, law, and marketing have readily benefited from the advanced capabilities of these models. However, sectors characterized by extreme complexity, such as drug discovery, aerospace, and finance and risk assessment, pose unique challenges that artificial general intelligence (AGI) models struggle to address effectively in their current form. Despite the progress of AI in drug discovery, the intricacies of biological systems and disease mechanisms require specialized approaches beyond the capabilities of off-the-shelf models. 

This has led to a new era in AI, where generalized models are giving way to a more nuanced approach: vertical AI solutions. This shift mirrors the rise of vertical SaaS platforms like Veeva Systems in the life sciences, driven by the realization that one-size-fits-all solutions often fall short when faced with complex, industry-specific challenges. In this blog, we’ll explore the limitations of current AI models in preclinical R&D and chart a path toward AI solutions that effectively address the unique needs of this complex field.

Plug-and-play doesn’t work for everyone

Off-the-shelf models’ plug-and-play nature offers a quick solution for scenarios where the problems are well-defined, and the data is of high quality, with straightforward structures that don’t compromise the model's accuracy. This is particularly true in the legal, marketing, and customer success fields, where we’ve witnessed LLMs improve efficiency and drive tangible outcomes. In such instances, using these models directly or building a simple retrieval-augmented generation (RAG) architecture for data sourcing can work well. However, meeting the diverse needs of user bases and effectively navigating the complexity and messiness of the data in areas like drug discovery, necessitates solutions beyond simple RAG. Let’s explore why this approach has been successful in less complex fields.

Off-the-shelf foundation models have proven to be highly beneficial in certain professional domains due to a combination of factors. First, these fields often rely on established knowledge bases, standardized terminology, and structured communication formats. LLMs excel at processing and generating text within these well-defined boundaries, streamlining tasks such as basic research, analysis, and document creation. This allows professionals to focus on more complex tasks, such as problem-solving, strategic planning, and innovation.

Second, the success of off-the-shelf LLMs is often predicated on the availability of large volumes of publicly available, accurate data. As with any data-driven system, the output quality is fundamentally limited by the quality of the input; the model's performance will only be as good as the underlying data it learns from. When this data is both abundant and accurate, it serves as rich training material for these models, allowing them to develop a deep understanding of domain-specific language and concepts. The structured nature of this data further facilitates the model’s ability to perform tasks like information retrieval and text generation accurately. 

Finally, the adoption of LLMs in professional settings is shaped by the diverse user needs. Factors like the variety of user personas, use cases, accuracy and explainability expectations, and the ease of understanding user intent play a significant role. Integration is easier in fields with shared knowledge, but understanding specific user requirements is crucial for maximizing the effectiveness and adoption of these tools.

Off-the-shelf LLMs excel in fields with established knowledge bases, standardized terminology, structured data, and relatively homogeneous user needs, but their limitations become evident in more complex domains like preclinical R&D. Consider a drug discovery team attempting to use a generic LLM to unravel a complex disease pathway–the results would likely be disappointing. Complexities in preclinical R&D, including intricate biological systems, varied experimental variables, and messy, inconsistent data, coupled with a diverse user base, require more customizable and specialized AI solutions than current off-the-shelf LLMs can provide.

Off-the-shelf foundation models: a gap in preclinical R&D expertise

Off-the-shelf foundation models, while powerful tools for general-purpose tasks are not inherently equipped to handle the unique challenges of preclinical R&D. Their limitations in this domain stem from a fundamental mismatch between their design and the specialized knowledge required for biomedical research. This gap in expertise manifests in three key areas that hinder their effectiveness: (1) the quality and nature of their training data, (2) their capacity for reasoning, accuracy, and providing transparency to the source data, and (3) their ability to understand user intent beyond simple chat interactions and within a scientific context.

The limitations of underlying data in preclinical R&D

The data used to train off-the-shelf models is often general and lacks the specialized knowledge required for preclinical R&D. These models are trained on vast amounts of text and code, but this data rarely encompasses the intricate details of molecular structures, pathways, and regulatory mechanisms relevant to drug discovery. As a result, they struggle to provide accurate and nuanced insights into the complexities of preclinical research. For example, these models may fail to distinguish between different protein isoforms or understand the subtle interactions between drug molecules and their target receptors. This lack of specialized knowledge limits their ability to provide valuable insights for drug discovery efforts.

Current foundation model-based solutions are also limited in their ability to effectively link knowledge, a challenge that is amplified by the messy, unstructured, and often inaccurate nature of the vast data that exists today. A lack of grounding in established ontologies hinders their ability to accurately connect summaries to specific paragraphs in public or internal data. This is especially problematic when considering the large amount of biological information available online, much of which is inaccurate or unreliable. Connecting these existing sources of data, like internal primary research and external ontologies, through RAG is crucial but challenging, as simply feeding this raw data into a foundation model or building an RAG architecture without careful curation and refinement will not yield reliable results. Simply put, it is garbage in, garbage out.

Before this data can be successfully leveraged for conversational AI purposes, it must be refined, synchronized, and correlated effectively creating a roadmap for researchers to navigate and extract relevant information. This shortfall in linking knowledge not only slows down the research process but also poses challenges in synthesizing and analyzing information effectively. Researchers rely on accurate and precise connections between summaries and underlying data to draw meaningful insights and make informed decisions. If off-the-shelf models are not capable of doing this, it diminishes the effectiveness of knowledge extraction in preclinical research.

Accessing valuable internal data stored within pharmaceutical and biotechnology companies is another significant challenge for off-the-shelf models. This data often contains proprietary knowledge accumulated through years of research and experimentation, which could enhance the effectiveness of AI solutions. However, without access, off-the-shelf foundation models are limited in their ability to ingest, extract, and analyze complex scientific data essential for preclinical research. This impacts the development and application of LLM-based solutions in pharma, representing a missed opportunity for these companies to leverage their internal data for innovation and discovery. 

Reasoning, hallucinations, and the need for retrieval-augmented generation

Off-the-shelf models are susceptible to generating inaccurate information, commonly known as hallucinations. These hallucinations arise from the models' inability to reason and comprehend the nuances of biomedical data. While these models may appear to provide plausible answers, they often lack scientific validity upon closer inspection. Without the ability to discern between genuine insights and false correlations, researchers risk investing time, resources, and effort into pursuing leads that ultimately prove fruitless or detrimental to developing safe and effective therapies. This lack of scientific rigor erodes trust in AI-driven solutions, creating a barrier to adoption. Moreover, the lack of explainability in these models makes it difficult to validate results and ensure future compliance with drug submission regulations, further complicating their use in preclinical R&D.

Retrieval-augmented generation (RAG) can address some of these challenges by integrating external knowledge sources to provide more accurate and grounded responses. This approach helps to reduce hallucinations by anchoring the model's responses in real-world data. However, even with RAG, specialized domain expertise and curated data remain essential for effective use in preclinical R&D.

The lack of explainability in off-the-shelf models further compounds their limitations in preclinical R&D. While RAG can improve access to relevant information, it doesn't guarantee accurate connections to the right data for generating meaningful insights. These models' inability to provide clear reasoning or references for their responses hinders researchers' understanding and undermines the reproducibility and transparency of scientific results, which are critical aspects of preclinical research.

Understanding the user intent and experience in preclinical research

Off-the-shelf LLMs struggle to understand the specific intent and context of scientific queries in preclinical R&D. They lack the ability to tailor responses to individual scientists' needs, overlooking user interests and expertise. This results in a generic approach that fails to create a personalized experience conducive to efficient interaction and insights extraction for researchers. To effectively contribute to preclinical research, these models need to understand the scientific context of a question, the user's expertise level, and the specific goals of the research project. This requires a level of complex prompting that goes beyond simple keyword searches.

Consider a scientist asking, "What are the potential drug targets for Alzheimer's disease?"  This seemingly simple question masks a complex informational landscape. Scientists are not a single persona; their needs are as diverse as the research they conduct. Factors such as their specific role, therapeutic focus, and stage of drug development all shape their information requirements. An off-the-shelf model, unable to grasp these nuances, might offer a generic response. However, a sophisticated AI-driven system could recognize these diverse personas and tailor its response accordingly. 

Furthermore, off-the-shelf models face limitations in handling visual elements and linking knowledge. Unlike fields where simple text responses suffice, preclinical research necessitates a deeper understanding of scientific context and the ability to interpret visual elements like graphs and molecular structures. These tasks require visual analysis and spatial reasoning, which are beyond the capabilities of most foundation models today. This lack of visual integration, coupled with the need for extensive prompt engineering, impacts the user experience and makes it challenging for scientists to effectively use these tools in preclinical R&D. 

The future of AI in preclinical R&D 

AI can revolutionize preclinical drug discovery, but maximizing its potential requires collaboration. Off-the-shelf AI models are a good start, but pharmaceutical companies need specialized AI software providers to overcome current limitations, developing AI models that understand biological systems, predict off-target effects, and simulate complex interactions. These partnerships can accelerate timelines and reduce waste in preclinical R&D. 

Drug discovery is complex and realizing AI's full transformative power demands a collective effort from AI companies, LLM providers, pharmaceutical companies, and regulators to drive innovation, research, and establish clear guidelines. This will lead to faster, more sustainable drug discovery and ultimately bring new medicines to patients faster.