Unlocking the Black Box: Building the Evidence-Based Foundation for Drug Discovery

Written By: Fernando Suarez

Posted On: Sep 16, 2025 | Last Updated: Dec 4, 2025

Disease biology remains a black box—not because we lack data, but because the evidence needed to understand it is scattered and buried across different scientific worlds: genomics, proteomics, metabolomics, clinical trials, and more. This incomplete understanding of biology is the leading cause of clinical trial failures, costing the industry billions annually and impacting millions of patients. To unravel this complexity and make evidence-based predictions about what will work, we need to connect these disparate pieces into a unified, reasoning system.

Neuro-symbolic AI is the key to unraveling this black box. Still, it requires a foundation: a Biological Evidence Knowledge Graph (BEKG) that unifies evidence-backed biomedical knowledge across disciplines and acts as a living map of all disease biology. Identifying, extracting, understanding, and connecting reliable and evidence-backed data from tens of millions of papers is a massive challenge in manual or AI-powered efforts. That is our goal at BenchSci: to turn the world’s scientific information into the evidence-based scientific truth.

The current state of knowledge extraction from scientific literature is fundamentally fragmented. Today's approaches rely on multiple specialized systems—one to extract genes, another for diseases, yet another for drug interactions—each capturing only isolated entities without understanding their relationships. Attempting to stitch these disparate extractions together leads to poor accuracy, missing connections, and a lack of the evidence chains needed to understand how these entities actually interact. This is the first and most critical problem that must be solved to build a comprehensive evidence map of disease biology.

We're proud to say we’ve achieved what others thought was impossible. Through a breakthrough discovery—that all scientific papers share a universal scientific grammar regardless of field—we've built LENS (Literature Extraction and Network Semantics), the first system capable of extracting complete, evidence-based insights from any biomedical paper with over 90% completeness and less than 1% hallucination rate. This stands in stark contrast to general-purpose LLMs, which, according to independent academic benchmarks, only capture 15-35% of critical biomedical concepts when used in conversational interfaces. LENS, on the other hand, provides this unprecedented accuracy and recall across all biomedical literature, outperforming manual curation, which is measured in papers per week rather than thousands per day.

LENS fundamentally changes how we innovate. Instead of starting from educated guesswork, teams can now begin with a comprehensive BEKG, grounded in traceable evidence. What makes LENS so valuable isn’t just the breadth of what it captures—it’s the depth. LENS is designed to extract and understand scientific evidence, unlike systems that capture the authors’ claims and conclusions, which are often unreliable. By extracting the full experimental context around every finding, LENS allows you to understand what was discovered, how, and under what conditions, helping address the lack of reproducibility. With neuro-symbolic AI, you can confidently explore and unravel the unknown, discover hidden connections, and unlock insights that give your organization a lasting competitive advantage.

The Crisis Your Research Organization Faces Today

Drug discovery is one of the highest-stakes endeavors in science. Every decision can cost millions, every delay impacts patients, and more than 90%¹ of clinical trials still end in failure. Despite an explosion of scientific data, the pace of innovation has slowed. Why? Because the leading cause of failure is not a lack of data—it’s turning that data into an evidence-based map to unlock the understanding of disease biology.

The most important breakthroughs in disease research remain buried in fragmented, complex data, leaving R&D teams with blind spots, irreproducible results, and costly failures that delay life-saving therapies. The true power of AI lies in going beyond human capacity, but it can only do so once the data problem is solved. By unifying and contextualizing evidence—from genetic modifications to disease pathophysiology—AI can finally reveal patterns invisible to humans, accelerating discovery and transforming patient outcomes.

This is the “black box” of disease biology—and it’s the ultimate frontier for AI.

AI offers an opportunity to illuminate this hidden landscape. By connecting evidence across biology, chemistry, and medicine with full traceability, AI can surface patterns and relationships that humans alone cannot see and suggest new hypotheses. Done right, it has the potential to reduce costly failures, accelerate R&D timelines, and usher in a new era of faster, more effective therapies for patients.

Neuro-symbolic AI to Unravel Disease Biology

Imagine AI that doesn't just process text but truly understands the mechanisms of disease and can use that understanding to infer causal relationships and make evidence-based hypotheses. Not only pattern matching or statistical correlation, but genuine comprehension of how biological systems interact, break down, and can be repaired.

For years, we’ve heard about the promise of neuro-symbolic AI. Now, that potential is coming to fruition. By combining the pattern recognition of neural networks with the logical reasoning of symbolic systems, it can trace pathways from molecular interactions to clinical outcomes, infer likely treatment responses, and predict which drug combinations will work synergistically—all with explainability.

But neuro-symbolic AI cannot exist without a foundation. To reason across domains reliably, it requires a comprehensive, evidence-based map of biomedical knowledge where every connection is quantified, contextualized, and traceable to its source. That foundation is the Biological Evidence Knowledge Graph (BEKG).

The Missing Foundation: BEKG—Evidence-Based Maps Across Science

To break through this impasse, biomedical research needs a unifying foundation: a Biological Evidence Knowledge Graph (BEKG).

The BEKG is not just another database. It is a living, evidence-backed map of disease biology. Unlike traditional knowledge graphs that record “Drug X affects Protein Y,” the BEKG preserves the full experimental context. Its power lies in connecting evidence in a way that hasn't been done before, and in bringing together different OMICS data—like genomics and proteomics—to create a vertical, end-to-end view of science. All of this is traceable back to the original source, revealing precisely what was studied, how it was measured, under what conditions, and with what results.

This evidence-first approach turns information into scientific truth. Every connection in the BEKG is backed by primary research, making it reliable, reproducible, and trustworthy. By linking findings across disciplines, the BEKG reveals hidden relationships—connecting chemistry insights, biology mechanisms, and clinical outcomes into one coherent picture. It transforms AI from a pattern-matching tool into a reasoning system—one that can generate and validate hypotheses, identify causal relationships, and predict outcomes based on mechanistic understanding.

Until now, building the BEKG at scale with the required accuracy has been impossible. The bottleneck is extracting precise, contextual information from tens of millions of papers. Current approaches fail at either accuracy or scale.

The challenge lies in converting unstructured text into structured, reliable knowledge. When tasked with extracting the complete experimental details from a research paper, general-purpose LLMs like GPT-4 struggle.

Independent academic benchmarks show that in naive, zero-shot settings, these models can fail to capture 65-85% of critical biomedical concepts (F1 scores of 0.15-0.35).
Our own internal testing confirms this gap, revealing that conversational queries to these models often yield answers with less than 55% completeness, systematically omitting the technical details required for reproducible research. This is not a flaw in the models themselves, but a reflection of using the wrong tool for a highly specialized job.
Domain-specific AI systems can reach 80–90% accuracy but only in narrow fields, requiring expensive retraining for each new use case.
Manual curation delivers quality but at a pace measured in papers per week, not thousands per day.

Without solving this extraction challenge, the vision of neuro-symbolic AI remains just that—a vision.

Our Breakthrough Discovery: Universal Scientific Grammar

We discovered something others missed: beneath the specialized vocabulary of different fields, all scientific papers share the same fundamental communication structure. Whether describing oncology mechanisms or cardiovascular interventions, every paper follows the same logical flow—research question, experimental methods, measured results, and evidence-based conclusions.

This Universal Scientific Grammar is why a single system can extract information across all biomedical domains without retraining. Instead of teaching our system the vocabulary of each field, we taught it to understand the structure of scientific reasoning itself.

LENS: The Foundational Innovation That Makes the BEKG Possible

Literature Extraction and Network Semantics (LENS) represents a fundamental shift in how we approach scientific knowledge extraction. Unlike systems that extract authors' claims and conclusions—which can often be overstated—LENS focuses exclusively on evidence-based insights from experimental results. Rather than trusting what papers say happened, LENS extracts what the data actually shows: the specific experiments conducted, the measurements obtained, and the statistical outcomes observed. This evidence-first approach, applied across any biomedical field from immune pathways to modelling Alzheimer's disease, is what makes LENS uniquely reliable for high-stakes decisions.

LENS operationalizes this breakthrough through a two-phase architecture that mirrors both the structure of scientific papers and how expert scientists read them:

Phase 1: Deep Reading transforms human-readable papers into machine-readable structured data. Following Universal Scientific Grammar, LENS systematically extracts the research question being asked, the methods used to test it, the results obtained with their measurements, and the conclusions drawn from the evidence.

Phase 2: Knowledge Synthesis builds inferential networks from this machine-readable data. LENS connects the extracted elements—linking methods to results, results to conclusions, and conclusions to broader biological understanding. For example, it recognizes that "PCNA" and "proliferating cell nuclear antigen" refer to the same protein, preserves all pathway connections mentioned in the source, and maintains complete evidence chains showing how each conclusion follows from its supporting experiments.

This two-phase approach is crucial: without high-quality machine-readable extraction in Phase 1, you can't build reliable knowledge graphs in Phase 2. Without reliable knowledge graphs, neuro-symbolic AI has no foundation for reasoning.

The Right Tool for the Job

Many researchers now turn to general-purpose LLMs like GPT-4 to summarize or find information in scientific papers. While incredibly useful, it's important to recognize that this interaction is fundamentally different from what a specialized extraction system does. The key is understanding the distinction between Conversational Q&A and Systematic Extraction.

Conversational Q&A (General LLMs): When you ask an LLM, "What are the primary biomarkers in this study?", its goal is to provide a helpful, human-readable summary. It is optimized for conversation and general understanding, not for capturing every single data point with perfect fidelity.
Systematic Extraction (LENS): In contrast, LENS is designed for a systematic, structured task. It processes a document with a single, pre-defined goal: to extract specific, pre-defined types of information (e.g., all assays, all biomarker-result pairs, all subject characteristics) and organize them into a complete, reliable, and machine-readable format.

Fundamentally, it comes down to using the right tool for a highly specialized job. For building a comprehensive, evidence-backed knowledge graph for scientific research, a conversational summary is insufficient; a systematic, complete extraction is essential.

Proof at Scale: The Numbers That Matter

LENS is a proven system operating at a massive scale. Based on rigorous evaluation, including ~50 scientific domain experts reviewing research papers and extensive LLM-based assessment of thousands of papers, LENS demonstrates over 90% accuracy across all biomedical domains—from oncology to immunology to neuroscience. Unlike domain-specific systems that work only within their trained fields and tasks, LENS maintains reliable performance across all biomedical disciplines without domain-specific training, while preserving complete evidence traceability and operating with a hallucination rate below 1%.

When used for Conversational Q&A about research papers provided, general-purpose LLMs like GPT-4 achieved only 32.5% completeness in capturing critical information, while Claude reached 35%. LENS maintains over 90% completeness in structured extraction from the same research papers provided to the LLMs, enabling scientists to validate claims, trace evidence to its source, and uncover cross-domain connections. Combined with its ability to continuously process new literature and integrate findings into knowledge graphs, LENS provides research organizations with the foundational technology that is both scientifically rigorous and transformative for drug discovery.

The Strategic Impact: What This Means for Your Organization

The strategic advantage is clear: LENS works in any new research area, enabling true interdisciplinary research through one unified system for extracting and understanding scientific data. When every claim links to its experimental source, scientists and organizations can have high confidence in AI-supported insights for high-stakes decisions.

See Connections Others Miss: Cross-domain insights that typically take years to discover become visible immediately. For example, prompt awareness of how a known cancer drug could be studied for treating a rare neurological disease.
Make Evidence-Based Decisions: Every hypothesis, every decision point backed by traceable evidence from primary sources. No more betting millions on AI-generated assumptions.
Accelerate Discovery Timelines: Literature reviews that took weeks can now be completed in hours. Systematic understanding of entire disease areas in days, not months.
Build Sustainable Competitive Advantage: While competitors struggle with fragmented AI tools and information silos, you operate with unified, comprehensive intelligence across all therapeutic areas.

The Future This Enables

With a BEKG as the foundation, neuro-symbolic AI moves from promise to reality, enabling research organizations to unlock new capabilities. Scientists and research organizations can identify novel drug targets by analyzing disease mechanisms at a systems level, infer likely clinical trial outcomes by synthesizing the full spectrum of preclinical evidence, and uncover opportunities to repurpose existing research/therapeutic modalities by recognizing mechanistic similarities across diseases. They can even design more effective combination therapies through better pathway interactions and analysis of synergistic effects.

This transformation isn't just about processing information faster—it's about deepening our understanding of disease biology in ways that were previously impossible. By uniting structured biomedical knowledge with the reasoning power of AI, BEKG-powered systems elevate discovery from isolated insights to integrated understanding, equipping organizations with the tools to anticipate outcomes, accelerate innovation, and ultimately bring better therapies to patients.

The organizations that will define the future of therapeutic discovery are the ones building evidence-based foundations today. Those who adopt a BEKG first will see connections others miss, accelerate discovery timelines, and secure a lasting competitive edge. LENS makes this possible now, at scale, and with the accuracy required for the highest-stakes decisions.

The future of drug discovery isn't about having more AI systems—it's about having the right foundation for AI that truly understands disease biology. With a BEKG and LENS, that future is here.

References

“Why 90% of clinical drug development fails and how to improve it?” Acta Pharm Sin B., February 2022, https://pmc.ncbi.nlm.nih.gov/articles/PMC9293739/