BenchSci Blog

How Comprehensive is BenchSci’s Data and Where Do We Get It?

Written by Claudia Hung (she/her) | Sep 8, 2021 2:04:53 PM

Science is nothing without data. But sorting through the world’s history of experimental data to find information relevant to a specific experimental context can be challenging. BenchSci is a  powerful technology that scientists can leverage to quickly find the data they’re looking for, helping them accelerate their research and get novel medicine to patients faster. We have the world’s most comprehensive dataset of experiments, reagents, and model systems, compiled by proprietary machine learning models trained to understand experiments like a Ph.D. scientist. BenchSci empowers scientists with unique insights right at their fingertips to design and run more successful experiments. If you’re curious, here’s a quick breakdown of our data and where we get it.

Scientific Publications: >11 million

Our machine learning models have analyzed over 11 million open- and closed-access publications, and we’re adding new ones monthly. This includes papers published on PubMed Central within the last 15 years, where NIH-funded research is made publicly available within 12 months of publication. We’ve also partnered with several major publishers, including:

Biomedical experiments: >29 million

Within this publication database, our technology has identified more than 29 million individual experiments that utilize:

  • Antibodies: >22 million experiments, including western blotting, immunofluorescence, immunohistochemistry, ELISA, flow cytometry, and assay development 
  • RNAi: >1 million experiments, including knockdown studies
  • Proteins: >3.2 million experiments, including functional assay, activity assay, mass spectrometry, HPLC, and assay development 
  • CRISPR: >49,000 experiments, including knockout and knockin studies
  • Cell products: >2.7 million experiments, including in-vitro and tissue culture 
  • Animal models: >389,000 experiments, including in-vitro, drug treatment, injection, breeding, survival, xenograft, and phenotyping 

Total figures: >7.8 million

One of the most valuable things our platform provides scientists with is figure data. Our proprietary image recognition technology has extracted comprehensive data from over 7.8 million figures, including:

  • >3.2 million figures from peer-reviewed publications
  • >3.4 million vendor-provided figures

In addition, we have >148,900 figures from third-party validators, which include:

  • The Human Protein Atlas (HPA): A Swedish-based program started in 2003 to map human proteins, tissues, and organs, which has contributed to several thousands of publications in the field of human biology and disease
  • Encyclopedia of DNA Elements (ENCODE): An international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI) to build a comprehensive parts list of functional elements in the human genome
  • Protein Simple: Part of the Protein Platforms division of Bio-Techne, which helps thousands of scientists worldwide resolve protein analysis problems to reveal new insight into proteins and their role in disease
  • Lund-Johansen Lab of Oslo University: Has a central role in developing and supporting biomedical research—including basic, translational, and clinical research—within the South-Eastern Regional Health Authority and also pursues international research collaborations
  • European Monoclonal Antibodies Network (EuroMabNet): A European network of laboratories linked to academic institutions for sharing knowledge within European and global communities to identify and produce validated 'fit for purpose' antibodies
  • University of Washington Histology and Imaging Core (HIC): Provides efficient, high-quality solutions for researching immunohistochemistry, image analysis, quantitative microscopy, histology, Luminex cell-based multiplex assay, and comparative pathology consultation
  • The Children's Hospital of Philadelphia Research Institute: Home to one of the most extensive pediatric research programs in the country, fostering medical discoveries and innovations that improve pediatric healthcare and save countless children’s lives

Reagent and model system products: >33 million

We link all these data to over 33 million antibodies, RNAi, proteins, CRISPR, animal models, and cell products from 386 vendors, which include:

Our vendor partners work with us to keep their catalogs up-to-date on the BenchSci platform to ensure scientists continue to get the most relevant and accurate information for their specific experimental context. We also match data with our own metadata to empower scientists with unique insights into trends in reagent and model system usage across the literature. 

We’d love to hear how you use the comprehensive data on the BenchSci platform to advance your research. Let us know in the comments below, and subscribe to our blog for all the latest BenchSci news.