Leveraging Bioinformatics for Antibody Search

Written By: Mohamed Helmy, Ph.D.

Posted On: Nov 16, 2017 | Last Updated: Jun 21, 2024

Bioinformatics is the intersection of biomedical sciences, computer sciences, and information technology, where scientists can drive discovery by harnessing the power of interdisciplinary research.

Bioinformatics has a wide range of applications, including the development of new software tools, new analytical methods for biological data, and online databases for biomedical research. At BenchSci, we use bioinformatics in combination with machine learning to interpret the massive amounts of data accumulating from the fields of genomics, transcriptomics, proteomics, and metabolomics to support antibody search.

Before selecting antibodies for a particular experiment, researchers need to first acquire adequate understanding of their target protein in order to define the most suitable antibody characteristics. In additional to existing anitbody search engines, in this article, I will explore several bioinformatics resources that can help with this respect.

1. Protein/Gene Names and IDs

Most proteins are referred to by several names or "aliases," and it is critical to first correctly identify the protein target prior to your antibody selection.

Gene Names and Gene Cards are the two main resources for the most updated protein/gene names information if you work with human proteins. If you work in other organisms, use UniProt or an organism-specific database such as TAIR for Arabidopsis, WormBase for C. elegans, and FlyBase for Fruit Fly.

For names and IDs conversions, there are several conversion tools provided by the major databases and protein research web servers such as UniProt ID Mapping tool, and the g:Convert tool provided by g:Profiler.

2. Protein Sequence

Once your protein has been correctly identified, it is important to learn more about its characteristics. For instance, some proteins have truncated isoforms where the epitope for some antibodies will be missing. To investigate the protein characteristics, obtain the amino acid sequence of the protein and all its isoforms from UniProt or Ensemble databases.

3. Protein/Gene Function and Features

There are several databases to help you further characterize the protein target for its functions (UniProt), protein family (Pfam or PANTHER), protein domains (InterPro), and 3D structure (Protein Data Bank).

4. Subcellular Localization

For information on the subcellular localization of the protein target, UniProt is a great place to start. Furthermore, there are several specialized databases for protein subcellular localization such as LOCATE, PSORTdb, COMPARTMENTS, and LoCDB.

The subcellular localization of the protein target is particularly important for the analysis post-staining, as it can be different depending on tissues, cell types, drug treatments, activation status, and disease conditions.

5. Gene Ontology (GO) Annotation

The functional and localization information for the proteins in UniProt, Ensembl, or any other major databases are not always complete, and there are many proteins/genes that are still unannotated or poorly annotated. In fact, these proteins often have the most potential as novel therapeutic targets. For those proteins missing proper annotation (and even for those with the best annotation), the Gene Ontology (GO) annotation can provide good information on the protein's:

Biological process: The biological process is an operation or event that is related to the functioning of integrated living units (cells, tissues, organs, and organisms) and can be defined by a beginning and an end.
Molecular function: The molecular function is the elemental activities of a gene product at the molecular level, such as binding or catalysis.
Cellular component: The cellular compartment is the parts of a cell or its extracellular environment where the protein is located.

These sets of annotations provide the foundation for proteins that are poorly annotated (such as the newly discovered proteins) and enrichment to the well annotated ones. Also, GO provides a powerful bioinformatics server with several tool for browsing, searching and querying the GO annotations.

In my next article, I will introduce bioinformatic resources that are useful for protein research.

Until then, try using BenchSci to filter antibodies by the desirable characteristics as defined by your bioinformatic analyses and review published data to support your selection.