In a startup, hype can be a blessing and a curse. Hype for a company, market or technology opens doors. But it can also trigger overoptimism by true believers, skepticism by doubters, and confusion by a misinformed public (often due to poorly fact-checked, bandwagon-jumping mainstream media coverage).
So it is with artificial intelligence in healthcare. After my last post about startups using artificial intelligence in drug discovery, I spoke with several colleagues and industry scientists about the perception of such startups, and artificial intelligence in general, within pharmaceutical and biotechnology companies. This includes perceptions of BenchSci when we introduce it to industry researchers.
From these conversations, it became clear there are at least three common myths we need to dispel. Do you believe any of these? If so, read on.
It’s Easy to Build
People often suggest that it’s easy to apply artificial intelligence to drug discovery—that we’re just using widely available technology and data. After all, there are many open-source machine learning projects and libraries to apply; powerful cloud servers to use at low cost; and publicly available, nonproprietary biological data sources to leverage for training.
This is, in fact, all true. But the devil is in the details. With the current state of machine learning, you need:
- A lot of meticulously labeled data to do anything useful;
- Feature engineering to determine which features in that data are important for your models;
- A process to build, continuously update and clean your datasets;
- A platform through which end-users can access results;
- And continuous quality assurance to ensure those results are accurate.
Much of this takes significant domain expertise, not off-the-shelf technology. At BenchSci, for example, our science team is about the size of our engineering team. And while we regularly explore new server options and machine learning libraries, our biggest debates are over data comprehensiveness, accuracy, and utility, and the minutiae of user interface elements—the other day, our CTO David Q. Chen, who is an accomplished academic and machine learning expert, was debating background colors with our user experience head in a design meeting. As Steve Jobs said: “Details matter.”
It Does All the Work for You
Another myth we regularly encounter—likely triggered by enthusiasm about exponential improvement in machine learning over the past few years—is that systems leveraging artificial intelligence can completely automate drug discovery tasks.
While many professions may be automated or significantly disrupted by artificial intelligence, the reality is that today for drug discovery, artificial intelligence augments human labor and increases productivity, but doesn’t replace people. I have yet to see any system autonomously develop a drug without a human in the loop. (If you know of one, please tell me in the comments. I’d love to see it.)
With BenchSci, for example, we use machine learning to read, extract and organize unstructured information about antibodies from scientific papers. But scientists still need to search and filter our results and apply their experience and judgement to select an antibody for their work. Just as human-computer chess teams (so-called “centaurs”) can outcompete either humans or computers, so too can artificial intelligence empower research scientists to do better, faster, cheaper, and more reproducible work.
It’s Not Accurate
The third myth we hear regularly is that artificial intelligence is unreliable because its predictions are inaccurate. This myth likely derives from the fact that no artificial intelligence achieves 100% success on any sufficiently complex task, so people may perceive it as consistently underperforming because it’s imperfect. Accordingly, people conflate “80% accurate” with “80% effective” or “80% reliable,” when in fact these are not the same.
To explain, I will first note that humans are imperfect, inaccurate, and often irrational. Yet we’ve done okay. Artificial intelligence researchers therefore typically benchmark their systems’ performance not against perfection, but against a human baseline, striving (initially) to achieve human-level performance. With transcribing audio, for example, humans have a roughly 5.9% error rate—about 1 wrong word for every 17 spoken. We typically forgive people for this, but are much harder on machines like Siri. Machines have actually achieved equivalent transcription error rates to humans (at least in some contexts), for example, but when people hear they’re now 94.9% accurate—which they are—they worry about the remaining 5.1%.
At BenchSci, we hear this often. Our machine learning technology can now extract information about antibodies from scientific papers with greater than 90% accuracy, which is actually significantly better than human-level performance. This allows us to organize information from millions of papers and make it far more useful with less work.
To reach our goal of 100% accuracy is a continuous process. But while our machine learning technology helps with information extraction and labeling, we have other systems, processes, and people in place to identify and correct any errors, meaning that the effectiveness and reliability of our data is greater than the accuracy of our machine learning engine—which itself, as mentioned, is already greater than human.
So, there you have it. Three common myths, and some hopefully useful information to address them. If you’ve heard any other common myths, or don’t feel I’ve sufficiently addressed these, please post a comment to let me know.