From Why To Which One
For eight years, the hardest part of every customer conversation was justifying the category.
"Why does my scientist need AI at all?"
That was the question. Every meeting, every RFP, every budget discussion with a pharma R&D head. We spent years making the case for why AI belonged in the lab — why a biologist who'd spent 15 years mastering their craft should trust a model with their experimental design. It was a real objection, and it was fair.
Six months ago, that question stopped. I haven't heard it since.
The question now is: "Which one should I get them?"
That's a better problem to have. But it comes with its own trap, and I've watched a lot of smart leaders walk straight into it.
Two Camps
I tell people the landscape breaks into two lanes.
The first: vertical solutions built specifically for drug discovery. Causaly, Phylo, FutureHouse (commercialized through their spinout Edison Scientific), BenchSci, Owkin. Every one of these companies was built from the bench up. They know the domain — the experimental workflows, the literature density, the specific ways a medicinal chemist or oncology researcher actually thinks when they're designing an experiment or evaluating a target.
The second: frontier models with life sciences variants. GPT-Rosalind from OpenAI. Claude for Life Sciences from Anthropic. Gemini from Google. These are backed by the world's most powerful AI labs. Every few months, one of them publishes a benchmark showing near-superhuman performance on some biology task. They're working with Amgen, Moderna, and Novo Nordisk. The press releases are impressive.
I don't believe they'll win the lab.
The Specialist's Tool
Here's the argument.
Frontier models are trained to be excellent at everything. That breadth is their product — and the exact reason they're the wrong primary tool for a specialized R&D scientist.
Think about how specialized work operates in any field. Cardiac surgeons don't use the same tools as GPs. Patent litigators don't use the same research platforms as corporate generalists. Formula 1 mechanics don't fit the same parts to every car on the grid. The more specialized the work, the more purpose-built the tool. This isn't a controversial take. It's how every mature industry works.
R&D scientists are among the most specialized professionals on the planet. A medicinal chemist working on a kinase inhibitor doesn't need a tool that can write legal briefs and summarize board memos. They need something that understands their protein class, their failure modes, what it means when an assay result is off by twofold, and how to surface the three 2019 papers that nobody read but are directly relevant to what they're seeing today.
Our team was at a pharma R&D leadership forum recently. The conversation had completely shifted — not from our side, from theirs. The question wasn't "can AI help?" It was "What conditions make it reliable at scale?" The executives I spoke with from AZ, Novartis, GSK, and AbbVie were clear: the moat isn't who has the most capable underlying model. It's the one who has the most curated, domain-specific knowledge embedded in the product. General agentic AI plus a good prompt doesn't get you there.
Frontier models will be everywhere — in emails, in internal wiki tools, in executive dashboards. They'll be the background layer of the enterprise. But what tool does a scientist open at 9 am when they sit down to design an experiment? That's going to be something purpose-built for that work. The same reason specialized employees have always had specialized tools.
What I Actually Tell Customers
Here's where I diverge from how most vendors answer the "which vertical solution?" question.
When a pharma CIO or R&D head asks me, "Should we go with Causaly, Phylo, FutureHouse, Owkin, or BenchSci?" — the temptation is to go feature by feature. And every company in that list will give you a compelling slide. The capabilities start sounding the same. The differentiation blurs.
I don't do that anymore.
What I tell them: pick two or three tools. Put them in front of your scientists. Let them use them for a month. Don't decide this in the procurement room.
I say this knowing full well that BenchSci is in that shortlist, competing against the others. I still say it. Because it's the right answer.
Specialized professionals immediately know the difference between their tools. A Formula 1 driver doesn't need a spec sheet to evaluate a car. They feel it in the first corner — the chassis response, the brake bias, whether the car trusts them back. Two laps and they know. You cannot simulate that evaluation from a feature comparison any more than you can hire a surgeon by reading their CV.
The same thing happens when a scientist sits down with a new AI platform. They know within hours whether it actually "gets" their work — whether the evidence it surfaces is genuinely relevant, whether the reasoning maps to how they think, whether it reduces their cognitive load or adds to it.
The tool they keep opening every morning is the one you should buy.
This is how we win at BenchSci. Not by out-sliding the competition. We win when scientists use the product and feel the difference. Every feature on a comparison matrix looks the same until a scientist is sitting in the car.
The difference, for them, is between a failed experiment and a successful one. Between six months of follow-up work and a clear direction. Between a wasted lead and a drug candidate worth pursuing.
That's not a feature. That's the whole thing.
Put the tools in front of your scientists. Let them choose.
The answer is already there. You just have to trust the driver.
Member discussion