← home

OpenAI vs Anthropic for healthcare: what each company's roadmap is for the next 12 months

In my work building infrastructure for desktop computer-use agents at Nen, I work closely with customers in healthcare. My personal interest in healthcare extends before that — I've been on the sharp end of the US system as a patient through several knee surgeries, plus a $30k+ medical bill that only got resolved after a year of back-and-forth with insurance and the hospital. I also worked in technology in the Singapore government before Nen. So when two frontier labs shipped healthcare lines three days apart in January, I read both carefully.

OpenAI launched ChatGPT Health and ChatGPT for Healthcare on January 7–8, 2026. Anthropic launched Claude for Healthcare at JPM26 on January 11, 2026. Three days separated the announcements, and most of the press treated them as the same move: the two frontier labs finally getting serious about healthcare.

They aren't the same move. I spent the last few months reading each company's public posts, mapping the hires, pulling the customer lists, and running the April releases through a three-segment frame of the healthcare AI market. The products are aimed at different buyers, sold through different motions, and backed by hires that pre-committed each company to a strategy months before JPM26. This post walks the frame, then the evidence in three places — releases, partnerships, hires — and closes with what I'd like to see Anthropic build.

Contents

1. A primer on the healthcare AI market

Three buyer segments matter for frontier-model healthcare products over the next 12 months. They're distinct enough that treating them as one is the mistake most external coverage makes:

  1. Practitioners — clinicians. Bought either directly by an individual physician or institutionally through a health system.
  2. Consumers — patients and members. Bundled with a consumer subscription today (ChatGPT Free, Go, Plus, Pro), possibly unbundled later.
  3. Payer/provider admin — claims, prior authorization, utilization management, eligibility, denials, appeals. Institutional procurement only.

A clarification on the third, because the word "administrative" does two jobs in healthcare and they point opposite directions. Provider-side admin is a hospital ops team or a clinician drafting a PA letter, a discharge summary, an appeal — work that belongs to the care-delivery side of a transaction. Payer-side admin is adjudicating that PA, running the eligibility check, issuing the denial, or processing the claim — work that belongs to the insurance side. Same transaction, opposite sides of the wire. The two are sold to different buyers, have different compliance postures, and respond to different regulatory levers. Anthropic's Phase 1 public framing is payer-side admin; OpenAI's "prior authorization support" inside ChatGPT for Healthcare is provider-side drafting. Same word, different products.

Put together, the segments I'll use for the rest of the post are: practitioner, consumer, provider-side admin, payer-side admin. OpenAI's January products hit practitioner and consumer. Anthropic's hit provider-side admin and (partnership-stage) consumer, with payer-side admin telegraphed through hiring but not yet through product.

2. TL;DR

OpenAI is building for clinicians and consumers. Anthropic is building the platform for the payer-provider workflow layer. The two launched healthcare lines within four days of each other in January, but they're not running the same play.

The headline indicator is which benchmark each company invests in. OpenAI owns HealthBench — clinical reasoning against natural-language medical questions, GPT-5 leads (GPT-5 in "thinking" mode hits 0.46 on HealthBench Hard versus 0.26 in non-thinking; o3 beat Claude 3.7 Sonnet there in March 2025) — and shipped HealthBench Professional in April. Anthropic's release notes have cited MedAgentBench, where Opus 4.5 already scored 92.3% — approaching saturation, with little headroom left to drive product decisions. Even setting saturation aside, MedAgentBench measures clinical workflows inside a single EHR (lab ordering, note review, retrieval-style QA), which doesn't capture the multi-system coordination real provider admin requires. HealthAdminBench (Kinetic Systems + Stanford) is the current best example of the right shape for that work: end-to-end agent task success across real payer portals, EHRs, and fax. Claude Opus 4.6 leads at 36.3% versus GPT-5.4 CUA at 26.7% on it. Caveat: HealthBench is OpenAI's benchmark — they tune against it — but the lead is real. Different things measured, different products built.

The split tracks each company's underlying DNA. OpenAI wants to be the next great consumer platform — ChatGPT is one of the most-used consumer products on the internet, and healthcare is the highest-frequency consumer use case after coding and general assistance. Anthropic's orientation is work — Claude Code is its breakout enterprise product; Cowork extends the same agent-does-work-alongside-you model to non-engineering roles. In healthcare, that shows up as provider admin and payer admin: high-volume workflow categories where the agent's job is to complete a task against a real system of record, not answer a clinical question.

The rest of this post walks the evidence in three places: the public releases, the partnerships, and the hires. Each makes the same point from a different angle.

3. What the public releases point to

I pulled every public product announcement from each company between January 1 and April 24 and categorized them by segment.

DateCompanyProductSegment
Jan 7, 2026OpenAIChatGPT Health (waitlist, US-only; b.well + Apple Health + Function + MyFitnessPal + Peloton + AllTrails + Instacart + Weight Watchers; >230M weekly health questions cited; built with 260 physicians across 60 countries)Consumer
Jan 8, 2026OpenAIChatGPT for Healthcare (GPT-5.2; SAML SSO, SCIM, BAA, CMK; AdventHealth, Baylor Scott & White, Boston Children's, Cedars-Sinai, HCA, MSK, Stanford Medicine Children's, UCSF; discharge summaries, patient instructions, clinical letters, PA-support drafting; SharePoint integration)Clinician
Jan 11, 2026AnthropicClaude for Healthcare (connectors: CMS Coverage Database, ICD-10, NPI Registry, PubMed; Skills: FHIR, PA review; consumer partnerships: HealthEx, Function, Apple Health, Android Health Connect)Payer-provider admin
Jan 11, 2026AnthropicClaude in Microsoft Foundry (healthcare + life sciences)Payer-provider admin
Apr 22, 2026OpenAIChatGPT for Clinicians (free; NPI-verified via Persona; GPT-5.4; Skills for referral letters, PA support, patient instructions; deep research across medical journals; CME credits as byproduct)Clinician
Apr 22, 2026OpenAIHealthBench Professional (6,924 physician-reviewed conversations; 99.6% rated safe and accurate; reported to outperform human physicians across care consult, writing/documentation, medical research)Clinician
Apr 2026AnthropicCowork GA (enterprise productivity surface with plugins, private plugin marketplaces, enterprise controls; PwC announced healthcare-plugin development on Cowork)Payer-provider admin

OpenAI's column is three products each matched to a persona — consumer (ChatGPT Health), institutional clinician (ChatGPT for Healthcare), individual clinician (ChatGPT for Clinicians) — plus a benchmark (HealthBench Professional). Anthropic's column is a set of primitives (connectors, Skills, enterprise-platform distribution, plugin surface) that show up inside other people's products.

The individual-clinician path is worth walking through. ChatGPT for Clinicians' signup requires NPI ID entry, cross-reference against the NPPES registry, then Persona identity verification before access is granted. Brendan Keeler's April 24 Health API Guy post notes this is substantially stricter than OpenEvidence's knowledge-based state-licensure quiz, which the Doximity court filings allege is bypassable. Verification rigor matters for CME, pharma-ad monetization, and enterprise upsell — ChatGPT for Clinicians is built to be the system of record for "this user is a verified physician," and the rest of the OpenAI clinical motion compounds on top of that.

The end user is the fastest way to see the difference. The person who opens ChatGPT Health is a patient sitting down with bloodwork results, an insurance plan, or a list of pre-appointment questions. The person who opens ChatGPT for Healthcare is a clinician or hospital admin using a SharePoint-grounded workspace deployed by their IT team. The person who opens Claude for Healthcare doesn't exist yet — the connectors and Skills Anthropic shipped go into other people's products. The end user is a biller at a clinic running a vertical app like Tennr or Candid, or a clinician using Elation with Claude underneath them.

4. What the partnerships point to

Brendan Keeler's April 24 Health API Guy post puts rough numbers on the overall shape: OpenAI is roughly 80% consumer / 20% API revenue; Anthropic is the inverse. Anthropic also has a 70% win rate on new enterprise deals and 54% of the enterprise coding market via Claude Code. Those aren't healthcare-specific numbers, but they're the load-bearing prediction for what each company's healthcare partnership shape would look like — OpenAI's center of gravity is consumer; Anthropic's is API-under-someone-else's-product. The partner lists below match.

OpenAI's consumer-product integrations are data-in streams. b.well is the most structurally interesting: a consumer-facing aggregator that pulls patient records from thousands of US providers via FHIR and hands the identity and consent layer to OpenAI without OpenAI building the provider-side plumbing. Apple Health, Function, Peloton, MyFitnessPal, AllTrails, Instacart, and Weight Watchers are lifestyle and vitals signals (steps, sleep, meals, workouts, lab results). Composed, the relationship gives ChatGPT Health a patient's full health context for a conversation. The user is a patient at their kitchen table.

OpenAI's named API customers in clinical tools are Abridge and Ambience (ambient clinical documentation), EliseAI (patient communication), and Penda Health (a Kenya-based primary care network with a published study on OpenAI-powered clinical copilots reducing diagnostic and treatment errors in routine care). These are products built on OpenAI, not OpenAI products distributed through them. The partnership value is OpenAI getting the per-token economics while its customers own the clinician-facing product. These are also customers willing to commit publicly: Abridge — a Series E ambient-documentation vendor used across hundreds of US health systems — went on the record in an April 2026 blog post as a GPT-5.5 launch partner, with their CDS product running on the new model and reporting a 25% lift in clinical-quality coverage and 30% more concise responses versus GPT-5.2. That's the shape of trust OpenAI gets back from this tier of partner.

Anthropic's named partner list is shaped like a platform's customer roll, with two visible layers. The tier-1 enterprise customers are Banner Health, Premier, and Elation Health. Banner Health (tier-1 nonprofit, $14B+ revenue) rolled out BannerWise to 55,000+ employees with a published physician-burnout case study; Premier (NYSE: PINC, ~$1.4B revenue) is the GPO channel into 4,400+ member organizations; Elation Health (the established small-practice EHR) reports a 61% chart-review-time reduction and is Anthropic's only direct surface into the sub-10-provider segment. Banner is the headline reference deployment; Premier and Elation each cover segments Anthropic otherwise wouldn't reach.

The vertical-AI-app cohort spans Series A through multi-billion-valued private companies. Mid-to-late-stage clinical-workflow vendors include Qualified Health (Series B, $125M; built on Claude Sonnet 4.5 for University of Texas System patient identification across a 2M+ population), Heidi Health (Series B, $65M; multilingual clinical scribe in 110+ languages), Commure (multi-billion private; AI-native RCM and ambient documentation), and Viz.ai (Series D; named without a public Claude use case detailed). Earlier-stage provider-admin and RCM apps include Carta Healthcare (66% data-processing-time reduction at 99% accuracy), Tomorrow Health (Horizon Suite hits 95%+ clean-order rates on DME orders), and Brellium (pre-billing clinical compliance). Pair Team sits at the care-management layer (Medicaid care coordination with Flora and Arc, 52% ED-visit reduction).

Compared to OpenAI's named clinical-partner roster — which clusters tighter and later, with Abridge (Series E, hundreds of US health systems) as the headline plus Ambience, EliseAI, and Penda Health as a small set of mature clinical-documentation and patient-comms peers — Anthropic's list reads as less focused. OpenAI has picked clinical tools as its segment and gone deep. Anthropic has eleven partners across four workflow categories and roughly four stage brackets, with no single deep relationship at the top. That's the platform posture in its honest form: wider footprint, no segment narrative. A procurement committee comparing "OpenAI is the clinical-tools standard, with Abridge integrated everywhere" against "Anthropic is under eleven different things at eleven different stages" will probably read the second as less convincing — not because breadth is wrong for platform economics, but because depth is what tells a buyer the platform has gotten good at their specific segment. The bet Anthropic is making is that breadth across vertical apps eventually compounds into platform dominance. It might. It also might be that they haven't decided yet which segment to commit to.

5. What the hires point to

I read each company's public health-team roster and cross-referenced public bios against resumes on LinkedIn and speaker profiles. Org charts and resumes point toward what each company wants to prioritize, and both companies' healthcare hires read clearly enough to compare.

OpenAI's top of the house is Nate Gross, MD/MBA, VP of Health. Gross co-founded Doximity and was its chief strategy officer; he co-founded Rock Health before that. The Doximity playbook is specific: NPI-verified free product, 80%+ US physician penetration, enterprise upsell on top. That's the exact shape ChatGPT for Clinicians took when it launched April 22. Ashley Alexander, who leads healthcare product, was co-head of product at Instagram — consumer PLG muscle, hired into the function that owns ChatGPT Health. Karan Singhal leads Health AI; he was Staff Research Scientist at Google Brain from September 2019 through April 2024 and co-led Med-PaLM and Med-PaLM 2. HealthBench and HealthBench Professional are his shop. James Hairston, Global Head of Innovation Policy for health, devices, and robotics, came from Meta's VR/AR policy team — the relevant experience is navigating regulators on a new device category.

Read together, those four are a clinician-PLG motion, a consumer-PLG product chief, a clinical-AI research lineage applied to benchmarks, and policy muscle sized for an FDA conversation. OpenAI is applying its native consumer-first motion to a B2B-flavored vertical.

Anthropic's top of the house is Syed Mohiuddin, MD, Head of Healthcare. The relevant line on his CV is not the DO from Michigan State or the Detroit Medical Center residency. It's that from October 2024 through August/September 2025 he was SVP and Chief AI Transformation and Strategy Officer at UnitedHealth Group — ten months inside the largest US payer, running AI strategy. Before UHG, from September 2021 to May 2024, he was Counselor to the HHS Deputy Secretary under Biden-Harris, co-lead of the HHS AI Task Force, and the person who directed HHS's input to the Biden Executive Order on AI. Seven years at McKinsey in healthcare and public sector practices before that. He's based in DC.

That is not a consumer-product hire. That is a hire built for payer-side admin, regulator trust, and the 12-to-24-month enterprise procurement cycles those segments run on. The public signal is that Anthropic's Phase 1 in healthcare is payer admin and federal systems, not a clinician PLG app.

The rest of the Anthropic org is consistent with that read. Amol Avasare is Head of Growth and was the spokesperson for the HealthEx consumer-records partnership at JPM26. Rohan Siddhanti runs healthcare GTM out of NYC; his background is VP of BD at Healthie, Director of Enterprise BD at Bright Health (a payer), and Oliver Wyman Payer/Provider strategy. Dave Nolan is the enterprise AE for Healthcare & Life Sciences in SF. Anthropic also hired a Payers Lead in late 2025 (the Greenhouse posting closed December 16).

No Anthropic hire is shaped like a consumer-PLG lead or an individual-clinician acquisition lead. That's a structural choice, not an oversight.

6. What I'd like to see Anthropic build

TL;DR. The single highest-leverage thing Anthropic could ship in the next 12 months is Claude Cowork for Healthcare Administration: a first-party provider-ops product on the existing Cowork form factor, with first-party Skills, official clearinghouse integrations to be built (Availity, Stedi, the CMS PA API as it goes live in 2027), and computer use as the platform capability that closes the long tail of payer-portal workflows. Around it: an internal eval suite for payer-side admin artifacts, a first-party secure CUA runtime that gives healthcare deployments a defensible safety envelope, and the discipline of publishing a monthly HealthAdminBench-equivalent reliability number per Claude release. The anti-roadmap is a Claude-branded product that renders, drafts, or scores insurance denials.

I spend most of my time at Nen on workflow productivity — the layer below clinician-facing products — so Anthropic's roadmap is the one that directly affects what I build. Writing from outside Anthropic; this is what I'd argue for if asked.

It helps to step back and name the platform primitives a foundation-model company has to ship to win an enterprise vertical. Roughly four:

  1. Domain model capabilities — measured by external benchmarks (HealthBench, MedQA, MedAgentBench, HealthAdminBench) and a much larger volume of internal evals on artifacts the model has to produce.
  2. Agent harness — split into context (Skills, memory, retrieval) and behavior (the Claude Code style of agent loops vs the Cursor-on-Claude style of in-IDE assistance).
  3. Integrations — connectors to systems of record, roughly split between payer portals and provider EHRs. Powered by both programmatic access (API/MCP) and computer use.
  4. Managed agents — infrastructure for running agents in production. New as of the last twelve months; still mostly infra, but starting to show up as a product surface.

A vertical healthcare play has to follow the same playbook, at least for the first year. As more general-purpose primitives ship — agents with first-class identities like email and phone numbers, for example — they'll become more or less applicable to healthcare-specific workflows. The wishlist below is what I'd ship inside each primitive over the next twelve months.

1. Domain model capabilities

HealthBench (the lead is in §2) tells the clinical-reasoning story for clinicians. The same model accuracy matters in the payer/payee world too — a coverage determination that misreads NCD or LCD guidance is a denied claim downstream.

What Anthropic should also develop, internally, are evals specific to the artifacts generated during prior authorization and revenue-cycle workflows. The artifacts are well-named in industry terminology: PA submissions and decisions (X12 278 today, FHIR Da Vinci PAS v2.2.0 tomorrow, with reviewAction codes Approved / Denied / Pended / Partial); eligibility responses (271); claim-status responses (277); claim submissions (837); electronic remittance advice (835); Letters of Medical Necessity supporting appeals; coverage citations against NCDs and LCDs. Each is a structured-or-semi-structured document with a specific format, a specific set of failure modes, and a specific cost-of-error. Anthropic already has product examples on claude.com/solutions/healthcare; the question is whether there's an internal eval suite that scores model outputs against payer-acceptable formats with the precision payers expect.

Unlike clinical accuracy, evaluating insurance artifacts requires a close working relationship with payers, who hold internal guidelines on what a clean PA submission looks like, what an appeal letter has to cite to be considered, what counts as medical necessity for a given DRG. One tactical win here would be to partner more closely with Kaiser Permanente. As the largest integrated payer-provider system in the country (~12.5M members across the Foundation Health Plan, Permanente Medical Groups, and Foundation Hospitals), Kaiser is structurally unique: both sides of the PA-and-claim transaction live inside one customer. The "battle of the bots" Mohiuddin described in his UHG-era HealthTech Nerds interview doesn't run inside Kaiser — there's nobody on the other side of the wire to fight. That makes Kaiser the natural laboratory for the shared-data, shared-rules, real-time vision he described, and a Kaiser-Anthropic partnership lets Anthropic claim true ecosystem-player posture: reducing inefficiencies on both sides because both sides report to the same CEO. Kaiser also doesn't ship its denial product to anyone external, so Anthropic isn't taking on the brand risk of a "Claude denied my coverage" headline as long as the deployment stays internal.

2. Agent harness

Anthropic needs to deliver a first-party experience for providers — call it Claude Cowork for Healthcare Administration. The shape: a bundle of first-party Skills and connectors developed in collaboration with payers and EHR systems, packaged inside the Cowork form factor. The eval target is a HealthAdminBench-equivalent benchmark suite. HealthAdminBench (Kinetic Systems and Stanford, 135 expert-defined tasks across PA, appeals, and DME ordering, 1,177 deterministic plus 521 LLM-based verifiers across 1,698 subtasks) is the current best example of the right shape; an Anthropic-authored or co-authored successor would carry more weight downstream. Claude Opus 4.6 leads at 36.3% end-to-end today; the bar for "Claude Cowork for Healthcare Administration is shipping production-grade" is closer to the 70%+ range, with a published monthly reliability number per release. Quarterly customer-attested outcome reports from provider deployments add the second artifact. The pair matters because reliability claims about agentic systems are hard to verify from a model card alone — you need both an external benchmark with reproducible scoring and customer-attested outcomes from named deployments before anyone can evaluate the system against ground truth. It's the same eval-and-verifiability pattern Anthropic already runs on Claude Code (SWE-bench scores plus customer outcome stories), applied to a different domain. (Disclosure: I know the Kinetic Systems team and am a fan of the work; factor that in.)

The form factor matters as much as the underlying capability. The same Claude Desktop reach that ships to engineering teams as Claude Code, and to non-engineering knowledge workers as Cowork, can ship to provider ops teams as Claude Cowork for Healthcare Administration — same client, same enterprise controls, same procurement contract, different bundle of Skills and connectors. The plugin packs through PwC, Deloitte, and KPMG (pre-committed at Cowork GA in April 2026) are the distribution mechanism for the role-targeted bundles: PA coordinator, appeals specialist, eligibility checker, biller.

3. Integrations

The clearinghouse layer is where payer-admin transactions already flow, and right now Anthropic's integration story is implicit (Stedi is MCP-native, so any MCP client can in principle talk to it) rather than explicit (an Anthropic-Availity announcement, an Anthropic-Stedi connector named in the JPM26 connector roster). The version worth shipping: Availity and Stedi as first-class connectors in Claude for Healthcare, sitting alongside the CMS Coverage Database, ICD-10, NPI Registry, and PubMed connectors from JPM26. Pre-built Skills for 270/271 eligibility, 276/277 status, 278 prior auth (or its FHIR PAS replacement), and 837 claim submission turn each transaction into a single Skill invocation. Add first-class support for the CMS FHIR Prior Authorization API when it goes live on January 1, 2027.

A platform capability that has to ship alongside the integrations: a serious answer to the long tail. What's available through API and MCP will always lag what's available in the human UI. Ayo Ademoye's analysis of UnitedHealthcare claim status is a great example — the X12 277 EDI response exposes 15 fields; the same claim's web portal exposes 37. 83% of insurers don't support automated claim-status checks at all. The same shape repeats across PA, eligibility, appeals, and DME: the machine interfaces are systematically degraded versions of the human ones. HealthAdminBench-equivalent benchmarks reach the same conclusion from the other direction — HealthAdminBench's 95-action average task lives mostly in the browser-against-portal layer, not on the structured-data rail. Computer use is therefore the platform primitive that closes the long tail.

The platform capability already exists — Claude Computer Use shipped in late 2024 and Cowork builds on it; many of the partners in §4 are already running production CUA workloads on Claude. What would move the ceiling for healthcare specifically is concrete, and the HealthAdminBench paper points directly at the levers.

The biggest one is token-space domain learning — context delivered through Skills, structured memory, and in-context portal guidance, not weight updates. The paper's ablation shows that giving the agent portal-specific guidance (each payer portal's structure, available fields, expected flow) plus accessibility-tree observation lifts Claude Opus 4.6 from 36.3% to 51.9% end-to-end task success. That's a ~15-point gain from in-context guidance alone, no fine-tuning. Fine-tuning Qwen 3.5 on a hundred training examples gets further still (40% end-to-end, a +23 percentage-point gain over base), but it's harder to deploy at frontier-model scale; structured token-space context delivered through first-party healthcare Skills is the same lever applied at the platform layer rather than the model-weight layer. The implication for what Anthropic ships: a Skills library that captures payer-portal flows, payer-specific quirks, common failure recoveries, and the kind of structured task-scoped memory that compounds across a 95-action workflow.

Layered on top, the three reliability primitives the paper documents as the failure modes behind that subtask-to-end-to-end gap: a persistent plan the agent can invalidate mid-task (closes hidden long-horizon dependencies), a file-first action default that swaps screenshot-state for disk-backed retrieval (closes file-operation avoidance), and a task-scoped structured-memory skill (closes information loss across long horizons). Each maps to a documented failure mode, each is testable against the benchmark, and shipping all three is the lever most likely to close the 78.4%-subtask-to-36.3%-end-to-end gap toward something a procurement committee accepts.

4. Managed agents

Anthropic's center of gravity in healthcare is the platform, not the workflow product — every vertical app named in §4 is doing better workflow work than Anthropic could from outside the customer relationship. The company can support what amounts to vibe agent building: making it materially easier for agent developers to ship and improve agents on Claude, and easier in general to be an agent developer in healthcare. Managed agents is mostly infrastructure today (compute, sandboxing, observability, replay); the next twelve months will likely turn it into a product surface for non-engineering builders.

For healthcare specifically, that means SI partners — PwC, Deloitte, KPMG, Accenture — can build in-house solutions that today exist only as independent vendor products. A health-system ops team that wants a denial-triage agent built in-house, against their own EHR, with their own coverage policies, doesn't have to procure a third-party vertical app if the SI partner can spin one up on managed-agent infra in weeks. That's accretive to the vertical-app ecosystem (different buyer, different procurement cycle), not displacement.

One opportunity that isn't on Anthropic's roadmap today: a first-party secure CUA runtime. A CUA agent submitting a prior auth runs against payer portals with no formal API contract — same DOM a human sees, same MFA, same session timeouts, same brittle fingerprinting. Anthropic ships the model; the runtime ships the safety envelope (browser fingerprinting, session resume, credential vaulting, deterministic replay for audit, regional availability for compliance). Startups have been shipping that runtime layer for 18+ months — CloudCruise, Kaizen, Browserbase, Nen, plus the vertical apps that run their own. What's hard for any of them to guarantee at the rate healthcare procurement requires is the security posture: HIPAA BAA terms, hardware-rooted attestation, audit trails that hold up to OCR investigation, BAAs that survive long subprocessor chains. Runtime security is a frontier-lab-shape problem — the BAA-and-attestation work that healthcare requires is the same work the lab is already doing for the model itself, and the customer-side trust diagram collapses from "model + partner runtime + customer compliance review" to "Anthropic + customer compliance review" when the runtime is first-party. Disclosure: Nen is one of those startups, and I'd be arguing for Anthropic to ship into a segment Nen operates in. I'd still make the argument.

Anti-roadmap

The scope I wouldn't extend into is a Claude-branded product that renders, drafts, or scores insurance denials for payers. One bad case inside a payer deployment contaminates the Claude brand across coding, legal, financial services, consumer, and hiring. The value in payer admin can be captured at the platform layer — vertical apps on Claude, Anthropic earns per-token economics, payer-side vendors own the product face. That posture is consistent with what Mohiuddin has been saying publicly: "A lot of administrative work that a payer doesn't want to do if they don't have to do it. And this is all just about information flow. This is not like we're changing rules here."

What I'd watch

Whether Anthropic ships a public monthly reliability number on HealthAdminBench (or equivalent) by JPM27. It's the cheapest single artifact that would compound trust faster than competitors can close the capability gap, and the cheapest single signal that the payer-admin thesis is real. If it doesn't land by JPM27, I'd read the Phase 1 posture as slower than the hires suggest.


Disclosure: I run Nen, which builds computer-use infrastructure. Several of the vertical AI apps named in this post are Nen customers. The §6.4 paragraph arguing for a first-party secure CUA runtime is a segment Nen operates in — read it as me arguing for Anthropic to ship into Nen's segment, not for it.