What I'm Building
The end goal is LLM-as-a-judge for Biotech & Life Sciences — evaluation models that assess whether another model's output meets regulatory, technical, and quality standards. Starting in medtech because FDA sets the bar, not user satisfaction scores.
But before I get there, I'm solving a more foundational problem.
As we scale into multi/parallel-agent systems, the bottleneck changes. Skills, prompts, inter-agent contracts — still written for human readers. That's a constraint when the audience is another model. Machines still talk to each other in formats written for humans.
You can't build reliable evaluation infrastructure on top of unreliable communication. So I'm working on the A2A layer first — structured protocols and schema-first contracts that cut tokens and raise fidelity between agents. Measurable reliability, not vibes.
The sequence
- 1.A2A communication infrastructure — structured inter-agent protocols that make agent-to-agent communication auditable and reliable at scale.
- 2.LLM-as-a-judge for Biotech & Life Sciences — evaluation models built on top of that infrastructure, assessing model outputs against regulatory, technical, and quality standards that regulators actually accept.
Thinking about A2A, multi-agent orchestration, or AI in regulated industries?
Get in touch →