Lemma 1: Announcing LAURA
TL;DR: We're building reinforcement-learning environments that teach models to reason like lawyers—not just summarize text and recite cases. Today, we're releasing the preview of our benchmark for advanced legal reasoning, LAURA, the first step toward structured RL environments that simulate precedent analysis, constitutional + statutory interpretation, and more in real-world litigation and legal academia.
Most legal-AI tools today stop at the surface—they organize documents into tables or automate "workflows." But even the best models falter when asked to:
- apply abstract doctrines to new fact patterns,
- interpret ambiguous statutes,
- distinguish cases that seem similar but aren’t, or
- select precedent that best bolsters a client’s position.
That’s because current benchmarks and fine-tuning pipelines optimize for classification or document output—not the adversarial, analogical reasoning that makes law a profession rather than a process.
Lemma Research builds reinforcement-learning environments that stress-test and reward the reasoning process itself, not just the final answer. These environments contain rich, multi-step legal reasoning tasks that provide fine-grained reward signals and reveal model reasoning failures in a measurable way.
We’re now building out a pipeline with top law schools and lawyers to begin systematically stress-testing models against these environments—pushing reasoning to its limits and grounding performance evaluation in real legal expertise.
It’s the first step toward a larger RL environment we’re building in partnership with labs to advance legal—and more broadly, logical—reasoning.
LAURA Benchmark Preview
The release of our initial benchmark, LAURA (Legal Analysis, Understanding, and Reasoning Assessment), previews this work: a suite of complex reasoning tasks designed to evaluate how models identify hidden legal issues, navigate textual ambiguity, apply precedent to novel facts, and make surgical corrections to subtle errors.
Some highlights:
- GPT-5 Pro attained the highest overall score, with Grok 4 and Gemini 2.5 Pro closely behind.
- There is a noticeable gap between the performance of SOTA closed-source and open-source models; the former tend to achieve significantly higher scores.
- OpenAI, Google, and xAI’s models noticeably outperform Anthropic’s on our LAURA benchmark.
Who we are
Founded by Stanford alumni Sina Mollaei and Sherwin Lai, the company is backed by Y Combinator.
Sherwin studied Symbolic Systems and Computer Science at Stanford (MS CS '24, BS '24), served as Undergraduate Chair of the Stanford Technology Law Review, and worked as an ML engineer at YouTube, where he built personalized ranking algorithms for posts and comments. He deferred his acceptance to Harvard Law School to launch Lemma Research full-time.
Sina studied Computer Science at Stanford (BS '25) and conducted AI research across the Stanford AI Lab (protein design), Stanford Medicine (diffusion models for molecular binding predictions), and Harvard Medical School (high-throughput metabolomics). His graduate-level coursework spanned mathematical biology and immunology, and he turned down an offer in high-frequency trading to start Lemma Research with Sherwin.
After meeting at Stanford, Sherwin and Sina realized that their shared deep interest in how reasoning can be modeled and trained. They’ve lived on both sides—law and ML—and saw the same bottleneck: Without deeper reasoning skills, no legal-AI product will ever be considered truly dependable by an attorney.
We’d love to hear from:
- AI labs / research teams interested in reasoning-focused RL environments.
- Legal scholars / data partners open to collaboration on complex reasoning datasets.
- Researchers who want early access to the benchmark preview.
- Anyone working with legal tech.
📩 founders@withlemma.com
🌐 www.withlemma.com
If you're interested in our mailing list, click here.
