FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

AI Agents Make Gains In Legal Task Benchmarks

Gregory Zuckerman
Last updated: February 6, 2026 10:03 pm
By Gregory Zuckerman
Technology
6 Min Read
SHARE

AI agents just posted a meaningful jump on a benchmark built to test whether software can perform real legal work, renewing a question the profession has been asking for a year: if not today, how soon can AI do a lawyer’s job?

The latest results come from Mercor’s agent benchmark, which evaluates multistep professional tasks like drafting legal memos, spotting issues in hypotheticals, and analyzing contracts. Anthropic’s new Opus 4.6 model pushed the leaderboard forward, scoring just under 30% in one-shot attempts and roughly 45% when allowed multiple tries—well ahead of earlier models that clustered below 25% only weeks ago.

Table of Contents
  • What Changed In The Legal Agent Benchmarks
  • Why 30% Matters More Than It Sounds In Legal Work
  • From Paralegal To Coauthor In Everyday Legal Work
  • The Risk Ledger And The Rulebook For AI In Law
  • Measuring Real Legal Reasoning In Agent Workflows
  • What Comes Next For Agent Lawyers In Practice
A collage of images including a close-up of tomatoes, a vintage Sony monitor displaying text, a cloudy sky, a Mars rover on a desert landscape, and a grid of circular buttons.

It’s not a victory lap for machines in court, but it is a clear sign that agentic features—tool use, planning, and coordinated “swarms” of subagents—are now moving the needle on tasks that resemble day-to-day legal practice.

What Changed In The Legal Agent Benchmarks

Mercor’s evaluation stresses end-to-end execution, not just token-by-token prediction. Models must read a prompt, plan a sequence of steps, call tools or external data when appropriate, and deliver a final work product under constraints. In prior rounds, models floundered on long-horizon reasoning and cross-referencing facts across documents.

Opus 4.6 appears to improve each weak link. The model’s agentic stack supports iterative planning and self-critique, and Anthropic’s release included “agent swarms” that coordinate specialized workers. On multistep matters—think issue spotting across a fact pattern, synthesizing caselaw, then proposing edits to a clause—the compounded gains are visible in the scores.

Crucially, the uplift comes with limited prompt retries, suggesting higher baseline reliability. For firms evaluating AI as a workflow tool, fewer do-overs mean faster throughput and lower supervision costs.

Why 30% Matters More Than It Sounds In Legal Work

Thirty percent is not courtroom-ready. But in legal operations, partial automation compounds: shave 20–40% off document review, first drafts, or cite checks, and case teams redeploy hours to strategy. Goldman Sachs has estimated that roughly 44% of legal tasks are exposed to automation—largely the repetitive, text-heavy kind that pads billable hours but doesn’t decide outcomes.

Benchmarks also lag deployment realities. A model scoring 30% unguided may cross 60–70% in a workflow instrumented with retrieval, templates, checklists, and structured outputs. The lesson from e-discovery and contract lifecycle management is consistent: orchestrate the task well and average models look exceptional.

From Paralegal To Coauthor In Everyday Legal Work

Law firms and vendors have already been inching toward agent-like systems. Casetext pioneered GPT-4–powered brief drafting before its acquisition by Thomson Reuters, and Allen & Overy rolled out the Harvey platform to thousands of lawyers to assist with research and drafting. Corporate legal teams are using copilots to summarize NDAs, compare clauses against playbooks, and generate due diligence checklists.

AI agents boosting legal task benchmarks with scales of justice and digital documents

What the new benchmark implies is that these tools won’t just autocomplete text; they will plan, verify, and ask for what they need. An “agent lawyer” doesn’t replace counsel—it drafts alternatives, flags risks tied to fact patterns, runs a quick analogical search over recent cases, and presents a reasoned memo for a human to approve or revise.

The Risk Ledger And The Rulebook For AI In Law

Legal work punishes errors. The Avianca case—where fabricated citations from a chatbot slipped into a filing—remains a cautionary tale. Multiple U.S. judges now require certifications that attorneys verified AI-assisted filings, and several courts have issued standing orders on disclosure and citation checks.

Regulators are circling, too. The EU AI Act treats systems used to assist in administering justice as high-risk, triggering requirements around transparency, data governance, and human oversight. For firms, this translates into audit logs, source grounding, confidentiality controls, and red-teaming models against biased or hallucinated outputs before they touch client matters.

Measuring Real Legal Reasoning In Agent Workflows

Traditional exams only tell part of the story. Research communities have built domain-specific evaluations—such as LegalBench and newer suites that test citation fidelity, statutory interpretation, and contract edits constrained by policy. The next wave of evals will be scenario-based: did the agent find the controlling precedent, properly distinguish adverse authority, and preserve privilege throughout the workflow?

Vendors are already moving this direction with “grounded generation,” forcing models to cite the passages that support each conclusion. Combine that with tool use—databases, calendaring, entity extraction—and agent reliability can be measured, not assumed.

What Comes Next For Agent Lawyers In Practice

Expect rapid iteration. If a single model release can lift one-shot legal task scores by double digits, coordinated systems and domain-tuned policies will push higher. The realistic near-term picture is a paralegal-plus agent that drafts, checks, and explains, with a lawyer in the loop owning judgment calls and ethics.

Can AI agents be lawyers after all? Not in the licensure sense. But as coauthors and tireless analysts, they’re getting uncomfortably good—and the latest benchmarks suggest they’re moving faster than many in the profession predicted just a month ago.

Gregory Zuckerman
ByGregory Zuckerman
Gregory Zuckerman is a veteran investigative journalist and financial writer with decades of experience covering global markets, investment strategies, and the business personalities shaping them. His writing blends deep reporting with narrative storytelling to uncover the hidden forces behind financial trends and innovations. Over the years, Gregory’s work has earned industry recognition for bringing clarity to complex financial topics, and he continues to focus on long-form journalism that explores hedge funds, private equity, and high-stakes investing.
Latest News
Verizon Sues T-Mobile Over False Advertising Claims
Google TV Streamer Back On Sale For $79.99
Google And Microsoft-Backed Terradot Buys Eion
Netflix’s weekly slate: Queen of Chess, Lead Children, and more
OpenAI Plans Dime Earbuds With Simpler First Release
DJI Osmo Pocket 4 Pro Clears FCC For Global Launch
AT&T Launches Samsung-Built amiGO Jr. Kids Phone
Muppets Break The Internet With Viral Variety Special
Team USA Taps Oura Titanium Rings For Olympics
Tesla’s $8,000 FSD buyout ends next week as subscription takes over
Apple Shelves AI Doctor Plan For Health App
Microsoft Office 2021 License With Free Training For $34.97
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.