Incident 704: Study Highlights Persistent Hallucinations in Legal AI Systems

Description: Stanford University’s Human-Centered AI Institute (HAI) conducted a study in which they designed a "pre-registered dataset of over 200 open-ended legal queries" to test AI products by LexisNexis (creator of Lexis+ AI) and Thomson Reuters (creator of Westlaw AI-Assisted Research and Ask Practical Law AI). The researchers found that these legal models hallucinate in 1 out of 6 (or more) benchmarking queries.

Tools

New Report New Response DiscoverView History

Entities

View all entities

Alleged: Thomson Reuters and LexisNexis developed an AI system deployed by Legal professionals , Law firms and Organizations requiring legal research, which harmed Legal professionals , Clients of lawyers and Legal system.

Incident Stats

Incident ID

704

Report Count

Incident Date

2024-05-23

Editors

Daniel Atherton

Applied Taxonomies

MIT

MIT Taxonomy Classifications

Machine-Classified

Taxonomy Details

Risk Subdomain

7.3. Lack of capability or robustness

Risk Domain

AI system safety, failures, and limitations

Entity

Timing

Post-deployment

Intent

Unintentional

Incident Reports

Reports Timeline

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

hai.stanford.edu

We asked ChatGPT for legal advice—here are five reasons why you shouldn't

theconversation.com

hai.stanford.edu · 2024

Artificial intelligence (AI) tools are rapidly transforming the practice of law. Nearly three quarters of lawyers plan on using generative AI for their work, from sifting through mountains of case law to drafting contracts to reviewing docu…

theconversation.com · 2024

At some point in your life, you are likely to need legal advice. A survey carried out in 2023 by the Law Society, the Legal Services Board and YouGov found that two-thirds of respondents had experienced a legal issue in the past four years.…

Variants

A "variant" is an AI incident similar to a known case—it has the same causes, harms, and AI system. Instead of listing it separately, we group it under the first reported incident. Unlike other incidents, variants do not need to have been reported outside the AIID. Learn more from the research paper.

Seen something similar?

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

COMPAS Algorithm Performs Poorly in Crime Recidivism Prediction

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

Incident 704: Study Highlights Persistent Hallucinations in Legal AI Systems

Tools

Entities

Incident Stats

MIT Taxonomy Classifications

Incident Reports

Reports Timeline

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

We asked ChatGPT for legal advice—here are five reasons why you shouldn't

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

We asked ChatGPT for legal advice—here are five reasons why you shouldn't

Variants

Similar Incidents

By textual similarity

COMPAS Algorithm Performs Poorly in Crime Recidivism Prediction

Gender Biases in Google Translate

Personal voice assistants struggle with black voices, new study shows

Similar Incidents

By textual similarity

COMPAS Algorithm Performs Poorly in Crime Recidivism Prediction

Gender Biases in Google Translate

Personal voice assistants struggle with black voices, new study shows