Artificial intelligence and machine learning
Research Article
Legal judgement analysis using large language models
Yuri Serdyuk1
, Natalia Vlasova2, Seda Momot3, Elena Suleymanova4
| 1-4 | Ailamazyan Program Systems Institute of RAS, Ves'kovo, Russia |
| 1 |
|
Abstract. The article examines the use of the latest-generation large language models (LLMs) — such as ChatGPT, Grok, DeepSeek, GigaChat, and YandexGPT — for analyzing legal judgments. The analysis involved civil, administrative, and criminal cases. A dataset of legal judgments was compiled from the database of judicial and regulatory acts of the Russian Federation, the official portal of the Moscow courts of general jurisdiction, and the website of the Russian Agency for Legal and Judicial Information. Several types of large model tests were proposed and implemented, ground-truth selection principles were outlined, and queries (prompts) were formulated. The models were tested on their ability to predict appellate decisions, map crime descriptions to law articles, and evaluate decisions of multiple judicial authorities in a single case. The ability of the models to make their own consistent decisions was also examined. Testing showed that the correct prediction rate of LLMs on real-world juducial decisions rarely surpasses 50%. A brief overview of recent publications on the use of AI in legal practice is provided. (In Russian).
Keywords: large language models, LLM, legal judgements, dataset, prompt, AI in law, LegalAI
MSC-2020
68T37; 91F99, 68T05, 68Q60For citation: Yuri Serdyuk, Natalia Vlasova, Seda Momot, Elena Suleymanova. Legal judgement analysis using large language models. Program Systems: Theory and Applications, 2026, 17:1, pp. 21–56. (In Russ.). https://psta.psiras.ru/2026/1_21-56.
Full text of article (PDF): https://psta.psiras.ru/read/psta2026_1_21-56.pdf.
The article was submitted 24.12.2025; approved after reviewing 03.02.2026; accepted for publication 17.02.2026; published online 23.02.2026.