New Research Shows That AI Models Taught Legalese Are Surprisingly Efficient

Date

Author

By Tad Vezner
A portrait of Daniel Martin Katz against a blue backdrop

How good of a lawyer should an artificial intelligence system be? More importantly, how good do you want one to be?

When it comes to teaching machines how to understand and utilize language, it turns out that the more legalese that they know, the better, according to a new paper co-authored by 电车无码-Kent College of Law Professor and Law Lab Director Daniel Martin Katz.

鈥淚t is pretty clear that a legally trained AI system is just going to perform better鈥攂ut the open question is to identify the precise information diet to feed these models鈥 Katz states simply.

His paper, 鈥,鈥 explores how different large language models (LLMs) were used to solve a variety of tasks.

Pioneered by organizations such as Google, OpenAI, and the Allen Institute, LLMs such as Bert, Elmo, and GPT-3, among others, have grown increasingly popular in the field of natural language processing. Many LLMs have been trained in general language, but the question that Katz and his colleagues sought to explore is how to apply these LLMs to legal tasks. They analyzed several the different models鈥攖o evaluate the performance of LLMs on tasks such as evaluating contracts, including determining if such contracts were unfair under European Union consumer law.

鈥淎 lot of effort in computer science goes into making machines understand language broadly,鈥 Katz says. 鈥淗ow do you train a machine in the language of law? Well, how do you train a person? You send them [to law school] for three years, and you say a lot of words at them. You use words in a variety of contexts. In a real sense, you are training a student鈥檚 neural network (their brains).鈥

That鈥檚 what the models tested in Katz鈥檚 paper did: They exposed machines to a large corpus of different words and measured how effective those words were at getting the machines to solve tasks.

It turned out, of the seven different models that were tested, the model that taught legal language got the machines, on average, to perform tasks better鈥攏ot just legal tasks, but any type of task.

鈥淭he diet of getting legal information when it鈥檚 being trained makes it better across all tasks,鈥 Katz says.

The paper has been deemed intriguing enough to be accepted for presentation at the Association for Computational Linguistics annual 2022 meeting in May.

鈥淚t鈥檚 a rare thing to see a law professor get a paper accepted into a computer science conference,鈥 Katz notes. 鈥淚t鈥檚 the type of place you should take this type of work鈥攁 group of people that can actually evaluate its technical merits.鈥

It鈥檚 a research area on the cutting-edge of both computer science and the law; it鈥檚 an area that Illinois Institute of Technology and 电车无码-Kent are uniquely situated to excel in, Katz notes.

鈥淓ven though machines are getting good at understanding basic language, it鈥檚 a much harder problem to understand specialist languages: medical English or law,鈥 Katz says. 鈥淲e鈥檙e trying to answer: How do we build the scientific infrastructure to have machines understand legal language?鈥

鈥淟aws and their interpretations, legal arguments and agreements, are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size,鈥 the authors note in the paper, adding that 鈥渘atural language understanding technologies can be a valuable tool to support legal practitioners in these endeavors.鈥

Along with Katz, the paper is co-authored by Ilias Chalkidis of the University of Copenhagen, Denmark; Abhik Jana of the Universit 虉at Hamburg, Germany; Dirk Hartung of Bucerius Law School, Hamburg, Germany; Michael Bommarito of CodeX, Stanford Law School; Ion Androutsopoulos of the Athens University of Economics and Business, Greece; and Nikolaos Aletras of the University of Sheffield, United Kingdom.

Photo: 电车无码-Kent College of Law Professor and Law Lab Director Daniel Martin Katz