Discover more from The Gap
GPT-4 in Law School
In “A World Without Work”, Daniel Susskind refers to a 2016 study by Pew Research Center which found that just 16% of Americans think a four-year degree prepares students "very well" for a well-paying job. We can see how many of today’s most successful entrepreneurs dropped out of university: Sergey Brin, Larry Page, and Elon Musk left Stanford, Bill Gates and Mark Zuckerberg left Harvard, Steve Jobs left Reed College, Jack Dorsey left New York University, and Spotify-founder Daniel Ek left the Royal Institute of Technology.
According to Susskind, some economists argue that colleges and universities are less about skill- and knowledge acquisition and more about "signaling". Just as a peacock signals his virility to a potential mate by having a fancy set of feathers, students can signal their abilities to a potential employer by having a degree from a well-regarded university or by having certain letters or numbers on a piece of paper that are considered to be better than other letters or numbers.
This ”peacocking effect" of having a good degree with good grades is now being challenged by the fact that GPT-4 can outperform most human test-takers on a wide range of college exams. From the Uniform Bar Exam to sommelier examinations. If anyone can instantly leverage GPT-4’s internet-scale knowledge, how precious will university credentials be for future employers? And will computers go from highly practical efficiency tools to indispensable collaboration partners in the near-term future?
It's still way too early to make any sweeping conclusions about how generative AI will influence the lives of knowledge workers over the coming years. However, a burgeoning body of academic research has already surfaced on ChatGPT and GPT-4’s performance in work environments. Students probably make the most avid group of ChatGPT users. So much so, that interest in ChatGPT has been shown to rise and fall in correlation with school season and summer break (Business Insider).
Today, we will look closer at the study AI Assistance in Legal Analysis: An Empirical Study (Choi & Schwarcz 2023) which examines GPT-4's influence on students' performance in law school exams.
GPT-4 in Law School
Two researchers from University of Minnesota Law School, Jonathan H. Choi and Daniel Schwarcz, were among the co-authors of the paper ChatGPT Goes to Law School from January 2023. The research team tested ChatGPT’s abilities to answer four separate final exams for law school courses. ChatGPT passed all four exams and performed on average at the level of a C+ student, a low but passing grade, although it generally scored at or near the bottom of each class.
Then came GPT-4 in March. OpenAI proclaimed that while GPT-3.5, the underlying model for ChatGPT, scored around the bottom 10% in a simulated bar exam, GPT-4 passed the exam with a score around the top 10% of test takers (OpenAI). A full paper on GPT-4’s impressive bar exam performance can be found here.
In the near to medium-term future, AI language models will almost certainly not be able to replace lawyers. The AIs of today are much too unreliable and hallucinating and the risks of deploying them in important legal decisions are way too high. For more information on why AI will not replace lawyers, see this well-reasoned piece: The End of Lawyers? Not Yet.
Rather, AI solutions may be used in legal practices as a kind of advanced search tool to assist lawyers, rather than replacing them. On this background, Choi and Schwarcz published a new paper on GPT-4’s law school performance in August, but this time the authors focused more on the interplay between law students and the AI tool. Not only did they test GPT-4’s zero-shot performance against human test takers, but they compared student’s performance with and without access to GPT-4. Additionally, they measured GPT-4’s performance by using different prompting techniques. Let’s take a closer look at the experiment and the main results.
Students from two different law school classes at the University of Minnesota partook in the experiment. The two classes were:
Introduction to American Law and Legal Reasoning: an introductory class offered to undergraduates that covers a range of different topics.
Insurance Law: a more advanced class principally taught to upper-level students.
To make sure that all the participants were equipped for the task, they were required to go through a one-hour online training course about how to effectively use GPT-4 for legal analysis. Hereafter, the students had one additional hour to review relevant course material. Finally, the students had to answer exam questions with assistance from GPT-4. The exam questions came from the actual 2022 exam that was unpublished.
Besides the AI-assisted human exams, the researchers generated four exams produced by GPT-4 by feeding it with different prompts:
The basic prompt: Copy-and-pasting the exam questions directly from the exam.
A “chain-of-thought” prompt: The AI model is asked to “think step-by-step” prior to producing the results.
Few-shot prompting: The AI model is provided with examples of good responses to shape its response. Specifically, the research team provided GPT-4 with a few questions and model answers from prior exams that had also been provided to students.
Grounded prompting: Involves providing the AI model with relevant sources. GPT-4 was provided with lecture notes from the course.
The AI-assisted exams, the human-only exams from 2022, and the GPT-4 generated exams were blindly graded and compared.
Here are the five main findings:
GPT-4 substantially improved average students' performance on multiple-choice questions, but not on essay questions.
GPT-4 substantially improved the scores of students at the bottom of the class and negatively impacted the scores of students at the top of the class.
GPT-4 significantly reduced the time students spent on the multiple-choice exam. The AI-assisted exam took on average 62.9 minutes for students to complete, whereas the real exam in 2022 took on average 74.5 minutes out of 75 minutes to complete.
The performance of GPT-4 alone varied substantially according to the inputs it was given, but overall GPT-4 performed well on relatively simple multiple-choice questions and had more difficulty with the more complex essay questions.
When provided with a few examples (few shots prompting) or with lecture notes from the course material (grounded prompting), GPT-4 performed better than both humans and AI-assisted humans on average.
Implications of Generative AI in Education
As the study shows, generative AI can uplift the worst-performing student's test results. Probably the least surprising finding of the year. It's a bit more surprising that assistance from GPT-4 can actually make the best-performing students in an upper-level law school course perform worse.
Some people say that AI is “a great equalizer”. In this context, it may be equally correct to call AI a “destroyer of excellence”. As the authors of the paper write: "Access to AI might discourage effort when used as a crutch. In particular, access to AI might stifle creativity or lead users to settle for easy answers rather than exerting themselves and spotting more difficult issues.” Other papers support this finding, see for instance here and here (I will write more on this in a future post).
I wonder how much access to GPT-4 substantially differs from access to Google? And how impressive is it really that GPT-4 can ace an exam in law school when it's provided with all the relevant source material/the answers? Of course, it's impressive, but as the authors also acknowledge, GPT-4 could not perform at the level it did if it had to source out the relevant material for itself.
After all is said and done, university is about much more than performing well on tests, although good grades may be “colorful feathers” that signal high value to future employers. The more important question is if GPT-4 can support students in their personal and professional development throughout the education system and help them to better gain skills and knowledge that can be of service to others and have economic value in society.
Implications of Generative AI in Work
On the other side of education, I worry that more reliance on GPT-4 and other powerful AI tools may lead to less skilled workers, who enjoy their work less, and who gradually lose touch with what is going on due to the black-box nature of generative AI models. Generative AI may provide companies with huge savings in time and cost and the homogenous AI output may be slightly above average quality compared to that of human workers. At the same time, the human workers may pay less attention in their work, fear displacement, and overall be less happy.
A University of Pittsburgh study explored the relationship between the adoption of industrial robots and workplace injuries. In the US, the study found that industrial robots reduced workplace-related injuries by 1.2 cases per 100 workers. On the other hand, more people working alongside robots demonstrated a dramatic increase of 37.8 cases per 100.000 people in drug or alcohol-related deaths. Additionally, communities where people work alongside robots experienced a subtle increase in suicide rates and mental health issues (GlobalSpec).
I would challenge the notion that "AI-assisted humans" is inevitable or even desirable in the foreseeable future whether in academic institutions or offices. If I was a university professor, I would lean towards cautioning my students from using GPT-4. This holds even more true if I was a decision-maker at an elite law firm. Even drastic gains in productivity do not always justify the potential for a small compromise in quality, let alone the risk of catastrophic and hard-to-explain failures, such as when a New York lawyer relied on ChatGPT to prepare a court case. In a university setting, I would be concerned about how much students actually learn and develop when using GPT-4 as an assistant.
In this post, we have looked at an important paper about the use of GPT-4 in law school exams. Generative AI’s impact on education and work is complex and evolving. I plan to cover several other studies on the topic in a future post. If you have made it this far, click the like button and let me know your thoughts in the comments.
Reads of the Week
Confessions of a Viral AI Writer - Vauhini Vara, September 21, 2023 (Wired)
ChatGPT Vision - GPT-4V - Michael Spencer, October 3, 2023 (AI Supremacy)
From Feeding Moloch to 'Digital Minimalism - Ruth Gaskovski, April 21, 2023 (School of the Uncomformed)