We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.

Advertisement

AIs Can Write Gags, But Aren't in on the Joke

Two people sitting on a sofa looking at a screen laughing.
Credit: Surface / Unsplash.
Listen with
Speechify
0:00
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 3 minutes

Large neural networks, a form of artificial intelligence, can generate thousands of jokes along the lines of “Why did the chicken cross the road?” But do they understand why they’re funny?


Using hundreds of entries from the New Yorker magazine’s Cartoon Caption Contest as a testbed, researchers challenged AI models and humans with three tasks: matching a joke to a cartoon; identifying a winning caption; and explaining why a winning caption is funny.


In all tasks, humans performed demonstrably better than machines, even as AI advances such as ChatGPT have closed the performance gap. So are machines beginning to “understand” humor? In short, they’re making some progress, but aren’t quite there yet.

Want more breaking news?

Subscribe to Technology Networks’ daily newsletter, delivering breaking science news straight to your inbox every day.

Subscribe for FREE

“The way people challenge AI models for understanding is to build tests for them – multiple choice tests or other evaluations with an accuracy score,” said Jack Hessel, Ph.D. ’20, research scientist at the Allen Institute for AI (AI2). “And if a model eventually surpasses whatever humans get at this test, you think, ‘OK, does this mean it truly understands?’ It’s a defensible position to say that no machine can truly `understand’ because understanding is a human thing. But, whether the machine understands or not, it’s still impressive how well they do on these tasks.”


Hessel is lead author of “Do Androids Laugh at Electric Sheep? Humor ‘Understanding’ Benchmarks from The New Yorker Caption Contest,” which won a best-paper award at the 61st annual meeting of the Association for Computational Linguistics, held July 9-14 in Toronto.


Lillian Lee ’93, the Charles Roy Davis Professor in the Cornell Ann S. Bowers College of Computing and Information Science, and Yejin Choi, Ph.D. ’10, professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and the senior director of common-sense intelligence research at AI2, are also co-authors on the paper.


For their study, the researchers compiled 14 years’ worth of New Yorker caption contests – more than 700 in all. Each contest included: a captionless cartoon; that week’s entries; the three finalists selected by New Yorker editors; and, for some contests, crowd quality estimates for each submission. 


For each contest, the researchers tested two kinds of AI – “from pixels” (computer vision) and “from description” (analysis of human summaries of cartoons) – for the three tasks.


“There are datasets of photos from Flickr with captions like, ‘This is my dog,’” Hessel said. “The interesting thing about the New Yorker case is that the relationships between the images and the captions are indirect, playful, and reference lots of real-world entities and norms. And so the task of ‘understanding’ the relationship between these things requires a bit more sophistication.”


In the experiment, matching required AI models to select the finalist caption for the given cartoon from among “distractors” that were finalists but for other contests; quality ranking required models to differentiate a finalist caption from a nonfinalist; and explanation required models to generate free text saying how a high-quality caption relates to the cartoon.


Hessel penned the majority of human-generated explanations himself, after crowdsourcing the task proved unsatisfactory. He generated 60-word explanations for more than 650 cartoons.


“A number like 650 doesn’t seem very big in a machine-learning context, where you often have thousands or millions of data points,” Hessel said, “until you start writing them out.”


This study revealed a significant gap between AI- and human-level “understanding” of why a cartoon is funny. The best AI performance in a multiple choice test of matching cartoon to caption was only 62% accuracy, far behind humans’ 94% in the same setting. And when it came to comparing human- vs. AI-generated explanations, humans’ were preferred roughly 2-to-1.


While AI might not be able to “understand” humor yet, the authors wrote, it could be a collaborative tool humorists could use to brainstorm ideas.


Other contributors include Ana Marasovic, assistant professor at the University of Utah School of Computing; Jena D. Hwang, research scientist at AI2; Jeff Da, research assistant at the University of Washington Rowan Zellers, researcher at OpenAI; and humorist Robert Mankoff, president of Cartoon Collections and long-time cartoon editor at the New Yorker.


The authors wrote this paper in the spirit of the subject matter, with playful comments and footnotes throughout.


“This three or four years of research wasn’t always super fun,” Lee said, “but something we try to do in our work, or at least in our writing, is to encourage more of a spirit of fun.”


Reference: Jack Hessel, Ana Marasovic, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, and Yejin Choi. Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest. Proc conf Assoc Comput Linguist (Volume 1: Long Papers)2023. pages 688–714. doi:


This article has been republished from the following materials. Note: material may have been edited for length and content. For further information, please contact the cited source.