Automatic humor detection, the task of computationally identifying humorous content, is increasingly critical as Large Language Models (LLMs) become integrated into human communication platforms like chatbots and virtual assistants. However, understanding humor poses significant challenges for AI due to its reliance on complex context, cultural nuances, linguistic ambiguity, and multimodal cues. Current research is fragmented across different humor types, languages, modalities, and evaluation benchmarks, particularly concerning the capabilities and limitations of modern LLMs.
This survey provides a comprehensive synthesis of the automatic humor detection field, tracing its evolution from foundational psychological and linguistic theories through classical machine learning, deep learning, and the recent transformer-based LLM paradigm. We organize and analyze computational methods, feature engineering techniques, benchmark datasets (text-only, multimodal, multilingual), and evaluation metrics. We critically examine LLM adaptation strategies, including fine-tuning, parameter-efficient methods (PEFT), prompt engineering, and multi-task learning, alongside developments in multimodal and cross-lingual humor understanding.
Our analysis reveals that while LLMs demonstrate improved performance in capturing surface humor patterns, significant gaps persist in deep pragmatic reasoning, cultural grounding, multimodal integration, and explainability compared to human cognition. We identify key open challenges, including data scarcity, evaluation inconsistencies, the humor-offensiveness boundary, and the need for more robust, culturally aware, and interpretable models. By consolidating the field's progress and pinpointing critical limitations, this survey aims to guide future interdisciplinary research towards developing more socially intelligent and nuanced AI systems capable of genuinely understanding human humor.