AIDB Daily Papers
LLMは銀の弾丸ではない:ソフトウェアの公平性に関する事例研究
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、ソフトウェアの公平性におけるバイアス軽減のために、LLMと従来の機械学習手法を比較検証しました。
- 大規模言語モデル(LLM)が注目されていますが、ソフトウェアの公平性において従来の機械学習を上回るかは不明確です。
- 実験の結果、公平性と予測性能の両面で、LLMよりも機械学習の方が一貫して優れていることが明らかになりました。
Abstract
Fairness is a critical requirement for human-related, high-stakes software systems, motivating extensive research on bias mitigation. Prior work has largely focused on tabular data settings using traditional Machine Learning (ML) methods. With the rapid rise of Large Language Models (LLMs), recent studies have begun to explore their use for bias mitigation in the same setting. However, it remains unclear whether LLM-based methods offer advantages over traditional ML methods, leaving software engineers without clear guidance for practical adoption. To address this gap, we present a large-scale study comparing state-of-the-art ML- and LLM-based bias mitigation methods. We find that ML-based methods consistently outperform LLM-based methods in both fairness and predictive performance, with even strong LLMs failing to surpass established ML baselines. To understand why prior LLM-based studies report favorable results, we analyze their evaluation settings and show that these gains are largely driven by artificially balanced test data rather than realistic imbalanced distributions. We further observe that existing LLM-based methods primarily rely on in-context learning and thus fail to leverage all available training data. Motivated by this, we explore supervised fine-tuning on the full training set and find that, while it achieves competitive results, its advantages over traditional ML methods remain limited. These findings suggest that LLMs are not a silver bullet for software fairness.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: