AIDB Daily Papers
専門家と大規模言語モデルにおける主観的なコード嗜好性の研究
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、複雑性、コメント、モジュール性、可読性の4つの主観的コード嗜好軸を定義し、25名のソフトウェアエンジニアによって検証した。
- LLMは自然言語での嗜好とコードでの嗜好が一致しない場合があり、順序バイアスを示すことが明らかになった。
- LLMは人間と比較して極端な評価傾向を示し、GPT-5の分析では外部仮定への依存と脆い推論が確認された。
Abstract
Large Language Models (LLMs) have become increasingly popular for coding tasks, with subjective coding preferences being an essential element to adapt to programmers' personal needs. Existing work overlooks such characteristics and mainly focuses on code correctness. In this study, we propose a typification of four subjective coding preference axes - complexity, commenting, modularity, and readability - motivated by common engineering habits and validated by 25 software engineers. We collect a dataset of ~3,000 paired Python code snippets reflecting these axes, annotated by 73 experts who rate their preferences on a Likert scale. Using our dataset, we study how LLMs handle subjective coding preferences. We present 13 LLMs with pairs of solutions to the same programming task, first as textual descriptions and then as concrete code snippets. We find that models often prefer one option in natural language but the opposite when evaluating code. More consistent models (i.e., those that are coherent in their choices between deeds and words) frequently reveal positional bias: swapping the order of options changes the preferred alternative. We then use the five most consistent models to re-annotate the dataset. Compared to humans, models show polarized Likert distributions and notable divergence in ratings. A case study on GPT-5 reveals reliance on external assumptions and brittle reasoning.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: