AIDB Daily Papers
生成心理測定におけるプロンプトエンジニアリング:性格評価尺度の開発
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデル(LLM)を用いて性格評価項目を生成するAI-GENIEフレームワークにおいて、プロンプト設計が項目の質にどう影響するかを検証した。
- 適応型プロンプトは、意味的冗長性を減らし、構造的妥当性を高め、項目プールを大きく保つ点で、他の戦略より優れており、モデル能力との相乗効果が期待される。
- AI-GENIEは項目削減後の構造的妥当性を向上させ、特に適応型プロンプトと高性能モデルの組み合わせで、創造性と心理測定的一貫性のバランスを取る効果が確認された。
Abstract
This Monte Carlo simulation examines how prompt engineering strategies shape the quality of large language model (LLM)--generated personality assessment items within the AI-GENIE framework for generative psychometrics. Item pools targeting the Big Five traits were generated using multiple prompting designs (zero-shot, few-shot, persona-based, and adaptive), model temperatures, and LLMs, then evaluated and reduced using network psychometric methods. Across all conditions, AI-GENIE reliably improved structural validity following reduction, with the magnitude of its incremental contribution inversely related to the quality of the incoming item pool. Prompt design exerted a substantial influence on both pre- and post-reduction item quality. Adaptive prompting consistently outperformed non-adaptive strategies by sharply reducing semantic redundancy, elevating pre-reduction structural validity, and preserving substantially larger item pool, particularly when paired with newer, higher-capacity models. These gains were robust across temperature settings for most models, indicating that adaptive prompting mitigates common trade-offs between creativity and psychometric coherence. An exception was observed for the GPT-4o model at high temperatures, suggesting model-specific sensitivity to adaptive constraints at elevated stochasticity. Overall, the findings demonstrate that adaptive prompting is the strongest approach in this context, and that its benefits scale with model capability, motivating continued investigation of model--prompt interactions in generative psychometric pipelines.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: