评测Prompt | X·myLog

password

URL

type

status

date

slug

summary

tags

category

icon

周起始

标签

是否汉化

📌

模型评测第一步也是使用模型进行推理，推理时Prompt质量影响模型的输出。因此在模型评测的时候，也非常重视Prompt的质量。

大量的实验表明，即便测试的原始题目相同，对于 prompt 的不同构造方式会对模型的表现产生影响。可能影响的因素包括：

Prompt 本身的构成方式，包括指令、in-context example、题目的写法；

in-context example 的选择，包括了选择的数量和方式；

对 prompt 的使用方式。是让模型基于 prompt 进行补全，还是从候选的 prompt 中选择一个最好的作为答案？

通常使用两种Prompt策略：Few-short Prompt和COT Prompt。

notion image

通常，我们会在 prompt 开头放入指令，几个 in-context example（上下文样例），再在最后放入题目。例如：

🎒

离开乏味的皮囊，自由的灵魂在路上

Name: Alan Hsu

Tag: 随感、技术、经验、旅行、推荐、生活、音乐、电影 etc.

Email：xulanzhong521gmail.com

Twitter：@AlanHsu521

Telegram：@AlanHsu521

WeChat: Alan_Hsu_521

Github: https://github.com/xsoway/

Blog: https://xmylog.com

notion image

notion image

Author:Alan_Hsu
URL:https://xmylog.com/article/articles_LLMTest_testPrompt
Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts

deepseek-r1:7b和deepseek-r1:1.5b-qwen-distill-q4_K_M区别和释义

Lazy loaded image

CrewAI tool 调用是写在agents里还是task里

Lazy loaded image

2025_周报 #06

Lazy loaded image

Prompt Engeering

Lazy loaded image

Python测试开发

Lazy loaded image

流程式知识体系

Lazy loaded image

大模型评测指标评测任务&评测集

Loading...

Alan_Hsu

许心志我在蓝湛阔天海中定自主宰

统计

文章数:

197

Latest posts

打造个人跑步主页running_page

OpenCompass评测框架