大语言模型--- 不同种类Reward Model输出的代码分析；Seq. Classifier介绍；Seq. Classifier总体输出代码；代码分析

原创

非那雄胺消费者

发布于 2024-11-25 10:44:04

1140

文章被收录于专栏：区块链区块链

1. 概要

Reward 模型主要分为以下三类：

- Seq. Classifiers（序列分类器）

- Custom Classifiers（定制分类器）

- Generative Models（生成模型）

每一个种类输出代码都不同，本文主要详细讲解Seq. Classifiers（序列分类器）的输出代码。

2. Seq. Classifier介绍

Seq. Classifier（Sequence Classifier）是一种模型类型，用于对输入序列（如文本、音频、视频等）进行分类。它接受一段序列输入，并输出一个或多个标签，通常应用于文本分类、意图识别、情感分析等任务。

3. Seq. Classifier总体输出代码

import torch

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
device = "cuda:0"
model_name = "Skywork/Skywork-Reward-Llama-3.1-8B-v0.2"
rm = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map=device,
    attn_implementation="flash_attention_2",
    num_labels=1,
)
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."

conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]

# Format and tokenize the conversations
# If you use `tokenize=False` with `apply_chat_template` and `tokenizer()` to tokenize the conversation,
# remeber to remove the duplicated BOS token.
conv1_tokenized = rm_tokenizer.apply_chat_template(conv1, tokenize=True, return_tensors="pt").to(device)
conv2_tokenized = rm_tokenizer.apply_chat_template(conv2, tokenize=True, return_tensors="pt").to(device)

# Get the reward scores
with torch.no_grad():
    score1 = rm(conv1_tokenized).logits[0][0].item()
    score2 = rm(conv2_tokenized).logits[0][0].item()
print(f"Score for response 1: {score1}")
print(f"Score for response 2: {score2}")

# Output:
# 27B: 
# Score for response 1: 0.5625
# Score for response 2: -8.5

# 8B:
# Score for response 1: 13.6875
# Score for response 2: -9.1875

4. 代码分析

4.1. 引入必要的库

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

- torch：

用于处理张量计算，以及与 GPU 设备的交互。

- transformers：

- AutoModelForSequenceClassification：用于加载分类任务模型。

- AutoTokenizer：用于加载与模型匹配的分词器，将文本转为模型输入。

4.2. 定义设备和模型名称

device = "cuda:0"
model_name = "Skywork/Skywork-Reward-Llama-3.1-8B-v0.2"
```
- device：指定计算设备。
- model_name：这里是一个名为 [Skywork/Skywork-Reward-Llama-3.1-8B-v0.2](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B-v0.2) 的奖励模型。

4.3. 加载一个预训练的序列分类器模型

rm = AutoModelForSequenceClassification.from_pretrained(
     model_name,
     torch_dtype=torch.bfloat16,     
     device_map=device,     
     attn_implementation="flash_attention_2",
     num_labels=1, )

4.3.1 传递参数描述

from_pretrained 是 Transformers 库中核心方法之一，用于从预训练模型加载配置、权重，并实例化一个具体的模型对象。在这段代码中，from_pretrained 是 AutoModelForSequenceClassification 类的一个类方法，并通过一些关键参数进行配置。具体代码的含义如下：

- AutoModelForSequenceClassification.from_pretrained：加载一个预训练的序列分类器模型，适用于分类任务。这是因为奖励模型本质是一个回归任务，可以视为特殊的单类分类任务。

- model_name：指定要加载的模型地址，这里是"Skywork/Skywork-Reward-Llama-3.1-8B-v0.2"。

- torch_dtype=torch.bfloat16：指定使用 bfloat16 精度。这是一种低精度格式，减少内存占用，同时保持训练和推理的准确性。

- device_map=device：将模型分配到指定设备（如 GPU）。

- attn_implementation="flash_attention_2"：

- 指定使用 Flash Attention v2 实现。

- Flash Attention 是一种高效的注意力机制实现，减少了 GPU 内存占用并加快了 Transformer 的推理速度。

- num_labels=1：表示这是一个单一输出任务（如回归任务，输出奖励模型的奖励值），而不是多类别分类任务。

4.3.2 from_pretrained方法的工作流程

- 解析配置文件

config = kwargs.pop("config", None)
if not isinstance(config, PretrainedConfig):
    config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)

如果用户未显式提供 config，方法会自动从 model_name 或路径加载模型的配置文件（如 config.json）。

- 选择具体模型类

for config_class, model_class in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING.items():
    if isinstance(config, config_class):
        return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)

根据配置文件中的 config_class，从映射表 MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING 中找到对应的模型类（如 LlamaForSequenceClassification）。

调用具体模型类的 from_pretrained 方法加载权重并返回模型实例。

4.4. 加载分词器

rm_tokenizer = AutoTokenizer.from_pretrained(model_name)

- 使用 AutoTokenizer 加载与模型匹配的分词器。

- 分词器负责将自然语言文本（如 "Hello, world!"）转为模型可以处理的 token 格式。

- 需要确保分词器和模型是匹配的。

4.5. 定义问题（prompt）和两个回答（response1 和 response2）


prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."

4.6. 组织对话样本（conv1 和 conv2）

conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]

每组对话包含两部分：

- 用户输入（prompt）：用户提问。

- 模型回答（response1 或 response2）：模型根据用户输入生成的回答。

4.7. 张量格式转换

conv1_tokenized = rm_tokenizer.apply_chat_template(conv1, tokenize=True, return_tensors="pt").to(device)
conv2_tokenized = rm_tokenizer.apply_chat_template(conv2, tokenize=True, return_tensors="pt").to(device)

- 函数rm_tokenizer.apply_chat_template作用：将对话样本（conv1 和 conv2）格式化为奖励模型支持的输入模板，并进行分词。

- 输入：

- conv1 和 conv2：对话样本，是用户输入和模型回答组成的列表。

- tokenize=True：表示需要将对话样本转换为 token（模型可理解的 ID）。

- return_tensors="pt"：指定返回格式为 PyTorch 张量（torch.Tensor），以便输入到模型。

- 输出：

- 格式化后的分词张量，适配奖励模型的输入要求。

- .to(device)

- 作用：将生成的张量移动到指定的计算设备。

- device：

- 通常是 "cuda:0" 表示 GPU 设备。

- 如果设备是 CPU，则可以是 "cpu"。

- 目的是：

- 确保数据和模型在同一个设备上，避免计算错误。

- 生成的结果

- conv1_tokenized 和 conv2_tokenized：

- 是两个分词后的 PyTorch 张量，包含了对话的内容（conv1 和 conv2）

- 这些张量可以直接输入到奖励模型进行推理。

4.8. 生成评分

with torch.no_grad():
    score1 = rm(conv1_tokenized).logits[0][0].item()
    score2 = rm(conv2_tokenized).logits[0][0].item()
print(f"Score for response 1: {score1}")
print(f"Score for response 2: {score2}")

1. with torch.no_grad():

- 关闭梯度计算：

- 在推理过程中，不需要计算梯度，因此使用 torch.no_grad() 来节省内存和计算资源。

- 适用于模型评估阶段，而非训练阶段。

2. rm(conv1_tokenized)奖励模型的推理过程

- 作用：

- 将 conv1_tokenized 传入奖励模型（rm）进行推理，获取输出 logits。

- 输入数据：

- conv1_tokenized 是经过分词和模板化的 PyTorch 张量，表示对话 conv1。

- 输出：

- rm(conv1_tokenized) 返回一个包含 logits 的对象，表示模型对输入对话的评价。

3. .logits

- 作用：

- 提取奖励模型的输出 logits。

- logits 是奖励模型对输入对话的未归一化分数。

- 数据形状：

- logits[0][0]：

- logits[0]：因为输入是 batch（批量），取第一个对话样本的 logits。

- logits[0][0]：取输出 logits 的第一个元素，通常是奖励分数（回归任务）。

4. .item()

- 作用：

- 将 PyTorch 张量（标量）转换为 Python 的浮点数（float）。

- 便于后续输出或存储分数。

邀请人:zhangjiqun

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

人工智能

LLM

腾讯混元大模型

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

人工智能

LLM

腾讯混元大模型

登录后参与评论

0 条评论

热度