在上一篇文章LLM Agent提效进阶1:反思工作流—91%精度大超GPT-4 24%中,我们深入探讨了三篇开创性的论文,它们共同描绘了大语言模型中反思设计智能体的宏伟蓝图。今天,我们将迈出从理论到实践的关键一步——通过安装和测试Reflexion框架,我们将揭开智能体工作流的神秘面纱,实现知识的深度融合与应用。由于框架东西较多,我们暂定分为上中下三篇来讲解。
git clone https://github.com/noahshinn/reflexion.git
查看该项目,它有4个测试项目分别为
这4个项目是独立的,需要分别下载依赖。为简单起见,我们以hotpotqa_runs开始分析。
安装依赖
cd hotpotqa_runs
conda create -n reflexion python=3.10
pip install -r requirements.txt
项目实际的入口是hotpotqa_runs/notebooks 下的三个文件
在深入探讨思维链CoT(Chain of Thoughts)之前,我们先以ReactQA.ipynb为例,来谈谈实际中可能会遇到的问题。当你打开这个juypter笔记本后,你会发现还有报错,警告很多依赖没安装。我本地生成了所需的依赖文件requirements.txt,有需要的同学可以联系我发送给你。
pip install juypter
pip install openai
pip install wikipedia
pip install "pandas<2.0.0"
在juypter notebook中不太好观察代码本身的调用和错误,我们还要调试代码。因此,我们使用juypter的nbconvert工具将ReactQA.ipynb转换为纯python文件。
jupyter nbconvert --to script hotpotqa_runs/notebooks/ReactQA.ipynb
# 生成ReactQA.py
# 移动到notebook外,可以解决代码上一些本地util问题。
mv ReactQA.py ../
在本项目中,我们采用了一种成本效益更高的方法来实现大型语言模型(LLM)的测试。虽然项目配置中默认使用OpenAI作为LLM的接口,但考虑到实际应用中可能涉及的费用,我们选择探索更为经济的替代方案。为此,我们采用之前部署的Llama.cpp服务,以取代OpenAI的服务。我把部署文章也链接在这里。
接下来,启动最近部署的Mistral 7B量化模型。该模型经过优化,能够在保持性能的同时减少资源消耗。通过这种方式,我们不仅能够评估框架的运行效率,还能够探索在不同成本约束下的最佳实践。
./server -m ./models/mymodels/mistral-7b-instruct-v0.2.Q4_K_S.gguf -c 8192 -n -1 -t 7 --embeddings
修改代码,让Agent中的LLM调用本地的Llama.cpp服务。
export OPENAI_API_KEY="sk"
因为hotpotqa_runs/agents.py
中有好几处使用os.environ['OPENAI_API_KEY']
来获取key,我们设置一个假的API_KEY就不用改这部分代码了。
打开hotpotqa_run/llm.py文件,设置openai_api_base为http://localhost:8080
class AnyOpenAILLM:
def __init__(self, *args, **kwargs):
# Determine model type from the kwargs
model_name = kwargs.get('model_name', 'gpt-3.5-turbo')
kwargs['openai_api_base'] = "http://localhost:8000"
if model_name.split('-')[0] == 'text':
self.model = OpenAI(*args, **kwargs)
self.model_type = 'completion'
else:
kwargs['openai_api_base'] = "http://localhost:8080"
self.model = ChatOpenAI(*args, **kwargs)
self.model_type = 'chat'
python ReactQA.py
输出
Question: The creator of "Wallace and Gromit" also created what animation comedy that matched animated zoo animals with a soundtrack of people talking about their homes?
Thought 1: The creator of "Wallace and Gromit" is Nick Park. I need to search for other animation comedies by Nick Park that match this description.
Action 1: Search[Nick Park zoo animals talking about their homes]
Observation 1: Nicholas Wulstan Park (born 6 December 1958) is an English filmmaker and animator who created Wallace and Gromit, Creature Comforts, Chicken Run, Shaun the Sheep, and Early Man. Park has been nominated for an Academy Award a total of six times and won four with Creature Comforts (1989), The Wrong Trousers (1993), A Close Shave (1995) and Wallace & Gromit: The Curse of the Were-Rabbit (2005).He has also received five BAFTA Awards, including the BAFTA for Best Short Animation for A Matter of Loaf and Death, which was also the most watched television programme in the United Kingdom in 2008. His 2000 film Chicken Run is the highest-grossing stop motion animated film.In 1985, Park joined Aardman Animations based in Bristol, and for his work in animation he was among the British cultural icons selected by artist Peter Blake to appear in a 2012 version of Blake's most famous artwork—the Beatles' Sgt. Pepper's Lonely Hearts Club Band album cover—to celebrate the British cultural figures of his life.Park was appointed a Commander of the Order of the British Empire (CBE) in the 1997 Birthday Honours for services to the animated film industry.
Thought 2: Nick Park also created Creature Comforts, which is the animation comedy that matched animated zoo animals with a soundtrack of people talking about their homes.
Action 2: Finish[Creature Comforts]
第一个问题已经完成并得出了答案,那么我们来看看 React 是如何运行这么多轮获取正确答案的吧?
ReactQA.py
的代码比较简单,我把主要流程的代码适当的做了一些增减,以方便运行调试,并贴在这里准备开始分析。
# Load the HotpotQA Sample
hotpot = joblib.load('data/hotpot-qa-distractor-sample.joblib').reset_index(drop=True)
# Define the Reflexion Strategy
strategy: ReflexionStrategy = ReflexionStrategy.REFLEXION
agent_cls = ReactReflectAgent if strategy != ReflexionStrategy.NONE else ReactAgent
row = hotpot.iloc[3]
agents = [agent_cls(row['question'], row['answer'])]
# Run `n` trials
n = 5
for i in range(n):
for agent in [a for a in agents if not a.is_correct()]:
agent.run(reflect_strategy=strategy)
print(f'Answer: {agent.key}')
hotpot = joblib.load('data/hotpot-qa-distractor-sample.joblib').reset_index(drop=True)
那么这个数据都是啥样呢?它的每一条问答包含问题,答案,难度,支持的事实依据还有上下文。
5a7613c15542994ccc9186bf
strategy: ReflexionStrategy = ReflexionStrategy.REFLEXION
共有以下4种反思策略。
这里设定为REFLEXION,该反思策略为应用refexion到下一次推理轨迹中。
agent_cls = ReactReflectAgent if strategy != ReflexionStrategy.NONE else ReactAgent
row = hotpot.iloc[3]
agents = [agent_cls(row['question'], row['answer'])]
由于策略设定为REFLEXION,因此agent_cls就是ReactReflectAgent。
n = 5
n用于设定总共所有的agent跑5次。
for agent in [a for a in agents if not a.is_correct()]:
agent.run(reflect_strategy=strategy)
print(f'Answer: {agent.key}')
所以,第一个问题的答案就是在agent.run
之后分析出的。鉴于Agent run分析起来耗时较多,我们将agent.run
的运行细节放到下一篇分析中。
此外,如果你对大语言模型应用开发开发有兴趣,还可以考虑购买Langchain实战课程:《LangChain 实战:LLM 应用开发指南》,点击阅读原文查看购买。
[1]
alfworld: https://alfworld.github.io/