前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >本地推理,单机运行,MacM1芯片系统基于大语言模型C++版本LLaMA部署“本地版”的ChatGPT

本地推理,单机运行,MacM1芯片系统基于大语言模型C++版本LLaMA部署“本地版”的ChatGPT

作者头像
用户9127725
发布于 2023-03-28 08:36:29
发布于 2023-03-28 08:36:29
1.1K00
代码可运行
举报
运行总次数:0
代码可运行

    OpenAI公司基于GPT模型的ChatGPT风光无两,眼看它起朱楼,眼看它宴宾客,FaceBook终于坐不住了,发布了同样基于LLM人工智能大语言模型LLaMA,号称包含70亿、130亿、330亿和650亿这4种参数规模的模型,参数是指神经网络中的权重和偏置等可调整的变量,用于训练和优化神经网络的性能,70亿意味着神经网络中有70亿个参数,由此类推。

    在一些大型神经网络中,每个参数需要使用32位或64位浮点数进行存储,这意味着每个参数需要占用4字节或8字节的存储空间。因此,对于包含70亿个参数的神经网络,其存储空间将分别为8 GB或12GB。

    此外,神经网络的大小不仅取决于参数的数量,还取决于神经元的数目,层数和其他结构参数等。因此,70亿的神经网络可能会占用更多的存储空间,具体取决于网络的结构和实现细节。

    因此这种体量的模型单机跑绝对够我们喝一壶,所以本次使用最小的LLaMA 7B模型进行测试。

    LLaMA项目安装和模型配置

    和Stable-Diffusion项目如出一辙,FaceBook开源的LLaMA项目默认写死使用cuda模式,这也就意味着必须有 NVIDIA 的 GPU来训练和运行,不过好在大神GeorgiGerganov 用 C++ 基于 LLaMA 项目重写了一个跑在 CPU 上的移植版本 llama.cpp应用。

    llama.cpp首先适配的就是苹果的M系列芯片,这对于果粉来说无疑是一个重大利好,首先通过命令拉取C++版本的LLaMA项目:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
git clone https://github.com/ggerganov/llama.cpp

    随后进入项目目录:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
llama.cpp

    在项目中,需要单独建立一个模型文件夹models:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
mkdir models

    随后去huggingface官网下载LLaMA的7B模型文件:https://huggingface.co/nyanko7/LLaMA-7B/tree/main

    是的,主模型文件已经达到了13.5gb之巨,如果本地硬盘空间告急,请谨慎下载。

    随后在models目录建立模型子目录7B:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
mkdir 7B

    将tokenizer.model和tokenizer_checklist.chk放入和7B平行的目录中:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
➜  models git:(master) ✗ ls
7B                      tokenizer.model         tokenizer_checklist.chk

    随后将checklist.chk consolidated.00.pth和params.json放入7B目录中:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
➜  7B git:(master) ✗ ls
checklist.chk       consolidated.00.pth  params.json

    至此,模型就配置好了。

    LLaMA模型转换

    由于我们没有使用FaceBook的原版项目,所以它的模型还需要进行转换,也就是转换为当前C++版本的LLaMA可以运行的模型。

    这里通过Python脚本进行转换操作:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
python3 convert-pth-to-ggml.py models/7B/ 1

    第一个参数是模型所在目录,第二个参数为转换时使用的浮点类型,使用 float32,转换的结果文件会大一倍,当该参数值为 1时,则使用 float16 这个默认值,这里我们使用默认数据类型。

    程序输出:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
➜  llama.cpp git:(master) ✗ python convert-pth-to-ggml.py models/7B/ 1
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': -1}
n_parts = 1

Processing part 0

Processing variable: tok_embeddings.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: output.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.0.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.0.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.0.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.0.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.1.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.1.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.1.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.1.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.1.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.2.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.2.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.2.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.2.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.2.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.3.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.3.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.3.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.3.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.3.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.4.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.4.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.4.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.4.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.4.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.5.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.5.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.5.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.5.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.5.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.6.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.6.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.6.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.6.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.6.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.7.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.7.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.7.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.7.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.7.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.8.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.8.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.8.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.8.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.8.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.9.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.9.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.9.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.9.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.9.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.10.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.10.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.10.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.10.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.10.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.11.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.11.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.11.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.11.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.11.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.12.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.12.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.12.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.12.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.12.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.13.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.13.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.13.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.13.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.13.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.14.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.14.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.14.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.14.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.14.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.15.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.15.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.15.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.15.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.15.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.16.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.16.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.16.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.16.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.16.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.17.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.17.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.17.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.17.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.17.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.18.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.18.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.18.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.18.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.18.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.19.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.19.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.19.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.19.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.19.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.20.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.20.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.20.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.20.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.20.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.21.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.21.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.21.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.21.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.21.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.22.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.22.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.22.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.22.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.22.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.23.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.23.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.23.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.23.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.23.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.24.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.24.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.24.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.24.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.24.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.25.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.25.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.25.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.25.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.25.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.26.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.26.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.26.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.26.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.26.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.27.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.27.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.27.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.27.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.27.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.28.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.28.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.28.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.28.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.28.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.29.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.29.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.29.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.29.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.29.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.30.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.30.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.30.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.30.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.30.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.31.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.31.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.31.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.31.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Processing variable: layers.31.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  Converting to float32
Done. Output file: models/7B//ggml-model-f16.bin, (part 0)

    可以看到,如果转换成功,会在models/7B/目录生成一个C++可以调用的ggml-model-f16.bin模型文件。

    LLaMA模型调用

    接下来就可以调用转换后的模型了,首先在编译C++项目:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
make

    程序返回:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
➜  llama.cpp git:(master) ✗ make
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX:      Apple clang version 14.0.0 (clang-1400.0.29.202)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread -c utils.cpp -o utils.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread main.cpp ggml.o utils.o -o main  -framework Accelerate
./main -h
usage: ./main [options]

options:
  -h, --help            show this help message and exit
  -i, --interactive     run in interactive mode
  -ins, --instruct      run in instruction mode (use with Alpaca models)
  -r PROMPT, --reverse-prompt PROMPT
                        in interactive mode, poll user input upon seeing PROMPT (can be
                        specified more than once for multiple prompts).
  --color               colorise output to distinguish prompt and user input from generations
  -s SEED, --seed SEED  RNG seed (default: -1)
  -t N, --threads N     number of threads to use during computation (default: 4)
  -p PROMPT, --prompt PROMPT
                        prompt to start generation with (default: empty)
  --random-prompt       start with a randomized prompt.
  -f FNAME, --file FNAME
                        prompt file to start generation.
  -n N, --n_predict N   number of tokens to predict (default: 128)
  --top_k N             top-k sampling (default: 40)
  --top_p N             top-p sampling (default: 0.9)
  --repeat_last_n N     last n tokens to consider for penalize (default: 64)
  --repeat_penalty N    penalize repeat sequence of tokens (default: 1.3)
  -c N, --ctx_size N    size of the prompt context (default: 512)
  --ignore-eos          ignore end of stream token and continue generating
  --memory_f16          use f16 instead of f32 for memory key+value
  --temp N              temperature (default: 0.8)
  -b N, --batch_size N  batch size for prompt processing (default: 8)
  -m FNAME, --model FNAME
                        model path (default: models/llama-7B/ggml-model.bin)

c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize  -framework Accelerate

    编译成功后,本地会生成一个main.cpp文件。

    随后根据编译后输出的说明文档直接调用模型即可:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
./main -m ./models/7B/ggml-model-f16.bin -p 'Hi i am '

    程序输出:

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
➜  llama.cpp git:(master)./main -m ./models/7B/ggml-model-f16.bin -p 'hi i am'
main: seed = 1679400707
llama_model_load: loading model from './models/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 1
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 13365.09 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-f16.bin'
llama_model_load: .................................... done
llama_model_load: model size = 12853.02 MB / num tensors = 291

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: prompt: ' hi i am'
main: number of tokens in prompt = 6
     1 -> ''
 13450 -> ' hi'
   423 -> 'i'
 25523 -> ' am'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


 hi i am a pythoner, but sunk to become a ruby

    说实话,推理速度实在不敢恭维,也可能是因为笔者的电脑配置太渣导致。

    结语

    LLaMA 7B模型总体上需要纯英文的提示词(prompt),对中文的理解能力还不够,优势是确实可以单机跑起来,当然本地跑的话,减少了网络传输数据的环节,推理效率自然也就更高,对于普通的AI爱好者来说,足矣。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2023-03-24,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
LeCun转赞:在苹果M1/M2芯片上跑LLaMA!130亿参数模型仅需4GB内存
---- 新智元报道   编辑:好困 【新智元导读】现在,Meta最新的大语言模型LLaMA,可以在搭载苹果芯片的Mac上跑了! 前不久,Meta前脚发布完开源大语言模型LLaMA,后脚就被网友放出了无门槛下载链接,「惨遭」开放。 消息一出,圈内瞬间就热闹了起来,大家纷纷开始下载测试。 但那些手头没有顶级显卡的朋友们,就只能望模型兴叹了。 不过,问题不大。Georgi Gerganov在最近做了一个名为「llama.cpp」的项目——没有GPU也能跑LLaMA。 项目地址:https://githu
新智元
2023/03/29
1.7K0
LeCun转赞:在苹果M1/M2芯片上跑LLaMA!130亿参数模型仅需4GB内存
250行代码从头搭建Llama 3,GitHub一天4.6k星!Karpathy大赞
Llama系列作为为数不多的优质开源LLM,一直受到开发者们的追捧。在Hugging Face社区的文本生成模型中,几乎是「霸榜」的存在。
新智元
2024/05/22
7990
250行代码从头搭建Llama 3,GitHub一天4.6k星!Karpathy大赞
Karpathy称赞,从零实现LLaMa3项目爆火,半天1.5k star
一个月前,Meta 发布了开源大模型 llama3 系列,在多个关键基准测试中优于业界 SOTA 模型,并在代码生成任务上全面领先。
机器之心
2024/05/22
3460
Karpathy称赞,从零实现LLaMa3项目爆火,半天1.5k star
windows10搭建llama大模型
随着人工时代的到来及日渐成熟,大模型已慢慢普及,可以为开发与生活提供一定的帮助及提升工作及生产效率。所以在新的时代对于开发者来说需要主动拥抱变化,主动成长。
逍遥壮士
2023/09/12
1.2K0
windows10搭建llama大模型
机器学习|从0开发大模型之模型预训练
继续写《从0开发大模型》系列文章,本文主要介绍预训练过程。 预训练是目的是让模型学习知识,需要将预处理的数据(《机器学习|从0开发大模型之数据预处理》)中生成的 pretrain_data.bin 文件的上下文全部学习到,那预训练怎么做呢?
用户1904552
2025/02/27
1710
机器学习|从0开发大模型之模型预训练
mlc-llm 推理优化和大语言模型搭建解析(文末送书)
本文解析一下mlc-llm(https://github.com/mlc-ai/mlc-llm)对大模型推理的流程以及使用的图优化,算子优化策略。mlc-llm的模型部署流程可以查看官方文档:https://mlc.ai/mlc-llm/docs/ ,也可以参考我前段时间写的这篇MLC-LLM 部署RWKV World系列模型实战(3B模型Mac M2解码可达26tokens/s) 。
BBuf
2023/09/26
1.8K0
mlc-llm 推理优化和大语言模型搭建解析(文末送书)
【Rust与AI】LLM模型基本架构
本篇是《Rust与AI》系列的第二篇,上一篇我们主要介绍了本系列的概览和方向,定下了一个基调。本篇我们将介绍LLM的基本架构,我们会以迄今为止使用最广泛的开源模型LLaMA为例展开介绍。
MikeLoveRust
2023/12/30
1.1K0
【Rust与AI】LLM模型基本架构
【LLM系列之LLaMA】LLaMA: Open and Efficient Foundation Language Models
LLaMA 是 Meta AI 发布的包含 7B、13B、33B 和 65B 四种参数规模的基础语言模型集合,LLaMA-13B 仅以 1/10 规模的参数在多数的 benchmarks 上性能优于 GPT-3(175B),LLaMA-65B 与业内最好的模型 Chinchilla-70B 和 PaLM-540B 比较也具有竞争力。
致Great
2023/08/25
1K0
【LLM系列之LLaMA】LLaMA: Open and Efficient Foundation Language Models
构建能够使用 CPU 运行的 MetaAI LLaMA2 中文大模型
本篇文章聊聊如何使用 GGML 机器学习张量库,构建让我们能够使用 CPU 来运行 Meta 新推出的 LLaMA2 大模型。
soulteary
2023/07/24
1.1K0
构建能够使用 CPU 运行的 MetaAI LLaMA2 中文大模型
本篇文章聊聊如何使用 GGML 机器学习张量库,构建让我们能够使用 CPU 来运行 Meta 新推出的 LLaMA2 大模型。
soulteary
2023/09/04
9460
构建能够使用 CPU 运行的 MetaAI LLaMA2 中文大模型
Text Generation Inference源码解读(二):模型加载与推理
本文以TGI对Llama 2的支持为例,解读TGI的模型加载和推理实现,总结其中运用到的推理优化技巧,最后以TGI增加AWQ推理支持为例复盘模型加载逻辑。虽尽力保持行文简洁,但最后成文还是很长,请读者按需跳转阅读。本文所分析TGI代码版本为1.1.1。
BBuf
2024/02/22
2.1K0
Text Generation Inference源码解读(二):模型加载与推理
从零到一使用 Ollama、Dify 和 Docker 构建 Llama 3.1 模型服务
本篇文章聊聊,如何使用 Ollama、Dify 和 Docker 来完成本地 Llama 3.1 模型服务的搭建。
soulteary
2024/07/28
2.2K0
从零到一使用 Ollama、Dify 和 Docker 构建 Llama 3.1 模型服务
从零到一使用 Ollama、Dify 和 Docker 构建 Llama 3.1 模型服务
本篇文章聊聊,如何使用 Ollama、Dify 和 Docker 来完成本地 Llama 3.1 模型服务的搭建。
soulteary
2024/08/01
1.1K0
从零到一使用 Ollama、Dify 和 Docker 构建 Llama 3.1 模型服务
多模态大模型篇
在CV方向上,一般我们输入的都是图片,无论这个图片多大,都会resize到一个统一的尺寸。最终经过CNN的提取,变成一个特征向量,那么这个特征向量的维度是一样的。再经过softmax变成一个分类(Class)的概率
算法之名
2023/10/16
9730
多模态大模型篇
音视频开发之旅(92)-多模态Clip论文解读与源码分析
在做分类 检测以及分割任务时,数据的标注非常关键, 比如可用于分类任务的ImageNet数据集共有120万张图片1000个分类,  可用于目标检测和分割任务的COCO数据集共有33万张图片80个目标类别. 传统的图像分类模型通常在标注的数据集上进行训练,但这些数据集的类别和数量相对较小,训练的模型泛化能力也受限,很难直接zero-shot迁移到下游任务.
音视频开发之旅
2024/09/07
3660
音视频开发之旅(92)-多模态Clip论文解读与源码分析
Llama深入浅出
前方干货预警:这可能是你能够找到的最容易懂的最具实操性的学习开源LLM模型源码的教程。
lyhue1991
2023/09/05
2.4K1
Llama深入浅出
使用Python实现深度学习模型:Transformer模型
Transformer模型自提出以来,已经成为深度学习领域,尤其是自然语言处理(NLP)中的一种革命性模型。与传统的循环神经网络(RNN)和长短期记忆网络(LSTM)不同,Transformer完全依赖于注意力机制来捕捉序列中的依赖关系。这使得它能够更高效地处理长序列数据。在本文中,我们将详细介绍Transformer模型的基本原理,并使用Python和TensorFlow/Keras实现一个简单的Transformer模型。
Echo_Wish
2024/06/08
8790
使用Python实现深度学习模型:Transformer模型
无所不能的Embedding6 - 跨入Transformer时代~模型详解&代码实现
上一章我们聊了聊quick-thought通过干掉decoder加快训练, CNN—LSTM用CNN作为Encoder并行计算来提速等方法,这一章看看抛开CNN和RNN,transformer是如何只基于attention对不定长的序列信息进行提取的。虽然Attention is All you need论文本身是针对NMT翻译任务的,但transformer作为后续USE/Bert的重要组件,放在embedding里也没啥问题。以下基于WMT英翻中的任务实现了transfromer,完整的模型代码详见DSXiangLi-Embedding-transformer
风雨中的小七
2021/03/03
9490
无所不能的Embedding6 - 跨入Transformer时代~模型详解&代码实现
从头预训练一只超迷你 LLaMA 3
这次打算用 Hugging Face 的 API 来写一份预训练大(小)模型的代码,也就是用 Trainer 来做预训练。由于只是想练习一下,因此打算选一个极小模型 + 小数据集。为了贴近主流,于是打算预训练一个 LLaMA 3——不过是超迷你版本,大小仅不到 20M。
NewBeeNLP
2024/06/04
9850
从头预训练一只超迷你 LLaMA 3
AI智能体研发之路-模型篇(二):DeepSeek-V2-Chat 训练与推理实战
5月6日私募基金幻方发布DeepSeek-V2,千亿级模型,每百万Tokens仅需1元-2元。5月15日,字节发布白菜价的豆包大模型,5月21日阿里、百度相机大幅下调甚至免费开放自家商用模型接口,大模型价格战正式打响。而被誉为大模型价格屠夫的“DeepSeek-V2”到底是怎么个事儿,是否可以进行训练和推理,今天我们来展开讲一讲。
LDG_AGI
2024/08/13
1.5K0
AI智能体研发之路-模型篇(二):DeepSeek-V2-Chat 训练与推理实战
推荐阅读
相关推荐
LeCun转赞:在苹果M1/M2芯片上跑LLaMA!130亿参数模型仅需4GB内存
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档