Ollama【部署 05】Linux环境安装、升级、启动脚本（connection reset by peer异常）

原创

yuanzhengme

发布于 2025-08-26 18:05:54

1770

1.安装

GitHub下载安装包，上传到 Linux 服务器后：

# 解压
sudo tar -C /usr -xzf ollama-linux-amd64.tgz
# 启动
ollama serve
# 验证
ollama -v

2.升级

2.1 官网升级说明

可查看《手动安装文档》的 Updating 章节：

<font style="color:rgb(202, 225, 244);background-color:rgb(10, 14, 18);">Update Ollama by running the install script again:</font>

curl -fsSL https://ollama.com/install.sh | sh

<font style="color:rgb(202, 225, 244);background-color:rgb(10, 14, 18);">Or by re-downloading Ollama:</font>

curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
sudo tar -C /usr -xzf ollama-linux-amd64.tgz

实际上就是覆盖之前的安装文件。

2.2 实际升级步骤

下载最新版本安装包并上传服务器后：

# 依然是解压
tar -zxvf ollama-linux-amd64.tgz

# 输出日志（0.5.11升级到0.6.2）
bin/ollama
lib/ollama/cuda_v11/
lib/ollama/cuda_v11/libcublas.so.11
lib/ollama/cuda_v11/libggml-cuda.so

gzip: stdin: invalid compressed data--format violated
tar: 归档文件中异常的 EOF
tar: 归档文件中异常的 EOF
tar: Error is not recoverable: exiting now

# 输出日志（0.6.2升级到0.6.8）
bin/ollama
lib/ollama/cuda_v11/
lib/ollama/cuda_v11/libcudart.so.11.0
lib/ollama/cuda_v11/libcublas.so.11.5.1.109
lib/ollama/cuda_v11/libcublasLt.so.11
lib/ollama/cuda_v11/libcublas.so.11
lib/ollama/cuda_v11/libcudart.so.11.3.109
lib/ollama/cuda_v11/libggml-cuda.so
lib/ollama/cuda_v11/libcublasLt.so.11.5.1.109
lib/ollama/cuda_v12/
lib/ollama/cuda_v12/libcudart.so.12
lib/ollama/cuda_v12/libcublasLt.so.12
lib/ollama/cuda_v12/libcublas.so.12
lib/ollama/cuda_v12/libcudart.so.12.8.90
lib/ollama/cuda_v12/libcublas.so.12.8.4.1
lib/ollama/cuda_v12/libcublasLt.so.12.8.4.1
lib/ollama/cuda_v12/libggml-cuda.so
lib/ollama/libggml-base.so
lib/ollama/libggml-cpu-alderlake.so
lib/ollama/libggml-cpu-haswell.so
lib/ollama/libggml-cpu-icelake.so
lib/ollama/libggml-cpu-sandybridge.so
lib/ollama/libggml-cpu-skylakex.so
lib/ollama/libggml-cpu-sse42.so
lib/ollama/libggml-cpu-x64.so

可以忽略升级过程中的日志输出。

3.启动脚本

使用 lscpu查看 CPU 核心数：

# 查询命令
lscpu

# 输出结果
架构：                   x86_64
  CPU 运行模式：         32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  字节序：               Little Endian
CPU:                     40
  在线 CPU 列表：        0-39

启动脚本：

export OMP_NUM_THREADS=40
export OLLAMA_NUM_THREADS=40
export OLLAMA_NUM_PARALLEL=0
export OLLAMA_KEEP_ALIVE="2h"
export OLLAMA_MODELS=/root/.ollama/models
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_HOST=0.0.0.0:11434
nohup ./ollama serve >> serve.log 2>&1 &

3.1 变量说明

OLLAMA_DEBUG（显示额外的调试信息）

# 没有开启
export OLLAMA_DEBUG=1

OLLAMA_HOST（指定 ollama 服务器的 IP 地址和端口，默认为 127.0.0.1:11434）

export OLLAMA_HOST=0.0.0.0:11434

OLLAMA_KEEP_ALIVE（设置模型在内存中保持加载的时间，默认为 5m）

export OLLAMA_KEEP_ALIVE="2h"

OLLAMA_MAX_LOADED_MODELS（指定每个 GPU 上可以同时加载的最大模型数）
OLLAMA_MAX_QUEUE（设置允许排队的最大请求数量）
OLLAMA_MODELS（指定存储模型的目录路径）
OLLAMA_NUM_PARALLEL（设置允许同时处理的最大并行请求数量）
OLLAMA_NOPRUNE（禁用启动时清理模型 blob 文件的功能）
OLLAMA_ORIGINS（指定允许的跨域请求来源，以逗号分隔）

export OLLAMA_ORIGINS="http://example.com,https://localhost"

OLLAMA_SCHED_SPREAD（强制将模型均匀分配到所有 GPU 上）
OLLAMA_FLASH_ATTENTION（启用 Flash Attention，加速注意力机制）
OLLAMA_KV_CACHE_TYPE（设置 K/V 缓存的量化类型，默认为 f16）

export OLLAMA_KV_CACHE_TYPE="q4_0"

OLLAMA_LLM_LIBRARY（指定 LLM 库以绕过自动检测）
OLLAMA_GPU_OVERHEAD（为每个 GPU 预留一部分 VRAM，以字节为单位）
OLLAMA_LOAD_TIMEOUT（设置模型加载的最大等待时间，默认为 5m）

export OLLAMA_LOAD_TIMEOUT="10m"

4.connection reset by peer

Error: pull model manifest: Get "https://registry.ollama.ai/v2/library/qwen3/manifests/0.6b": read tcp xxx.xxx.x.xxx:49086->104.21.75.227:443: read: connection reset by peer

公司内网问题，所有的服务器都无法下载模型文件，我使用云服务器下载模型，进行了离线导入。

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

linux

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

linux

#Ollama

登录后参与评论

0 条评论

热度