SGLang推理引擎部署z-image-turbo
下载镜像
我7元花钱买了50G的镜像加速包,轩辕镜像加速包
#登录轩辕的专属域名才能享用加速
docker login -u 账号 -p '密码' docker.xuanyuan.run
docker pull docker.xuanyuan.run/lmsysorg/sglang:latest下载模型,之前写的文档有写,用魔塔的命令下载模型
conda activate vllm
modelscope download --model Tongyi-MAI/Z-Image-Turbo --local_dir /data/tongyi_z_image_turbo
modelscope download --model Qwen/Qwen3-8B --local_dir /data/qwen3-8b
modelscope download --model Qwen/Qwen-Image --local_dir /data/qwen-image
modelscope download --model Qwen/Qwen-Image-Edit --local_dir /data/qwen-image-edit
modelscope download --model Qwen/Qwen3-VL-8B-Instruct-GGUF --local_dir /data/Qwen3-VL-8B-Instruct-GGUF
modelscope download --model ZhipuAI/GLM-4.7-Flash --local_dir /data/GLM-4.7-Flash创建docker-compose.yml-qwen3-8b.yml文件
version: '3.8'
services:
sglang-z-image-turbo:
container_name: qwen3-8b
image: lmsysorg/sglang:latest
restart: always
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2
capabilities: [gpu]
shm_size: 64g
ipc: host
ulimits:
memlock:
soft: -1
hard: -1
stack:
soft: 67108864
hard: 67108864
ports:
- "8001:8001"
volumes:
- /data/qwen3-8b:/model
command: >
python -m sglang.launch_server
--model /model
--tp 2
--trust-remote-code
--host 0.0.0.0
--port 8001账号:admin
密码:Calong@2015
用gptstack 部署 Z-Image-Turbo 模型
用 docker-compose 搭建gptstack
version: '3.8' # 推荐使用3.8版本,兼容主流Docker版本
services:
gpustack:
container_name: gpustack # 对应--name gpustack
image: gpustack/gpustack # 镜像名称v2.0.1
restart: unless-stopped # 对应--restart unless-stopped
ports:
- "8010:80" # 对应-p 80:80
volumes:
- gpustack-data:/var/lib/gpustack # 对应--volume gpustack-data:/var/lib/gpustack
# 如需后台运行,执行docker-compose up -d即可(-d参数替代docker run的-d)
# 定义命名卷(与docker run的命名卷对应)
volumes:
gpustack-data:
driver: local # 使用本地卷驱动(默认)账号:admin
密码:默认密码需要到服务器运行命令获取命令如下:
sudo docker exec gpustack cat /var/lib/gpustack/initial_admin_password添加节点
到服务器执行命令:
nvidia-smi >/dev/null 2>&1 && echo "NVIDIA driver OK" || (echo "NVIDIA driver issue"; exit 1) && sudo docker info 2>/dev/null | grep -q "nvidia" && echo "NVIDIA Container Toolkit OK" || (echo "NVIDIA Container Toolkit not configured"; exit 1)将命令复制到服务器跑一下:
sudo docker run -d --name gpustack-worker \
--restart=unless-stopped \
--privileged \
--network=host \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume gpustack-data:/var/lib/gpustack \
--volume /data:/data \
--runtime nvidia \
quay.io/gpustack/gpustack:v2.0.1 \
--server-url http://192.168.8.109:8010 \
--token gpustack_88864acb6deb1410_bbc56706bf1f7e8f9f34a59e76427266 \
--advertise-address 192.168.8.109 添加sqlang镜像
注:因为z-image-turbo为新模型,需要sglang的v0.5.6post2 版本以上才能支持;
确保服务器上有这个模型;
yaml格式如下:
description: null
version_configs:
0.5.6-custom:
image_name: quay.io/gpustack/runner:cuda12.8-sglang0.5.6.post2
run_command: ''
custom_framework: cuda
default_backend_param: []
部署模型
表示部署完成;
使用 API 调用
调用代码如下:
已知问题:z-image-turbo 模型在调用 API 的response_format 值只能返回base64格式,需要处理解码才能得到图片;
import requests
import base64
import json
API_KEY = "gpustack_7a9fad9cd1ce8923_4a09e34fb5cb4ae97def2b5fe7911723"
URL = "http://192.168.8.109:8010/v1/images/generations"
SAVE_IMAGE_PATH = "./Squirtle.jpg"
# response_format = "url"
response_format = "b64_json"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}
data = {
"model": "z-image-turbo",
"n": 1,
"size": "1024x1024",
"response_format": response_format,
"style": "natural",
"prompt": "杰尼龟"
}
try:
response = requests.post(
url=URL,
headers=headers,
json=data,
timeout=120
)
response.raise_for_status()
res_json = response.json()
print(f"响应体:{json.dumps(res_json, ensure_ascii=False, indent=2)}")
if not res_json.get("data") or len(res_json["data"]) == 0:
raise ValueError("响应中未找到图片数据")
# 3. 根据响应格式处理图片
image_data = res_json["data"][0]
if response_format == "url":
# 处理 URL 格式:下载图片并保存
image_url = image_data.get("url")
if not image_url:
raise KeyError("响应中未找到 url 字段")
print(f"正在下载图片:{image_url}")
# 请求图片 URL(添加超时,避免卡死)
img_response = requests.get(image_url, timeout=60)
img_response.raise_for_status() # 检查图片URL是否可访问
# 保存图片到本地
with open(SAVE_IMAGE_PATH, "wb") as f:
f.write(img_response.content)
print(f"图片已从URL保存至: {SAVE_IMAGE_PATH}")
elif response_format == "b64_json":
# 处理 Base64 格式:解码并保存
image_b64 = image_data.get("b64_json")
if not image_b64:
raise KeyError("响应中未找到 b64_json 字段")
with open(SAVE_IMAGE_PATH, "wb") as f:
f.write(base64.b64decode(image_b64))
print(f"图片已从Base64解码保存至: {SAVE_IMAGE_PATH}")
else:
raise ValueError(f"不支持的响应格式:{response_format},仅支持 url / b64_json")
# 异常处理
except requests.exceptions.ConnectionError:
print(f"连接失败:无法访问 {URL},请检查IP/端口是否正确")
except requests.exceptions.HTTPError as e:
print(f"HTTP错误 {response.status_code}:{response.text}")
except requests.exceptions.Timeout:
print("请求超时:接口响应超过120秒(或图片URL下载超时)")
except KeyError as e:
print(f"响应格式错误:未找到关键字段 {e},响应内容:{res_json}")
except ValueError as e:
print(f"数据校验失败:{e}")
except Exception as e:
print(f"未知错误:{str(e)}")