AI架构师之路第17课--构建企业级AI自动化训练与部署流水线(Fine-tune + Eval + Serve)

我爱免费 · 发表于 2025-11-11 14:27

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？注册

×

作者：微信文章
一、目标

构建一个端到端的 AI 模型交付流水线，实现：

二、核心阶段与功能

阶段	目标	关键任务
1. Fine-tune（微调）	在特定领域数据上优化基础模型	- 数据预处理 - 分布式训练（DDP/FSDP/DeepSpeed） - 超参搜索（可选） - 模型 checkpoint 保存
2. Eval（评估）	客观衡量模型性能	- 在验证集/测试集上推理 - 计算指标（准确率、F1、BLEU、ROUGE 等） - 人工评估抽样（可选） - 生成评估报告
3. Serve（部署）	将模型提供为低延迟、高可用服务	- 模型格式转换（如 ONNX/TensorRT） - 推理服务封装（vLLM/TGI/FastAPI） - 自动扩缩容、健康检查 - API 网关接入

三、典型技术栈

功能	推荐工具
编排引擎	Airflow, Kubeflow Pipelines, Metaflow, Prefect, GitHub Actions
训练框架	Hugging Face Transformers + Accelerate / DeepSpeed / FSDP
评估框架	evaluate (HF), torchmetrics, 自定义脚本
模型注册表	MLflow Model Registry, Weights & Biases (W&B), DVC
部署方案	vLLM, TensorRT-LLM, TGI (Text Generation Inference), FastAPI + Docker
基础设施	Kubernetes (K8s), Docker, AWS SageMaker, Azure ML, GCP Vertex AI
监控与日志	Prometheus + Grafana, ELK, OpenTelemetry

四、流水线流程示例（以 LLM 微调为例）

五、关键实践要点

1. 数据与代码版本化

2. 模型可追溯性

3. 评估驱动部署（Evaluation-Gated Deployment）

4. 部署策略

5. 成本与性能平衡

六、企业级成熟且完整的自动化训练与部署流水线示例需要支持的功能：

1、整体架构图

2、项目结构

qwen-lora-pipeline/├── data/│ └── alpaca_zh_demo.json    # 中文指令微调数据（Alpaca 格式）├── src/│ ├── train_lora.py          # LoRA 微调脚本│ ├── evaluate_model.py       # 评估 + 注册到 MLflow│ └── create_k8s_manifest.py # 动态生成 K8s YAML（可选）├── docker/│ ├── Dockerfile.vllm       # vLLM + LoRA 推理镜像│ └── entrypoint.sh          # 启动脚本├── k8s/│ └── deployment.yaml.tpl    # K8s 模板├── requirements.txt├── .gitlab-ci.yml└── config.yaml                # 超参配置
3、核心组件实现
requirements.txt

torch==2.1.0transformers>=4.35accelerate>=0.24peft>=0.6datasets>=2.14mlflow>=2.8sentencepieceeinopsvllm>=0.3.0src/train_lora.py（LoRA 微调 Qwen-7B）

import osimport jsonimport mlflowfrom datasets import load_datasetfrom transformers import ( AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForSeq2Seq)from peft import LoraConfig, get_peft_model, TaskTypedef main(): mlflow.set_tracking_uri(os.getenv("MLFLOW_TRACKING_URI", "http://mlflow:5000")) mlflow.set_experiment("qwen-lora-finetune") with mlflow.start_run():       model_name = "/models/Qwen-7B-Chat"  # 私有模型路径（预下载）       tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)       model = AutoModelForCausalLM.from_pretrained(          model_name,          trust_remote_code=True,          device_map="auto",          torch_dtype="auto"       )       # 启用 LoRA       peft_config = LoraConfig(          task_type=TaskType.CAUSAL_LM,          inference_mode=False,          r=64,          lora_alpha=16,          lora_dropout=0.1,          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],       )       model = get_peft_model(model, peft_config)       model.print_trainable_parameters()  # 打印可训练参数量       # 加载数据       dataset = load_dataset("json", data_files="data/alpaca_zh_demo.json")["train"]       def format_example(example):          return f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n{example['output']}"       def tokenize_function(examples):          texts = [format_example(ex) for ex in examples["examples"]]          return tokenizer(texts, truncation=True, padding="max_length", max_length=512)       tokenized_dataset = dataset.map(          tokenize_function,          batched=True,          remove_columns=dataset.column_names,          num_proc=4       )       training_args = TrainingArguments(          output_dir="./lora_output",          per_device_train_batch_size=2,          gradient_accumulation_steps=8,          learning_rate=2e-4,          num_train_epochs=1,          logging_steps=10,          save_strategy="no",          fp16=True,          report_to="mlflow"       )       trainer = Trainer(          model=model,          args=training_args,          train_dataset=tokenized_dataset,          data_collator=DataCollatorForSeq2Seq(tokenizer, padding=True),       )       trainer.train()       # 保存 LoRA 适配器       model.save_pretrained("./lora_output/adapter")       tokenizer.save_pretrained("./lora_output/adapter")       mlflow.log_artifacts("./lora_output/adapter", artifact_path="model")if __name__ == "__main__": main()src/evaluate_model.py（评估 + 注册模型）

import osimport mlflowfrom mlflow.models import infer_signaturedef main(): client = mlflow.MlflowClient(tracking_uri=os.getenv("MLFLOW_TRACKING_URI")) run_id = os.getenv("CI_MLFLOW_RUN_ID")  # 从 CI 传入 # 简化评估：检查模型文件是否存在 model_path = f"mlruns/.../{run_id}/artifacts/model"  # 实际需解析 accuracy = 0.85  # 模拟评估结果 # 注册模型到 Registry model_uri = f"runs:/{run_id}/model" result = mlflow.register_model(model_uri, name="qwen-lora-zh") # 设置 Stage（Staging） client.transition_model_version_stage(       name="qwen-lora-zh",       version=result.version,       stage="Staging" ) # 写入部署标志 if accuracy >= 0.8:       with open("deploy_flag", "w") as f:          f.write("true")       print(f"✅ Model registered as version {result.version}, ready for deployment.") else:       with open("deploy_flag", "w") as f:          f.write("false")if __name__ == "__main__": main()docker/Dockerfile.vllm

FROM vllm/vllm-openai:latest# 安装中文分词依赖RUN pip install sentencepiece# 复制 LoRA 适配器（由 CI 在构建时注入）COPY lora_output/adapter /app/lora_adapter# 启动脚本COPY docker/entrypoint.sh /app/entrypoint.shRUN chmod +x /app/entrypoint.shEXPOSE 8000ENTRYPOINT ["/app/entrypoint.sh"]docker/entrypoint.sh

#!/bin/bashMODEL_PATH="/models/Qwen-7B-Chat"LORA_PATH="/app/lora_adapter"# 启动 vLLM，加载基础模型 + LoRApython -m vllm.entrypoints.openai.api_server \  --model $MODEL_PATH \  --lora-modules zh-adapter=$LORA_PATH \  --tokenizer-mode auto \  --tensor-parallel-size 1 \  --port 8000k8s/deployment.yaml.tpl

stages:  - train  - evaluate  - deployvariables:  MLFLOW_TRACKING_URI: "http://mlflow.internal:5000"  REGISTRY: "registry.internal.example.com"  IMAGE_NAME: "$REGISTRY/qwen-lora:$CI_COMMIT_SHORT_SHA"before_script:  - pip install -r requirements.txttrain:  stage: train  script: - python src/train_lora.py  artifacts: paths:    - lora_output/ expire_in: 1 weekevaluate:  stage: evaluate  script: - export CI_MLFLOW_RUN_ID=$(cat mlflow_run_id.txt)  # 实际需从 train 阶段获取 - python src/evaluate_model.py - cat deploy_flag  dependencies: - train  artifacts: paths:    - deploy_flag    - lora_output/deploy-prod:  stage: deploy  script: # 构建镜像 - cp -r lora_output/adapter docker/ - docker build -f docker/Dockerfile.vllm -t $IMAGE_NAME docker/ - docker push $IMAGE_NAME # 渲染 K8s YAML 并部署 - sed "s|{{ .Image }}|$IMAGE_NAME|g" k8s/deployment.yaml.tpl > k8s/deployment.yaml - kubectl apply -f k8s/deployment.yaml  dependencies: - evaluate  only: variables:    - $CI_COMMIT_BRANCH == "main"  when: on_success  environment: name: production  rules: - exists:       - deploy_flag    allow_failure: false - if: '$(cat deploy_flag) == "true"'.gitlab-ci.yml（关键：自动触发 K8s 部署）

stages:  - train  - evaluate  - deployvariables:  MLFLOW_TRACKING_URI: "http://mlflow.internal:5000"  REGISTRY: "registry.internal.example.com"  IMAGE_NAME: "$REGISTRY/qwen-lora:$CI_COMMIT_SHORT_SHA"before_script:  - pip install -r requirements.txttrain:  stage: train  script: - python src/train_lora.py  artifacts: paths:    - lora_output/ expire_in: 1 weekevaluate:  stage: evaluate  script: - export CI_MLFLOW_RUN_ID=$(cat mlflow_run_id.txt)  # 实际需从 train 阶段获取 - python src/evaluate_model.py - cat deploy_flag  dependencies: - train  artifacts: paths:    - deploy_flag    - lora_output/deploy-prod:  stage: deploy  script: # 构建镜像 - cp -r lora_output/adapter docker/ - docker build -f docker/Dockerfile.vllm -t $IMAGE_NAME docker/ - docker push $IMAGE_NAME # 渲染 K8s YAML 并部署 - sed "s|{{ .Image }}|$IMAGE_NAME|g" k8s/deployment.yaml.tpl > k8s/deployment.yaml - kubectl apply -f k8s/deployment.yaml  dependencies: - evaluate  only: variables:    - $CI_COMMIT_BRANCH == "main"  when: on_success  environment: name: production  rules: - exists:       - deploy_flag    allow_failure: false - if: '$(cat deploy_flag) == "true"'部署后调用示例

# 调用 vLLM OpenAI 兼容 APIcurl http://qwen-lora-service:8000/v1/chat/completions \  -H "Content-Type: application/json" \  -d '{ "model": "qwen-7b", "messages": [{"role": "user", "content": "你好！"}], "lora_name": "zh-adapter"  }'增强建议（生产级）

功能	实现方式
自动回滚	监控服务健康，失败时 kubectl rollout undo
多环境	为 dev/staging/prod 创建不同 K8s namespace
模型版本路由	Ingress + Header 路由到不同模型版本
GPU 资源隔离	使用 K8s Device Plugin + Resource Quota
日志监控	Fluentd + Prometheus + Grafana

账号		自动登录	找回密码
密码			注册

萍聚头条

AI架构师之路第17课--构建企业级AI自动化训练与部署流水线(Fine-tune + Eval + Serve)

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

浏览过的版块

萍聚头条

AI架构师之路 第17课--构建企业级AI自动化训练与部署流水线(Fine-tune + Eval + Serve)

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

浏览过的版块

AI架构师之路第17课--构建企业级AI自动化训练与部署流水线(Fine-tune + Eval + Serve)