ai-infra 分类归档 | 土法炼钢兴趣小组的算法知识备份

【大模型基础设施工程】01：大模型基础设施全景 —— 训练、推理、RAG、Agent、观测

2026-04-22 | architecture · ai-infra | #llm #infra #overview #training #inference #rag #agent #deepseek #openai

面向工程师的大模型基础设施开篇地图，覆盖 2022 到 2026 的工程分水岭、五层工程栈、训练与推理的工程差异、中国与全球行业版图以及成本曲线。

【大模型基础设施工程】02：GPU 计算入门——SM、Tensor Core、HBM、NVLink

2026-04-22 | architecture · ai-infra | #llm #infra #gpu #cuda #tensor-core #hopper #blackwell #hbm #flashattention #ascend

从 CPU 与 GPU 的架构差异出发，讲清楚 SM、Warp、Tensor Core、HBM、NVLink 的工程含义，并结合 Roofline、FlashAttention 与国产算力栈，给出大模型工程师能直接上手的 GPU 心智模型。

【大模型基础设施工程】03：CUDA 生态——cuBLAS、cuDNN、NCCL、Triton、CUTLASS

2026-04-22 | architecture · ai-infra | #llm #infra #cuda #cublas #cudnn #nccl #triton #cutlass #rocm #cann #tensor-engine

从 nvcc 到 Triton，把 NVIDIA 软件栈的每一层拆给大模型工程师看，顺便谈谈 ROCm、CANN 为什么一直追不上。

【大模型基础设施工程】04：互联与网络——NVLink、InfiniBand、RoCE 与国产替代

2026-04-22 | architecture · ai-infra | #llm #infra #interconnect #nvlink #infiniband #roce #nvswitch #nvl72 #fat-tree #ascend #huawei-cloudmatrix

从 NVLink / NVSwitch / NVL72 到 InfiniBand NDR 与 RoCEv2，再到华为 CloudMatrix、阿里 HPN、腾讯星脉，系统梳理万卡集群互联的工程选型与踩坑。

【大模型基础设施工程】05：训练全景：Pre-train、SFT、RLHF、DPO、蒸馏

2026-04-22 | architecture · ai-infra | #llm #infra #training #pretrain #sft #rlhf #scaling-law #adamw #tokenizer #deepseek #chinchilla

以工程视角串联现代 LLM 的四阶段训练栈——预训练、中训、SFT 与对齐——覆盖数据、Tokenizer、优化器、精度、Scaling Law 与代表性训练框架。

【大模型基础设施工程】06：3D 并行深度——数据 / 张量 / 流水 / 序列 / ZeRO

2026-04-22 | architecture · ai-infra | #llm #infra #parallelism #data-parallel #tensor-parallel #pipeline #sequence-parallel #zero #fsdp #megatron #dualpipe

万卡训练的基石：从 DP、TP、PP、SP、EP 到 ZeRO/FSDP，再到 DualPipe 的零气泡流水，一篇讲透并行策略的工程选型与通信优化。

【大模型基础设施工程】07：Megatron-LM 与 DeepSpeed

2026-04-22 | architecture · ai-infra | #llm #infra #megatron #deepspeed #fsdp #torchtitan #colossal-ai #training-framework #zero #nemo

开源训练框架双雄对比，覆盖 Megatron-LM、DeepSpeed、FSDP2、torchtitan、Colossal-AI，含选型与工程实操。

【大模型基础设施工程】08：MoE 训练工程

2026-04-22 | architecture · ai-infra | #llm #infra #moe #mixture-of-experts #gshard #switch-transformer #mixtral #deepseek #deepep #megablocks #expert-parallel

混合专家（MoE）模型训练工程实战：从 GShard、Switch、Mixtral 到 DeepSeek-V3，覆盖门控、负载均衡、Expert Parallel、All-to-All 通信与 DeepEP / MegaBlocks 等开源栈

【大模型基础设施工程】09：RLHF 与对齐流水线

2026-04-22 | architecture · ai-infra | #llm #infra #rlhf #ppo #dpo #grpo #reward-model #alignment #deepseek-r1 #openai-o1 #trl #openrlhf

从 SFT、奖励模型到 PPO、DPO、GRPO 的完整对齐流水线工程实践，覆盖 OpenAI o1、DeepSeek-R1 等推理模型的 RL 路线与主流框架选型。

【大模型基础设施工程】10：Checkpoint 与故障容忍

2026-04-22 | architecture · ai-infra | #llm #infra #checkpoint #fault-tolerance #resiliency #dcp #sdc #llama3 #xai-colossus #straggler

万卡集群训练每天都在断：从 GPU HBM ECC、NVLink 降级到 SDC，本篇系统讲 checkpoint、恢复与弹性容错的工程实践。

【大模型基础设施工程】11：推理引擎基础

2026-04-22 | architecture · ai-infra | #llm #infra #inference #prefill #decode #kv-cache #gqa #mla #continuous-batching #ttft #flash-decoding

从 Prefill/Decode 两阶段、KV Cache、Continuous Batching 到 PD 分离，系统讲清楚大模型推理的工程基础。

【大模型基础设施工程】14：量化工程 —— INT8 / FP8 / FP4 / AWQ / GPTQ

2026-04-22 | architecture · ai-infra | #llm #infra #quantization #fp8 #fp4 #int8 #int4 #awq #gptq #smoothquant #bitnet #transformer-engine

从数据类型、PTQ/QAT 算法、KV Cache 量化到 H100/B200/MI300/昇腾硬件支持，覆盖 AutoAWQ、GPTQ、SmoothQuant、BitNet 与 vLLM/TensorRT-LLM/llama.cpp 工程落地

【大模型基础设施工程】15：推测解码与 MTP

2026-04-22 | architecture · ai-infra | #llm #infra #speculative-decoding #medusa #eagle #mtp #lookahead #jacobi #deepseek-v3 #self-speculative

从经典 Speculative Decoding 到 Medusa、EAGLE、Lookahead、MTP 与自推测——系统梳理让大模型"一次多吐几 token"的工程方法与引擎支持

【大模型基础设施工程】16：长上下文工程

2026-04-22 | architecture · ai-infra | #llm #infra #long-context #rope #yarn #ring-attention #mamba #mla #nsa #streamingllm #ulysses

从 4K 到 1M+ 上下文的训练与推理工程——位置编码扩展、稀疏 attention、Ring Attention、KV 压缩与长上下文评测

【大模型基础设施工程】17：RAG 工程全景

2026-04-22 | architecture · ai-infra | #llm #infra #rag #retrieval #embedding #rerank #chunking #hyde #graphrag #ragas #bge #colbert

从文档解析、切片、嵌入、索引、检索、重排到生成与评估，系统梳理 RAG 的工程流水线、进阶范式与国内外生态

【大模型基础设施工程】18：向量库与图 RAG

2026-04-22 | architecture · ai-infra | #llm #infra #vector-db #hnsw #ivfpq #diskann #milvus #qdrant #pgvector #graphrag #neo4j #colbert

从 HNSW、IVF-PQ、DiskANN 到 Milvus、Qdrant、pgvector；从稠密稀疏混合到 Microsoft GraphRAG 的工程实操。

【大模型基础设施工程】19：Agent 框架工程

2026-04-22 | architecture · ai-infra | #llm #infra #agent #langgraph #autogen #crewai #mcp #a2a #coze #browser-use #memgpt #react

从 ReAct 到 LangGraph、AutoGen、CrewAI、Coze，再到 MCP 与 A2A 协议，系统梳理 LLM Agent 框架的工程栈与选型

【大模型基础设施工程】20：工具调用与 MCP

2026-04-22 | architecture · ai-infra | #llm #infra #function-call #tool-use #mcp #openai-tools #claude-tools #json-schema #outlines #xgrammar #structured-output

从 OpenAI function calling 到 Anthropic MCP，深入剖析大模型工具调用的格式、结构化输出、并行调用、协议设计与工程安全边界。

【大模型基础设施工程】21：推理服务化

2026-04-22 | architecture · ai-infra | #llm #infra #serving #triton #ray-serve #kserve #bentoml #lora #mooncake #pd-disaggregation #serverless-gpu

从单机引擎到生产级集群：Triton、Ray Serve、KServe、vLLM OpenAI Server、PD 分离、LoRA 多租户、KEDA 自动扩缩、Serverless GPU 的全景工程实战。

【大模型基础设施工程】22：大模型网关

2026-04-22 | architecture · ai-infra | #llm-gateway #litellm #oneapi #portkey #kong-ai #semantic-cache #routellm #guardrails #fallback #multi-provider

企业级 LLM 调用的统一入口：多供应商路由、配额与计费、语义缓存、Guardrails 与可观测，LiteLLM、OneAPI、Portkey、Kong/Envoy AI Gateway 的工程选型与落地。

【大模型基础设施工程】23：LLM 可观测性

2026-04-22 | architecture · ai-infra | #llm #infra #observability #langsmith #langfuse #helicone #openllmetry #opentelemetry #ragas #phoenix #gpu-metrics #llm-eval

面向 LLM、RAG 与 Agent 系统的可观测性工程实战；覆盖 Metrics、Logs、Traces、Token 成本、幻觉评估、Langfuse / LangSmith / Phoenix / OpenLLMetry 与 OpenTelemetry GenAI 语义约定。

【大模型基础设施工程】24：成本、合规与安全

2026-04-22 | architecture · ai-infra | #llm #infra #cost #security #compliance #ai-act #prompt-injection #jailbreak #guardrails #owasp-llm #confidential-compute #tee

从卡时、电费到 AI Act 与 Prompt Injection——一份写给基础设施工程师的大模型成本、合规、安全三位一体手册。

【大模型基础设施工程】25：大模型基础设施未来

2026-04-22 | architecture · ai-infra | #llm #infra #outlook #world-model #agentic-os #rubin #mamba #diffusion-llm #edge-llm #chip #career #ai-act

系列收官：从 2022 到 2026 的四年拐点出发，梳理推理时 Scaling、世界模型、Agentic OS、专用芯片、架构创新、端侧、成本腰斩、合规八大趋势，并给出工程师成长路径与 25 篇索引。

【大模型基础设施工程·特别篇】DeepSeek-V4 与国产芯片：从备份路线到主路径

2026-04-25 | architecture · ai-infra | #llm #infra #deepseek #domestic-chip #ascend #cann #training #inference #ai-chip

DeepSeek-V4 发布后，如果国产芯片已经支撑旗舰模型的关键训练或推理链路，它会怎样影响 NVIDIA 生态、国产 AI 芯片、云厂商、模型团队和工程师的技术选择？

【大模型基础设施工程·特别篇】27：DeepSeek-V4 的极致性价比从哪来

2026-05-27 | architecture · ai-infra | #llm #infra #deepseek #moe #long-context #kv-cache #fp4 #muon #agent

从 MoE 激活比、CSA/HCA 混合注意力、mHC、Muon，到磁盘级 KV cache、FP4 QAT 和专家蒸馏，系统拆解 DeepSeek-V4 为什么能把 1M 上下文和强 Agent 能力做得又强又便宜。

大模型基础设施工程

2026-04-22 | architecture · ai-infra | #llm #infra #training #inference #rag #agent #vllm #sglang #deepseek #llmops

面向中国工程团队的大模型基础设施系列。从 GPU/CUDA/互联、训练框架与 3D 并行、vLLM/SGLang 推理引擎、量化与推测解码、RAG/Agent 到服务化、网关、可观测性与安全合规，覆盖 LLMOps 全链路。

【大模型基础设施工程】13：vLLM / SGLang / TensorRT-LLM / TGI 对比

2026-04-22 | architecture · ai-infra | #llm #infra #vllm #sglang #tensorrt-llm #tgi #lmdeploy #mindie #inference-engine #flashinfer

主流推理引擎的架构、性能、生态深度对比，给出工程选型与落地决策依据。

【大模型基础设施工程】12：PagedAttention 与 Continuous Batching

2026-04-22 | architecture · ai-infra | #llm #infra #vllm #pagedattention #continuous-batching #chunked-prefill #prefix-cache #radixattention #sglang

vLLM 的两大核心革新——Continuous Batching 让 GPU 打满、PagedAttention 让显存不再碎，推理吞吐量因此跃升一个数量级。本篇从操作系统类比到工程实操全盘拆解。