nmt 标签归档 | 土法炼钢兴趣小组的算法知识备份

【Transformer 与注意力机制】12｜Bahdanau Attention：注意力的早期形态

2026-04-15 | transformer | #attention #bahdanau #nmt #additive-attention #history #transformer

把 Bahdanau, Cho, Bengio 2014 那篇「Neural Machine Translation by Jointly Learning to Align and Translate」逐项拆开。固定 context vector 的瓶颈、双向 RNN 编码、additive attention 公式 vᵀtanh(W₁s + W₂h)、与 Luong 2015 multiplicative attention 的取舍，以及为什么这是 Q/K/V 的雏形。

【Transformer 与注意力机制】19｜《Attention Is All You Need》论文背景

2026-04-15 | transformer | #transformer #history #attention #paper-reading #vaswani #google #nmt

回到 2017 年 6 月那篇论文：八位作者、Google Brain/Translate 的内部背景、LSTM 时代的工程困境、为什么这篇在当年是「机器翻译的论文」、为什么七年后却被读成了「大模型时代的圣经」。