## 做法

Warm-up 做法一开始在 ResNet 中明确提到：

We further explore n = 18 that leads to a 110-layer ResNet. In this case, we find that the initial learning rate of 0.1 is slightly too large to start converging . So we use 0.01 to warm up the training until the training error is below 80% (about 400 iterations), and then go back to 0.1 and continue training.

...

With an initial learning rate of 0.1, it starts converging (<90% error) after several epochs, but still reaches similar accuracy.

1. 使用学习率 0.01 训练模型到 training error 小于 80%（大概 400 步）；
2. 使用学习率 0.1 继续训练模型，每次 loss 平稳不变时就降低为 1/10 倍，降低两次。