Stanford CS336 Language Modeling from Scratch | Spring 2025 | Scaling laws 2
2025-06-04 13:29
Scaling Laws and Model Training Optimization in Large Language Models
LLM
缩放法则
µP (Maximal Update Parametrization)
超参数优化
模型训练优化
WSD学习率调度器
Chinchilla法则
计算效率
模型初始化
IsoFLOP分析
已摘要
阅读时间:12 分钟(4191 个字)
2 summary versions