May 11, 2026 TRL sequence packing → DeepSeek MLA: 누락된 cu_seqlens 복원 May 10, 2026 MLA 학습 시 modeling-side projection fusion: q_a/kv_a 배치 + K-side absorption May 10, 2026 DeepSeek 계열 MoE 학습 가속: Python expert loop → grouped GEMM