Zhao Dongyu's Blog

Marlin代码解读

Posted on 2024-09-26 Edited on 2026-04-13 In 技术
Symbols count in article: 28k Reading time ≈ 26 mins.

Marlin Kernel是IST-DASLab 开发的GPTQ量化模型高性能 FP16(activation) x INT4(weight) GEMM算子实现，在现有W4A16 GEMM Kernel中，Marlin Kernel性能是最好的。

作为一个不会cuda的小白，研究完marlin算子之后神清气爽，

【长文预警 & 多图预警】

Posted on 2024-11-04 Edited on 2026-04-13 In 技术
Symbols count in article: 123 Reading time ≈ 1 mins.

在知乎看到这么一句话：

跟着好好上韩松的课程，把 lab 都自己认真做一遍，特别是 AWQ 那套算法和推理的框架 （quantizaiton），只要能读懂整套代码，就能自己回答你问的问题了～

感觉自己其实没有一个系统的学习，于是跟着韩松学习一遍，在此记录一下。

Posted on 2024-11-13 Edited on 2026-04-13 In 技术
Symbols count in article: 6.3k Reading time ≈ 6 mins.

学习C++

Posted on 2024-11-14 Edited on 2026-04-13 In 技术
Symbols count in article: 9.7k Reading time ≈ 9 mins.

做一些在树莓派zero w 上面的实验，这里记录一下整体流程。

Posted on 2024-11-22 Edited on 2026-04-13 In 技术
Symbols count in article: 1.7k Reading time ≈ 2 mins.

近期在Mac M1上使用tensorflow，发现不能用了。报错：

The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.

Posted on 2024-11-14 Edited on 2026-04-13 In 技术
Symbols count in article: 3.9k Reading time ≈ 4 mins.

其实最近心思一直在把之前的推理框架的工作能够沉淀一下发一篇论文，今天丁大佬教育我不要闭门造车，不要局限于推理框架，多看看别人的工程学习学习,学学triton、TVM、mlc-llm这些东西。很受用。

听人劝，吃饱饭。开始学习 triton

Posted on 2025-02-07 Edited on 2026-04-13 In 技术
Symbols count in article: 962 Reading time ≈ 1 mins.

扎实基础，系统学习。

Posted on 2025-03-12 Edited on 2026-04-13 In 技术
Symbols count in article: 23 Reading time ≈ 1 mins.

在 Hopper H20 平台优化 FlashMLA。

Posted on 2024-12-02 Edited on 2026-04-13 In 技术
Symbols count in article: 3k Reading time ≈ 3 mins.

研究了一下 tensorflow 实现 int8 量化的 softmax 算子

Posted on 2025-03-25 Edited on 2026-04-13 In 技术
Symbols count in article: 7.5k Reading time ≈ 7 mins.

“`torch.compile“` speeds the flame,
Trade-offs linger, but worth the game.
Train or infer, it cuts the line,
With care and craft, its power’s thine.