ML4LM — Profiling torch.compile on DenseNet-121 Inference (GTX 1650)

less than 1 minute read

Published: September 11, 2025

Introduction

Deep dive into profiling torch.compile performance on DenseNet-121 inference using GTX 1650, exploring optimization techniques and performance metrics.

Read the full article on Medium

Share on

Twitter Facebook LinkedIn

ML4LM — Speculative Decoding — from where we left off

less than 1 minute read

Published: October 02, 2025

Most blogs stop at the basics and skip the real details. I break down what’s usually missing: batching, accept/reject checks, and fallbacks.

ML4LM — Speculative Decoding — from where we left off

less than 1 minute read

Published: October 02, 2025

Most blogs stop at the basics and skip the real details. I break down what’s usually missing: batching, accept/reject checks, and fallbacks.

ML4LM — Guards vs Graph Breaks in PyTorch: What You Need to Know [medium]

less than 1 minute read

Published: August 24, 2025

Guards vs Graph Breaks in PyTorch torch.compile

ML4LM — A tiny Triton primer (toy example) [medium]

less than 1 minute read

Published: August 23, 2025

A tiny Triton primer (toy example)

Hoyath Ali

ML4LM — Profiling torch.compile on DenseNet-121 Inference (GTX 1650)

Introduction

Share on

You May Also Enjoy

ML4LM — Speculative Decoding — from where we left off

ML4LM — Speculative Decoding — from where we left off

ML4LM — Guards vs Graph Breaks in PyTorch: What You Need to Know [medium]

Guards vs Graph Breaks in PyTorch torch.compile

ML4LM — A tiny Triton primer (toy example) [medium]

A tiny Triton primer (toy example)