
Making Deep Learning Go Brrrr From First Principles - Horace
So, if you want to keep your GPUs going brrrr, let's discuss the three components your system might be spending time on - compute, memory bandwidth, and overhead. Behind the bitter lesson is a legion of engineers keeping GPUs running efficiently. Image from Gwern.
GPUs Go Brrr · Hazy Research - Stanford University
May 12, 2024 · On the practical side, we’re going to talk about what we’ve learned about making GPUs go brr -- and release an embedded DSL, ThunderKittens, that we’ve built to help us write some particularly speedy kernels (which we are also releasing).
GPUs Go Brrr : r/LocalLLaMA - Reddit
This post is a mixture of practice and philosophy. On the practical side, we’re going to talk about what we’ve learned about making GPUs go brr -- and release an embedded DSL, ThunderKittens, that we’ve built to help us write some particularly speedy kernels (which we are also releasing).
GPUs Go Brrr - Hacker News
Newer GPUs actually support dynamic memory allocation, recursion, and the GPU threads have their own stacks, so you could in fact treat them as sequential devices and write games and simulators directly on them.
[D] Making Deep Learning Go Brrrr From First Principles
Mar 15, 2022 · To help address that, I wrote a blog called "Making Deep Learning Go Brrrr From First Principles": https://horace.io/brrr_intro.html. Basically, for most models, there are 3 regimes that you might be spending all of your time on - Compute, Memory-Bandwidth, and Overhead.
ThunderKittens: A Simple Embedded DSL for AI kernels
May 12, 2024 · Relatively quickly, we had a small library (DSL?) that we called ThunderKittens that we hope lets us write simple-to-understand clean code that indeed makes gpus go brrr. [1] Our observations for TK are pretty simple: You want to keep tensor cores busy.
Making your GPU go BRRR: Creating a CUDA Layer in PyTorch
Mar 13, 2024 · Implement the forward and backward pass in PyTorch. This gives access to an online debugger and the full functionality of Python, like Jupyter Notebooks. Validate the implementation with gradcheck. This somewhat magic function runs your forward pass and does numerical derivation to validate your backward pass code.
GPU go brrr: Estimating OLS (with standard errors) via deep learning
Jul 20, 2020 · GPU go brrr: Estimating OLS (with standard errors) via deep learning So a bunch of my criminologists friends have methods envy. So to help them out, I made some python functions to estimate OLS models using pytorch (a deep learning python library).
GPUs Go Brrr - Aili
May 12, 2024 · GPUs Go Brrr
The article discusses optimizing the performance of AI models on GPUs, particularly the NVIDIA H100 GPU. It covers various techniques and hardware features that can be leveraged to maximize GPU utilization, including:
GPUs Go Brrr - Simon Willison
May 13, 2024 · GPUs Go Brrr (via) Fascinating, detailed low-level notes on how to get the most out of NVIDIA's H100 GPUs (currently selling for around $40,000 a piece) from the research team at Stanford who created FlashAttention, among other things.