microgpt.py — 226 lines
Demystify LLMs in 200 Lines of Code.
Read the actual code behind language models. Highlight any line. Get a precise explanation of what it does and why.
class Head(nn.Module):
""" one head of self-attention """
def forward(self, x):
k = self.key(x)
wei = q @ k.transpose(-2,-1)
AI ›
Head is one attention head. It projects input x into queries, keys, and values, then computes scaled dot-product attention...
Based on Andrej Karpathy's microgpt