2 hrs
In this hands-on course, Andrej Karpathy — a founding member of OpenAI and one of the world's foremost AI educators — walks you through building a byte-pair-encoding tokenizer from the ground up. Tokenization is how large language models break down text into digestible pieces, yet it remains one of the most misunderstood components of modern AI. By the end of this course, you'll understand exactly how LLMs process language at the foundational level.
You're an AI enthusiast or engineer who's ready to move beyond tutorials and actually understand the internals of LLMs. This course is for anyone who wants to build, fine-tune, or deploy AI systems — not just use them.
Comfortable with Python and basic programming concepts. Familiarity with how neural networks work is helpful but not required — Karpathy explains each step clearly.
India's AI talent pool is growing fast, and companies like Flipkart, Amazon India, and early-stage AI startups across Bangalore, Delhi, and Mumbai are actively hiring engineers who understand LLM internals — not just those who know how to call APIs. Tokenization knowledge is especially valuable if you're working on multilingual AI (Hindi, Tamil, Bengali models) or adapting LLMs for Indian languages and dialects. Understanding this layer puts you ahead of most candidates and opens doors to senior engineering roles and research positions.
Yes, completely free. You can watch the full course on YouTube with no paywalls or hidden charges.
The course is about 2 hours. We'd suggest setting aside a focused weekend afternoon or breaking it into two 1-hour sessions during the week. Pause often to code along — that's where the real learning happens.
This course doesn't offer a formal certificate, but you'll get something more valuable: the ability to explain and build tokenizers from scratch. That knowledge speaks louder on interviews and in real work.