Summary
Open-source “llm-course” by Maxime Labonne offers a free, three-track roadmap—LLM Fundamentals, LLM Scientist, LLM Engineer—supported by Colab notebooks and a paid LLM Engineer’s Handbook. A free HuggingChat/ChatGPT assistant quizzes learners and answers questions in real time.
Key Points
- Prerequisites: linear algebra, calculus, probability & statistics; Python fluency (NumPy, Pandas, Matplotlib, Seaborn).
- Core ML: scikit-learn algorithms, PCA/t-SNE, train/val/test splits, data-cleaning pipeline.
- Neural nets: back-prop, optimizers (SGD, Adam), regularisation (dropout, L1/L2); build MLP in PyTorch.
- NLP: tokenisation, TF-IDF, Word2Vec/GloVe/FastText, RNN/LSTM/GRU.
- Transformers: encoder-decoder → decoder-only GPT-style; self-attention; tokenisation choices affect speed & memory.
- Text generation: greedy/beam vs. temperature/nucleus sampling; visual guides by 3Blue1Brown, Karpathy, Bycroft.
- Pre-training: data-centric, compute-heavy (Llama 3.1 used 15 T tokens), but feasible below 1 B parameters with careful curation, deduplication, and tokenisation.
Next Steps
Clone the repo, pick a track, run the Colabs, and use the interactive assistant to test understanding.