Lesson 1. Machine Learning Basics
- 1.1. About the videos
- 1.2. About this lesson
- 1.3. AI and Machine Learning
- 1.3.1. Early Progress
- 1.3.2. AI Winters
- 1.3.3. The Modern Era
- Exercise 1
- 1.4. Model
- Exercise 2
- 1.5. Four-Step Machine Learning Process
- 1.6. Vector
- Exercise 3
- 1.7. Neural Network
- Exercise 4
- 1.8. Matrix
- 1.9. Gradient Descent
- Exercise 5
- 1.10. Automatic Differentiation
- Exercise 6
- Quiz 1
Lesson 2. Language Modeling Basics
- 2.1. Bag of Words
- Exercise 7
- 2.2. Word Embeddings
- Exercise 8
- 2.3. Byte-Pair Encoding
- Exercise 9
- 2.4. Language Model
- 2.5. Count-Based Language Model
- 2.6. Evaluating Language Models
- 2.6.1. Perplexity
- 2.6.2. ROUGE
- 2.6.3. Human Evaluation
- Exercise 10
- Quiz 2
Lesson 3. Recurrent Neural Network
- 3.1. Elman RNN
- 3.2. Mini-Batch Gradient Descent
- Exercise 11
- 3.3. Programming an RNN
- 3.4. RNN as a Language Model
- Exercise 12
- 3.5. Embedding Layer
- 3.6. Training an RNN Language Model
- Exercise 13
- 3.7. Dataset and DataLoader
- Exercise 14
- 3.8. Training Data and Loss Computation
- Quiz 3
Lesson 4. Transformer
- 4.1. Decoder Block
- Exercise 15
- 4.2. Self-Attention
- 4.2.1. Step 1 of Self-Attention
- 4.2.2. Step 2 of Self-Attention
- 4.2.3. Step 3 of Self-Attention
- 4.2.4. Step 4 of Self-Attention
- 4.2.5. Step 5 of Self-Attention
- 4.2.6. Step 6 of Self-Attention
- 4.3. Position-Wise Multilayer Perceptron
- Exercise 16
- 4.4. Rotary Position Embedding
- Exercise 17
- 4.5. Multi-Head Attention
- Exercise 18
- 4.6. Residual Connection
- 4.7. Root Mean Square Normalization
- Exercise 19
- 4.8. Key-Value Caching
- 4.9. Transformer in Python
- Exercise 20
- Quiz 4
Lesson 5. Large Language Model
- 5.1. Why Larger Is Better
- 5.1.1. Large Parameter Count
- 5.1.2. Large Context Size
- 5.1.3. Large Training Dataset
- 5.1.4. Large Amount of Compute
- Exercise 21
- 5.2. Supervised Finetuning
- 5.3. Finetuning a Pretrained Model
- 5.3.1. Baseline Emotion Classifier
- 5.3.2. Emotion Generation
- 5.3.3. Finetuning to Follow Instructions
- Exercise 22
- 5.4. Sampling From Language Models
- 5.4.1. Basic Sampling with Temperature
- 5.4.2. Top-\mathbf{k} Sampling
- 5.4.3. Nucleus (Top-p) Sampling
- 5.4.4. Penalties
- 5.5. Low-Rank Adaptation (LoRA)
- 5.5.1. The Core Idea
- 5.5.2. Parameter-Efficient Finetuning (PEFT)
- 5.6. LLM as a Classifier
- Exercise 23
- 5.7. Prompt Engineering
- 5.7.1. Features of a Good Prompt
- 5.7.2. Follow-up Actions
- 5.7.3. Code Generation
- 5.7.4. Documentation Synchronization
- 5.8. Hallucinations
- 5.8.1. Reasons for Hallucinations
- 5.8.2. Preventing Hallucinations
- 5.9. LLMs, Copyright, and Ethics
- 5.9.1. Training Data
- 5.9.2. Generated Content
- 5.9.3. Open-Weight Models
- 5.9.4. Broader Ethical Considerations
- Quiz 5
Lesson 6. Further Reading
- 6.1. Mixture of Experts
- Exercise 24
- 6.2. Model Merging
- Exercise 25
- 6.3. Model Compression
- 6.4. Preference-Based Alignment
- Exercise 26
- 6.5. Advanced Reasoning
- 6.6. Language Model Security
- 6.7. Vision Language Model
- Exercise 27
- 6.8. Preventing Overfitting
- 6.9. Concluding Remarks
- 6.10. More From the Author
- Quiz 6
