Page 3 of 4 · ← Hub

PyTorch & deep learning core

Tensors with automatic differentiation, modular networks, and the standard training loop. Includes a compact theory refresher: loss, optimization, and generalization—what AI engineering interviews often probe alongside code.

torch.nn autograd DataLoader CUDA
4
Topics
01

Tensors & autograd

🔥

torch.Tensor, devices, gradients

Like NumPy with GPU and derivatives

A tensor is a multi-dimensional array. Set requires_grad=True to track operations for reverse-mode autodiff (.backward()). Use device="cuda" when a GPU is available—keep tensors on one device to avoid silent copies.

python
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.linspace(-1, 1, steps=100, device=device, requires_grad=True)
y = (x * x).sum()
y.backward()
# x.grad holds ∂y/∂x
02

nn.Module & building blocks

🧠

Subclassing nn.Module

Layers register parameters automatically

python
import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dim: int, hidden: int, out_dim: int):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden),
            nn.ReLU(),
            nn.Linear(hidden, out_dim),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.net(x)
Call model.train() / model.eval() so dropout & batch norm behave correctly. Save checkpoints with torch.save(model.state_dict(), ...).
03

Training loop & DataLoader

Standard supervised loop

Mini-batches, loss, backward, step

python
model = MLP(in_dim, hidden, num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

for epoch in range(epochs):
    model.train()
    for xb, yb in train_loader:
        xb, yb = xb.to(device), yb.to(device)
        optimizer.zero_grad(set_to_none=True)
        logits = model(xb)
        loss = criterion(logits, yb)
        loss.backward()
        optimizer.step()
04

Theory — what to articulate in interviews

📚

Loss, optimization & generalization

Maps directly to knobs in PyTorch

TopicShort ideaPyTorch hook
Empirical riskTrain loss approximates expected loss over the data distributionCrossEntropyLoss, MSELoss
SGD / AdamStochastic estimates of the gradient; Adam adapts per-parameter stepstorch.optim.*
OverfittingLow train error, high val error — memorizationDropout, weight decay, more data, simpler model
RegularizationAdd penalty or noise so weights stay small / robustweight_decay, dropout, early stopping
Learning rateToo large: unstable; too small: slowSchedulers, warmup (see docs), monitor val loss
For production, you also care about latency, numerical stability (mixed precision with torch.cuda.amp), and reproducibility (torch.manual_seed, DataLoader workers).
Related pages
Data & ML (NumPy / sklearn) · → FastAPI & engineering (serve checkpoints) · Hub