Page 3 of 4 · ← Hub

PyTorch & deep learning core

Tensors with automatic differentiation, modular networks, and the standard training loop. Includes a compact theory refresher: loss, optimization, and generalization—what AI engineering interviews often probe alongside code.

torch.nn autograd DataLoader CUDA

Topics

→

Deploy next

Tensors & autograd

🔥

torch.Tensor, devices, gradients

Like NumPy with GPU and derivatives

▼

A tensor is a multi-dimensional array. Set requires_grad=True to track operations for reverse-mode autodiff (.backward()). Use device="cuda" when a GPU is available—keep tensors on one device to avoid silent copies.

python

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.linspace(-1, 1, steps=100, device=device, requires_grad=True)
y = (x * x).sum()
y.backward()
# x.grad holds ∂y/∂x

nn.Module & building blocks

🧠

Subclassing nn.Module

Layers register parameters automatically

▼

python

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dim: int, hidden: int, out_dim: int):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden),
            nn.ReLU(),
            nn.Linear(hidden, out_dim),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.net(x)

✓

Call model.train() / model.eval() so dropout & batch norm behave correctly. Save checkpoints with torch.save(model.state_dict(), ...).

Training loop & DataLoader

↻

Standard supervised loop

Mini-batches, loss, backward, step

▼

python

model = MLP(in_dim, hidden, num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

for epoch in range(epochs):
    model.train()
    for xb, yb in train_loader:
        xb, yb = xb.to(device), yb.to(device)
        optimizer.zero_grad(set_to_none=True)
        logits = model(xb)
        loss = criterion(logits, yb)
        loss.backward()
        optimizer.step()

Theory — what to articulate in interviews

📚

Loss, optimization & generalization

Maps directly to knobs in PyTorch

▼

Topic	Short idea	PyTorch hook
Empirical risk	Train loss approximates expected loss over the data distribution	`CrossEntropyLoss`, `MSELoss`
SGD / Adam	Stochastic estimates of the gradient; Adam adapts per-parameter steps	`torch.optim.*`
Overfitting	Low train error, high val error — memorization	Dropout, weight decay, more data, simpler model
Regularization	Add penalty or noise so weights stay small / robust	`weight_decay`, dropout, early stopping
Learning rate	Too large: unstable; too small: slow	Schedulers, warmup (see docs), monitor val loss

ℹ

For production, you also care about latency, numerical stability (mixed precision with torch.cuda.amp), and reproducibility (torch.manual_seed, DataLoader workers).

Related pages
Data & ML (NumPy / sklearn) · → FastAPI & engineering (serve checkpoints) · Hub