A kernel function k(x, y) computes an inner product between feature-space mappings of x and y without explicitly computing the mapping:
k(x, y) = <φ(x), φ(y)>
The key benefit (the kernel trick) is that many learning algorithms (SVM, kernel ridge, kernel PCA) only need inner products; a kernel lets us operate in a high (possibly infinite) dimensional feature space implicitly and efficiently.
Common kernel functions
Kernel
Formula
Notes
Linear
k(x,y)=x·y
Equivalent to no mapping; fast, good baseline.
Polynomial
k(x,y)=(α x·y + c)d
Allows interactions up to degree d.
RBF / Gaussian
k(x,y)=exp(-||x-y||² / (2σ²))
Infinite-dimensional; locally sensitive; popular.
Sigmoid
k(x,y)=tanh(α x·y + c)
Linked to neural nets; not always positive-definite.
Properties required for valid kernels
Symmetry: k(x,y)=k(y,x).
Positive semidefiniteness: For any finite set {x₁,...,x_n}, the kernel matrix K with Kᵢⱼ=k(xᵢ,xⱼ) must be positive semidefinite (all eigenvalues ≥0). This is Mercer's condition.
Closure properties: Sums, products, and limits of valid kernels are valid kernels.
Example: Polynomial kernel (simple)
Let x=[x₁,x₂], y=[y₁,y₂], choose d=2, c=1:
k(x,y) = (x·y + 1)² = (x₁y₁ + x₂y₂ + 1)²
This equals the dot product in a 6-dimensional feature space consisting of squared and cross terms — but we avoid computing φ(x) explicitly.
Machine learning usage
SVM: Replace x·y by k(x,y) in the dual formulation to learn nonlinear boundaries.
Kernel PCA: Compute principal components from the kernel matrix to perform nonlinear dimensionality reduction.
Kernel ridge regression: Solve ridge regression in feature space via kernel matrices.
Part B — Norms in Linear Algebra
A norm is a function that assigns a non-negative length or size to vectors (and matrices). Norms satisfy: positivity, scalability, and triangle inequality.