Lecture 27 — Bayesian Learning

Lecture 27 — Bayesian Learning

Principles • Naïve Bayes variants • Bayesian networks • Bayesian regression • MAP vs MLE • Practical demo

1. Bayesian Learning — Big Picture

Bayesian learning treats model parameters as random variables and uses probability to represent uncertainty. Learning updates a prior belief about parameters θ to a posterior using observed data D via Bayes' rule:

p(θ | D) = p(D | θ) p(θ) / p(D)

- p(θ): prior (what you believed before seeing data).
- p(D|θ): likelihood (how probable the observed data is under θ).
- p(θ|D): posterior (updated belief).
- p(D): evidence (normalizing constant).

Example (coin toss): prior Beta(α,β) over coin bias θ. Observing heads/tails updates α,β; posterior is Beta(α+heads, β+tails).

2. Six Topics / Models Covered

1) Naïve Bayes (general)

Assumes features are conditionally independent given class: p(y|x) ∝ p(y) ∏ p(xᵢ|y). Fast, works well for text.

2) Multinomial Naïve Bayes

Used for count data (bag-of-words). Likelihood from word counts per class (Laplace smoothing often applied).

3) Bernoulli Naïve Bayes

Binary features (word present/absent). Useful when only occurrence matters.

4) Gaussian Naïve Bayes

Continuous features modeled as Gaussians per class: p(xᵢ|y=c) = N(μ_{c,i}, σ_{c,i}²).

5) Bayesian Networks (Directed Acyclic Graphs)

Represent conditional independence with a DAG. Joint factorizes as ∏ p(Xᵢ | Parents(Xᵢ)). Support structured reasoning and causal models (with care).

6) Bayesian Linear Regression & MAP

Place priors on weights (e.g., Gaussian). Posterior over weights is Gaussian (conjugacy). MAP estimate blends prior and likelihood — equivalent to regularized regression (Ridge).

3. MLE vs MAP vs Full Bayesian

  • MLE (Maximum Likelihood): choose θ that maximizes p(D|θ). No prior used.
  • MAP (Maximum A Posteriori): choose θ that maximizes p(θ|D) ∝ p(D|θ)p(θ). Prior acts as regularizer.
  • Full Bayesian: keep the entire posterior distribution p(θ|D) — enables uncertainty quantification and predictive distribution by integrating over θ.
Example: Gaussian likelihood + Gaussian prior → closed-form posterior (conjugacy). MAP with Gaussian prior = ridge regression.

4. Bayesian Networks (BNs)

BNs encode conditional independencies with a DAG. They are powerful for modeling structured domains (medical diagnosis, fault trees). Inference can be done via variable elimination, belief propagation, or sampling (MCMC).

5. Applications

  • Spam detection (Naïve Bayes) — simple and effective for text.
  • Medical diagnosis (Bayesian networks capture symptom-disease relations).
  • Probabilistic calibration and uncertainty-aware predictions (Bayesian regression).
  • Hyperparameter tuning via Bayesian optimization.

6. Short Python Examples (sketch)

# Gaussian Naive Bayes (scikit-learn)
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
pred = model.predict(X_test)

# Multinomial Naive Bayes for text
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
X = CountVectorizer().fit_transform(documents)
clf = MultinomialNB(alpha=1.0)  # Laplace smoothing
clf.fit(X_train, y_train)
        

Next — interactive Naïve Bayes posterior calculator: enter classes, priors, and likelihoods (categorical) and compute posteriors.