1. Bayesian Learning — Big Picture
Bayesian learning treats model parameters as random variables and uses probability to represent uncertainty. Learning updates a prior belief about parameters θ to a posterior using observed data D via Bayes' rule:
p(θ | D) = p(D | θ) p(θ) / p(D)
- p(θ): prior (what you believed before seeing data).
- p(D|θ): likelihood (how probable the observed data is under θ).
- p(θ|D): posterior (updated belief).
- p(D): evidence (normalizing constant).
2. Six Topics / Models Covered
1) Naïve Bayes (general)
Assumes features are conditionally independent given class: p(y|x) ∝ p(y) ∏ p(xᵢ|y). Fast, works well for text.
2) Multinomial Naïve Bayes
Used for count data (bag-of-words). Likelihood from word counts per class (Laplace smoothing often applied).
3) Bernoulli Naïve Bayes
Binary features (word present/absent). Useful when only occurrence matters.
4) Gaussian Naïve Bayes
Continuous features modeled as Gaussians per class: p(xᵢ|y=c) = N(μ_{c,i}, σ_{c,i}²).
5) Bayesian Networks (Directed Acyclic Graphs)
Represent conditional independence with a DAG. Joint factorizes as ∏ p(Xᵢ | Parents(Xᵢ)). Support structured reasoning and causal models (with care).
6) Bayesian Linear Regression & MAP
Place priors on weights (e.g., Gaussian). Posterior over weights is Gaussian (conjugacy). MAP estimate blends prior and likelihood — equivalent to regularized regression (Ridge).
3. MLE vs MAP vs Full Bayesian
- MLE (Maximum Likelihood): choose θ that maximizes p(D|θ). No prior used.
- MAP (Maximum A Posteriori): choose θ that maximizes p(θ|D) ∝ p(D|θ)p(θ). Prior acts as regularizer.
- Full Bayesian: keep the entire posterior distribution p(θ|D) — enables uncertainty quantification and predictive distribution by integrating over θ.
4. Bayesian Networks (BNs)
BNs encode conditional independencies with a DAG. They are powerful for modeling structured domains (medical diagnosis, fault trees). Inference can be done via variable elimination, belief propagation, or sampling (MCMC).
5. Applications
- Spam detection (Naïve Bayes) — simple and effective for text.
- Medical diagnosis (Bayesian networks capture symptom-disease relations).
- Probabilistic calibration and uncertainty-aware predictions (Bayesian regression).
- Hyperparameter tuning via Bayesian optimization.
6. Short Python Examples (sketch)
# Gaussian Naive Bayes (scikit-learn)
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
pred = model.predict(X_test)
# Multinomial Naive Bayes for text
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
X = CountVectorizer().fit_transform(documents)
clf = MultinomialNB(alpha=1.0) # Laplace smoothing
clf.fit(X_train, y_train)
Next — interactive Naïve Bayes posterior calculator: enter classes, priors, and likelihoods (categorical) and compute posteriors.