Lecture 30 — Ensemble Methods & Model Evaluation

1. Why use ensembles?

Ensemble methods combine multiple models to produce a single improved prediction. They typically reduce variance (bagging), reduce bias (boosting), or both (stacking). Ensembles are among the most effective techniques for tabular data and many competitions.

2. Six common ensemble techniques

1) Bagging (Bootstrap Aggregating)

Train multiple base models on different bootstrap samples (sampling with replacement) and average/vote their predictions. Reduces variance. Example: bagged decision trees.

2) Random Forest

Bagging applied to decision trees with additional random feature selection at each split. Highly robust, less prone to overfitting than single trees, provides feature importance.

3) AdaBoost

Boosting algorithm that sequentially trains weak learners (often stumps) and reweights misclassified examples. Emphasizes hard examples and reduces bias.

4) Gradient Boosting Machines (GBM)

Build trees sequentially where each new tree fits the residuals (negative gradient) of the loss. Variants: XGBoost, LightGBM, CatBoost — highly performant on tabular data.

5) Stacking (Stacked Generalization)

Train diverse base models; then train a meta-learner on their predictions. Often yields gains by letting the meta-learner correct base models' errors.

6) Voting Ensembles

Combine predictions from multiple models by majority vote (classification) or average (regression). Can be hard or soft (using predicted probabilities for weighted averaging).

3. Practical examples & when to choose

Example: For a Kaggle-style tabular problem, try Random Forest or Gradient Boosted Trees. If models disagree, use stacking with a simple meta-learner (logistic regression) to blend predictions.

4. Key trade-offs & tips

Bias vs Variance: bagging reduces variance; boosting reduces bias.
Interpretability: ensembles (especially boosting) are less interpretable; use SHAP/partial dependence for explanations.
Overfitting: boosting can overfit if trees are too deep or learning rate too high; tune carefully.
Feature importance: Random Forest provides built-in importance; prefer permutation importance for robustness.
Computational cost: ensembles require more compute; use subsampling, early stopping, or smaller learners if needed.

5. Short Python sketches

# Random Forest
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=0)
rf.fit(X_train, y_train)

# Gradient Boosting (XGBoost sketch)
import xgboost as xgb
dtrain = xgb.DMatrix(X_train, label=y_train)
params = {'objective':'binary:logistic', 'eta':0.05, 'max_depth':6}
bst = xgb.train(params, dtrain, num_boost_round=300)

# Stacking (sklearn)
from sklearn.ensemble import StackingClassifier
estimators = [('rf', rf), ('gb', SomeGBM()), ('svc', SVC(probability=True))]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)

6. Evaluation & Diagnostics

Use cross-validation (preferably stratified for classification).
Monitor learning curves (train vs validation error) to detect over/underfitting.
For boosting, use validation set and early stopping to avoid overfitting.
Examine calibration (reliability) of predicted probabilities — use calibration plots or isotonic regression / Platt scaling.

Interactive Ensemble Voting Demo

Enter predictions from up to 5 base models for one example (classification). Choose majority or weighted voting and compute final prediction and probability.

Classes (comma-separated) Model names (comma-separated, up to 5) Enter predictions for the observation (comma-separated, same order as models) If soft voting: enter model probabilities for first class (comma-separated, or leave empty) Voting type Weights (comma-separated, used if Weighted)

Tip: Soft voting uses predicted probabilities (prob of first class). If only hard preds are available, use hard voting.