Lecture 30 — Ensemble Methods & Model Evaluation

Lecture 30 — Ensemble Methods & Model Evaluation

Bagging • Random Forest • Boosting • Stacking • Voting • Practical tips & demo

1. Why use ensembles?

Ensemble methods combine multiple models to produce a single improved prediction. They typically reduce variance (bagging), reduce bias (boosting), or both (stacking). Ensembles are among the most effective techniques for tabular data and many competitions.

2. Six common ensemble techniques

1) Bagging (Bootstrap Aggregating)

Train multiple base models on different bootstrap samples (sampling with replacement) and average/vote their predictions. Reduces variance. Example: bagged decision trees.

2) Random Forest

Bagging applied to decision trees with additional random feature selection at each split. Highly robust, less prone to overfitting than single trees, provides feature importance.

3) AdaBoost

Boosting algorithm that sequentially trains weak learners (often stumps) and reweights misclassified examples. Emphasizes hard examples and reduces bias.

4) Gradient Boosting Machines (GBM)

Build trees sequentially where each new tree fits the residuals (negative gradient) of the loss. Variants: XGBoost, LightGBM, CatBoost — highly performant on tabular data.

5) Stacking (Stacked Generalization)

Train diverse base models; then train a meta-learner on their predictions. Often yields gains by letting the meta-learner correct base models' errors.

6) Voting Ensembles

Combine predictions from multiple models by majority vote (classification) or average (regression). Can be hard or soft (using predicted probabilities for weighted averaging).

3. Practical examples & when to choose

Example: For a Kaggle-style tabular problem, try Random Forest or Gradient Boosted Trees. If models disagree, use stacking with a simple meta-learner (logistic regression) to blend predictions.

4. Key trade-offs & tips

  • Bias vs Variance: bagging reduces variance; boosting reduces bias.
  • Interpretability: ensembles (especially boosting) are less interpretable; use SHAP/partial dependence for explanations.
  • Overfitting: boosting can overfit if trees are too deep or learning rate too high; tune carefully.
  • Feature importance: Random Forest provides built-in importance; prefer permutation importance for robustness.
  • Computational cost: ensembles require more compute; use subsampling, early stopping, or smaller learners if needed.

5. Short Python sketches

# Random Forest
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=0)
rf.fit(X_train, y_train)

# Gradient Boosting (XGBoost sketch)
import xgboost as xgb
dtrain = xgb.DMatrix(X_train, label=y_train)
params = {'objective':'binary:logistic', 'eta':0.05, 'max_depth':6}
bst = xgb.train(params, dtrain, num_boost_round=300)

# Stacking (sklearn)
from sklearn.ensemble import StackingClassifier
estimators = [('rf', rf), ('gb', SomeGBM()), ('svc', SVC(probability=True))]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)

6. Evaluation & Diagnostics

  • Use cross-validation (preferably stratified for classification).
  • Monitor learning curves (train vs validation error) to detect over/underfitting.
  • For boosting, use validation set and early stopping to avoid overfitting.
  • Examine calibration (reliability) of predicted probabilities — use calibration plots or isotonic regression / Platt scaling.