Multiple Linear Regression
Predict outcomes using multiple features simultaneously. Learn how combining multiple inputs creates more accurate and nuanced predictions.
From One to Many: The Power of Multiple Features
Evolution from Simple Linear
While simple linear regression uses one feature (like house size) to predict an outcome (like price), multiple linear regression combines many features for more accurate predictions.
Simple vs Multiple:
Real-World Example
Predicting house prices becomes much more accurate when you consider:
- Square footage ()
- Number of bedrooms ()
- Location quality score ()
- Age of house ()
- Garage spaces ()
Each feature contributes to the final prediction with its own weight.
Mathematical Foundation
The Equation
Or in vector notation (more compact):
Components:
- = predicted value
- = bias (base prediction)
- = weight for feature
- = value of feature
Interpretation:
Each weight tells you how much the prediction changes when feature increases by 1, holding all other features constant. This is the “partial effect” of that feature.
Geometric Interpretation
While simple linear regression fits a line through 2D space, multiple linear regression fits a hyperplane through n-dimensional space:
Fits a plane in 3D space
Fits a hyperplane in 4D space
Fits a hyperplane in (n+1)D space
Key Concepts & Challenges
Feature Independence
Multiple linear regression assumes features are independent. When features are correlated (multicollinearity), it becomes hard to isolate each feature's effect.
Example of Multicollinearity:
- House size and number of rooms (highly correlated)
- Age and mileage in cars (usually correlated)
- Height and weight in people (often correlated)
Feature Scaling
When features have different scales (e.g., age in years vs income in dollars), you need to normalize them for stable training.
Common Scaling Methods:
- Standardization:
- Min-Max:
- Robust: Uses median and IQR
Curse of Dimensionality
As you add more features, you need exponentially more data to maintain the same prediction quality:
- 10 features → need ~100 samples minimum
- 100 features → need ~1,000 samples minimum
- 1,000 features → need ~10,000 samples minimum
Rule of thumb: At least 10-20 samples per feature
Feature Selection
Not all features improve predictions. Some add noise:
- Forward selection: Add features one by one
- Backward elimination: Remove features one by one
- L1 regularization: Automatically zeros out weak features
Training Process
Cost Function
Same as simple linear regression, but now summing over all features:
Where
Optimization Methods
Normal Equation
- Direct solution
- No iterations needed
- Slow for many features (>10,000)
Gradient Descent
Update rules:
- Scales to large datasets
- Memory efficient
- Requires learning rate tuning
Interactive Playground
Experiment with multiple linear regression using real or synthetic multi-feature datasets. Notice how different features contribute to the final prediction.
Parameters
Data & Regression Line
Model Performance
Run regression to see metrics
When to Use Multiple Linear Regression
Perfect For:
- Predicting continuous values with multiple relevant features
- Understanding feature importance and relationships
- Business metrics (sales, revenue, costs) with multiple drivers
- Scientific measurements with multiple variables
- Real estate pricing, demand forecasting, risk assessment
Consider Alternatives When:
- Features have complex non-linear relationships → Try polynomial regression
- Too many features relative to samples → Use regularization
- Features are highly correlated → Consider PCA or feature selection
- Categorical target variable → Use logistic regression
- Very complex patterns → Try neural networks or tree-based models
Practical Tips
Feature Engineering
- Create interaction terms (x₁ × x₂)
- Add domain-specific features
- Transform skewed features (log, sqrt)
Validation Strategy
- Use cross-validation for small datasets
- Check residual plots for patterns
- Test for multicollinearity (VIF)
Interpretation
- Standardize features for comparing weights
- Check confidence intervals
- Consider partial dependence plots
Remember: More features isn't always better. Start simple, add complexity gradually, and always validate that each new feature actually improves out-of-sample performance.