Polynomial Regression

Capture curved relationships in your data. Learn how polynomial features transform linear regression into a powerful tool for modeling non-linear patterns.

Beyond Straight Lines: Modeling Curves

The Power of Polynomials

Real-world relationships are rarely perfectly linear. Polynomial regression lets you fit curves to your data while still using the linear regression framework you already understand.

Evolution of Complexity:

Linear: y=w0+w1xy = w_0 + w_1x
Quadratic: y=w0+w1x+w2x2y = w_0 + w_1x + w_2x^2
Cubic: y=w0+w1x+w2x2+w3x3y = w_0 + w_1x + w_2x^2 + w_3x^3
Degree n: y=i=0nwixiy = \sum_{i=0}^{n} w_i x^i

Real-World Applications

Polynomial regression excels at modeling phenomena with natural curves:

  • Physics: Projectile motion, acceleration curves
    Example: h(t)=12gt2+v0t+h0h(t) = -\frac{1}{2}gt^2 + v_0t + h_0
  • Economics: Supply-demand curves, diminishing returns
    Revenue often follows quadratic patterns
  • Biology: Growth curves, dose-response relationships
    Population growth, enzyme kinetics

How Polynomial Regression Works

The Clever Trick: Feature Engineering

Polynomial regression is still linear regression! The trick is creating new features that are powers of your original features:

Original Feature:

Single feature xx

Example: temperature = 25°C

Polynomial Features:

  • xx = 25
  • x2x^2 = 625
  • x3x^3 = 15,625
Now we use multiple linear regression on these engineered features! The model is still linear in the parameters (weights), just not in the original feature.

Mathematical Formulation

Single Variable

y=w0+w1x+w2x2+...+wnxny = w_0 + w_1x + w_2x^2 + ... + w_nx^n

One input variable, multiple polynomial terms

Multiple Variables

y=w0+iwixi+i,jwijxixj+...y = w_0 + \sum_{i} w_i x_i + \sum_{i,j} w_{ij} x_i x_j + ...

Includes interaction terms and powers

Choosing the Right Polynomial Degree

Low Degree (1-2)

  • Simple, interpretable
  • Less prone to overfitting
  • Stable predictions
  • May underfit complex patterns

Medium Degree (3-4)

  • Captures moderate complexity
  • Good balance
  • Needs more data
  • Less interpretable

High Degree (5+)

  • Fits complex patterns
  • Prone to overfitting
  • Unstable at edges
  • Hard to interpret

The Bias-Variance Tradeoff

📉
Underfitting

Degree too low

High bias, low variance

Just Right

Optimal degree

Balanced bias-variance

📈
Overfitting

Degree too high

Low bias, high variance

Common Pitfalls & Solutions

Pitfall: Runge's Phenomenon

High-degree polynomials oscillate wildly at the edges of your data range.

Solution:

  • Use lower degree polynomials
  • Try splines instead for local smoothness
  • Add regularization (Ridge/Lasso)

Pitfall: Numerical Instability

Large powers of x (like x¹⁰) can cause numerical overflow or underflow.

Solution:

  • Always scale/normalize features first
  • Use orthogonal polynomials
  • Center data around zero

Pitfall: Extrapolation Danger

Polynomials explode outside the training data range.

Solution:

  • Never extrapolate far from training range
  • Set prediction bounds
  • Use domain knowledge for constraints

Pitfall: Feature Explosion

With multiple variables, polynomial features grow exponentially.

Solution:

  • Limit interaction terms
  • Use feature selection
  • Apply L1 regularization for sparsity

Interactive Playground

Experiment with polynomial regression. Try different polynomial degrees and see how they affect the fit. Notice how higher degrees can overfit, especially with noisy data.

Tip: Start with degree 2 (quadratic) and increase gradually. Watch the training vs test error!

Parameters

|||||
10%20%30%40%50%

Data & Regression Line

Model Performance

Run regression to see metrics

Model Selection Guide

ScenarioRecommended ApproachDegreeNotes
Simple curve, lots of dataStandard polynomial2-3Good starting point
Complex curve, limited dataPolynomial + Ridge regularization3-4Prevents overfitting
Multiple peaks/valleysSplines or piecewise polynomial3 per pieceBetter local control
Periodic patternsFourier features-Sin/cos basis functions
Unknown complexityCross-validation to select degree1-5Let data decide

When to Use Polynomial Regression

Ideal For:

  • Known curved relationships (physics, chemistry)
  • Growth/decay curves (exponential-like patterns)
  • Optimization problems (finding maxima/minima)
  • Time series with trends
  • Dose-response relationships

Avoid When:

  • Data is truly linear → Use simple linear regression
  • Very high-dimensional data → Consider regularized methods
  • Need to extrapolate → Use domain-specific models
  • Discontinuous relationships → Try tree-based models
  • Limited data → Risk of overfitting

Practical Implementation Tips

Feature Preprocessing

  1. 1Center and scale your features
  2. 2Generate polynomial features
  3. 3Remove highly correlated features
  4. 4Apply regularization if needed

Validation Strategy

  • Use cross-validation for degree selection
  • Plot learning curves to detect overfitting
  • Check residual plots for patterns
  • Test on holdout data

Code Example (Python)

polynomial_regression.pypython
# Import libraries
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

# Create polynomial regression pipeline
poly_model = Pipeline([
    ('poly', PolynomialFeatures(degree=3)),
    ('linear', LinearRegression())
])

# Fit and predict
poly_model.fit(X_train, y_train)
predictions = poly_model.predict(X_test)

Remember: Polynomial regression is a powerful tool, but with great power comes great responsibility. Always validate your model thoroughly and be cautious about interpreting high-degree polynomial coefficients.