Tutorial: The Gauss-Newton Method (with Python Examples)

Tutorial: The Gauss-Newton Method (with Python Examples)

📌
For more content check out the Circuit of Knowledge.

Introduction:

The Gauss-Newton method is an iterative optimization algorithm used to solve non-linear least squares problems. It is particularly useful for curve fitting, where the goal is to find the best-fit parameters of a model function to a given set of data points. In this tutorial, we will explore the Gauss-Newton method, its mathematical derivation, and its implementation in Python with and without using optimization libraries.

Problem Statement:

Let's consider a real-life problem of predicting the temperature of a city based on the day of the year. We have collected temperature data for various days throughout the year and want to fit a curve to this data to make predictions. The temperature model can be represented as a sine function:

T(d)=asin(b(dc))+eT(d) = a \sin(b (d - c)) + e

Where:

  • T(d)T(d) is the temperature on day dd
  • aa is the amplitude of the sine function
  • bb is the frequency of the sine function
  • cc is the phase shift of the sine function
  • ee is the vertical shift of the sine function

Our goal is to find the optimal values of the parameters aa, bb, cc, and ee that minimize the sum of squared residuals between the observed temperatures and the predicted temperatures.

Mathematical Derivation:

The Gauss-Newton method is based on the idea of linearizing the non-linear model function around the current estimate of parameters and iteratively updating the parameters to minimize the sum of squared residuals.

Let's denote the residual function as:

ri=TiT(di,p)r_i = T_i - T(d_i, \mathbf{p})

where TiT_i is the observed temperature on day did_i, T(di,p)T(d_i, \mathbf{p}) is the predicted temperature using the current parameter vector p=[a,b,c,e]T\mathbf{p} = [a, b, c, e]^T, and rir_i is the residual.

The sum of squared residuals is given by:

S(p)=i=1nri2=i=1n(TiT(di,p))2S(\mathbf{p}) = \sum_{i=1}^{n} r_i^2 = \sum_{i=1}^{n} (T_i - T(d_i, \mathbf{p}))^2

To minimize S(p)S(\mathbf{p}), the Gauss-Newton method linearizes the model function using a first-order Taylor expansion around the current parameter estimate pk\mathbf{p}_k:

T(di,p)T(di,pk)+Ji(pk)(ppk)T(d_i, \mathbf{p}) \approx T(d_i, \mathbf{p}_k) + \mathbf{J}_i(\mathbf{p}_k)(\mathbf{p} - \mathbf{p}_k)

where Ji(pk)\mathbf{J}_i(\mathbf{p}_k) is the Jacobian matrix evaluated at pk\mathbf{p}_k.

The Jacobian matrix for the sine function model is:

Ji(pk)=[sin(bk(dick))ak(dick)cos(bk(dick))akbkcos(bk(dick))1]\mathbf{J}_i(\mathbf{p}_k) = \begin{bmatrix} \sin(b_k(d_i - c_k)) & a_k(d_i - c_k)\cos(b_k(d_i - c_k)) & -a_kb_k\cos(b_k(d_i - c_k)) & 1 \end{bmatrix}

By substituting the linearized model into the sum of squared residuals and setting the derivative with respect to p\mathbf{p} to zero, we obtain the normal equations:

JTJΔp=JTr\mathbf{J}^T\mathbf{J}\Delta\mathbf{p} = \mathbf{J}^T\mathbf{r}

where J\mathbf{J} is the Jacobian matrix stacked vertically for all data points, r\mathbf{r} is the vector of residuals, and Δp\Delta\mathbf{p} is the update to the parameter vector.

The parameter update is then calculated as:

Δp=(JTJ)1JTr\Delta\mathbf{p} = (\mathbf{J}^T\mathbf{J})^{-1}\mathbf{J}^T\mathbf{r}

The Gauss-Newton method iteratively updates the parameters using:

pk+1=pk+Δp\mathbf{p}_{k+1} = \mathbf{p}_k + \Delta\mathbf{p}

until convergence or a maximum number of iterations is reached.

Python Implementation without Libraries:

Here's the Python code that implements the Gauss-Newton method from scratch:

import numpy as np
import matplotlib.pyplot as plt

# Define the sine function
def sine_func(params, d):
    a, b, c, e = params
    return a * np.sin(b * (d - c)) + e

# Define the Jacobian matrix
def jacobian(params, d):
    a, b, c, e = params
    J = np.zeros((len(d), 4))
    J[:, 0] = np.sin(b * (d - c))
    J[:, 1] = a * (d - c) * np.cos(b * (d - c))
    J[:, 2] = -a * b * np.cos(b * (d - c))
    J[:, 3] = 1
    return J

# Gauss-Newton method
def gauss_newton(params, d, temp, max_iter=100, tol=1e-6):
    for _ in range(max_iter):
        residuals = temp - sine_func(params, d)
        J = jacobian(params, d)
        delta = np.linalg.pinv(J.T @ J) @ J.T @ residuals
        params += delta
        if np.linalg.norm(delta) < tol:
            break
    return params

# Example data
days = np.array([0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330])
temperatures = np.array([5, 10, 20, 25, 30, 35, 40, 35, 25, 20, 10, 5])

# Initial guess for parameters
initial_params = np.array([20, 0.02, 90, 20])

# Perform Gauss-Newton optimization
optimized_params = gauss_newton(initial_params, days, temperatures)

# Generate points for the optimized curve
days_plot = np.linspace(0, 365, 365)
temperatures_plot = sine_func(optimized_params, days_plot)

# Plot the data points and optimized curve
plt.figure(figsize=(10, 6))
plt.scatter(days, temperatures, color='red', label='Data Points')
plt.plot(days_plot, temperatures_plot, color='blue', label='Optimized Curve')
plt.xlabel('Day of the Year')
plt.ylabel('Temperature')
plt.legend()
plt.show()

print("Optimized Parameters:")
print("a =", optimized_params[0])
print("b =", optimized_params[1])
print("c =", optimized_params[2])
print("e =", optimized_params[3])
Optimized Parameters:
a = 17.214381454450546
b = 0.01595915001087644
c = 69.05945422111704
e = 20.060318747383874
image

Python Implementation using Libraries:

Here's the Python code that utilizes the scipy.optimize library to perform the Gauss-Newton optimization:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import least_squares

# Define the sine function
def sine_func(params, d):
    a, b, c, e = params
    return a * np.sin(b * (d - c)) + e

# Define the residual function
def residuals(params, d, temp):
    return temp - sine_func(params, d)

# Example data
days = np.array([0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330])
temperatures = np.array([5, 10, 20, 25, 30, 35, 40, 35, 25, 20, 10, 5])

# Initial guess for parameters
initial_params = np.array([20, 0.02, 90, 20])

# Perform Gauss-Newton optimization using least_squares (with Levenberg Marquardt Regularization)
result = least_squares(residuals, initial_params, args=(days, temperatures), method='lm')

# Extract the optimized parameters
optimized_params = result.x

# Generate points for the optimized curve
days_plot = np.linspace(0, 365, 365)
temperatures_plot = sine_func(optimized_params, days_plot)

# Plot the data points and optimized curve
plt.figure(figsize=(10, 6))
plt.scatter(days, temperatures, color='red', label='Data Points')
plt.plot(days_plot, temperatures_plot, color='blue', label='Optimized Curve')
plt.xlabel('Day of the Year')
plt.ylabel('Temperature')
plt.legend()
plt.show()

print("Optimized Parameters:")
print("a =", optimized_params[0])
print("b =", optimized_params[1])
print("c =", optimized_params[2])
print("e =", optimized_params[3])
Optimized Parameters:
a = 17.214372262870874
b = 0.015959161056508728
c = 69.0595361694195
e = 20.060332299073693
image

The least_squares function from scipy.optimize internally uses the Levenberg-Marquardt algorithm, which is a variant of the Gauss-Newton method. It automatically handles the calculation of the Jacobian matrix and the parameter updates.

Another example:

In this example, we will use the Gauss-Newton method to fit a nonlinear function to a given dataset.

Problem Statement:

Suppose we have a dataset containing the population of a city at different years. The relationship between the population and time is known to follow a non-linear logistic growth model:

P(t)=K1+AertP(t) = \frac{K}{1 + Ae^{-rt}}

where P(t)P(t) is the population at time tt, KK is the carrying capacity (maximum population), AA is a constant related to the initial population, and rr is the growth rate. Our goal is to estimate the parameters KK, AA, and rr that best fit the given data using the Gauss-Newton method.

Mathematical Derivation:

The Gauss-Newton method aims to minimize the sum of squared residuals between the observed population values and the predicted population values using the non-linear model. The residual for the i-th data point is given by:

ri=PiP(ti,p)r_i = P_i - P(t_i, \mathbf{p})

where PiP_i is the observed population at time tit_i, P(ti,p)P(t_i, \mathbf{p}) is the predicted population using the current parameter vector p=[K,A,r]T\mathbf{p} = [K, A, r]^T, and rir_i is the residual.

The sum of squared residuals is given by:

S(p)=i=1nri2=i=1n(PiP(ti,p))2S(\mathbf{p}) = \sum_{i=1}^{n} r_i^2 = \sum_{i=1}^{n} (P_i - P(t_i, \mathbf{p}))^2

To minimize S(p)S(\mathbf{p}) using the Gauss-Newton method, we calculate the Jacobian matrix J\mathbf{J} of the residuals with respect to the parameters. The Jacobian matrix for the logistic growth model is:

Ji(p)=[1(1+Aerti)2Kerti(1+Aerti)2KAterti(1+Aerti)2]\mathbf{J}_i(\mathbf{p}) = \begin{bmatrix} \frac{1}{(1 + Ae^{-rt_i})^2} & -\frac{Ke^{-rt_i}}{(1 + Ae^{-rt_i})^2} & \frac{KAte^{-rt_i}}{(1 + Ae^{-rt_i})^2} \end{bmatrix}

The Gauss-Newton update step is then calculated as:

Δp=(JTJ)1JTr\Delta\mathbf{p} = (\mathbf{J}^T\mathbf{J})^{-1}\mathbf{J}^T\mathbf{r}

The parameters are updated iteratively using:

pk+1=pk+Δp\mathbf{p}_{k+1} = \mathbf{p}_k + \Delta\mathbf{p} until convergence or a maximum number of iterations is reached.

Python Implementation:

Here's the Python code that implements the Gauss-Newton method for the non-linear least squares problem:

import numpy as np
import matplotlib.pyplot as plt

# Logistic growth model
def logistic_growth(params, t):
    K, A, r = params
    return K / (1 + A * np.exp(-r * t))

# Jacobian matrix of the logistic growth model
def jacobian(params, t):
    K, A, r = params
    J = np.zeros((len(t), 3))
    J[:, 0] = 1 / (1 + A * np.exp(-r * t))**2
    J[:, 1] = -K * np.exp(-r * t) / (1 + A * np.exp(-r * t))**2
    J[:, 2] = K * A * t * np.exp(-r * t) / (1 + A * np.exp(-r * t))**2
    return J

# Gauss-Newton method for non-linear least squares
def gauss_newton(params, t, P, max_iter=100, tol=1e-6):
    for _ in range(max_iter):
        residuals = P - logistic_growth(params, t)
        J = jacobian(params, t)
        delta = np.linalg.pinv(J.T @ J) @ J.T @ residuals
        params += delta
        if np.linalg.norm(delta) < tol:
            break
    return params

# Example data
years = np.array([1900, 1920, 1940, 1960, 1980, 2000])
population = np.array([10000, 15000, 30000, 60000, 90000, 120000])

# Initial guess for parameters
initial_params = np.array([150000, 10, 0.02])

# Perform non-linear least squares using Gauss-Newton method
optimized_params = gauss_newton(initial_params, years - 1900, population)

# Generate points for the optimized curve
years_plot = np.linspace(0, 100, 100) + 1900
population_plot = logistic_growth(optimized_params, years_plot - 1900)

# Plot the data points and optimized curve
plt.figure(figsize=(8, 6))
plt.scatter(years, population, label='Data Points', color='red')
plt.plot(years_plot, population_plot, label='Optimized Curve', color='blue')
plt.xlabel('Year')
plt.ylabel('Population')
plt.title('Non-linear Least Squares Fit')
plt.legend()
plt.show()

print("Optimized Parameters:")
print("K =", optimized_params[0])
print("A =", optimized_params[1])
print("r =", optimized_params[2])
Optimized Parameters:
K = 160463.49627790874
A = 20.848212649774148
r = 0.041219431657605395
Gauss-Newton Algorithm, fitting a logistic population growth
Gauss-Newton Algorithm, fitting a logistic population growth

In this example, we have a dataset containing the population of a city at different years. The relationship between the population and time follows a non-linear logistic growth model.

The code defines the logistic growth model function logistic_growth and the corresponding Jacobian matrix function jacobian. The gauss_newton function performs the parameter estimation using the Gauss-Newton method.

After obtaining the optimized parameters, we generate the fitted curve using the logistic growth model and plot it along with the data points to visualize the non-linear least squares fit.

This example demonstrates how the Gauss-Newton method can be applied to a non-linear least squares problem in a data science context, specifically for fitting a logistic growth model to a population dataset.

I apologize for the previous confusion and hope this corrected example accurately illustrates the application of the Gauss-Newton method for non-linear least squares problems in data science. Let me know if you have any further questions!

Conclusion:

In this tutorial, we explored the Gauss-Newton method for curve fitting. We derived the mathematical formulation of the method and implemented it in Python, both from scratch and using the scipy.optimize library. The Gauss-Newton method is a powerful tool for solving non-linear least squares problems and can be applied to various curve fitting and parameter estimation tasks in real-life scenarios.

I hope this tutorial helps you understand the Gauss-Newton method and its implementation in Python! Let me know if you have any further questions.