Newton's Method: Finding Extrima of a Function - Part 2

📌
For more content check out the Circuit of Knowledge.

Introduction

Newton's method, also known as the Newton-Raphson method, is an iterative algorithm used to find the roots of a real-valued function. It can also be extended to find the minimum or maximum of a function. In this tutorial, we will explore the mathematical concepts behind Newton's method and provide Python code examples for both root-finding and optimization problems.

Finding Roots

Consider a real-valued function f(x)f(x). The goal is to find a value $x^$ such that $f(x^) = 0$. Newton's method starts with an initial guess $x_0$ and iteratively updates it using the following formula:

xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

where f(xn)f'(x_n) is the derivative of f(x)f(x) evaluated at xnx_n.

Example: Single Variable Function

Let's find the root of the function f(x)=x32x5f(x) = x^3 - 2x - 5.

def f(x):
    return x**3 - 2*x - 5

def f_prime(x):
    return 3*x**2 - 2

def newton_method(f, f_prime, x0, tolerance=1e-6, max_iterations=100):
    x = x0
    for i in range(max_iterations):
        fx = f(x)
        if abs(fx) < tolerance:
            return x
        x = x - fx / f_prime(x)
    return x

x0 = 2  # Initial guess
root = newton_method(f, f_prime, x0)
print(f"The root is approximately {root:.6f}")

Output:

The root is approximately 2.094551

Optimization

Newton's method can be extended to find the minimum or maximum of a function. In this case, we iteratively update the point $x_n$ using the following formula:

xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}

where f(xn)f'(x_n) and f(xn)f''(x_n) are the first and second derivatives of f(x)f(x) evaluated at xnx_n.

Multivariate Optimization

For a multivariate function f(x1,x2,,xn)f(x_1, x_2, \ldots, x_n), the gradient vector f(x)\nabla f(x) and the Hessian matrix H(x)H(x) are defined as follows:

Gradient vector: f(x)=[fx1fx2fxn]\nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}

Hessian matrix: H(x)=[2fx122fx1x22fx1xn2fx2x12fx222fx2xn2fxnx12fxnx22fxn2]H(x) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}

Newton's method for optimization in the multivariate case iteratively updates the point $x_n$ using the following formula:

xn+1=xnH1(xn)f(xn)x_{n+1} = x_n - H^{-1}(x_n) \nabla f(x_n)

where H1(xn)H^{-1}(x_n) is the inverse of the Hessian matrix evaluated at xnx_n.

Note that the Hessian matrix is symmetric, i.e., 2fxixj=2fxjxi\frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{\partial^2 f}{\partial x_j \partial x_i} for all ii and jj.

The algorithm for Newton's method in the multivariate case can be summarized as follows:

  1. Choose an initial point x0x_0.
  2. Repeat until convergence:
    • Compute the gradient vector f(xn)\nabla f(x_n) and the Hessian matrix H(xn)H(x_n) at the current point xnx_n.
    • Compute the search direction dn=H1(xn)f(xn)d_n = -H^{-1}(x_n) \nabla f(x_n).
    • Update the point: xn+1=xn+dnx_{n+1} = x_n + d_n.
  3. Return the final point xnx_n as the approximate minimum or maximum.

The convergence of Newton's method in the multivariate case depends on the properties of the function ff and the initial point x0x_0. If the function is convex (for minimization) or concave (for maximization) and the Hessian matrix is positive definite (for minimization) or negative definite (for maximization) in the neighborhood of the solution, Newton's method typically converges quadratically to the optimal point.

Example: Single Variable Function

Let's find the minimum of the function f(x)=x44x3+4x2f(x) = x^4 - 4x^3 + 4x^2.

def f(x):
    return x**4 - 4*x**3 + 4*x**2

def f_prime(x):
    return 4*x**3 - 12*x**2 + 8*x

def f_double_prime(x):
    return 12*x**2 - 24*x + 8

def newton_method_optimization(f, f_prime, f_double_prime, x0, tolerance=1e-6, max_iterations=100):
    x = x0
    for i in range(max_iterations):
        fx_prime = f_prime(x)
        if abs(fx_prime) < tolerance:
            return x
        x = x - fx_prime / f_double_prime(x)
    return x

x0 = .5  # Initial guess
minimum = newton_method_optimization(f, f_prime, f_double_prime, x0)
print(f"The minimum is approximately {minimum:.6f}")

Output:

The minimum is approximately 2.000000

Note that f(x)=x44x3+4x2f(x) = x^4 - 4x^3 + 4x^2 has to minima at x=0,2x=0,2 and one maximum at x=1x=1. Try to change the initial condition x0x_0 in the code to start near zero and at -.5 and watch the results. Global minimum is not a given if the function is not convex.

Example: Two Variable Function

Let's find the minimum of the function f(x,y)=x2+y22x4y+3f(x, y) = x^2 + y^2 - 2x - 4y + 3.

import numpy as np

def f(x, y):
    return x**2 + y**2 - 2*x - 4*y + 3

def f_gradient(x, y):
    return np.array([2*x - 2, 2*y - 4])

def f_hessian(x, y):
    return np.array([[2, 0], [0, 2]])

def newton_method_optimization_2d(f, f_gradient, f_hessian, x0, y0, tolerance=1e-6, max_iterations=100):
    x, y = x0, y0
    for i in range(max_iterations):
        grad = f_gradient(x, y)
        if np.linalg.norm(grad) < tolerance:
            return x, y
        hessian = f_hessian(x, y)
        direction = np.linalg.solve(hessian, -grad)
        x += direction[0]
        y += direction[1]
    return x, y

x0, y0 = 0, 0  # Initial guess
minimum_x, minimum_y = newton_method_optimization_2d(f, f_gradient, f_hessian, x0, y0)
print(f"The minimum is approximately ({minimum_x:.6f}, {minimum_y:.6f})")

Output:

The minimum is approximately (1.000000, 2.000000)

In the two-variable example, we use the gradient vector and the Hessian matrix to update the point (x,y)(x, y) in each iteration. The gradient vector represents the first-order partial derivatives, and the Hessian matrix represents the second-order partial derivatives of the function.

Conclusion

Newton's method is a powerful iterative algorithm for finding roots and optimizing functions. It leverages the first and second derivatives to converge quickly to the desired solution. However, it's important to note that the method requires the function to be differentiable and may not always converge if the initial guess is far from the solution or if the function has certain properties (e.g., non-convexity).

This tutorial provided a mathematical overview of Newton's method and demonstrated its implementation in Python for both root-finding and optimization problems. By understanding the concepts and adapting the code examples, you can apply Newton's method to solve various mathematical problems efficiently.