👋

Wav-KAN: Wavelet Kolmogorov-Arnold Networks

📌
For more content check out the Circuit of Knowledge.

The paper "Wav-KAN: Wavelet Kolmogorov-Arnold Networks" by Zavareh Bozorgasl and Hao Chen [1]. introduces a novel neural network architecture called Wav-KAN that integrates wavelet functions within the Kolmogorov-Arnold Networks (KAN) framework to enhance interpretability and performance. The authors address the limitations of traditional multilayer perceptrons (MLPs) and recent advancements like Spl-KAN, which face challenges related to interpretability, training speed, robustness, computational efficiency, and performance.

Wav-KAN incorporates wavelet functions into the KAN structure, enabling the network to capture both high-frequency and low-frequency components of the input data efficiently. The wavelet-based approximations employ orthogonal or semi-orthogonal basis functions and maintain a balance between accurately representing the underlying data structure and avoiding overfitting to noise.

In KANs, the weights are replaced by learnable univariate functions placed on edges rather than nodes. The output of each layer is computed as:

x(l+1)=To(Ψl+1,l(Xl))x^{(l+1)} = T_o \left(\Psi^{l+1,l}(X^l)\right)

where Ψl+1,l\Psi^{l+1,l} represents the activation functions connecting layer ll and layer l+1l+1, and ToT_o is an operator that sums the elements of each row of the matrix. The complete formula for the entire KAN network with LL layers is:

fKAN(X0)=x(L)=To(ΨL,L1((To(ΨL1,L2(To(Ψ1,0(X0)))T)T)T))f_{KAN}(X^0) = x^{(L)} = T_o \left(\Psi^{L,L-1} \left( \left(T_o \left(\Psi^{L-1,L-2} \cdots \left(T_o \left(\Psi^{1,0}(X^0)\right)\right)^T\right)^T \cdots \right)^T\right)\right)

The paper also discusses the Continuous Wavelet Transform (CWT) and the criteria for a mother wavelet to be used as a basis for function approximation. The CWT of a signal/function is represented by wavelet coefficients, calculated as:

C(s,τ)=+g(t)1sψ(tτs)dtC(s, \tau) = \int_{-\infty}^{+\infty} g(t) \frac{1}{\sqrt{s}} \psi \left(\frac{t - \tau}{s}\right) dt

where g(t)g(t) is the signal/function, ψ(t)\psi(t) is the mother wavelet, ss is the scale factor, and τ\tau is the shift factor.

The authors compare Wav-KAN with Spl-KAN and MLPs, highlighting the advantages of Wav-KAN in terms of accuracy, interpretability, and training speed. They also present a comparison of the number of parameters required for each architecture, showing that Wav-KAN has fewer parameters than Spl-KAN while maintaining superior performance.

Simulation results on the MNIST dataset demonstrate that Wav-KAN achieves superior accuracy and faster training speeds compared to Spl-KAN. The authors evaluate the performance of different wavelet types, such as Mexican hat, Morlet, Derivative of Gaussian (DOG), and Shannon, and find that the choice of wavelet significantly impacts the performance of the KAN model. Mexican hat and DOG wavelets prove to be particularly effective at capturing essential features while maintaining robustness against noise.

The Mexican hat wavelet is defined as:

ψ(t)=2π1/43σ(1t2σ2)exp(t22σ2)\psi(t) = \frac{2}{\pi^{1/4} \sqrt{3\sigma}} \left(1 - \frac{t^2}{\sigma^2}\right) \exp \left(-\frac{t^2}{2\sigma^2}\right)

where σ\sigma is the adjustable standard deviation of the Gaussian.

In conclusion, Wav-KAN represents a significant advancement in the design of interpretable neural networks, offering a powerful and versatile model that combines high performance with interpretability. The authors believe that Wav-KAN will be widely used in various fields and implemented in popular machine learning libraries like PyTorch and TensorFlow. The paper sets the stage for further exploration and implementation of Wav-KAN, making wavelets in KANs a widespread choice for activation functions, similar to the current popularity of ReLU and sigmoid in universal approximation theory (UAT).

[1] Bozorgasl, Zavareh and Hao Chen. “Wav-KAN: Wavelet Kolmogorov-Arnold Networks.” (2024).