Fast Fourier Transform: Algorithm Principles and Iterative Implementation
Polynomial Representations
A polynomial $A(x) = \sum_{i=0}^{n-1} a_i x^i$ of degree $n-1$ admits two canonical representations. The coefficient representation stores the vector $[a_0, a_1, \ldots, a_{n-1}]$. Alternatively, the point-value representation consists of $n$ distinct evaluation pairs ${(x_0, A(x_0)), (x_1, A(x_1)), \ldots, (x_{n-1}, A(x_{n-1}))}$.
Point-value representations enable $O(n)$ polynomial multiplication: if $C(x) = A(x) \cdot B(x)$, then $C(x_i) = A(x_i) \cdot B(x_i)$ for any evaluation point $x_i$. The computational bottleneck lies in converting between coefficient and point-value forms efficiently.
Complex Roots of Unity
The Discrete Fourier Transform evaluates polynomials at the $n$-th roots of unity. Define $\omega_n = e^{2\pi i/n}$, where these complex numbers satisfy:
- Periodicity: $\omega_n^{k+n} = \omega_n^k$
- Reflection: $\omega_n^{k+n/2} = -\omega_n^k$ for even $n$
- Halving: $\omega_n^{2k} = \omega_{n/2}^k$
These properties enable the decomposition of large DFTs into smaller transforms.
The Forward Transform
To compute $A(\omega_n^k)$ for all $k \in [0, n-1)$, partition $A(x)$ by parity of indices:
$$A(x) = A_{\text{even}}(x^2) + x \cdot A_{\text{odd}}(x^2)$$
where $A_{\text{even}}$ contians coefficients $a_0, a_2, \ldots$ and $A_{\text{odd}}$ contains $a_1, a_3, \ldots$. Substituting $\omega_n^k$ yields:
$$A(\omega_n^k) = A_{\text{even}}(\omega_{n/2}^k) + \omega_n^k \cdot A_{\text{odd}}(\omega_{n/2}^k)$$
For indices in the second half ($k + n/2$):
$$A(\omega_n^{k+n/2}) = A_{\text{even}}(\omega_{n/2}^k) - \omega_n^k \cdot A_{\text{odd}}(\omega_{n/2}^k)$$
This recurrence reduces the time complexity from $O(n^2)$ to $O(n \log n)$ via divide-and-conquer.
Inverse Transformation
Recovering coefficients from point-values requires evaluating the polynomial at the conjugate roots $\omega_n^{-k}$ and normalizing by $n$:
$$a_k = \frac{1}{n} \sum_{j=0}^{n-1} y_j \omega_n^{-jk}$$
where $y_j = A(\omega_n^j)$. This symmetry permits using identical algorithmic strucutre for both forward and inverse transforms, differing only by the sign of the exponent and a scaling factor.
Iterative Optimization
Bit-Reversal Permutation
The recursive decomposition induces a specific ordering equivalent to bit-reversed indices. For a transform of size $n = 2^m$, if $j$ represents the bitwise reversal of $i$ (e.g., $001_2 \to 100_2$ for $n=8$), swapping elements at positions $i$ and $j$ (when $j > i$) eliminates recursive calls.
Compute reversals iteratively:
rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (logN - 1));
Butterfly Operations
Each computational level processes pairs of elements according to the butterfly pattern:
upper = even + omega * odd
lower = even - omega * odd
Processing by increasing segment lengths (2, 4, 8, ..., n) with in-place updates maximizes cache locality compared to the recursive approach.
Implementation
#include <bits/stdc++.h>
using namespace std;
struct Complex {
double real, imag;
Complex(double r = 0, double i = 0) : real(r), imag(i) {}
Complex operator+(const Complex& o) const { return Complex(real + o.real, imag + o.imag); }
Complex operator-(const Complex& o) const { return Complex(real - o.real, imag - o.imag); }
Complex operator*(const Complex& o) const {
return Complex(real * o.real - imag * o.imag, real * o.imag + imag * o.real);
}
};
void fft(vector<Complex>& data, bool invert) {
int n = data.size();
int logN = __builtin_ctz(n);
vector<int> bitRev(n);
for (int i = 0; i < n; ++i) {
bitRev[i] = (bitRev[i >> 1] >> 1) | ((i & 1) << (logN - 1));
if (bitRev[i] > i) swap(data[i], data[bitRev[i]]);
}
for (int len = 2; len <= n; len <<= 1) {
double angle = 2 * M_PI / len * (invert ? -1 : 1);
Complex omega(cos(angle), sin(angle));
for (int i = 0; i < n; i += len) {
Complex factor(1, 0);
int half = len >> 1;
for (int j = 0; j < half; ++j) {
Complex even = data[i + j];
Complex odd = factor * data[i + j + half];
data[i + j] = even + odd;
data[i + j + half] = even - odd;
factor = factor * omega;
}
}
}
if (invert) {
for (auto& x : data) {
x.real /= n;
x.imag /= n;
}
}
}
vector<int> multiply(const vector<int>& a, const vector<int>& b) {
int resultSize = 1;
while (resultSize < (int)a.size() + (int)b.size()) resultSize <<= 1;
vector<Complex> fa(resultSize), fb(resultSize);
for (int i = 0; i < (int)a.size(); ++i) fa[i] = Complex(a[i], 0);
for (int i = 0; i < (int)b.size(); ++i) fb[i] = Complex(b[i], 0);
fft(fa, false);
fft(fb, false);
for (int i = 0; i < resultSize; ++i) fa[i] = fa[i] * fb[i];
fft(fa, true);
vector<int> result(resultSize);
for (int i = 0; i < resultSize; ++i) result[i] = (int)(fa[i].real + 0.5);
return result;
}