Yet another attempt at developing intuition for the Fast Fourier Transform

Fourier analysis is one of the key building blocks for much of the technology in the modern world. Its widespread applicability and use is due in large part to the development of an efficient implementation of the discrete fourier transform, the Fast Fourier Transform ('FFT' for short).

However, it can be a bit difficult to get a firm intuitive grasp on the FFT. Having struggled with it a while, I found a way to look at the FFT that intuitively clicked; this note will lay that approach out with the hope that at least some others will find it useful in augmenting their intuition.

Here's a first formulation of a related problem.

Given a vector y of length N, find a vector beta such that

    y = T beta
where T is a special matrix to be described below. T will turn out to be invertable, so a simple way to solve the problem is
    beta = inverse(T) y
A handy way to look at this is that we are simply re-expressing y in terms of a new set of basis vectors, the columns of T.

So, what are the vectors that make up the columns of T? They are based on the trigonometric functions sin(x) and cos(x) of various frequencies (with the exception of the first as noted below.)

Specifically, the first column of T is a vector of all 1's. (This is used to "move y up or down" so that it is centered on the X axis.)

For the second vector, create a sin curve that goes from 0 to N in one "loop", i.e.,

    sin(2 PI t / N), t in [0 .. N]
and create an N-vector by sampling it at the values 0, 1, .. N-1.

The third vector is paired with the second one; it is the co-sin curve that goes from 0 to N in one loop, again sampled at the values 0, 1, .. N-1.

The fourth and fifth columns of the T matrix we were looking at before the interlude are sin and co-sin curves made from two "loops" going from 0 to N.

Then three loops, then four loops, etc. until you have N columns.

So, the vector beta re-expresses the original vector y in a new basis where that basis de-composes y in terms of {sin,cos} pairs of various frequencies.

An interlude on motivation

Why would you want to do this recasting? My impression is that engineers find it incredibly useful to do things like take a digitally represented sound track, break it up into small pieces, re-express each piece in terms of constituent frequencies as described above, hack on those frequencies, and then map this back to the "time domain" (i.e., multiply the modified beta vector by T) and concatenate the pieces back together.

Flight dynamics people are interested in how and when an aircraft will respond to contol inputs, vibration modes, oscillations, etc., and this sort of frequency analysis is very useful to them also.

An interesting and crucial geometrical observation

Consider a given frequency "Freq" (a real number interpreted as how often something happens). For example if Freq were 10, this might mean something is happening repeatedly at a rate of 10 times per second.

Here is the interesting observation: For any real A and B, offset1 and offset2,

    A sin(Freq * t + offset1) + B sin(Freq * t + offset2)
is just another sin curve of the form
    K sin(Freq * t + offset)
In other words, for a given frequency, trigonometric curves are closed under multiplication and shifting. Moreover, given any sin curve of some amplitude K1 and phase K2, you can find an A and B such that
    A sin(theta) + B cos(theta)
equals it. These observations are crucial to Fourier analysis; they say that any stretched and shifted sin curve of a given frequency can be re-expressed as the sum of a pair of unshifted trigonometric curves of the same frequency.

A handy way to visualize this is to think of

    A sin(Freq * t +offset1)
as a vector on the plane which rotates counterclockwise like the hand of an Alice in Wonderland clock. This clock hand has length A. At time t=0, the hand of the clock is at angle offset1. As t grows, the hand spins around counterclockwise, and returns to its original position every
    2 PI / Freq
time units. Another clock hand has length B, and starts at position offset2 at time t=0. We can imagine viewing these two clock hands as vectors, and adding them together to get a third vector with length C and position offset3 at time t=0. This latter vector is simply another clock hand that rotates with frequency Freq, and can be characterized by the formula
    C sin(Freq * t + offset3).

Author's note: Thanks for explaining the above to me, Dad.

Back to the main story

The next intuitive step is to consider a different but related matrix I will call W. The elements of W are complex numbers. Again, the problem is, given a vector y (which can now be complex, but as a practical matter is almost always real), find beta such that

y = W beta.

Again, "beta = inverse(W) y" is a perfectly fine solution in principle, but is slow and numerically unstable to compute.

(The transition from real-valued trigonometric vectors to complex vectors is a conceptual annoyance at first, but once you get comfortable with it the yield in terms of mathematical convenience is enormous. And, the FFT algorithm crucially depends on this transition.)

So, what is W? Again, the first column is a vector of 1.'s.

For the second column, take the unit circle on the complex plane, start where it crosses the positive X axis, and go counter-clockwise in N equally spaced (smallest possible) steps around the circle. (I.e., cut the unit circle up into N pie slices, and step around the edges of the slices.)

For the third column, you do the same thing, but going in the clockwise direction instead.

So, one of these vectors goes around the unit circle counter-clockwise, and the other goes around the unit circle clockwise. Note that respective pairs of points in the two vectors line up vertically, and they are the same distance above (resp. below) the X axis on the complex plane.

Fleshing out the remaining columns of W now is similar to the columns of T above.

The fourth column is made by again starting on the X axis, and taking N counter-clockwise steps around the complex unit circle, but this time using twice the step size. Doing this, you will go around the circle twice in N steps.

For the fifth column you do the same thing clockwise.

Then 3 steps at a time counter-clockwise (resp. clockwise), then 4 steps, etc.

Just keep going this way until you get N columns for your complete W matrix.

(Assuming N is even, the last "pair" formed this way will be two identical vectors of alternating 1's and -1's, so you only use one of them.)

Now, we define an important quantity:

omega_N is defined to be exp(2 PI i / N).

(Geometrically, envision omega_N as follows: take the unit circle, and divide it up into N equal pie-slices. omega_N is the point on the unit circle that is one pie-slice width up from where the unit circle hits the positive X axis.)

In terms of omega_N, the second and third vectors above are

  | 1.              |
  | omega_N         |
  | omega_N ^ 2     |
  | omega_N ^ 3     |
  | ...             |
  | omega_N ^ (N-1) |


  | 1.               |
  | omega_N ^ -1     |
  | omega_N ^ -2     |
  | omega_N ^ -3     |
  | ...              |
  | omega_N ^ -(N-1) |

respectively. (Because exponentiation on the unit circle on the complex plane moves you around but leaves you on the unit circle, we can always use modular arithmetic to re-express the exponents of omega_N. So, the first few elements of the second vector above could have been written "1.", "omega_N ^ (N-1)", "omega_N ^ (N-2)" etc.)

As a notational convenience that will be very useful to the FFT algorithm as described below, I will call the first vector above "circlevec_N". It is just the vector made up of N equally spaced points around the unit circle starting at <1., 0.> and going counter-clockwise.

Using Euler identities, the first vector has elements

    cos(theta) + i sin(theta)

    theta = 0, 1 * 2 PI / N, 2 * 2 PI / N, .. (N-1) * 2 PI / N

and the second vector has elements

    cos(theta) - i sin(theta)

    theta = 0, 1 * 2 PI / N, 2 * 2 PI / N, .. (N-1) * 2 PI / N

So now, you have these basis vectors composed of all kinds of ugly complex numbers.

How do you use them to express real-valued vectors?? It is interesting to look at a couple of examples. Let's say your real-valued function is cos(theta).

Well, picking constants of 0.5 and -0.5 to multiply columns 1 and 2 of W will do the trick:

    0.5 * (cos(theta) + i sin(theta))
  + 0.5 * (cos(theta) - i sin(theta))

The imaginary sin terms are canceled out and what is left is the real-valued vector cos(theta).

Similarly, if your desired function is sin(theta), the constants "-0.5 * i" and "0.5 * i" will work:

  - 0.5 * i * (cos(theta) + i sin(theta))
  + 0.5 * i * (cos(theta) - i sin(theta))

The co-sin terms cancel out, and i^2 terms cause the sin terms to add up to the real-valued function sin(theta).

(In fact, if the multipliers of the pair of columns at a given frequency K are any complex conjugates "a + bi" and "a - bi", the imaginary terms will always cancel out and the result will be real-valued at that frequency, specifically "2a cos(K theta) - 2b sin(K theta)".)

Given the above hopefully intuitive description of the vectors of W, it is now useful to re-arrange them to make the description of the FFT algorithm cleaner. Column K of the re-arranged W matrix will be circlevec_N raised to the Kth power:

   | 1.  1.            1.                       1.                    |
   | 1.  omega^1       omega^2                  omega^(N-1)           |
   | 1.  omega^2       omega^4                  omega^(2 * (N-1))     |
   | 1.  omega^3       omega^6                                        |
   | ..                                                               |
   | 1.  omega^(N-1)   omega^((N-1) * 2)  ...   omega^((N-1) * (N-1)) |

(I have used modular arithmetic to change the negative exponents used to define W above to positive ones here.)

Another interlude on motivation

Computer scientists have found a useful application of the FFT that is a bit more abstract than the previous examples: fast multiplication of polynomials.

If we want to multiply two polynomials f(x) and g(x) of maximum exponent n, we can always imitate fourth-grade arithmetic in multiplying multiple-digit numbers, do n^2 multiplications of all combinations of coefficients from f(x) and g(x), and add these products up to find the product polynomial. However, we can use the FFT to do much better than this.

A polynomial

    f(x) = c{0} + c{1} * x ..  + c{n-1} * x^(n-1) + c{n} * x^n
can be uniquely specified by giving its values on n+1 distinct points. (A straight line on the pane can be specified by giving two points on the line, a parabola can be specified by giving three points, etc.) Looking at the above matrix of powers of omega, if we post-multiply that matrix by the vector of coefficients
    [c{0}, c{1}, .. c{n}]'
of the polynomial, the output is exactly the evaluation of the polynomial at the n+1 points
    1, omega, omega^2, .. omega^(N-1).

To multiply f(x) and g(x) efficiently, we can take the FFT of each of the vectors of coefficients, add the resulting values pointwise, and then reverse-FFT those points to obtain the coefficients of the product polynomial. This is all accomplished in time O(nlogn), as opposed to the O(n^2) time required for the fourth-grade approach.

One practical application of this is to multiply very large numbers, which is needed for various encryption algorithms. (When you visit a secure web site to pay with a credit card, you are relying on these sorts of encryption algorithms.)

Back to the main story again..

The above matrix is symmetric, and importantly, it is also orthogonal, with each column (and row) vector of length sqrt(N). So, modulo a scalar constant, the inverse of W is obtained by taking the complex conjugate of each element of the matrix W. Call this inverse matrix U.

Geometrically, the U matrix can be constructed in a manner analogous to the W matrix, but at each step going in the opposite direction (clockwise versus counter-clockwise) from the direction we went for the W matrix.

So, the discrete fourier transform beta of y can be defined in either of two equivalent ways (modulo a scalar constant):

    y = W beta


    U y = beta.

(An annoying, inconvenient side note: as far as I can tell, engineers and computer scientists reverse the roles of U and W above. Until now I have given the engineering method because it seems more intuitive to me, but I will now "switch horses" to agree with the normal computer-science convention, and define W in terms of omega_N^-1 and U in terms of omega_N.)

With the above lead-in, here is the FFT algorithm:

input:  a vector y of length N, where N is a power of 2.

output:  a vector beta such that y = Wn beta, where "Wn" is the NxN W matrix.

beta = FFT(y):

    /* we will find beta by computing U y.
     * With this formulation, the FFT problem is nothing more than
     * an attempt to multiply our input vector y by a "special" matrix Un in
     * N logN time.)

    if len(y) == 1 then
        beta = y

        /* break y into 2 vectors of length N/2 */

        let y_even be the even-indexed elements of y.   /* y[0], y[2], ... */
        let y_odd be the odd-indexed elements of y.     /* y[1], y[3], ... */

        beta_even = FFT(y_even)
        beta_odd  = FFT(y_odd)

        /* the above vectors are of length N/2;
         * concatenate two copies of each to create vectors of length N:

        beta_even_N = concatenate(beta_even, beta_even)
        beta_odd_N = concatenate(beta_odd, beta_odd)

        /* Do a by-element multiply of beta_odd_N by "circlevec_N", the
         * complex N-vector gotten by dividing the complex unit circle into
         * N equal segments starting at 1 + 0i and going counter-clockwise.

        beta = beta_even_N + circlevec_N * beta_odd_N

A working ruby implementation of the above algorithm (designed for understandability rather than efficiency) can be found at
A really small Ruby implementation of the Fast Fourier Transform algorithm

(There are two definitions of multiplication floating around in this discussion; the superposition of two matrices "M N" means the standard matrix multiply, whereas the notation "M * N" is used to denote element-by-element multiplication of two matrices or multiplication of a matrix by a scalar.)

(There are lots of cute tricks that will make the above algorithm faster by a constant, but the above algorithm runs in time N logN, and it seems worthwhile to hold off on the cute tricks that speed it up by constant factors for the sake of clarity and understandability.)

Here's why the algorithm works.

We are computing "U y", where the columns of U are instances of our vector circlevec_N raised to various powers. (the powers 0, 1, 2, .. , N-1).

Given that we are looking for a vector beta such that "beta = U y", it turns out to be convenient to break the square matrix U into two "tall, thin" rectangular matrices U_even and U_odd.

(Note, by the way, that although we are talking a lot about matrices, in the FFT algorithm they are all fictitious; we never create or manipulate any matrices in the actual algorithm. For debugging and algorithm understanding, however, it is pretty handy to code up these matrix algorithms explicitly and play with them.)

U_even consists of the N/2 even columns of U, and U_odd consists of the N/2 odd columns of U. (The columns are numbered 0 .. N-1.)

Clearly, given beta such that beta = U y,

    beta = (U_even y_even) + (U_odd y_odd).

The square matrix of size (N/2) X (N/2) which is the top half of U_even (call it "U_even_top") is exactly U_(N/2), our familiar matrix U, but based on a new "omega" that is twice the stepsize around the unit cirle as our original omega_N.

This is (for me) easiest to see geometrically: Each of the even columns of the original matrix Un (i.e., each of the columns of U_even), is gotten by taking steps around the unit circle that are twice as long as the steps used to define the original matrix Un.

Furthermore, because of modular arithmetic over the unit circle on the complex plane and the fact that N is even, the bottom half of U_even exactly matches the top half of U_even. So, we can simply make two copies of the matrix product "U_even_top y_even" (gotten via the recursion) to get the desired N-vector beta_even = U_even y_even.

To treat the odd columns, U_odd, we note that they are odd powers of the circlevec_N vector, and we can factor out a copy of that vector. To be a bit proper about it, we need to form an NxN diagonal matrix with circlevec_N down the main diagonal and zeroes elsewhere. Then, we have

    U_odd = diag(circlevec_N) U_even.

Substituting, we get the following equation:

    beta = (U_even y_even) + (diag(circlevec_N) U_even) y_odd.

Using associativity of matrix multiplication, we can first solve "U_even y_odd", and then left-multiply the resulting N-vector by diag(circlevec_N).

But "U_even y_odd" is just another instance of the recursive sub-problem, which we can solve to yield an intermediate N-vector beta_odd_1. (As above, we recursively solve for the top half of U_even and then concatenate two copies of the result to get the N-vector.)

Furthermore, the left-multiply of beta_odd_1 by diag(circlevec_N) can be replaced by an element-by-element multiply of beta_odd_1 by the N-vector circlevec_N, to yield our desired beta_odd without the need for an N-squared matrix multiply.

We then add the even and odd sub-vectors together to arrive at the final desired result.