An elementary proof of a key lemma in Shor's quantum factoring algorithm 
The quantum factoring algorithm[1] of Shor depends on a crucial lemma involving the behavior of DFT's of periodic vectors. This step in the derivation is advanced, and is a difficult and deep line of reasoning. The excellent presentation of quantum computing by Vazirani[2] notes: "the quantum Fourier transform can detect the period of a periodic vector even if it does not divide M. But the derivation is not as clean as in the case when the period does divide M, so we shall not go any further into this." 
The point of this note is to provide an alternative proof of the lemma that is based on elementary mathematics, and is relatively easy to understand. 
We will find it useful to consider a particularly simple type of vector in C^{N}, which we will call a "comb vector". Informally, a comb vector is a vector with a sequence of evenly spaced 1's, and zeroes everywhere else: 
[1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0]' [0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0]' [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]' [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]'
Definition: A "comb vector" V in C^{N} with period p ≤ N and offset M < p is a vector of 1's and 0's with a value of 1 at each position 
i = M + k * p
such that 0 ≤ i < N for some integer k, and 0 elsewhere. 
(The symbol "*" will be used to denote standard scalar and matrix multiplication.) 
We define "DFT(V)" here as "dftMatrix * V", where 
dftMatrix[i,j] = e^{i*j * 2 I π/N} / sqrt(N).
In what follows we will use a convenient notation "θ_{N}" as a shorthand for "2 * π / N", the angle obtained by dividing a full circle into N equal angles. The above definition is then expressed as 
dftMatrix[i,j] = e^{i*j * I θN} / sqrt(N).
Definition: "Prob(V)" is a function that takes a nonzero vector V in C^{N} and normalizes it: Each coordinate of V is multiplied by its own conjugate, giving the square of the length of that coordinate, and the resulting vector is divided by its length. This gives a vector of nonnegative real values that sum to 1.0. Such a vector can be interpreted as defining a discrete probability distribution. 
Theorem: Consider a comb vector V in C^{N} with period p and offset m. (Assume N is a power of 2.) 
Form the vector P by taking the discrete Fourier transform of V and forming probabilities: 
P = Prob( DFT(V) )
Then the probabilities in the vector P are concentrated around p locations: 
P[0], P[round(N/p)], P[round(2 * (N/p)] .. P[round((p1) * (N/p))].
Specifically, each of these p locations has a value greater than 0.4 / p. 
It is important to note that even if a comb vector has a nonzero initial offset (i.e., [ 0, 0, .. , 1, 0 .. ]'), the DFT of this vector has its p equally spaced regions of high weight "shifted" to the left, and the first one is at position zero of the output vector. 
Let P be Prob(DFT(V)) as above. A few observations: 
 The probability vector P has p separate coordinates with large probability; together, these coordinates have probability at least 0.4. 
 The first coordinate of P is one of those with large probability, and the gaps between coordinates with high probability are approximately N/p. 
 If one of the expressions i * (N/p) is an integer, then P[ i * (N/p) ] has a probability of 1/p. This is the value it would have if all of the probability were uniformly concentrated on p locations. 
 If one of the expressions i * (N/p) defining the locations of high probability is exactly half way between two integers, then the two adjacent coordinates bracketing i * (N/p) each have probability at least 0.4/p. So, that local region of P has a probability of at least 0.8/p. 
 It turns out that every one of the expressions i * (N/p) that is not an integer is bracketed on either side by coordinates that add up to at least 0.8/p. 
 Consequently, at least 80% of all of the probability in the entire vector P is concentrated around p evenly spaced locations starting at coordinate 0. 
(The simplest proof of this last assertion that I can come up with is substantially more complicated than the proof of the less powerful theorem proved here.) 
Before diving into the technical details of the proof, here is a synopsis of the essential idea. 
Consider a comb vector V in C^{N} with period p and offset m: 
[0, .. 0, 1, 0, .. 0, .. 0, 0, .. 0, 1, 0, .. 0, .. 0, .. 0, .. 0, 1, 0, .. 0]'.
(We make no assumption that p evenly divides N.) 
We will decompose V into a particularly useful set of p vectors
V_{0}, .. V_{p1} that exactly sum to
V.
[e^{( 0 * I θN)}, e^{( 1 * I θN)}, .. e^{(N1) * I θN)}]
exp(e^{iθ}, r) = e^{iθ' * r}
c_{0} Ω^{0 * N/p} c_{1} Ω^{1 * N/p} c_{2} Ω^{2 * N/p} .. c_{p1} Ω^{(p1) * N/p}
c_{i} = e^{m * 2 I π/N } / p
Ω^{i * N/p}, i in [0 .. p1].
Prob( DFT(Ω^{r}) )
An Example
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, <23 more repetitions> 1, 0, 0, 0, 0]'
[1, 1, ... 1]' / 10 Ω^{25.6} / 10 Ω^{51.2} / 10 Ω^{76.8} / 10 Ω^{102.4} / 10 Ω^{128.0} / 10 Ω^{153.6} / 10 Ω^{179.2} / 10 Ω^{204.8} / 10 Ω^{230.4} / 10

vector  index 1  index 2  prob 1  prob 2  total probability 
DFT([1, 1, ... 1]')  0    1.0    1.0 
DFT(Ω^{25.6})  25  26  0.2545  0.5727  0.8272 
DFT(Ω^{51.2})  51  52  0.8751  0.0646  0.9297 
DFT(Ω^{76.8})  76  77  0.0646  0.8751  0.9297 
DFT(Ω^{102.4})  102  103  0.5727  0.2545  0.8272 
DFT(Ω^{128.0})  128    1.0    1.0 
DFT(Ω^{153.6})  153  154  0.2545  0.5727  0.8272 
DFT(Ω^{179.2})  179  180  0.8751  0.0646  0.9297 
DFT(Ω^{204.8})  204  205  0.0646  0.8751  0.9297 
DFT(Ω^{230.4})  230  231  0.5727  0.2545  0.8272 
For the sequence of lemmas that lead to the final desired result, we will consider a single one of our p vectors of the form Ω^{r}. 
We will closely examine the behavior of this vector for values of r in [0.5 .. 0.5]. It will then be a simple matter to extend to all other real values for r. 
The first step will be to show that the first coordinate of Prob(DFT(Ω^{0.5})) has a value greater than 0.4. (This and the following results all refer to vectors in C^{N} for N a power of 2.) 
Second, we will show that the first coordinate of Prob(DFT(Ω^{r})) for r in [0.5 .. 0.5] is minimized at r = +/ 0.5. 
These facts together will allow us to conclude that the first coordinate of Prob(DFT(Ω^{r})) is greater than 0.4 for all r in [0.5 .. 0.5]. 
Then, using the shifting theorem we will observe that Prob(DFT(Ω^{k+r})) has probability greater than 0.4 in coordinate position "k mod N" for all integer k and real r in [0.5 .. 0.5]. 
The result then follows immediately: Given the sum of a collection of p "Ω^{r}" vectors that add up to a comb vector V, taking their DFT's individually, and then summing them, produces the desired output vector with p equally spaced local regions of high probability. The distributivity property then gives us the desired final result. 
Lemma: The set of all periodic vectors in C^{N} with period p ≤ N is contained in a pdimensional linear subspace S_{p} of C^{N}. 
Proof: Let M be any fulldimensional p x p matrix. Form an N x p matrix S out of the rows of M: the i'th row of S is the (i mod p)'th row of M. Every periodic vector P with period p is a linear combination of the columns of S. 
To see this, consider vector p1
that contains the first p elements of P. This pdimensional vector is a linear
combination of the columns of M,
since M is a fulldimensional square p x p matrix: p1 = M p1', where p1' = M^{1} p1.
Our original vector P is simply S p1', since the i'th element of P equals the (i mod p)'th
element of P and the i'th row of S equals the (i mod p)'th row of M by construction.
QED. 
Intuitively, we take the p x p matrix M, and stack copies of it vertically to make the N x p matrix S. (If p does not divide N, the last copy of M will not have all of its rows.) 
Lemma: A basis for S_{p} is the following set of p vectors based on Ω: 
Ω^{0 * (N/p)} Ω^{1 * (N/p)} Ω^{2 * (N/p)} ... Ω^{(p1) * (N/p)}
Proof: Take M in the above proof to be the p x p positive DFT matrix for C^{P}. The columns of the n x p matrix S as formed above are the desired p vectors. Consider coordinate j of vector k from the above list: 
ω^{j * k * (N/p) i 2 π / N} = ω^{j * k * i 2 π / p} = ω^{(j mod p) * k * i 2 π / p}
The last step above is due to the fact that f(m) = ω^{m i 2 π / p} is a periodic function with period p. Removing a multiple of p simply means that we go around the unit circle fewer times before ending up in the same place. 
On the other hand, S[j, k] = ω^{(j mod p) * k * i 2 π / p} by construction.
QED. 
In the above lemma, the "(N/p)" term in the exponent changes the scalar terms of Ω from powers of ω^{2 π i / N} to powers of ω^{2 π i / p}. This changes the stepsize or angle at which the elements of Ω step around the unit circle. 
Lemma: Given a comb vector V in C^{N} with period p, the following p vectors sum to V: 
Ω^{0 * (N/p)} / p Ω^{1 * (N/p)} / p Ω^{2 * (N/p)} / p ... Ω^{(p1) * (N/p)} / p
Proof:
The vectors are simply the basis vectors from the previous lemma. The constants 1/p are
the values of the vector V reexpressed in the new basis.
QED.
f(r) = Σ(Ω^{r}) * conjugate(Σ(Ω^{r}).
Σ(V) * Σ(conjugate(V)).
e^{(r * i * I θN * i)} * e^{(r * (j) * I θN)}.
e^{(r * (ij) * I θN)}.
e^{(r * (ij) * I θN)} + e^{(r * (ji) * I θN)}.
cos( r * (ij) * θ_{N}) + I sin( r * (ij) * θ_{N} + cos((r * (ij) * θ_{N})) + I sin((r * (ij) * θ_{N}) = cos( r * (ij) * θ_{N}) + I sin( r * (ij) * I θ_{N}) + cos( (r * (ij) * θ_{N}))  I sin( (r * (ij) * I θ_{N}) = 2 cos(r * (ij) * θ_{N}).
0.5 * θ_{N} * (N1) to 0.5 * θ_{N} * (N1).
cos(+/0.5 * θ_{N} * (N1)) for r = 0.5 or 0.5.
DFT(V) = DFT(V_{0} + .. + V_{p1}) = DFT(V_{0}) + .. + DFT(V_{k1}).
Conclusion
References
