Hardy-Ramanujan with probabilistic method

One of my favourite examples of the probabilistic method at work is the Hardy–Ramanujan theorem. It says that if you take a large integer \(x\), the number of distinct prime factors it has, written \(\omega(x)\), is usually very close to \(\log\log x\). More precisely, the fluctuations are typically no larger than about \(\sqrt{\log\log x}\).

At first sight this is the kind of statement one might expect to need advanced analytic tools. After all, it concerns the distribution of prime factors, which is usually a delicate subject. But in fact the proof rests on a very simple probabilistic observation; to sketch the main one out, if a random variable’s expectation and variance are “close enough”, one can use Chebyshev’s inequality to show that almost all cases of this random variable are assymptotically equal to its expectation with only a vanishing number of exceptions. Thus, if the expected value of \(\omega(x)\) is close to its variance, we can use clever bounds to find an elementary proof to a seemingly difficult number theoretic problem.

Theorem (Ramanujan). Let \(\phi\) be any function that grows arbitrarily slowly to infinity. For large enough \(N\), we have that \(\vert \{x \leq N :\vert \omega(x) - \text{log}(\text{log}(x)) \vert > \phi(x)\sqrt{\text{log}(\text{log}(x))} \} \vert = o(N)\)

To prove this statement, it suffices to show that \(\textbf{P}( \vert \omega(x) - \text{log}(\text{log}(x)) \vert > \phi(x)\sqrt{\text{log}(\text{log}(x) )}) = o(1)\). This looks like an application of Chebyshev’s inequality.

We can start by defining a random variable \(X\) denoting the number of prime divisors a randomly selected \(x \leq N\) has that are each less than \(N^{\frac{1}{3}}\) (notice that \(x\) can only have at most 3 prime factors larger than \(N^{\frac{1}{3}}\), so \(X\) will only be off by a constant additive factor). For a technical detail we will see later, it will be easier to analyze prime factors less than \(N^{\frac{1}{3}}\) rather than \(N\).

Since the probability that a prime \(p\) divides \(x\) is \(\frac{1}{p} + O\left(\frac{1}{N}\right)\), we have that

\[\textbf{E}[X] = \sum_{p \leq N^{\frac{1}{3}}} \frac{1}{p} + O\left(\frac{1}{N}\right)\]

by simply letting \(X\) be the sum of indicator variables representing whether a prime divides \(x\) or not. Since each indicator variable is a Bernoulli random variable, their individual variances are \(\textbf{P}(p \mid x) - \textbf{P}(p \mid x)^2\), meaning that

\[\textbf{Var}(X) = \left(\sum_{p \leq N^{\frac{1}{3}}} \frac{1}{p} - \frac{1}{p^2} + O\left( \frac{1}{N} \right) \right) - 2\left(\sum_{p < q \leq N^{\frac{1}{3}}} \textbf{E}[(\mathbb{I}(p \mid x))(\mathbb{I}(q \mid x))] - \textbf{E}[(\mathbb{I}(p \mid x))]\textbf{E}[(\mathbb{I}(q \mid x))] \right).\]

Notice that \((\mathbb{I}(p \mid x))(\mathbb{I}(q \mid x)) = 1\) if and only if both \(p\) and \(q\) divide \(x\). This is the same as if \(pq\) divides \(x\), which has probability \(\frac{1}{pq} + O\left(\frac{1}{N}\right)\). Thus,

\[\textbf{E}[(\mathbb{I}(p \mid x))(\mathbb{I}(q \mid x))] - \textbf{E}[(\mathbb{I}(p \mid x))]\textbf{E}[(\mathbb{I}(q \mid x))]\]


\[= \frac{1}{pq} + O\left(\frac{1}{N}\right) - \left(\frac{1}{p} + O\left(\frac{1}{N}\right)\right) \left(\frac{1}{q} + O\left(\frac{1}{N}\right) \right) = O\left(\frac{1}{N}\right)\]

and

\[\textbf{Var}(X) = \left(\sum_{p \leq N^{\frac{1}{3}}} \frac{1}{p} - \frac{1}{p^2} + O\left( \frac{1}{N}\right) \right) + 2\left(\sum_{p < q \leq N^{\frac{1}{3}}} O\left( \frac{1}{N}\right)\right).\]

Since we are summing at most \(N^{\frac{2}{3}}\) number of terms (the technical detail we discussed earlier), we have that each \(O(\frac{1}{N})\) vanishes. We can then use Merten’s estimates and the fact that the sum of reciprical of squares converges to get that

\[\textbf{E}[X] = \text{log}(\text{log}(N)) + O(1)\] \[\textbf{Var}(X) = \text{log}(\text{log}(N)) + O(1)\]

and then by Chebyshev’s inequality and the fact that \(\text{log}(\text{log}(x)) = \text{log}(\text{log}(N)) + O(1)\) with probability \(1 - o(1)\), we have that

\[\textbf{P}\left( \vert \omega(x) - \text{log}(\text{log}(x)) \vert > \phi(x)\sqrt{\text{log}(\text{log}(x) )}\right) \ll o(1) + \frac{\text{log}(\text{log}(N))}{\phi(x)^2\text{log}(\text{log}(N))} = o(1)\]

meaning that

\[\vert \{x \leq N :\vert \omega(x) - \text{log}(\text{log}(x)) \vert > \phi(x)\sqrt{\text{log}(\text{log}(x))} \} \vert = o(N).\]


Note on Merten Estimates

Merten’s estimates can be derived using elementary techniques. All you really need is that \(\text{log}(n!) = n\text{log}(n) + O(n)\), as

\[\text{log}(n!) = \sum_{p \leq n} \text{log}(p) \lfloor \frac{n}{p} \rfloor = O(n) + n\sum_{p \leq n} \frac{\text{log}(p)}{p}\] \[\sum_{p \leq n}\frac{\text{log}(p)}{p} = \text{log}(n) + O(1)\]

Then, using summation by parts on \(\frac{\text{log}(p)}{p}\) and \(\frac{1}{\text{log}(p)}\), we have that

\[\sum_{p \leq n} \frac{1}{p} = \text{log}(\text{log}(n)) + O(1)\]



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Minimum number of lines from non-collinear points
  • Diophantine Approximation and Transcendental Numbers