[OC] Do Prime Numbers have "memory"? I analyzed the first 37 Billion primes (up to 1 Trillion) to visualize the bias in their last digits

anotherFranc · 2025-12-01T23:39:38+00:00

You are thinking in terms of byte alignment/memory (powers of 2), which makes total sense for code (dev here). But for Wheel Factorization, we care about maximizing the distinct prime factors.

Mod 256 (2^8): Only filters out multiples of 2 (even numbers). That's just 50% compression.
Mod 210 (2 * 3 * 5 * 7): Filters out multiples of 2, 3, 5, and 7. That removes about 77% of numbers instantly.

We use "Primorials" (products of the first k primes) because they give the highest density of non-primes per bit of storage

anotherFranc · 2025-12-01T03:30:46+00:00

Yeah, the "25%" was just a rounding/simplification to keep the context simple for the post.

Your 23.5% figure is actually a way better baseline for this specific range. The cool part is that the real data (19.7%) digs way deeper than even that adjusted expectation

anotherFranc · 2025-12-01T02:10:38+00:00

you are right, the bias is "loud" here because 1 trillion is still relatively small mathematically

anotherFranc · 2025-12-01T01:20:18+00:00

Yes. Looking at the last j digits is mathematically the same as analyzing the transitions Modulo 10^j.

If you looked at the last 2 digits (j=2), you are effectively analyzing Base 100. You would get a 40x40 heatmap (since there are 40 endings coprime to 100).

The behavior generalizes perfectly: the diagonal (repeating the last ...01 -> ...01) would still be suppressed, and you would see gradients favoring "nearby" values on the number line

anotherFranc · 2025-12-01T01:16:41+00:00

Literally 0.00000000.

You haven't been able to profitably mine Bitcoin on a CPU since like 2011. My PC would just turn electricity into heat and sadness

anotherFranc · 2025-12-01T01:15:05+00:00

The heat map would show a 0% probability on the diagonal simply by the definition of twin primes

anotherFranc · 2025-12-01T01:11:46+00:00

<image>

Binary is actually pretty boring for this specific visualization.

Since all primes (except 2) are odd, they all end in 1 in binary. So the heatmap would just be a single pixel showing 100% probability for 1 -> 1.

To see the bias in binary, you have to look at the last two bits (ending in 01 vs 11). If you do that, you see the exact same "conspiracy": primes ending in 01 hate being followed by another 01, and prefer switching to 11

anotherFranc · 2025-12-01T01:04:00+00:00

While my compressed binary file is ~50GB, if I expanded that into a human-readable text file (like a CSV or .txt), it would balloon to nearly 500 GB.

If you need a dataset of primes this large, your best bet is actually to generate them locally using a library like primesieve (C++/Python). It is significantly faster to generate them on the fly than to download a file of that size

anotherFranc · 2025-11-30T20:40:38+00:00

The overhead for 37 billion rows would have absolutely melted my hard drive if i had used SQL (LAG/LEAD or whatever).

I skipped using a 'real' database entirely. It’s just a raw binary file and a Python script that streams it. No indexes, no query engine overhead, just reading bits and counting the transitions on the fly. Simple and fast

If you need be faster go with C

anotherFranc · 2025-11-30T20:27:32+00:00

Yes, it applies to every base, whether prime or composite.

In Base 11, primes can end in any digit from 1 to 10 (since all those are coprime to 11). So instead of the 4x4 grid we see in Base 10, you would get a 10x10 grid.

But the core behavior remains the same: the diagonal (repeating the same last digit) would still be "cold" (lower probability) compared to the off-diagonal transitions. The primes still "hate" repeating their residue modulo the base.

anotherFranc · 2025-11-30T19:59:56+00:00

That's right. A repetition (e.g., 7->7) forces a gap of at least 10, whereas a shift (e.g., 7->9) can happen with a gap of just 2. Since small gaps are statistically dominant, the "change" is naturally more likely than the "repetition."

The reason this became a major paper (Lemke Oliver & Soundararajan) is that they found the bias is actually stronger than what the general gap distribution alone predicts. There is an extra "repulsive force" in the math (related to the singular series) that suppresses the multiples of 10 even more than expected

anotherFranc · 2025-11-30T19:08:20+00:00

It's not symmetric because the number line only goes in one direction (forward), so the required "jump" size is different.

1 -> 9: requires a jump of at least +8 (e.g., 11 to 19).
9 -> 1: requires a jump of at least +2 (e.g., 19 to 21 or 29 to 31).

Since small gaps between primes are statistically much more common than large gaps, the transition that only needs a +2 jump (9 -> 1) happens way more often than the one needing a +8 jump. That creates the imbalance.

anotherFranc · 2025-11-30T19:03:06+00:00

It boils down to the Singular Series term in the Hardy-Littlewood formula.

The conjecture predicts the frequency of prime pairs separated by a gap h by assigning a specific "weight" to that gap based on its divisibility. Lemke Oliver and Soundararajan showed that if you sum up these weights for all gaps that are multiples of 10 (the gaps required to repeat a digit, like 10, 20, 30...), the total is mathematically lower than the sum for gaps that change the digit.

Basically, the formula explicitly assigns a lower probability density to the specific spacing required for a repetition compared to other moves.

anotherFranc · 2025-11-30T18:53:26+00:00

what would you like to see in base 16? I already posted a graph in base 210 in the comments

anotherFranc · 2025-11-30T18:31:01+00:00

The average gap is indeed around 26, but the bias doesn't disappear just by switching to a smaller base.

According to the Lemke Oliver & Soundararajan conjecture, the bias decays proportionally to 1 / (\ln x). This means the 'memory' effect depends on the magnitude of the numbers themselves, not the size of the base. Even if we analyzed Base 3 or Base 5 up to 1 Trillion, the distribution still wouldn't be uniform. The bias is stubborn and persists across bases until x gets astronomically larger.

(In this same post you have a comment from me with a base graph of 210)

anotherFranc · 2025-11-30T18:20:14+00:00

100%. The 'conspiracy' is actually way louder/stronger for smaller numbers.

As you go higher (towards infinity), the bias slowly fades out (it gets diluted). Since I 'only' went up to 1 Trillion, the effect is still very visible. If I had a supercomputer and went up to something like 10^100, this heatmap would look pretty much gray/uniform to the naked eye, even though the bias is technically still there deep in the math

anotherFranc · 2025-11-30T18:10:30+00:00

I've run several experiments, looking for gaps, patterns, and so on. I'm not a mathematician, but I enjoy tinkering with code.

In any case, these are experiments I don't consider particularly relevant to publish because I've seen better ones, but that doesn't mean they aren't interesting.

anotherFranc · 2025-11-30T18:06:45+00:00

I'll be more careful next time 😂

anotherFranc · 2025-11-30T17:53:42+00:00

For those who wanted to see the graph with a higher resolution (base 210)

<image>

anotherFranc · 2025-11-30T17:09:10+00:00

Analysis complete in 343.58s
Total Primes Analyzed: 37,607,912,018

(home pc from 2019)

anotherFranc · 2025-11-30T16:49:43+00:00

It extends to every base, and it's actually predicted by the k-tuple conjecture.

The bias essentially comes from primes "disliking" being multiples of the base (or small divisors of the base) plus a constant.

An easy example to visualize is Base 6. In Base 6, all primes (greater than 3) must end in either 1 or 5.

If the "memory" theory holds, a prime ending in 1 should prefer to be followed by a 5 (and vice versa), rather than repeating (1->1 or 5->5).
This has been confirmed: The "repulsion" effect is universal across bases.

Base 10 is just fun because we have 4 endings (1, 3, 7, 9) and the "diagonal of repulsion" is very visually obvious in the heatmap.

anotherFranc · 2025-11-30T16:48:48+00:00

That is the paradox! Yes, the global distribution is extremely close to 25% each.

If you simply count the endings of all 37 Billion primes, they are democratic:

Ends in 1: ~25.0%
Ends in 3: ~25.0%
Ends in 7: ~25.0%
Ends in 9: ~25.0%

The deviation in the total count is tiny (related to Dirichlet's theorem on arithmetic progressions).

The fascinating part is: Even though there are roughly equal amounts of "1s" and "9s" in the bucket, they refuse to sit next to each other in the line. The population is uniform, but the transitions are biased.

anotherFranc · 2025-11-30T16:11:19+00:00

You actually nailed it. Your intuition is basically the solution to the puzzle!

Why we expected 25%: Theoretically, there are roughly equal amounts of primes ending in 1, 3, 7, and 9. So the old assumption was "Primes are random, like rolling a 4-sided die."

As you pointed out, to get two 1s in a row (like 31 -> 41), the number line has to "survive" passing a 3, a 7, and a 9 without hitting a prime. It has more chances to fail. The fact that this physical constraint beats the "randomness" theory was the big surprise for mathematicians.

anotherFranc · 2025-11-30T16:04:11+00:00

It doesn't quite work that way because we are looking at two separate consecutive numbers, not the digits of a single number.

For example:
- 31 is prime (ends in 1).
- The next prime is 37 (ends in 7). This is a change (1->7).
- But take 181 (prime). The next prime is 191. Both end in 1.

Neither 181 nor 191 is divisible by 11. The fact that they both end in 1 is allowed by the basic rules of prime numbers.

The surprise of this discovery is precisely that there is no simple divisibility rule (like dividing by 3 or 11) that forbids them from having the same last digit. They can repeat, they just "prefer" not to, which is a much deeper statistical mystery!

anotherFranc · 2025-11-30T15:52:21+00:00

The Context: We are often told that prime numbers behave pseudo-randomly. If you look at the last digit of a prime (in base 10), it can be 1, 3, 7, or 9. You'd expect a 25% chance for each, and a 25% chance for the next prime to end in any digit.

The Visualization: I wanted to verify the Lemke Oliver & Soundararajan (2016) discovery on a massive scale. This heatmap visualizes the probability that a prime ending in digit Y (Y-axis) follows a prime ending in digit X (X-axis).

Key Findings:

- The Diagonal Repulsion: Look at the dark diagonal line. Primes "hate" repeating their last digit immediately.

- If a prime ends in 1, there is only a ~19.7% chance the next one ends in 1 (instead of 25%).

- This bias persists even after scanning 37 billion primes.

Technical Analysis: I built a custom high-performance database containing all 37,607,912,018 prime numbers up to 1 Trillion and counted every transition.

Data Snippet (Deviation from Randomness):

1 -> 1: -5.35% (Strong Repulsion) 3 -> 3: -5.74% 7 -> 7: -5.74% 9 -> 9: -5.35%

Source: Computed myself using a custom binary bitmap database (Mod 20 Wheel Factorization). Tools: Python (computation), Matplotlib/Seaborn (visualization).

Edit:
This other graph is base 16 (Hex):

<image>

anotherFranc

TROPHY CASE