What is the longest known sequence that repeats in Pi? (homelab)

  • On a side note (it's mentioned in the article), it has been a while since I last visited OEIS (Online Encyclopedia of Integer Sequences), which has been of use over the years.

    Very much appreciate the amazing effort that is OEIS!

    I thought I should mention it to raise awareness: https://oeis.org/

    The sequence in the article: https://oeis.org/A197123

  • I wonder if you could do something with Bloom filters similar to:

      filters = {}
      candidates = []
      for i in range(len(pi)):
          block = pi[i:i+10]
          hash_prefix = block[:4]
          try:
              bloom = filters[hash_prefix]
          except KeyError:
              filters[hash_prefix] = BloomFilter().add(block)
              candidates.append((i, block))
              continue
          if block in bloom:
              candidates.append((i, block))
          else:
              bloom.add(block)
    
    Basically, make one pass through to build a list of statistically likely candidates to evaluate later. Then use full string matching on those candidates to find the real matching sequences.

    That might be helpful here because Bloom filters can say "we might have seen this string before" or "we definitely have not seen it before". That may greatly reduce the amount of storage you need.

  • OK cool. But...

    Finding random numbers repeating is a simple brute force problem. It is cool, but slightly boring.

    Related to this, what I cannot wrap my head across, is the Infinite monkey theorem. Is it not possible that we keep expanding the numbers and never reach a complex enough set of values?

  • The binary representation of Pi contains both the MIT license and the GPL. I'm confused.

  • Extending that to all known digits would be a fun technical challenge. I once wrote a code to calculate positions where k subsequent decimal digits equal to the position index itself [1], but I had no storage to spare so I instead used a stream of digits from the public GCP dataset [2]. You would eventually want to try this approach if you are like me (i.e. don't want to pay for two-digit-TB storage).

    [1] https://github.com/lifthrasiir/remote-pi-reader/

    [2] https://storage.googleapis.com/pi100t/index.html (used to be `pi50t` back then)

  • This can easily be optimized to use less memory, since pi is thankfully random.

    Do N passes over the data, and at each pass only put in your hashmap the values that mod N equal the current iteration. If you take N~100, the runtime would probably be in the same ballpark, since the only thing that increase is streaming all the data N times. With a fast SSD, that's not that much.

  • Is there a research paper is this?

    Seriously, I find the most interesting works self-published in people’s personal blogs and I often wonder if there’s a paper in there and why it didn’t make it into a paper.

    Maybe there’s a paper in finding the largest repeating sequence in Pi. There have certainly been more niche papers than that.

  • Assuming that pi has an infinite number of digits, then the answer to "What is the longest known sequence that repeats in Pi?" is "The longest known sequence of Pi".

  • Breaking news: As of this writing, OP is 23 hours old. OEIS has already been updated with the new terms OP discovered.

  • Can you do something fancy with FFTs and convolution here?

    I _think_ you can but my brain doesn't have the bandwidth to think past that.

  • Isn't the entire known sequence of pi also known to repeat due to its nature?

  • The right algorithm for this problem starts with a linear time suffix sort.