python - Why factorization of products of close primes is much slower than products of dissimilar primes - Stack Overflow

This is a purely academic question without any practical consideration. This is not homework, I dropped

This is a purely academic question without any practical consideration. This is not homework, I dropped out of high school long ago. I am just curious, and I can't sleep well without knowing why.

I was messing around with Python. I decided to factorize big integers and measure the runtime of calls for each input.

I used a bunch of numbers and found that some numbers take much longer to factorize than others.

I then decided to investigate further, I quickly wrote a prime sieve function to generate primes for testing. I found out that a product of a pair of moderately large primes (two four-digit primes) take much longer to be factorized than a product of one very large prime (six-digits+) and a small prime (<=three-digits).

At first I thought my first simple function for testing is inefficient, that is indeed the case, so I wrote a second function that pulled primes directly from pre-generated list of primes, the second function was indeed more efficient, but strangely it exhibits the same pattern.

Here are some numbers that I used:

13717421 == 3607 * 3803
13189903 == 3593 * 3671
56267023 == 7187 * 7829
65415743 == 8087 * 8089

12345679 == 37 * 333667
38760793 == 37 * 1047589
158202851 == 151 * 1047701
762312571 == 727 * 1048573

Code:

import numpy as np
from itertools import cycle

def factorize(n):
    factors = []
    while not n % 2:
        factors.append(2)
        n //= 2

    i = 3
    while i**2 <= n:
        while not n % i:
            factors.append(i)
            n //= i
        i += 2
    
    return factors if n == 1 else factors + [n]

TRIPLE = ((4, 2), (9, 6), (25, 10))
WHEEL = ( 4, 2, 4, 2, 4, 6, 2, 6 )
def prime_sieve(n):
    primes = np.ones(n + 1, dtype=bool)
    primes[:2] = False
    for square, double in TRIPLE:
        primes[square::double] = False
    
    wheel = cycle(WHEEL)
    k = 7
    while (square := k**2) <= n:
        if primes[k]:
            primes[square::2*k] = False
        
        k += next(wheel)
    
    return np.flatnonzero(primes)
    
PRIMES = list(map(int, prime_sieve(1048576)))
TEST_LIMIT = PRIMES[-1] ** 2

def factorize_sieve(n):
    if n > TEST_LIMIT:
        raise ValueError('Number too large')

    factors = []
    for p in PRIMES:
        if p**2 > n:
            break
        while not n % p:
            factors.append(p)
            n //= p
    
    return factors if n == 1 else factors + [n]

Test result:

In [2]: %timeit factorize(13717421)
279 μs ± 4.29 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [3]: %timeit factorize(12345679)
39.6 μs ± 749 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit factorize_sieve(13717421)
64.1 μs ± 688 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit factorize_sieve(12345679)
12.6 μs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: %timeit factorize_sieve(13189903)
64.6 μs ± 964 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit factorize_sieve(56267023)
117 μs ± 3.88 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: %timeit factorize_sieve(65415743)
130 μs ± 1.38 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit factorize_sieve(38760793)
21.1 μs ± 232 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [10]: %timeit factorize_sieve(158202851)
21.4 μs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [11]: %timeit factorize_sieve(762312571)
22.1 μs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

As you can clearly see, factorization of two medium primes on average takes much longer than two extremes. Why is it this case?

This is a purely academic question without any practical consideration. This is not homework, I dropped out of high school long ago. I am just curious, and I can't sleep well without knowing why.

I was messing around with Python. I decided to factorize big integers and measure the runtime of calls for each input.

I used a bunch of numbers and found that some numbers take much longer to factorize than others.

I then decided to investigate further, I quickly wrote a prime sieve function to generate primes for testing. I found out that a product of a pair of moderately large primes (two four-digit primes) take much longer to be factorized than a product of one very large prime (six-digits+) and a small prime (<=three-digits).

At first I thought my first simple function for testing is inefficient, that is indeed the case, so I wrote a second function that pulled primes directly from pre-generated list of primes, the second function was indeed more efficient, but strangely it exhibits the same pattern.

Here are some numbers that I used:

13717421 == 3607 * 3803
13189903 == 3593 * 3671
56267023 == 7187 * 7829
65415743 == 8087 * 8089

12345679 == 37 * 333667
38760793 == 37 * 1047589
158202851 == 151 * 1047701
762312571 == 727 * 1048573

Code:

import numpy as np
from itertools import cycle

def factorize(n):
    factors = []
    while not n % 2:
        factors.append(2)
        n //= 2

    i = 3
    while i**2 <= n:
        while not n % i:
            factors.append(i)
            n //= i
        i += 2
    
    return factors if n == 1 else factors + [n]

TRIPLE = ((4, 2), (9, 6), (25, 10))
WHEEL = ( 4, 2, 4, 2, 4, 6, 2, 6 )
def prime_sieve(n):
    primes = np.ones(n + 1, dtype=bool)
    primes[:2] = False
    for square, double in TRIPLE:
        primes[square::double] = False
    
    wheel = cycle(WHEEL)
    k = 7
    while (square := k**2) <= n:
        if primes[k]:
            primes[square::2*k] = False
        
        k += next(wheel)
    
    return np.flatnonzero(primes)
    
PRIMES = list(map(int, prime_sieve(1048576)))
TEST_LIMIT = PRIMES[-1] ** 2

def factorize_sieve(n):
    if n > TEST_LIMIT:
        raise ValueError('Number too large')

    factors = []
    for p in PRIMES:
        if p**2 > n:
            break
        while not n % p:
            factors.append(p)
            n //= p
    
    return factors if n == 1 else factors + [n]

Test result:

In [2]: %timeit factorize(13717421)
279 μs ± 4.29 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [3]: %timeit factorize(12345679)
39.6 μs ± 749 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit factorize_sieve(13717421)
64.1 μs ± 688 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit factorize_sieve(12345679)
12.6 μs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: %timeit factorize_sieve(13189903)
64.6 μs ± 964 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit factorize_sieve(56267023)
117 μs ± 3.88 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: %timeit factorize_sieve(65415743)
130 μs ± 1.38 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit factorize_sieve(38760793)
21.1 μs ± 232 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [10]: %timeit factorize_sieve(158202851)
21.4 μs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [11]: %timeit factorize_sieve(762312571)
22.1 μs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

As you can clearly see, factorization of two medium primes on average takes much longer than two extremes. Why is it this case?

Share Improve this question edited Jan 17 at 18:10 Ξένη Γήινος asked Jan 17 at 17:58 Ξένη ΓήινοςΞένη Γήινος 3,8101 gold badge18 silver badges60 bronze badges 4
  • Taken to the extreme, would you not expect a number with 2 and some big prime to be discovered almost immediately? – JonSG Commented Jan 17 at 21:42
  • @JonSG I wouldn't. Would you? – no comment Commented Jan 17 at 22:41
  • @nocomment If I'm factoring a number that i know is a factor of two primes like a 1223.........222 then yes, I know immediately via inspection or by testing the first prime number 2 to discover the pair. If the number is odd and with factors close to root n then I have to try comparatively many failures. It follows the the lower the min factor of the prime pair is, the faster we would expect to uncover the pair. Why else would we use huge relatively prime pairs in security rather than (3,5)? – JonSG Commented Jan 17 at 23:08
  • @JonSG But their factorize function doesn't know/assume that it's a factor of two primes. After immediately finding factor 2, it continues trying to factorize the big prime. And that takes time. – no comment Commented Jan 17 at 23:41
Add a comment  | 

2 Answers 2

Reset to default 4

I don't recognize, that this question is related to programming.

Your algorithm executes trial divisions (starting with the smallest numbers) and terminates at the square root of the input number, so the worst case is actually trying to factorize the square number of a prime (maximum factor possible). Encountering a low factor speeds up the process, since you immediately divide n and there is a chance the test factor reached when squared already exceeds the reduced n.

Your loop stops when i**2 exceeds n, with all known factors divided out of n. For a squarefree input, this happens once i is bigger than both

  • the second-largest factor of the original input, and
  • the square root of the largest factor of the original input.

This is the point when your code knows it's found all the factors. So for a product of two primes p*q with p<q, your code takes time proportional to max(p, sqrt(q)). This value is smaller for your "lopsided" tests.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745350436a4623799.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信