The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::Prime::Util - Utilities related to prime numbers, including fast sieves and factoring

VERSION

Version 0.26

SYNOPSIS

  # Normally you would just import the functions you are using.
  # Nothing is exported by default.  List the functions, or use :all.
  use Math::Prime::Util ':all';


  # Get a big array reference of many primes
  my $aref = primes( 100_000_000 );

  # All the primes between 5k and 10k inclusive
  my $aref = primes( 5_000, 10_000 );

  # If you want them in an array instead
  my @primes = @{primes( 500 )};


  # For non-bigints, is_prime and is_prob_prime will always be 0 or 2.
  # They return return 0 (composite), 2 (prime), or 1 (probably prime)
  say "$n is prime"  if is_prime($n);
  say "$n is ", (qw(composite maybe_prime? prime))[is_prob_prime($n)];

  # Strong pseudoprime test with multiple bases, using Miller-Rabin
  say "$n is a prime or 2/7/61-psp" if is_strong_pseudoprime($n, 2, 7, 61);

  # Strong Lucas-Selfridge test
  say "$n is a prime or slpsp" if is_strong_lucas_pseudoprime($n);

  # step to the next prime (returns 0 if not using bigints and we'd overflow)
  $n = next_prime($n);

  # step back (returns 0 if given input less than 2)
  $n = prev_prime($n);


  # Return Pi(n) -- the number of primes E<lt>= n.
  $primepi = prime_count( 1_000_000 );
  $primepi = prime_count( 10**14, 10**14+1000 );  # also does ranges

  # Quickly return an approximation to Pi(n)
  my $approx_number_of_primes = prime_count_approx( 10**17 );

  # Lower and upper bounds.  lower <= Pi(n) <= upper for all n
  die unless prime_count_lower($n) <= prime_count($n);
  die unless prime_count_upper($n) >= prime_count($n);


  # Return p_n, the nth prime
  say "The ten thousandth prime is ", nth_prime(10_000);

  # Return a quick approximation to the nth prime
  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

  # Lower and upper bounds.   lower <= nth_prime(n) <= upper for all n
  die unless nth_prime_lower($n) <= nth_prime($n);
  die unless nth_prime_upper($n) >= nth_prime($n);


  # Get the prime factors of a number
  @prime_factors = factor( $n );

  # Get all factors
  @divisors = all_factors( $n );

  # Euler phi (Euler's totient) on a large number
  use bigint;  say euler_phi( 801294088771394680000412 );
  say jordan_totient(5, 1234);  # Jordan's totient

  # Moebius function used to calculate Mertens
  $sum += moebius($_) for (1..200); say "Mertens(200) = $sum";
  # Mertens function directly (more efficient for large values)
  say mertens(10_000_000);

  # Exponential of Mangoldt function
  say "lamba(49) = ", log(exp_mangoldt(49));

  # divisor sum
  $sigma  = divisor_sum( $n );
  $sigma2 = divisor_sum( $n, sub { $_[0]*$_[0] } );

  # primorial n#, primorial p(n)#, and lcm
  say "The product of primes below 47 is ",     primorial(47);
  say "The product of the first 47 primes is ", pn_primorial(47);
  say "lcm(1..1000) is ", consecutive_integer_lcm(1000);

  # Ei, li, and Riemann R functions
  my $ei   = ExponentialIntegral($x);   # $x a real: $x != 0
  my $li   = LogarithmicIntegral($x);   # $x a real: $x >= 0
  my $R    = RiemannR($x)               # $x a real: $x > 0
  my $Zeta = RiemannZeta($x)            # $x a real: $x >= 0


  # Precalculate a sieve, possibly speeding up later work.
  prime_precalc( 1_000_000_000 );

  # Free any memory used by the module.
  prime_memfree;

  # Alternate way to free.  When this leaves scope, memory is freed.
  my $mf = Math::Prime::Util::MemFree->new;


  # Random primes
  my $small_prime = random_prime(1000);      # random prime <= limit
  my $rand_prime = random_prime(100, 10000); # random prime within a range
  my $rand_prime = random_ndigit_prime(6);   # random 6-digit prime
  my $rand_prime = random_nbit_prime(128);   # random 128-bit prime
  my $rand_prime = random_strong_prime(256); # random 256-bit strong prime
  my $rand_prime = random_maurer_prime(256); # random 256-bit provable prime

DESCRIPTION

A set of utilities related to prime numbers. These include multiple sieving methods, is_prime, prime_count, nth_prime, approximations and bounds for the prime_count and nth prime, next_prime and prev_prime, factoring utilities, and more.

The default sieving and factoring are intended to be (and currently are) the fastest on CPAN, including Math::Prime::XS, Math::Prime::FastSieve, Math::Factor::XS, Math::Prime::TiedArray, Math::Big::Factors, Math::Factoring, and Math::Primality (when the GMP module is available). For numbers in the 10-20 digit range, it is often orders of magnitude faster. Typically it is faster than Math::Pari for 64-bit operations.

All operations support both Perl UV's (32-bit or 64-bit) and bignums. It requires no external software for big number support, as there are Perl implementations included that solely use Math::BigInt and Math::BigFloat. However, performance will be improved for most big number functions by installing Math::Prime::Util::GMP, and is definitely recommended if you do many bignum operations. Also look into Math::Pari as an alternative.

The module is thread-safe and allows concurrency between Perl threads while still sharing a prime cache. It is not itself multithreaded. See the Limitations section if you are using Win32 and threads in your program.

Two scripts are also included and installed by default:

  • primes.pl displays primes between start and end values or expressions, with many options for filtering (e.g. twin, safe, circular, good, lucky, etc.). Use --help to see all the options.

  • factor.pl operates similar to the GNU factor program. It supports bigint and expression inputs.

BIGNUM SUPPORT

By default all functions support bignums. With a few exceptions, the module will not turn on bignum support for you -- you will need to use bigint, use bignum, or pass in a Math::BigInt or Math::BigFloat object as your input. The functions take some care to perform all bignum operations using the same class as was passed in, allowing the module to work properly with Calc, FastCalc, GMP, Pari, etc. You should try to install Math::Prime::Util::GMP if you plan to use bigints with this module, as it will make it run much faster.

Some of the functions, including:

  factor
  is_prime
  is_prob_prime
  is_strong_pseudoprime
  next_prime
  prev_prime
  nth_prime
  moebius
  mertens
  euler_phi
  exp_mangoldt

work very fast (under 1 microsecond) on small inputs, but the wrappers for input validation and bigint support take more time than the function itself. Using the flag '-bigint', e.g.:

  use Math::Prime::Util qw(-bigint);

will turn off bigint support for those functions. Those functions will then go directly to the XS versions, which will speed up very small inputs a lot. This is useful if you're using the functions in a loop, but since the difference is less than a millisecond, it's really not important in general (also, a future implementation may find a way to speed this up without the option).

If you are using bigints, here are some performance suggestions:

  • Install Math::Prime::Util::GMP, as that will vastly increase the speed of many of the functions. This does require the GMP library be installed on your system, but this increasingly comes pre-installed or easily available using the OS vendor package installation tool.

  • Install and use Math::BigInt::GMP or Math::BigInt::Pari, then use use bigint try => 'GMP,Pari' in your script, or on the command line -Mbigint=lib,GMP. Large modular exponentiation is much faster using the GMP or Pari backends, as are the math and approximation functions when called with very large inputs.

  • Install Math::MPFR if you use the Ei, li, Zeta, or R functions. If that module can be loaded, these functions will run much faster on bignum inputs, and are able to provide higher accuracy.

  • Having run these functions on many versions of Perl, if you're using anything older than Perl 5.14, I would recommend you upgrade if you are using bignums a lot. There are some brittle behaviors on 5.12.4 and earlier with bignums.

FUNCTIONS

is_prime

  print "$n is prime" if is_prime($n);

Returns 2 if the number is prime, 0 if not. For numbers larger than 2^64 it will return 0 for composite and 1 for probably prime, using a strong BPSW test. If Math::Prime::Util::GMP is installed, some quick primality proofs are run on larger numbers, so will return 2 for many of those also.

Also see the "is_prob_prime" function, which will never do additional tests, and the "is_provable_prime" function which will try very hard to return only 0 or 2 for any input.

For native precision numbers (anything smaller than 2^64, all three functions are identical and use a deterministic set of Miller-Rabin tests. While "is_prob_prime" and "is_prime" return probable prime results for larger numbers, they use the strong Baillie-PSW test, which has had no counterexample found since it was published in 1980 (though certainly they exist).

primes

Returns all the primes between the lower and upper limits (inclusive), with a lower limit of 2 if none is given.

An array reference is returned (with large lists this is much faster and uses less memory than returning an array directly).

  my $aref1 = primes( 1_000_000 );
  my $aref2 = primes( 1_000_000_000_000, 1_000_000_001_000 );

  my @primes = @{ primes( 500 ) };

  print "$_\n" for (@{primes( 20, 100 )});

Sieving will be done if required. The algorithm used will depend on the range and whether a sieve result already exists. Possibilities include trial division (for ranges with only one expected prime), a Sieve of Eratosthenes using wheel factorization, or a segmented sieve.

next_prime

  $n = next_prime($n);

Returns the next prime greater than the input number. If the input is not a bigint, then 0 is returned if the next prime is larger than a native integer type (the last representable primes being 4,294,967,291 in 32-bit Perl and 18,446,744,073,709,551,557 in 64-bit).

prev_prime

  $n = prev_prime($n);

Returns the prime smaller than the input number. 0 is returned if the input is 2 or lower.

prime_count

  my $primepi = prime_count( 1_000 );
  my $pirange = prime_count( 1_000, 10_000 );

Returns the Prime Count function Pi(n), also called primepi in some math packages. When given two arguments, it returns the inclusive count of primes between the ranges (e.g. (13,17) returns 2, 14,17 and 13,16 return 1, and 14,16 returns 0).

The current implementation decides based on the ranges whether to use a segmented sieve with a fast bit count, or Lehmer's algorithm. The former is preferred for small sizes as well as small ranges. The latter is much faster for large ranges.

The segmented sieve is very memory efficient and is quite fast even with large base values. Its complexity is approximately O(sqrt(a) + (b-a)), where the first term is typically negligible below ~ 10^11. Memory use is proportional only to sqrt(a), with total memory use under 1MB for any base under 10^14.

Lehmer's method has complexity approximately O(b^0.7) + O(a^0.7). It does use more memory however. A calculation of Pi(10^14) completes in under 1 minute, Pi(10^15) in under 5 minutes, and Pi(10^16) in under 20 minutes, however using about 500MB of peak memory for the last. In contrast, even primesieve using 12 cores would take over a week on this same computer to determine Pi(10^16).

Also see the function "prime_count_approx" which gives a very good approximation to the prime count, and "prime_count_lower" and "prime_count_upper" which give tight bounds to the actual prime count. These functions return quickly for any input, including bigints.

prime_count_upper

prime_count_lower

  my $lower_limit = prime_count_lower($n);
  my $upper_limit = prime_count_upper($n);
  #   $lower_limit  <=  prime_count(n)  <=  $upper_limit

Returns an upper or lower bound on the number of primes below the input number. These are analytical routines, so will take a fixed amount of time and no memory. The actual prime_count will always be equal to or between these numbers.

A common place these would be used is sizing an array to hold the first $n primes. It may be desirable to use a bit more memory than is necessary, to avoid calling prime_count.

These routines use verified tight limits below a range at least 2^35, and use the Dusart (2010) bounds of

    x/logx * (1 + 1/logx + 2.000/log^2x) <= Pi(x)

    x/logx * (1 + 1/logx + 2.334/log^2x) >= Pi(x)

above that range. These bounds do not assume the Riemann Hypothesis. If the configuration option assume_rh has been set (it is off by default), then the Schoenfeld (1976) bounds are used for large values.

prime_count_approx

  print "there are about ",
        prime_count_approx( 10 ** 18 ),
        " primes below one quintillion.\n";

Returns an approximation to the prime_count function, without having to generate any primes. The current implementation uses the Riemann R function which is quite accurate: an error of less than 0.0005% is typical for input values over 2^32. A slightly faster (0.1ms vs. 1ms) but much less accurate answer can be obtained by averaging the upper and lower bounds.

nth_prime

  say "The ten thousandth prime is ", nth_prime(10_000);

Returns the prime that lies in index n in the array of prime numbers. Put another way, this returns the smallest p such that Pi(p) >= n.

For relatively small inputs (below 2 million or so), this does a sieve over a range containing the nth prime, then counts up to the number. This is fairly efficient in time and memory. For larger values, a binary search is performed between the Dusart 2010 bounds using Riemann's R function, then Lehmer's fast prime counting method is used to calculate the count up to that point, then sieving is done in the typically small error zone.

While this method is hundreds of times faster than generating primes, and doesn't involve big tables of precomputed values, it still can take a fair amount of time and space for large inputs. Calculating the 10^11th prime takes a bit under 2 seconds, the 10^12th prime takes 10 seconds, and the 10^13th prime (323780508946331) takes 1 minute. Think about whether a bound or approximation would be acceptable, as they can be computed analytically.

If the bigint or bignum module is not in use, this will generate an overflow exception if the number requested would result in a prime that cannot fit in a native type. If bigints are in use, then the calculation will proceed, though it will be exceedingly slow. A later version of Math::Prime::Util::GMP may include this functionality which would help for 32-bit machines.

nth_prime_upper

nth_prime_lower

  my $lower_limit = nth_prime_lower($n);
  my $upper_limit = nth_prime_upper($n);
  #   $lower_limit  <=  nth_prime(n)  <=  $upper_limit

Returns an analytical upper or lower bound on the Nth prime. These are very fast as they do not need to sieve or search through primes or tables. An exact answer is returned for tiny values of n. The lower limit uses the Dusart 2010 bound for all n, while the upper bound uses one of the two Dusart 2010 bounds for n >= 178974, a Dusart 1999 bound for n >= 39017, and a simple bound of n * (logn + 0.6 * loglogn) for small n.

nth_prime_approx

  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

Returns an approximation to the nth_prime function, without having to generate any primes. Uses the Cipolla 1902 approximation with two polynomials, plus a correction for small values to reduce the error.

is_strong_pseudoprime

  my $maybe_prime = is_strong_pseudoprime($n, 2);
  my $probably_prime = is_strong_pseudoprime($n, 2, 3, 5, 7, 11, 13, 17);

Takes a positive number as input and one or more bases. The bases must be greater than 1. Returns 1 if the input is a prime or a strong pseudoprime to all of the bases, and 0 if not.

If 0 is returned, then the number really is a composite. If 1 is returned, then it is either a prime or a strong pseudoprime to all the given bases. Given enough distinct bases, the chances become very, very strong that the number is actually prime.

This is usually used in combination with other tests to make either stronger tests (e.g. the strong BPSW test) or deterministic results for numbers less than some verified limit (e.g. it has long been known that no more than three selected bases are required to give correct primality test results for any 32-bit number). Given the small chances of passing multiple bases, there are some math packages that just use multiple MR tests for primality testing.

Even numbers other than 2 will always return 0 (composite). While the algorithm does run with even input, most sources define it only on odd input. Returning composite for all non-2 even input makes the function match most other implementations including Math::Primality's is_strong_pseudoprime function.

miller_rabin

An alias for is_strong_pseudoprime. This name is being deprecated.

is_strong_lucas_pseudoprime

Takes a positive number as input, and returns 1 if the input is a strong Lucas pseudoprime using the Selfridge method of choosing D, P, and Q (some sources call this a strong Lucas-Selfridge pseudoprime). This is one half of the BPSW primality test (the Miller-Rabin strong pseudoprime test with base 2 being the other half).

is_prob_prime

  my $prob_prime = is_prob_prime($n);
  # Returns 0 (composite), 2 (prime), or 1 (probably prime)

Takes a positive number as input and returns back either 0 (composite), 2 (definitely prime), or 1 (probably prime).

For 64-bit input (native or bignum), this uses a tuned set of Miller-Rabin tests such that the result will be deterministic. Either 2, 3, 4, 5, or 7 Miller-Rabin tests are performed (no more than 3 for 32-bit input), and the result will then always be 0 (composite) or 2 (prime). A later implementation may change the internals, but the results will be identical.

For inputs larger than 2^64, a strong Baillie-PSW primality test is performed (aka BPSW or BSW). This is a probabilistic test, so only 0 (composite) and 1 (probably prime) are returned. There is a possibility that composites may be returned marked prime, but since the test was published in 1980, not a single BPSW pseudoprime has been found, so it is extremely likely to be prime. While we believe (Pomerance 1984) that an infinite number of counterexamples exist, there is a weak conjecture (Martin) that none exist under 10000 digits.

is_provable_prime

  say "$n is definitely prime" if is_provable_prime($n) == 2;

Takes a positive number as input and returns back either 0 (composite), 2 (definitely prime), or 1 (probably prime). This gives it the same return values as "is_prime" and "is_prob_prime".

The current implementation of both the Perl and GMP proofs is using theorem 5 of BLS75 (Brillhart-Lehmer-Selfridge), requiring n-1 to be factored to (n/2)^(1/3)). This takes less time than factoring to n^0.5 as required by the generalized Pocklington test or n-1 for the Lucas test. However it is possible a factor cannot be found in a reasonable amount of time, so you should always test that the result in 2 to ensure it was proven.

A later implementation will use an ECPP test for larger inputs.

prime_certificate

  my @cert = prime_certificate($n);
  say verify_prime(@cert) ? "proven prime" : "not prime";

Given a positive integer n as input, returns either an empty array (we could not prove n prime) or an array representing a certificate of primality. This may be examined or given to "verify_prime" for verification. The latter function contains the description of the format.

is_provable_prime_with_cert

Given a positive integer as input, returns a two element array containing the result of "is_provable_prime" and an array reference containing the primality certificate like "prime_certificate". The certificate will be an empty array reference if the result is not 2 (definitely prime).

verify_prime

  my @cert = prime_certificate($n);
  say verify_prime(@cert) ? "proven prime" : "not prime";

Given an array representing a certificate of primality, returns either 0 (not verified), or 1 (verified). The computations are all done using pure Perl Math::BigInt and should not be time consuming (the Pari or GMP backends will help with large inputs).

A certificate is an array holding an n-cert. An n-cert is one of:

  n
       implies n,"BPSW"

  n,"BPSW"
       the number n is small enough to be proven with BPSW.  This
       currently means smaller than 2^64.

  n,"Pratt",[n-cert, ...],a
       A Pratt certificate.  We are given n, the method "Pratt" or
       "Lucas", a list of n-certs that indicate all the unique factors
       of n-1, and an 'a' value to be used in the Lucas primality test.
       The certificate passes if:
         1 all factor n-certs can be verified
         2 all n-certs are factors of n-1 and none are missing
         3 a is coprime to n
         4 a^(n-1) = 1 mod n
         5 a^((n-1)/f) != 1 mod n for each factor

  n,"n-1",[n-cert, ...],[a,...]
       An n-1 certificate suitable for the generalized Pocklington or the
       BLS75 (Brillhart-Lehmer-Selfridge 1975, theorem 5) test.  The
       proof is performed using BLS75 theorem 5 which requires n-1 to be
       factored up to (n/2)^1/3.  If n-1 is factored to more than
       sqrt(n), then the conditions are identical to the generalized
       Pocklington test.
       The certificate passes if:
         1 all factor n-certs can be verified
         2 all factor n-certs are factors of n-1
         3 there must be a corresponding 'a' for each factor n-cert
         4 given A (the factored part of n-1), B = (n-1)/A (the
           unfactored part), s = int(B/(2A)), r = B-s*2A:
             - n < (A+1)(2*A*A+(r-a)A+a)    [ n-1 factored to (n/2)^1/3 ]
             - s = 0 or r*r-8s not a perfect square
             - A and B are coprime
         5 for each pair (f,a) representing a factor n-cert and its 'a':
             - a^(n-1) = 1 mod n
             - gcd( a^((n-1)/f)-1, n ) = 1

  n,"AGKM",[ec-block],[ec-block],...
       An Elliptic Curve certificate.  We are given n, the method "AGKM"
       or "ECPP", and one or more 6-element blocks representing a
       standard ECPP or Atkin-Goldwasser-Kilian-Morain certificate.
       In its traditional form, it is non-recursive, with each q value
       being proved by successive blocks (this makes it easy to use for
       programs like Sage and GMP-ECPP).  A q value is also allowed to
       be an n-cert, which allows an alternative proof for the last q.
       Every ec-block has 6 elements:
         N   the N value this block proves prime if q is prime
         a   value describing the elliptic curve to be used
         b   value describing the elliptic curve to be used
         m   order of the curve
         q   a probable prime > (N^1/4+1)^2 (may be an n-cert)
         P   a point [x,y] on the curve (affine coordinates)
       The certificate passes if:
         - the final q can be proved with BPSW.
         - for each block:
             - N is the same as the preceeding block's q
             - N is not divisible by 2 or 3
             - gcd( 4a^3 + 27b^2, N ) == 1;
             - q > (N^1/4+1)^2
             - U = (m/q)P is not the point at infinity
             - V = qU is the point at infinity

is_aks_prime

  say "$n is definitely prime" if is_aks_prime($n);

Takes a positive number as input, and returns 1 if the input passes the Agrawal-Kayal-Saxena (AKS) primality test. This is a deterministic unconditional primality test which runs in polynomial time for general input.

This function is only included for completeness and as an example. While the implementation is fast compared to the only other Perl implementation available (in Math::Primality), it is slow compared to others. However, even optimized AKS implementations are far slower than ECPP or other modern primality tests.

moebius

  say "$n is square free" if moebius($n) != 0;
  $sum += moebius($_) for (1..200); say "Mertens(200) = $sum";

Returns μ(n), the Möbius function (also called the Moebius, Mobius, or MoebiusMu function) for a non-negative integer input. This function is 1 if n = 1, 0 if n is not square free (i.e. n has a repeated factor), and -1^t if n is a product of t distinct primes. This is an important function in prime number theory. Like SAGE, we define moebius(0) = 0 for convenience.

If called with two arguments, they define a range low to high, and the function returns an array with the value of the Möbius function for every n from low to high inclusive. Large values of high will result in a lot of memory use. The algorithm used is Deléglise and Rivat (1996) algorithm 4.1, which is a segmented version of Lioen and van de Lune (1994) algorithm 3.2.

mertens

  say "Mertens(10M) = ", mertens(10_000_000);   # = 1037

Returns M(n), the Mertens function for a non-negative integer input. This function is defined as sum(moebius(1..n)), but calculated more efficiently for large inputs. For example, computing Mertens(100M) takes:

   time    approx mem
     0.4s      0.1MB   mertens(100_000_000)
    74.8s   7000MB     List::Util::sum(moebius(1,100_000_000))
    88.5s      0MB     $sum += moebius($_) for 1..100_000_000   [-nobigint]
   181.8s      0MB     $sum += moebius($_) for 1..100_000_000

The summation of individual terms via factoring is quite expensive in time, though uses O(1) space. This function will generate the equivalent output via a sieving method, which will use some more memory, but be much faster. The current method is a simple n^1/2 version of Deléglise and Rivat (1996), which involves calculating all moebius values to n^1/2, which in turn will require prime sieving to n^1/4.

Various algorithms exist for this, using differing quantities of μ(n). The simplest way is to efficiently sum all n values. Benito and Varona (2008) show a clever and simple method that only requires n/3 values. Deléglise and Rivat (1996) describe a segmented method using only n^1/3 values. The current implementation does a simple non-segmented n^1/2 version of their method. Kuznetsov (2011) gives an alternate method that he indicates is even faster. Lastly, one of the advanced prime count algorithms could be theoretically used to create a faster solution.

euler_phi

  say "The Euler totient of $n is ", euler_phi($n);

Returns φ(n), the Euler totient function (also called Euler's phi or phi function) for an integer value. This is an arithmetic function that counts the number of positive integers less than or equal to n that are relatively prime to n. Given the definition used, euler_phi will return 0 for all n < 1. This follows the logic used by SAGE. Mathematic/WolframAlpha also returns 0 for input 0, but returns euler_phi(-n) for n < 0.

If called with two arguments, they define a range low to high, and the function returns an array with the totient of every n from low to high inclusive. Large values of high will result in a lot of memory use.

jordan_totient

  say "Jordan's totient J_$k($n) is ", jordan_totient($k, $n);

Returns Jordan's totient function for a given integer value. Jordan's totient is a generalization of Euler's totient, where jordan_totient(1,$n) == euler_totient($n) This counts the number of k-tuples less than or equal to n that form a coprime tuple with n. As with euler_phi, 0 is returned for all n < 1. This function can be used to generate some other useful functions, such as the Dedikind psi function, where psi(n) = J(2,n) / J(1,n).

exp_mangoldt

  say "exp(lambda($_)) = ", exp_mangoldt($_) for 1 .. 100;

Returns exp(Λ(n)), the exponential of the Mangoldt function (also known as von Mangoldt's function) for an integer value. It is equal to log p if n is prime or a power of a prime, and 0 otherwise. We return the exponential so all results are integers. Hence the return value for exp_mangoldt is:

   p   if n = p^m for some prime p and integer m >= 1
   1   otherwise.

chebyshev_theta

  say chebyshev_theta(10000);

Returns θ(n), the first Chebyshev function for a non-negative integer input. This is the sum of the logarithm of each prime where p <= n. An alternate computation is as the logarithm of n primorial. Hence these functions:

  use List::Util qw/sum/;  use Math::BigFloat;

  sub c1a { 0+sum( map { log($_) } @{primes(shift)} ) }
  sub c1b { Math::BigFloat->new(primorial(shift))->blog }

yield similar results, albeit slower and using more memory.

chebyshev_psi

  say chebyshev_psi(10000);

Returns ψ(n), the second Chebyshev function for a non-negative integer input. This is the sum of the logarithm of each prime where p^k <= n for an integer k. An alternate computation is as the summatory Mangoldt function. Another alternate computation is as the logarithm of lcm(1,2,...,n). Hence these functions:

  use List::Util qw/sum/;  use Math::BigFloat;

  sub c2a { 0+sum( map { log(exp_mangoldt($_)) } 1 .. shift ) }
  sub c2b { Math::BigFloat->new(consecutive_integer_lcm(shift))->blog }

yield similar results, albeit slower and using more memory.

divisor_sum

  say "Sum of divisors of $n:", divisor_sum( $n );

This function takes a positive integer as input and returns the sum of all the divisors of the input, including 1 and itself. This is known as the sigma function (see Hardy and Wright section 16.7, or OEIS A000203).

The more general form takes a code reference as a second parameter, which is applied to each divisor before the summation. This allows computation of numerous functions such as OEIS A000005 [d(n), sigma_0(n), tau(n)]:

  divisor_sum( $n, sub { 1 } );

OEIS A001157 [sigma_2(n)]:

  divisor_sum( $n, sub { $_[0]*$_[0] } )

the general sigma_k (OEIS A00005, A000203, A001157, A001158, etc.):

  divisor_sum( $n, sub { $_[0] ** $k } );

the 5th Jordan totient (OEIS A059378):

  divisor_sum( $n, sub { my $d=shift; $d**5 * moebius($n/$d); } );

though in the last case we have a function "jordan_totient" to compute it more efficiently.

This function is useful for calculating things like aliquot sums, abundant numbers, perfect numbers, etc.

The summation is done as a bigint if the input was a bigint object. You may need to ensure the result of the subroutine does not overflow a native int.

primorial

  $prim = primorial(11); #        11# = 2*3*5*7*11 = 2310

Returns the primorial n# of the positive integer input, defined as the product of the prime numbers less than or equal to n. This is the OEIS series A034386: primorial numbers second definition.

  primorial(0)  == 1
  primorial($n) == pn_primorial( prime_count($n) )

The result will be a Math::BigInt object if it is larger than the native bit size.

Be careful about which version (primorial or pn_primorial) matches the definition you want to use. Not all sources agree on the terminology, though they should give a clear definition of which of the two versions they mean. OEIS, Wikipedia, and Mathworld are all consistent, and these functions should match that terminology.

pn_primorial

  $prim = pn_primorial(5); #      p_5# = 2*3*5*7*11 = 2310

Returns the primorial number p_n# of the positive integer input, defined as the product of the first n prime numbers (compare to the factorial, which is the product of the first n natural numbers). This is the OEIS series A002110: primorial numbers first definition.

  pn_primorial(0)  == 1
  pn_primorial($n) == primorial( nth_prime($n) )

The result will be a Math::BigInt object if it is larger than the native bit size.

consecutive_integer_lcm

  $lcm = consecutive_integer_lcm($n);

Given an unsigned integer argument, returns the least common multiple of all integers from 1 to n. This can be done by manipulation of the primes up to n, resulting in much faster and memory-friendly results than using a factorial.

random_prime

  my $small_prime = random_prime(1000);      # random prime <= limit
  my $rand_prime = random_prime(100, 10000); # random prime within a range

Returns a pseudo-randomly selected prime that will be greater than or equal to the lower limit and less than or equal to the upper limit. If no lower limit is given, 2 is implied. Returns undef if no primes exist within the range.

The goal is to return a uniform distribution of the primes in the range, meaning for each prime in the range, the chances are equally likely that it will be seen. This is removes from consideration such algorithms as PRIMEINC, which although efficient, gives very non-random output.

For small numbers, a random index selection is done, which gives ideal uniformity and is very efficient with small inputs. For ranges larger than this ~16-bit threshold but within the native bit size, a Monte Carlo method is used (multiple calls to rand may be made if necessary). This also gives ideal uniformity and can be very fast for reasonably sized ranges. For even larger numbers, we partition the range, choose a random partition, then select a random prime from the partition. This gives some loss of uniformity but results in many fewer bits of randomness being consumed as well as being much faster.

If an irand function has been set via "prime_set_config", it will be used to construct any ranged random numbers needed. The function should return a uniformly random 32-bit integer, which is how the irand functions exported by Math::Random::Secure, Math::Random::MT, Math::Random::ISAAC and most other modules behave.

If no irand function was set, then Bytes::Random::Secure is used with a non-blocking seed. This will create good quality random numbers, so there should be little reason to change unless one is generating long-term keys, where using the blocking random source may be preferred.

Examples of various ways to set your own irand function:

  # Math::Random::Secure.  Uses ISAAC and strong seed methods.
  use Math::Random::Secure;
  prime_set_config(irand => \&Math::Random::Secure::irand);

  # Bytes::Random::Secure (OO interface with full control of options):
  use Bytes::Random::Secure ();
  BEGIN {
    my $rng = Bytes::Random::Secure->new( Bits => 512 );
    sub irand { return $rng->irand; }
  }
  prime_set_config(irand => \&irand);

  # Crypt::Random.  Uses Pari and /dev/random.  Very slow.
  use Crypt::Random qw/makerandom/;
  prime_set_config(irand => sub { makerandom(Size=>32, Uniform=>1); });

  # Mersenne Twister.  Very fast, decent RNG, auto seeding.
  use Math::Random::MT::Auto;
  prime_set_config(irand=>sub {Math::Random::MT::Auto::irand() & 0xFFFFFFFF});

random_ndigit_prime

  say "My 4-digit prime number is: ", random_ndigit_prime(4);

Selects a random n-digit prime, where the input is an integer number of digits between 1 and the maximum native type (10 for 32-bit, 20 for 64-bit, 10000 if bigint is active). One of the primes within that range (e.g. 1000 - 9999 for 4-digits) will be uniformly selected using the irand function as described above.

If the number of digits is greater than or equal to the maximum native type, then the result will be returned as a BigInt. However, if the '-nobigint' tag was used, then numbers larger than the threshold will be flagged as an error, and numbers on the threshold will be restricted to native numbers. For better performance with large bit sizes, install Math::Prime::Util::GMP.

random_nbit_prime

  my $bigprime = random_nbit_prime(512);

Selects a random n-bit prime, where the input is an integer number of bits between 2 and the maximum representable bits (32, 64, or 100000 for native 32-bit, native 64-bit, and bigint respectively). A prime with the nth bit set will be uniformly selected, with randomness supplied via calls to the irand function as described above.

Since this uses the random_prime function, all uniformity properties of that function apply to this. The n-bit range is partitioned into nearly equal segments less than 2^32, a segment is randomly selected, then the trivial Monte Carlo algorithm is used to select a prime from within the segment. This gives a reasonably uniform distribution, doesn't use excessive random source, and can be very fast.

The result will be a BigInt if the number of bits is greater than the native bit size. For better performance with large bit sizes, install Math::Prime::Util::GMP.

random_strong_prime

  my $bigprime = random_strong_prime(512);

Constructs an n-bit strong prime using Gordon's algorithm. We consider a strong prime p to be one where

  • p is large. This function requires at least 128 bits.

  • p-1 has a large prime factor r.

  • p+1 has a large prime factor s

  • r-1 has a large prime factor t

Using a strong prime in cryptography guards against easy factoring with algorithms like Pollard's Rho. Rivest and Silverman (1999) present a case that using strong primes is unnecessary, and most modern cryptographic systems agree. First, the smoothness does not affect more modern factoring methods such as ECM. Second, modern factoring methods like GNFS are far faster than either method so make the point moot. Third, due to key size growth and advances in factoring and attacks, for practical purposes, using large random primes offer security equivalent to using strong primes.

Similar to "random_nbit_prime", the result will be a BigInt if the number of bits is greater than the native bit size. For better performance with large bit sizes, install Math::Prime::Util::GMP.

random_maurer_prime

  my $bigprime = random_maurer_prime(512);

Construct an n-bit provable prime, using the FastPrime algorithm of Ueli Maurer (1995). This is the same algorithm used by Crypt::Primes. Similar to "random_nbit_prime", the result will be a BigInt if the number of bits is greater than the native bit size. For better performance with large bit sizes, install Math::Prime::Util::GMP.

The differences between this function and that in Crypt::Primes are described in the "SEE ALSO" section.

Internally this additionally runs the BPSW probable prime test on every partial result, and constructs a primality certificate for the final result, which is verified. These add additional checks that the resulting value has been properly constructed.

random_maurer_prime_with_cert

  my($n, $cert_ref) = random_maurer_prime_with_cert(512)

As with "random_maurer_prime", but returns a two-element array containing the n-bit provable prime along with a primality certificate. The certificate is the same as produced by "prime_certificate" or "is_provable_prime_with_cert", and can be parsed by "verify_prime" or any other software that can parse the certificate (the "n-1" form is described in detail in "verify_prime").

UTILITY FUNCTIONS

prime_precalc

  prime_precalc( 1_000_000_000 );

Let the module prepare for fast operation up to a specific number. It is not necessary to call this, but it gives you more control over when memory is allocated and gives faster results for multiple calls in some cases. In the current implementation this will calculate a sieve for all numbers up to the specified number.

prime_memfree

  prime_memfree;

Frees any extra memory the module may have allocated. Like with prime_precalc, it is not necessary to call this, but if you're done making calls, or want things cleanup up, you can use this. The object method might be a better choice for complicated uses.

Math::Prime::Util::MemFree->new

  my $mf = Math::Prime::Util::MemFree->new;
  # perform operations.  When $mf goes out of scope, memory will be recovered.

This is a more robust way of making sure any cached memory is freed, as it will be handled by the last MemFree object leaving scope. This means if your routines were inside an eval that died, things will still get cleaned up. If you call another function that uses a MemFree object, the cache will stay in place because you still have an object.

prime_get_config

  my $cached_up_to = prime_get_config->{'precalc_to'};

Returns a reference to a hash of the current settings. The hash is copy of the configuration, so changing it has no effect. The settings include:

  precalc_to      primes up to this number are calculated
  maxbits         the maximum number of bits for native operations
  xs              0 or 1, indicating the XS code is available
  gmp             0 or 1, indicating GMP code is available
  maxparam        the largest value for most functions, without bigint
  maxdigits       the max digits in a number, without bigint
  maxprime        the largest representable prime, without bigint
  maxprimeidx     the index of maxprime, without bigint
  assume_rh       whether to assume the Riemann hypothesis (default 0)

prime_set_config

  prime_set_config( assume_rh => 1 );

Allows setting of some parameters. Currently the only parameters are:

  xs              Allows turning off the XS code, forcing the Pure Perl
                  code to be used.  Set to 0 to disable XS, set to 1 to
                  re-enable.  You probably will never want to do this.

  gmp             Allows turning off the use of L<Math::Prime::Util::GMP>,
                  which means using Pure Perl code for big numbers.  Set
                  to 0 to disable GMP, set to 1 to re-enable.
                  You probably will never want to do this.

  assume_rh       Allows functions to assume the Riemann hypothesis is
                  true if set to 1.  This defaults to 0.  Currently this
                  setting only impacts prime count lower and upper
                  bounds, but could later be applied to other areas such
                  as primality testing.  A later version may also have a
                  way to indicate whether no RH, RH, GRH, or ERH is to
                  be assumed.

  irand           Takes a code ref to an irand function returning a
                  uniform number between 0 and 2**32-1.  This will be
                  used for all random number generation in the module.

FACTORING FUNCTIONS

factor

  my @factors = factor(3_369_738_766_071_892_021);
  # returns (204518747,16476429743)

Produces the prime factors of a positive number input, in numerical order. The special cases of n = 0 and n = 1 will return n, which guarantees multiplying the factors together will always result in the input value, though those are the only cases where the returned factors are not prime.

The current algorithm for non-bigints is a sequence of small trial division, a few rounds of Pollard's Rho, SQUFOF, Pollard's p-1, Hart's OLF, a long run of Pollard's Rho, and finally trial division if anything survives. This process is repeated for each non-prime factor. In practice, it is very rare to require more than the first Rho + SQUFOF to find a factor, and I have not seen anything go to the last step.

Factoring bigints works with pure Perl, and can be very handy on 32-bit machines for numbers just over the 32-bit limit, but it can be very slow for "hard" numbers. Installing the Math::Prime::Util::GMP module will speed up bigint factoring a lot, and all future effort on large number factoring will be in that module. If you do not have that module for some reason, use the GMP or Pari version of bigint if possible (e.g. use bigint try => 'GMP,Pari'), which will run 2-3x faster (though still 100x slower than the real GMP code).

all_factors

  my @divisors = all_factors(30);   # returns (2, 3, 5, 6, 10, 15)

Produces all the divisors of a positive number input. 1 and the input number are excluded (which implies that an empty list is returned for any prime number input). The divisors are a power set of multiplications of the prime factors, returned as a uniqued sorted list.

trial_factor

  my @factors = trial_factor($n);

Produces the prime factors of a positive number input. The factors will be in numerical order. The special cases of n = 0 and n = 1 will return n, while with all other inputs the factors are guaranteed to be prime. For large inputs this will be very slow.

fermat_factor

  my @factors = fermat_factor($n);

Produces factors, not necessarily prime, of the positive number input. The particular algorithm is Knuth's algorithm C. For small inputs this will be very fast, but it slows down quite rapidly as the number of digits increases. It is very fast for inputs with a factor close to the midpoint (e.g. a semiprime p*q where p and q are the same number of digits).

holf_factor

  my @factors = holf_factor($n);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This uses Hart's One Line Factorization with no premultiplier. It is an interesting alternative to Fermat's algorithm, and there are some inputs it can rapidly factor. In the long run it has the same advantages and disadvantages as Fermat's method.

squfof_factor

rsqufof_factor

  my @factors = squfof_factor($n);
  my @factors = rsqufof_factor($n);  # racing multiplier version

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This function typically runs very fast.

prho_factor

pbrent_factor

  my @factors = prho_factor($n);
  my @factors = pbrent_factor($n);

  # Use a very small number of rounds
  my @factors = prho_factor($n, 1000);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. These attempt to find a single factor using Pollard's Rho algorithm, either the original version or Brent's modified version. These are more specialized algorithms usually used for pre-factoring very large inputs, as they are very fast at finding small factors.

pminus1_factor

  my @factors = pminus1_factor($n);
  my @factors = pminus1_factor($n, 1_000);          # set B1 smoothness
  my @factors = pminus1_factor($n, 1_000, 50_000);  # set B1 and B2

Produces factors, not necessarily prime, of the positive number input. This is Pollard's p-1 method, using two stages with default smoothness settings of 1_000_000 for B1, and 10 * B1 for B2. This method can rapidly find a factor p of n where p-1 is smooth (it has no large factors).

MATHEMATICAL FUNCTIONS

ExponentialIntegral

  my $Ei = ExponentialIntegral($x);

Given a non-zero floating point input x, this returns the real-valued exponential integral of x, defined as the integral of e^t/t dt from -infinity to x.

If the bignum module has been loaded, all inputs will be treated as if they were Math::BigFloat objects.

For non-BigInt/BigFloat objects, the result should be accurate to at least 14 digits.

For BigInt / BigFloat objects, we first check to see if Math::MPFR is available. If so, then it is used since it is very fast and has high accuracy. Accuracy when using MPFR will be equal to the accuracy() value of the input (or the default BigFloat accuracy, which is 40 by default).

MPFR is used for positive inputs only. If Math::MPFR is not available or the input is negative, then other methods are used: continued fractions (x < -1), rational Chebyshev approximation ( -1 < x < 0), a convergent series (small positive x), or an asymptotic divergent series (large positive x). Accuracy should be at least 14 digits.

LogarithmicIntegral

  my $li = LogarithmicIntegral($x)

Given a positive floating point input, returns the floating point logarithmic integral of x, defined as the integral of dt/ln t from 0 to x. If given a negative input, the function will croak. The function returns 0 at x = 0, and -infinity at x = 1.

This is often known as li(x). A related function is the offset logarithmic integral, sometimes known as Li(x) which avoids the singularity at 1. It may be defined as Li(x) = li(x) - li(2). Crandall and Pomerance use the term li0 for this function, and define li(x) = Li0(x) - li0(2). Due to this terminology confusion, it is important to check which exact definition is being used.

If the bignum module has been loaded, all inputs will be treated as if they were Math::BigFloat objects.

For non-BigInt/BigFloat objects, the result should be accurate to at least 14 digits.

For BigInt / BigFloat objects, we first check to see if Math::MPFR is available. If so, then it is used, as it will return results much faster and can be more accurate. Accuracy when using MPFR will be equal to the accuracy() value of the input (or the default BigFloat accuracy, which is 40 by default).

MPFR is used for inputs greater than 1 only. If Math::MPFR is not installed or the input is less than 1, results will be calculated as Ei(ln x).

RiemannZeta

  my $z = RiemannZeta($s);

Given a floating point input s where s >= 0, returns the floating point value of ζ(s)-1, where ζ(s) is the Riemann zeta function. One is subtracted to ensure maximum precision for large values of s. The zeta function is the sum from k=1 to infinity of 1 / k^s. This function only uses real arguments, so is basically the Euler Zeta function.

If the bignum module has been loaded, all inputs will be treated as if they were Math::BigFloat objects.

For non-BigInt/BigFloat objects, the result should be accurate to at least 14 digits. The XS code uses a rational Chebyshev approximation between 0.5 and 5, and a series for other values. The PP code uses an identical series for all values.

For BigInt / BigFloat objects, we first check to see if the Math::MPFR module is installed. If so, then it is used, as it will return results much faster and can be more accurate. Accuracy when using MPFR will be equal to the accuracy() value of the input (or the default BigFloat accuracy, which is 40 by default).

If Math::MPFR is not installed, then results are calculated using either Borwein (1991) algorithm 2, or the basic series. Full input accuracy is attempted, but there are defects in Math::BigFloat with high accuracy computations that make this difficult. It is also very slow. I highly recommend installing Math::MPFR for BigFloat computations.

RiemannR

  my $r = RiemannR($x);

Given a positive non-zero floating point input, returns the floating point value of Riemann's R function. Riemann's R function gives a very close approximation to the prime counting function.

If the bignum module has been loaded, all inputs will be treated as if they were Math::BigFloat objects.

For non-BigInt/BigFloat objects, the result should be accurate to at least 14 digits.

For BigInt / BigFloat objects, we first check to see if the Math::MPFR module is installed. If so, then it is used, as it will return results much faster and can be more accurate. Accuracy when using MPFR will be equal to the accuracy() value of the input (or the default BigFloat accuracy, which is 40 by default). Accuracy without MPFR should be 35 digits.

EXAMPLES

Print pseudoprimes base 17:

    perl -MMath::Prime::Util=:all -E 'my $n=$base|1; while(1) { print "$n " if is_strong_pseudoprime($n,$base) && !is_prime($n); $n+=2; } BEGIN {$|=1; $base=17}'

Print some primes above 64-bit range:

    perl -MMath::Prime::Util=:all -Mbigint -E 'my $start=100000000000000000000; say join "\n", @{primes($start,$start+1000)}'
    # Similar code using Pari:
    # perl -MMath::Pari=:int,PARI,nextprime -E 'my $start = PARI "100000000000000000000"; my $end = $start+1000; my $p=nextprime($start); while ($p <= $end) { say $p; $p = nextprime($p+1); }'

Examining the η3(x) function of Planat and Solé (2011):

  sub nu3 {
    my $n = shift;
    my $phix = chebyshev_psi($n);
    my $nu3 = 0;
    foreach my $nu (1..3) {
      $nu3 += (moebius($nu)/$nu)*LogarithmicIntegral($phix**(1/$nu));
    }
    return $nu3;
  }
  say prime_count(1000000);
  say prime_count_approx(1000000);
  say nu3(1000000);

Project Euler, problem 3 (Largest prime factor):

  use Math::Prime::Util qw/factor/;
  use bigint;  # Only necessary for 32-bit machines.
  say "", (factor(600851475143))[-1]

Project Euler, problem 7 (10001st prime):

  use Math::Prime::Util qw/nth_prime/;
  say nth_prime(10_001);

Project Euler, problem 10 (summation of primes):

  use Math::Prime::Util qw/primes/;
  my $sum = 0;
  $sum += $_ for @{primes(2_000_000)};
  say $sum;

Project Euler, problem 21 (Amicable numbers):

  use Math::Prime::Util qw/divisor_sum/;
  sub dsum { my $n = shift; divisor_sum($n) - $n; }
  my $sum = 0;
  foreach my $a (1..10000) {
    my $b = dsum($a);
    $sum += $a + $b if $b > $a && dsum($b) == $a;
  }
  say $sum;

Project Euler, problem 41 (Pandigital prime), brute force command line:

  perl -MMath::Prime::Util=:all -E 'my @p = grep { /1/&&/2/&&/3/&&/4/&&/5/&&/6/&&/7/} @{primes(1000000,9999999)}; say $p[-1];

Project Euler, problem 47 (Distinct primes factors):

  use Math::Prime::Util qw/pn_primorial factor/;
  use List::MoreUtils qw/distinct/;
  sub nfactors { scalar distinct factor(shift); }
  my $n = pn_primorial(4);  # Start with the first 4-factor number
  $n++ while (nfactors($n) != 4 || nfactors($n+1) != 4 || nfactors($n+2) != 4 || nfactors($n+3) != 4);
  say $n;

Project Euler, problem 69, stupid brute force solution (about 5 seconds):

  use Math::Prime::Util qw/euler_phi/;
  my ($n, $max) = (0,0);
  do {
    my $ndivphi = $_ / euler_phi($_);
    ($n, $max) = ($_, $ndivphi) if $ndivphi > $max;
  } for 1..1000000;
  say "$n  $max";

Here's the right way to do PE problem 69 (under 0.03s):

  use Math::Prime::Util qw/pn_primorial/;
  my $n = 0;
  $n++ while pn_primorial($n+1) < 1000000;
  say pn_primorial($n);'

Project Euler, problem 187, stupid brute force solution, ~3 minutes:

  use Math::Prime::Util qw/factor -nobigint/;
  my $nsemis = 0;
  do { my @f = factor($_); $nsemis++ if scalar @f == 2; }
     for 1 .. int(10**8)-1;
  say $nsemis;

Here's the best way for PE187. Under 30 milliseconds from the command line:

  use Math::Prime::Util qw/primes prime_count -nobigint/;
  use List::Util qw/sum/;
  my $limit = shift || int(10**8);
  my @primes = @{primes(int(sqrt($limit)))};
  say sum( map { prime_count(int(($limit-1)/$primes[$_-1])) - $_ + 1 }
               1 .. scalar @primes );

LIMITATIONS

I have not completed testing all the functions near the word size limit (e.g. 2^32 for 32-bit machines). Please report any problems you find.

Perl versions earlier than 5.8.0 have a rather broken 64-bit implementation, in that the values are actually stored as doubles. Hence any value larger than ~ 2^49 will start losing bottom bits. This causes numerous functions to not work properly. The test suite will try to determine if your Perl is broken (this only applies to really old versions of Perl compiled for 64-bit when using numbers larger than ~ 2^49). The best solution is updating to a more recent Perl.

The module is thread-safe and should allow good concurrency on all platforms that support Perl threads except Win32. With Win32, either don't use threads or make sure prime_precalc is called before using primes, prime_count, or nth_prime with large inputs. This is only an issue if you use non-Cygwin Win32 and call these routines from within Perl threads.

SEE ALSO

This section describes other CPAN modules available that have some feature overlap with this one. Also see the "REFERENCES" section. Please let me know if any of this information is inaccurate. Also note that just because a module doesn't match what I believe are the best set of features, doesn't mean it isn't perfect for someone else.

I will use SoE to indicate the Sieve of Eratosthenes, and MPU to denote this module (Math::Prime::Util). Some quick alternatives I can recommend if you don't want to use MPU:

  • Math::Prime::FastSieve is the alternative module I use for basic functionality with small integers. It's fast and simple, and has a good set of features.

  • Math::Primality is the alternative module I use for primality testing on bigints.

  • Math::Pari if you want the kitchen sink and can install it and handle using it. There are still some functions it doesn't do well (e.g. prime count and nth_prime).

Math::Prime::XS has is_prime and primes functionality. There is no bigint support. The is_prime function uses well-written trial division, meaning it is very fast for small numbers, but terribly slow for large 64-bit numbers. The prime sieve is an unoptimized non-segmented SoE which which returns an array. It works well for 32-bit values, but speed and memory are problematic for larger values.

Math::Prime::FastSieve supports primes, is_prime, next_prime, prev_prime, prime_count, and nth_prime. The caveat is that all functions only work within the sieved range, so are limited to about 10^10. It uses a fast SoE to generate the main sieve. The sieve is 2-3x slower than the base sieve for MPU, and is non-segmented so cannot be used for larger values. Since the functions work with the sieve, they are very fast. All this functionality is present in MPU as well, though not required.

Bit::Vector supports the primes and prime_count functionality in a somewhat similar way to Math::Prime::FastSieve. It is the slowest of all the XS sieves, and has the most memory use. It is, however, faster than the pure Perl code in MPU or elsewhere.

Crypt::Primes supports random_maurer_prime functionality. MPU has more options for random primes (n-digit, n-bit, ranged, and strong) in addition to Maurer's algorithm. MPU does not have the critical bug RT81858. MPU should have a more uniform distribution as well as return a larger subset of primes (RT81871). MPU does not depend on Math::Pari though can run slow for bigints unless the Math::BigInt::GMP or Math::BigInt::Pari modules are installed. Having Math::Prime::Util::GMP installed also helps performance for MPU. Crypt::Primes is hardcoded to use Crypt::Random, while MPU uses Bytes::Random::Secure, and also allows plugging in a random function. This is more flexible, faster, has fewer dependencies, and uses a CSPRNG for security. What Crypt::Primes has that MPU does not is support for returning a generator.

Math::Factor::XS calculates prime factors and factors, which correspond to the "factor" and "all_factors" functions of MPU. These functions do not support bigints. Both are implemented with trial division, meaning they are very fast for really small values, but quickly become unusably slow (factoring 19 digit semiprimes is over 700 times slower). It has additional functions count_prime_factors and matches which have no direct equivalent in MPU.

Math::Big version 1.12 includes primes functionality. The current code is only usable for very tiny inputs as it is incredibly slow and uses lots of memory. RT81986 has a patch to make it run much faster and use much less memory. Since it is in pure Perl it will still run quite slow compared to MPU.

Math::Big::Factors supports factorization using wheel factorization (smart trial division). It supports bigints. Unfortunately it is extremely slow on any input that isn't comprised entirely of small factors. Even 7 digit inputs can take hundreds or thousands of times longer to factor than MPU or Math::Factor::XS. 19-digit semiprimes will take hours vs. MPU's single milliseconds.

Math::Factoring is a placeholder module for bigint factoring. Version 0.02 only supports trial division (the Pollard-Rho method does not work).

Math::Prime::TiedArray allows random access to a tied primes array, almost identically to what MPU provides in Math::Prime::Util::PrimeArray. MPU has attempted to fix Math::Prime::TiedArray's shift bug (RT58151). MPU is typically much faster and will use less memory, but there are some cases where MP:TA is faster (MP:TA stores all entries up to the largest request, while MPU:PA stores only a window around the last request).

Math::Primality supports is_prime, is_strong_pseudoprime, is_strong_lucas_pseudoprime, next_prime, prev_prime, prime_count, and is_aks_prime functionality. It also adds is_pseudoprime which MPU does not have or have planned. This is a great little module that implements primality functionality. It was the first module to support the BPSW and AKS tests. All inputs are processed using GMP, so it of course supports bigints. In fact, Math::Primality was made originally with bigints in mind, while MPU was originally targeted to native integers, but both have added better support for the other. The main differences are extra functionality (MPU has more functions) and performance. With native integer inputs, MPU is generally much faster, especially with "prime_count". For bigints, MPU is slower unless the Math::Prime::Util::GMP module is installed, in which case they have somewhat similar speeds. Math::Primality also installs a primes.pl program, but it has much less functionality than the one includes with MPU.

Math::NumSeq is more a related module rather than one with direct functionality. It does however offer a way to get similar results such as primes, twin primes, Sophie-Germain primes, lucky primes, moebius, divisor count, factor count, Euler totient, primorials, etc. Math::NumSeq is mainly set up for accessing these values in order, rather than for arbitrary values, though some sequences support that. The primary advantage I see is the uniform access mechanism for a lot of sequences. For those methods that overlap, MPU is usually much faster. Importantly, most of the sequences in Math::NumSeq are limited to 32-bit indices.

Math::Pari supports a lot of features, with a great deal of overlap. In general, MPU will be faster for native 64-bit integers, while Pari will be faster for bigints (installing Math::Prime::Util::GMP is critical for getting good performance with bigints, but even then, Pari's better algorithms will eventually win out). Trying to hit some of the highlights:

isprime

Similar to MPU's is_prob_prime or is_prime functions. MPU is deterministic for native integers, and uses a strong BPSW test for bigints (with a quick primality proof tried as well). The default version of Pari used by Math::Pari (2.1.7) uses 10 random M-R bases, which is a quick probable prime test (it also supports a Pocklington-Lehmer test by giving a 1 as the second argument). Using the newer 2.3.5 library makes isprime use an APRCL primality proof, which can take longer (though will be much faster than the BLS75 proof used in MPU's is_provable_prime routine).

primepi

Similar to MPU's prime_count function. Pari uses a naive counting algorithm with its precalculated primes, so this is not very useful.

primes

Doesn't support ranges, requires bumping up the precalculated primes for larger numbers, which means knowing in advance the upper limit for primes. Support for numbers larger than 400M requires making with Pari version 2.3.5. If that is used, sieving is about 2x faster than MPU, but doesn't support segmenting.

factorint

Similar to MPU's factor though with a different return (I find the result value quite inconvenient to work with, but others may like its vector of factor/exponent format). Slower than MPU for all 64-bit inputs on an x86_64 platform, it may be faster for large values on other platforms. Bigints are slightly faster with Math::Pari for "small" values, and MPU slows down rapidly (the difference is noticeable with 30+ digit numbers).

eulerphi

Similar to MPU's euler_phi. MPU is 2-5x faster for native integers. There is also support for a range, which can be much more efficient. Without Math::Prime::Util::GMP installed, MPU is very slow with bigints. With it installed, it is about 2x slower than Math::Pari.

moebius

Similar to MPU's moebius. Comparisons are similar to eulerphi.

sumdiv

Similar to MPU's divisor_sum. The standard sum (sigma_1) is very fast in MPU. Giving it a sub makes it much slower, and for numbers with very many factors, Pari is much faster.

eint1

Similar to MPU's ExponentialIntegral.

zeta

A more feature-rich version MPU's RiemannZeta function (supports negative and complex inputs).

Overall, Math::Pari supports a huge variety of functionality and has a sophisticated and mature code base behind it. For native integers sometimes the functions can be slower, but bigints are usually superior, and it rarely has any performance surprises. Some of the unique features MPU offers include super fast prime counts, nth_prime, approximations and limits for both, random primes, fast Mertens calculations, Chebyshev theta and psi functions, and the logarithmic integral and Riemann R functions. All with fairly minimal installation requirements.

PERFORMANCE

Counting the primes to 10^10 (10 billion), with time in seconds. Pi(10^10) = 455,052,511. The numbers below are for sieving. Calculating Pi(10^10) takes 0.064 seconds using the Lehmer algorithm in version 0.12.

   External C programs in C / C++:

       1.9  primesieve 3.6 forced to use only a single thread
       2.2  yafu 1.31
       3.8  primegen (optimized Sieve of Atkin, conf-word 8192)
       5.6  Tomás Oliveira e Silva's unoptimized segmented sieve v2 (Sep 2010)
       6.7  Achim Flammenkamp's prime_sieve (32k segments)
       9.3  http://tverniquet.com/prime/ (mod 2310, single thread)
      11.2  Tomás Oliveira e Silva's unoptimized segmented sieve v1 (May 2003)
      17.0  Pari 2.3.5 (primepi)

   Small portable functions suitable for plugging into XS:

       4.1  My segmented SoE used in this module (with unrolled inner loop)
      15.6  My Sieve of Eratosthenes using a mod-30 wheel
      17.2  A slightly modified verion of Terje Mathisen's mod-30 sieve
      35.5  Basic Sieve of Eratosthenes on odd numbers
      33.4  Sieve of Atkin, from Praxis (not correct)
      72.8  Sieve of Atkin, 10-minute fixup of basic algorithm
      91.6  Sieve of Atkin, Wikipedia-like

Perl modules, counting the primes to 800_000_000 (800 million):

  Time (s)   Module                      Version  Notes
  ---------  --------------------------  -------  -----------
       0.007 Math::Prime::Util           0.12     using Lehmer's method
       0.27  Math::Prime::Util           0.17     segmented mod-30 sieve
       0.39  Math::Prime::Util::PP       0.24     Perl (Lehmer's method)
       0.9   Math::Prime::Util           0.01     mod-30 sieve
       2.9   Math::Prime::FastSieve      0.12     decent odd-number sieve
      11.7   Math::Prime::XS             0.26     needs some optimization
      15.0   Bit::Vector                 7.2
      48.9   Math::Prime::Util::PP       0.14     Perl (fastest I know of)
     170.0   Faster Perl sieve (net)     2012-01  array of odds
     548.1   RosettaCode sieve (net)     2012-06  simplistic Perl
    3048.1   Math::Primality             0.08     Perl + Math::GMPz
  ~11000     Math::Primality             0.04     Perl + Math::GMPz
  >20000     Math::Big                   1.12     Perl, > 26GB RAM used

Python's standard modules are very slow: mpmath v0.17 primepi takes 169.5s and 25+ GB of RAM. sympy 0.7.1 primepi takes 292.2s. However there are very fast solutions written by Robert William Hanks (included in the xt/ directory of this distribution): pure Python in 12.1s and numpy in 2.8s.

is_prime: my impressions for various sized inputs:

   Module                   1-10 digits  10-20 digits  BigInts
   -----------------------  -----------  ------------  --------------
   Math::Prime::Util        Very fast    Pretty fast   Slow to Fast (3)
   Math::Prime::XS          Very fast    Very slow (1) --
   Math::Prime::FastSieve   Very fast    N/A (2)       --
   Math::Primality          Very slow    Very slow     Fast
   Math::Pari               Slow         OK            Fast

   (1) trial division only.  Very fast if every factor is tiny.
   (2) Too much memory to hold the sieve (11dig = 6GB, 12dig = ~50GB)
   (3) If L<Math::Prime::Util::GMP> is installed, then all three of the
       BigInt capable modules run at reasonably similar speeds, capable of
       performing the BPSW test on a 3000 digit input in ~ 1 second.  Without
       that module all computations are done in Perl, so this module using
       GMP bigints runs 2-3x slower, using Pari bigints about 10x slower,
       and using the default bigints (Calc) it can run much slower.

The differences are in the implementations:

Math::Prime::Util

looks in the sieve for a fast bit lookup if that exists (default up to 30,000 but it can be expanded, e.g. prime_precalc), uses trial division for numbers higher than this but not too large (100k on 64-bit machines, 100M on 32-bit machines), a deterministic set of Miller-Rabin tests for 64-bit numbers, and a BPSW test for bigints.

Math::Prime::XS

does trial divisions, which is wonderful if the input has a small factor (or is small itself). But if given a large prime it can take orders of magnitude longer. It does not support bigints.

Math::Prime::FastSieve

only works in a sieved range, which is really fast if you can do it (M::P::U will do the same if you call prime_precalc). Larger inputs just need too much time and memory for the sieve.

Math::Primality

uses GMP for all work. Under ~32-bits it uses 2 or 3 MR tests, while above 4759123141 it performs a BPSW test. This is is fantastic for bigints over 2^64, but it is significantly slower than native precision tests. With 64-bit numbers it is generally an order of magnitude or more slower than any of the others. Once bigints are being used, its performance is quite good. It is faster than this module unless Math::Prime::Util::GMP has been installed, in which case this module is just a little bit faster.

Math::Pari

has some very effective code, but it has some overhead to get to it from Perl. That means for small numbers it is relatively slow: an order of magnitude slower than M::P::XS and M::P::Util (though arguably this is only important for benchmarking since "slow" is ~2 microseconds). Large numbers transition over to smarter tests so don't slow down much. With the default Pari version, isprime will do M-R tests for 10 randomly chosen bases, but can perform a Pocklington-Lehmer proof if requested using isprime(x,1). Both could fail to identify a composite. If pari 2.3.5 is used instead (this requires hand-building the Math::Pari module) then the options are quite different. ispseudoprime(x,0) performs a strong BPSW test, while isprime now performs a primality proof using a fast implementation of the APRCL method. While the APRCL method is very fast compared to MPU's primality proof methods, it is still much, much slower than a BPSW probable prime test for large inputs.

Factoring performance depends on the input, and the algorithm choices used are still being tuned. Math::Factor::XS is very fast when given input with only small factors, but it slows down rapidly as the smallest factor increases in size. For numbers larger than 32 bits, Math::Prime::Util can be 100x or more faster (a number with only very small factors will be nearly identical, while a semiprime with large factors will be the extreme end). Math::Pari's underlying algorithms and code are much more mature than this module, and for 21+ digit numbers will be a better choice. Small numbers factor much faster with Math::Prime::Util. For 30+ digit numbers, Math::Pari is much faster. Without the Math::Prime::Util::GMP module, almost all actions on numbers greater than native scalars will be much faster in Pari.

This slide presentation has a lot of data on 64-bit and GMP factoring performance I collected in 2009. Assuming you do not know anything about the inputs, trial division and optimized Fermat or Lehman work very well for small numbers (<= 10 digits), while native SQUFOF is typically the method of choice for 11-18 digits (I've seen claims that a lightweight QS can be faster for 15+ digits). Some form of Quadratic Sieve is usually used for inputs in the 19-100 digit range, and beyond that is the General Number Field Sieve. For serious factoring, I recommend looking at yafu, msieve, gmp-ecm, GGNFS, and Pari. The latest yafu should cover most uses, with GGNFS likely only providing a benefit for numbers large enough to warrant distributed processing.

The primality proving algorithms leave much to be desired. If you have numbers larger than 2^128, I recommend isprime(n, 2) from Pari 2.3+ which will run a fast APRCL test, GMP-ECPP, or Primo. Any of those will be much faster than the Lucas or BLS algorithms used in MPU for large inputs. For very large numbers, Primo is the one to use.

AUTHORS

Dana Jacobsen <dana@acm.org>

ACKNOWLEDGEMENTS

Eratosthenes of Cyrene provided the elegant and simple algorithm for finding primes.

Terje Mathisen, A.R. Quesada, and B. Van Pelt all had useful ideas which I used in my wheel sieve.

Tomás Oliveira e Silva has released the source for a very fast segmented sieve. The current implementation does not use these ideas. Future versions might.

The SQUFOF implementation being used is a slight modification to the public domain racing version written by Ben Buhrow. Enhancements with ideas from Ben's later code as well as Jason Papadopoulos's public domain implementations are planned for a later version. The old SQUFOF implementation, still included in the code, is my modifications to Ben Buhrow's modifications to Bob Silverman's code.

REFERENCES

  • Pierre Dusart, "Estimates of Some Functions Over Primes without R.H.", preprint, 2010. Updates to the best non-RH bounds for prime count and nth prime. http://arxiv.org/abs/1002.0442/

  • Pierre Dusart, "Autour de la fonction qui compte le nombre de nombres premiers", PhD thesis, 1998. In French. The mathematics is readable and highly recommended reading if you're interesting in prime number bounds. http://www.unilim.fr/laco/theses/1998/T1998_01.html

  • Gabriel Mincu, "An Asymptotic Expansion", Journal of Inequalities in Pure and Applied Mathematics, v4, n2, 2003. A very readable account of Cipolla's 1902 nth prime approximation. http://www.emis.de/journals/JIPAM/images/153_02_JIPAM/153_02.pdf

  • David M. Smith, "Multiple-Precision Exponential Integral and Related Functions", ACM Transactions on Mathematical Software, v37, n4, 2011. http://myweb.lmu.edu/dmsmith/toms2011.pdf

  • Vincent Pegoraro and Philipp Slusallek, "On the Evaluation of the Complex-Valued Exponential Integral", Journal of Graphics, GPU, and Game Tools, v15, n3, pp 183-198, 2011. http://www.cs.utah.edu/~vpegorar/research/2011_JGT/paper.pdf

  • William H. Press et al., "Numerical Recipes", 3rd edition.

  • W. J. Cody and Henry C. Thacher, Jr., "Chebyshev approximations for the exponential integral Ei(x)", Mathematics of Computation, v23, pp 289-303, 1969. http://www.ams.org/journals/mcom/1969-23-106/S0025-5718-1969-0242349-2/

  • W. J. Cody and Henry C. Thacher, Jr., "Rational Chebyshev Approximations for the Exponential Integral E_1(x)", Mathematics of Computation, v22, pp 641-649, 1968.

  • W. J. Cody, K. E. Hillstrom, and Henry C. Thacher Jr., "Chebyshev Approximations for the Riemann Zeta Function", "Mathematics of Computation", v25, n115, pp 537-547, July 1971.

  • Ueli M. Maurer, "Fast Generation of Prime Numbers and Secure Public-Key Cryptographic Parameters", 1995. Generating random provable primes by building up the prime. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.2151

  • Pierre-Alain Fouque and Mehdi Tibouchi, "Close to Uniform Prime Number Generation With Fewer Random Bits", pre-print, 2011. Describes random prime distributions, their algorithm for creating random primes using few random bits, and comparisons to other methods. Definitely worth reading for the discussions of uniformity. http://eprint.iacr.org/2011/481

  • Douglas A. Stoll and Patrick Demichel , "The impact of ζ(s) complex zeros on π(x) for x < 10^{10^{13}}", "Mathematics of Computation", v80, n276, pp 2381-2394, October 2011. http://www.ams.org/journals/mcom/2011-80-276/S0025-5718-2011-02477-4/home.html

  • OEIS: Primorial

  • Walter M. Lioen and Jan van de Lune, "Systematic Computations on Mertens' Conjecture and Dirichlet's Divisor Problem by Vectorized Sieving", in From Universal Morphisms to Megabytes, Centrum voor Wiskunde en Informatica, pp. 421-432, 1994. Describes a nice way to compute a range of Möbius values. http://walter.lioen.com/papers/LL94.pdf

  • Marc Deléglise and Joöl Rivat, "Computing the summation of the Möbius function", Experimental Mathematics, v5, n4, pp 291-295, 1996. Enhances the Möbius computation in Lioen/van de Lune, and gives a very efficient way to compute the Mertens function. http://projecteuclid.org/euclid.em/1047565447

  • Manuel Benito and Juan L. Varona, "Recursive formulas related to the summation of the Möbius function", The Open Mathematics Journal, v1, pp 25-34, 2007. Among many other things, shows a simple formula for computing the Mertens functions with only n/3 Möbius values (not as fast as Deléglise and Rivat, but really simple). http://www.unirioja.es/cu/jvarona/downloads/Benito-Varona-TOMATJ-Mertens.pdf

COPYRIGHT

Copyright 2011-2012 by Dana Jacobsen <dana@acm.org>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.