Mers package programs input and output format description

$Id: mersfmt.txt,v 1.17 2021/12/26 04:31:42 wedgingt Exp $

The basic format of the mers programs, server, and so on is:

M( {exponent} ){letter}[: {extent}][{,| #} {info}]

... where {exponent} is the power of two, {letter} is defined below,
{extent} is the extent of trial factoring, a factor, or some other
value depending on {letter}, a comma (,) or pound sign (#), and
arbitrary {info} in free form.  Every space in the format is literal
in that at least one must be present.  Square brackets like '[]'
surround optional items and braces like '{}' surround descriptions of
the data that goes there.

Note that, unlike most of the GIMPS work and data, the exponents in my
data are not always prime (though some of the data files on my web site
are restricted to prime exponents).

Everything after the first comma or pound sign is presently ignored by
my database update routines; this will change as I make improvements
to retain the discoverer and method used in the database itself.
Since most of the routines are shell scripts that include sorting on
the exponent, extent, and letter, the spaces are important to separate
fields for the generic UNIX/Linux sort program.

The intent is to have one fairly simple format for all data so that
all programs and scripts can share code to read and write it.

Now, for the descriptions of the {letter}s presently in use.  All
examples contain actual data as of when each section was originally
written or was last modified.

M( {exponent} )P[{,| #} {info}]
M( 3021377 )P
        'P'rime, perhaps with info about the discovery.
        See <A HREF="primeM.txt"> primeM.txt </A> for a complete list of
        known Mersenne prime exponents.

M( {exponent} )C: {prime-factor}[{,| #} {info}
M( 5056039 )C: 58799938797499937, golem.arc.nasa.gov (fd 48) #0 @ Fri Jul 25 02:34:39 1997
        'C'omposite with known prime factor.  The example is the
        output format of my (unreleased) TCP/IP server and denotes
        that the factor was sent to it by the listed computer at the
        given time.  Factors that are composite will not be listed in
        the data files on my web site.  Factors that are also factors
        of smaller Mersenne numbers will also not be listed.

M( {exponent} )C, {residue}[{,| #} {info}]
M( 41 )C, 0x000000c771a34e19, n = 8, mersenne1 v2.50  Kline
        'C'omposite with the Lucas-Lehmer residue and usually info on
        the program, version, user, and/or date.  Verified residues
        are available at <A HREF="http://www.mersenne.org/">
        mersenne.org </A> on the GIMPS status page in George Woltman's
        format.  Rarely used now.

M( {exponent} )c: {digits}[{,| #} {info}]
M( 1201 )c: 344
        'c'ofactor of {digits} digits is known to be composite.  The
        number of decimal digits is recorded since that will change
        every time a new factor is found.

M( {exponent} )H: {extent}[{,| #} {info}]
M( 727 )H: 9007199397668929
        'H'ighest trial factor attempted.  Mersenne is known to be
        composite.  Note that some smaller trial factors may not have
        been attempted since my database updates presently assume that
        all factorers are trial factorers (see also the 'G' line
        below).  The extent may be in (integer) power of two format:
M( 727 )H: 2^53
        The mers package program extract.c prints George Woltman's
        DATABASE file in this last format when given the -4 or the -m
        flag.

M( {exponent} )U: {extent}[{,| #} {info}]
M( 2147483743 )U: 844412082585031
        'U'nknown primality.  Extent is the highest trial factor
        attempted.  The extent may be in power of two format as
        explained for 'H' lines.

M( {exponent} )D[{,| #} {info}]
M( 7673 )D
        'D'one; all prime factors are known and I should have an ECPP
        primality certificate for each (primitive) prime factor.  The
        largest prime factor may not be listed explicitly in a 'C'
        line; it is often much larger and thus only implied by the 'D'
        line.  If the exponent is composite, Mersenne numbers with
        exponents that divide this exponent may still not be
        completely factored.  This and the other data that relates to
        the cofactor only applies to factors that are "primitive";
        that is, to factors that do not evenly divide smaller Mersenne
        numbers and cannot be derived algebraicly.  The 'd', 'c', 'l',
        'm', 'L', and 'M' lines have similar issues since they also
        refer to the cofactor.  See my <a href="mersenne.html">
        Mersenne page </a> file for related proofs.

M( {exponent} )[dS][{,| #} {info}]
M( {exponent} )[dS]: {digits}[ # {info}]
M( 6375 )C: 261586984897586338470001
M( 6375 )C: 123110416038843347492649222001
M( 6375 )S: 911
        'd'one or 'S'PRP; all (primitive) prime factors are thought to
        be known.  The largest factor is at least a pseudo-prime in
        some base other than 2 and is rarely if ever listed explicitly
        in a 'C' line.  See also the 'D' line just above and the 'L'
        and 'M' lines below.  I used 'd' until January 2019 and now
        use 'S' to avoid 'd' and 'D' being confused by mysql.

M( {exponent} )e: {ECM bound} {curves/trials} [CPUsecs/curve] [{,| #} {info}]
M( 601 )e: 3000000 1
        Elliptic Curve Method curves tried at the given (first stage)
        bound.  The database does not store this but instead adds it
        to the E: line for the exponent.

M( {exponent} )E: {ECM "work"} {ECM largest bound} [maxCPUsecs/curve] [{,| #} {info}]
M( 601 )E: 17684960000 3000000
        The ECM "work" and largest (stage 1) ECM bound attempted.  The
        "work" is presently defined as the sum for all curves of the
        stage 1 bound used as discussed on the Mersenne mailing list
        at the time I started tracking this data.

M( {exponent} )o: {P-1 first stage bound} [P-1 second stage bound][{,| #} {info}]
M( 727 )o: 15000000000 
        First and second stage bounds tested using a P-1 factorer.
        Note that the second stage bound is optional.

M( {exponent} )q: {P+1 first stage bound} [P+1 second stage bound][{,| #} {info}]
M( 727 )q: 0 0
        First and second stage bounds tested using a P+1 factorer.

M( {exponent} )G: {start of gap} {end of gap}[{,| #} {info}]
M( 617 )G: 4093841106587383 4503599627370496
        This denotes a gap in the trial factoring data.  I presently
        keep gaps (and joins) separately from the rest of the
        database.  Note that this can also be used to account for the
        range having been tested by a buggy factorer that may have
        missed factors (which is the original purpose, in fact, for
        these lines).

M( {exponent} )g: {start} {stop}[{,| #} {info}]
M( 4608557 )g: 50 51
        This also denotes a gap in the trial factoring data, but
        restricted to starting and ending at a power of 2.  The gap is
        from 2^start thru 2^stop.  E.g., the example indicates that
        M460857 has not been reliably checked for factors between 2^50
        and 2^51.  This is also used as a request to a trial factorer
        to do the indicated factoring, which is similar the format
        used in worktodo.ini files used by Prime95 and mprime and in
        worktodo.txt files used by two of the GPU factorers.  One
        common cause of these lines is an old version of the GIMPS
        trial factorer finding a factor as it did not check possible
        factors in order but checked each "order of 2" before the next
        one.

M( {exponent} )J: {start of "joined"/closed gap} {end of joined gap}[{,| #} {info}]
M( 601 )J: 1203 1181614081
        Denotes a "joined" gap in that it was produced by a trial
        factoring program that understands the 'G:' lines and is
        well tested and "well behaved" (e.g., will not produce gaps).
        This data "covers" 'G' and 'g' gaps, shrinking or eliminating them.

The last four letters, L, M, l, and m, all relate to an algebraic factorization
that applies only to composite exponents of the form n = 8*a - 4; the
factors are the 'L' and 'M' "halves" referred to in the descriptions.
The factorization is derived as follows:

   M(n) = M(8*a - 4)
        = 2^(8*a - 4) - 1
        = (2^(4*a - 2))^2 - 1^2
        = (2^(4*a - 2) - 1)*(2^(4*a - 2) + 1)
        = M(4*a - 2)*(M(4*a - 2) + 2)

Thus, half of the factorization of M(n) (for any even n) will be taken
care of by factoring M(n/2) = M(4*a - 2).  The other half can be factored
further into L and M:

   M(4*a - 2) + 2 = 2^(4*a - 2) + 1
        = (2^(2*n - 1) - 2^n + 1)*(2^(2*n - 1) + 2^n + 1)
        = L*M

Note that this factorization into L and M only works if the base, as for
Mersenne numbers, is 2.

M( {exponent} )L: {digits}[{,| #} {info}]
M( 12988 )C: 8234393
M( 12988 )C: 10987849
M( 12988 )C: 3837892943413
M( 12988 )C: 139199988925862811694530245823709
M( 12988 )C: 20098778207148252111516575825479349
M( 12988 )E: 40903926 305323
M( 12988 )L: 862
M( 12988 )m: 877
M( 12988 )o: 10000000
M( 12988 )q: 30000000
        All prime factors are thought to be known of the 'L' part.
        The largest factor is at least a pseudo-prime in some base
        other than 2 and may not be listed explicitly in a 'C' line.

M( {exponent} )M: {digits}[{,| #} {info}]
M( 4892 )C: 44029
M( 4892 )C: 68489
M( 4892 )E: 6045948689 6012334
M( 4892 )M: 4892
M( 4892 )l: 359
M( 4892 )o: 10000000
M( 4892 )q: 12000000
        All prime factors are thought to be known of the 'M' part.
        The largest factor is at least a pseudo-prime in some base
        other than 2 and may not be listed explicitly in a 'C' line.

Note that if both the L and M parts are thought to be completely
factored, then the database may contain an 'S', 'd', or 'D' line (as
described above) rather than the 'L' and 'M' lines.

M( {exponent} )l: {digits}[{,| #} {info}]
M( 4252 )C: 4253
M( 4252 )C: 119057
M( 4252 )C: 2351357
M( 4252 )C: 6262064969
M( 4252 )C: 118640804957
M( 4252 )C: 51799226966893295289509
M( 4252 )C: 51114358532758153182876066459473
M( 4252 )C: 1603777152900827521607976402498110749
M( 4252 )E: 240004674971 95459286
M( 4252 )l: 219
M( 4252 )m: 219
        The L half's cofactor of {digits} digits is known to be composite.
        The number of decimal digits is recorded since that will change
        every time a new factor is found.

M( {exponent} )m: {digits}[{,| #} {info}]
        The M half's cofactor of {digits} digits is known to be composite.
        The number of decimal digits is recorded since that will change
        every time a new factor is found.

Back to my <a href="mersenne.html"> Mersenne page </a>.

The "mers package" of software that I maintain, including the <a
href="README.html"> README.html </a> file, is available <a
href="mers.tar.gz"> here </a> and the current beta release is in <a
href="beta.tgz"> beta.tgz </a>; both, with the '.tgz' extension, can
be read by WinZip as well as the FSF's gunzip and UNIX/Linux tar
programs.

The current lowM.txt file, which records all data known to me for
exponents less than 200,000, is in <A HREF="mersdata.zip">
mersdata.zip, zip'd </A> and <A HREF="mersdata.tgz"> mersdata.tgz,
tar'd and then zip'd </A>.

Please send comments, ideas, questions, etc. to me,
<a href="mailto:wedgingt@acm.org"> Will Edgington </a> (permanent).