The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=pod

=encoding UTF-8

=head1 NAME

JSON::Parse - Read JSON into a Perl variable

=head1 SYNOPSIS

    
    use JSON::Parse 'parse_json';
    my $json = '["golden", "fleece"]';
    my $perl = parse_json ($json);
    # Same effect as $perl = ['golden', 'fleece'];
    


Convert JSON into Perl.

=head1 DESCRIPTION

JSON means "JavaScript Object Notation" and it is specified in L</RFC
4627>.

JSON::Parse converts JSON into the nearest equivalent Perl. The
function L</parse_json> takes one argument, a string containing JSON,
and returns a Perl reference. The input to C<parse_json> must be a
complete JSON structure.

The module differs from the L<JSON> module by simplifying the handling
of Unicode. If its input is marked as Unicode characters, the strings
in its output are also marked as Unicode characters.

JSON::Parse also provides two high speed validation functions,
L</valid_json> and L</assert_valid_json>, and a function to read JSON from
a file, L</json_file_to_perl>.

=head1 FUNCTIONS

=head2 parse_json

    use JSON::Parse 'parse_json';
    my $perl = parse_json ('{"x":1, "y":2}');

This function converts JSON into a Perl structure, either an array
reference or a hash reference.

If the first argument does not contain a complete valid JSON text,
C<parse_json> throws a fatal error ("dies"). If the first argument is
the undefined value, an empty string, or a string containing only
whitespace, C<parse_json> returns the undefined value.

If the argument contains valid JSON, the return value is either a hash
or an array reference. If the input JSON text is a serialized object,
a hash reference is returned:

    
    use JSON::Parse ':all';
    my $perl = parse_json ('{"a":1, "b":2}');
    print ref $perl, "\n";
    # Prints "HASH".
    


If the input JSON text is a serialized array, an array reference is
returned:

    
    use JSON::Parse ':all';
    my $perl = parse_json ('["a", "b", "c"]');
    print ref $perl, "\n";
    # Prints "ARRAY".
    


=head2 json_file_to_perl

    use JSON::Parse 'json_file_to_perl';
    my $p = json_file_to_perl ('filename');

This is exactly the same as L</parse_json> except that it reads the
JSON from the specified file rather than a scalar. The file must be in
the UTF-8 encoding, and is opened as a character file using
C<:encoding(UTF-8)> (see L<PerlIO::encoding> and L<perluniintro> for
details). The output is marked as character strings.

=head2 valid_json

    use JSON::Parse 'valid_json';
    if (valid_json ($json)) {
        # do something
    }

C<Valid_json> returns I<1> if its argument is valid JSON and I<0> if
not. It also returns I<0> if the input is undefined or the empty
string.

This is a high-speed validator which runs between roughly two and
eight times faster than L</parse_json>.

C<Valid_json> does not supply the actual errors which caused
invalidity. Use L</assert_valid_json> to get error messages when the
JSON is invalid.

=head2 assert_valid_json

    
    use JSON::Parse 'assert_valid_json';
    eval {
        assert_valid_json ('["xyz":"b"]');
    };
    if ($@) {
        print "Your JSON was invalid: $@\n";
    }
    # Prints "Unexpected character ':' parsing array"


This is the underlying function for L</valid_json>. It runs at the
same high speed, but throws an error if the JSON is wrong, rather than
returning 1 or 0. See L</DIAGNOSTICS> for the error format, which is
identical to L</parse_json>.

=head1 OLD INTERFACE

The following alternative function names are accepted. These are the
names used for the functions in old versions of this module. These
names are not deprecated and will never be removed from the module.

=head2 json_to_perl

This is exactly the same function as L</parse_json>.

=head2 validate_json

This is exactly the same function as L</assert_valid_json>.

=head1 Mapping from JSON to Perl

JSON elements are mapped to Perl as follows:

=head2 JSON numbers

JSON numbers become Perl numbers, either integers or double-precision
floating point numbers, or possibly strings containing the number if
parsing of a number by the usual methods fails somehow.

JSON does not allow leading zeros, or leading plus signs, so numbers
like I<+100> or I<0123> cause an L</Unexpected character> error. JSON
also does not allow numbers of the form I<1.> but it does allow things
like I<0e0> or I<1E999999>. As far as possible these are accepted by
JSON::Parse.

=head2 JSON strings

JSON strings become Perl strings. The JSON escape characters such as
C<\t> for the tab character (see section 2.5 of L</RFC 4627>) are
mapped to the equivalent ASCII character.

=head3 Handling of Unicode

If the input to L</parse_json> is marked as Unicode characters, the
output strings will be marked as Unicode characters. If the input is
not marked as Unicode characters, the output strings will not be
marked as Unicode characters. Thus, 

    
    use JSON::Parse ':all';
    # The scalar $sasori looks like Unicode to Perl
    use utf8;
    my $sasori = '["蠍"]';
    my $p = parse_json ($sasori);
    print utf8::is_utf8 ($p->[0]);
    # Prints 1.
    


but

    
    use JSON::Parse ':all';
    # The scalar $ebi does not look like Unicode to Perl
    no utf8;
    my $ebi = '["海老"]';
    my $p = parse_json ($ebi);
    print utf8::is_utf8 ($p->[0]);
    # Prints nothing.
    


Escapes of the form \uXXXX (see page three of L</RFC 4627>) are mapped
to ASCII if XXXX is less than 0x80, or to UTF-8 if XXXX is greater
than or equal to 0x80.

Strings containing \uXXXX escapes greater than 0x80 are also upgraded
to character strings, regardless of whether the input is a character
string or a byte string, thus regardless of whether Perl thinks the
input string is Unicode, escapes like \u87f9 are converted into the
equivalent UTF-8 bytes and the particular string in which they occur
is marked as a character string:

    
    use JSON::Parse ':all';
    no utf8;
    # 蟹
    my $kani = '["\u87f9"]';
    my $p = parse_json ($kani);
    print "It's marked as a character string" if utf8::is_utf8 ($p->[0]);
    # Prints "It's marked as a character string" because it's upgraded
    # regardless of the input string's flags.


This is modelled on the behaviour of Perl's C<chr>:

    
    no utf8;
    my $kani = '87f9';
    print "hex is character string\n" if utf8::is_utf8 ($kani);
    # prints nothing
    $kani = chr (hex ($kani));
    print "chr makes it a character string\n" if utf8::is_utf8 ($kani);
    # prints "chr makes it a character string"


Since every byte of input is validated as UTF-8 (see L</UTF-8 only>),
this hopefully will not upgrade invalid strings.

Surrogate pairs in the form C<\uD834\uDD1E> are also handled. If the
second half of the surrogate pair is missing, an L</Unexpected
character> or L</Unexpected end of input> error is thrown. If the
second half of the surrogate pair is present but contains an
impossible value, a L</Not surrogate pair> error is thrown.

=head2 JSON arrays

JSON arrays become Perl array references. The elements of the Perl
array are in the same order as they appear in the JSON.

Thus

    my $p = parse_json ('["monday", "tuesday", "wednesday"]');

has the same result as a Perl declaration of the form

    my $p = [ 'monday', 'tuesday', 'wednesday' ];

=head2 JSON objects

JSON objects become Perl hashes. The members of the JSON object become
key and value pairs in the Perl hash. The string part of each object
member becomes the key of the Perl hash. The value part of each member
is mapped to the value of the Perl hash.

Thus

    my $j = <<EOF;
    {"monday":["blue", "black"],
     "tuesday":["grey", "heart attack"],
     "friday":"Gotta get down on Friday"}
    EOF

    my $p = parse_json ($j);

has the same result as a Perl declaration of the form

    my $p = {
        monday => ['blue', 'black'],
        tuesday => ['grey', 'heart attack'],
        friday => 'Gotta get down on Friday',
    };

=head2 null

The JSON null literal is mapped to a readonly scalar
C<$JSON::Parse::null> containing the undefined value.

=head2 true

The JSON true literal is mapped to a readonly scalar
C<$JSON::Parse::true> containing the value 1.

=head2 false

The JSON false literal is mapped to a readonly scalar
C<$JSON::Parse::false> containing the value 0.

=head1 RESTRICTIONS

This module imposes the following restrictions on its input.

=over

=item JSON only

JSON::Parse is a strict parser. It only accepts input which exactly
meets the criteria of L</RFC 4627>. That means, for example,
JSON::Parse does not accept single quotes (') instead of double quotes
("), or numbers with leading zeros, like 0123. JSON::Parse does not
accept control characters (0x00 - 0x1F) in strings, missing commas
between array or hash elements like C<["a" "b"]>, or trailing commas
like C<["a","b","c",]>. It also does not accept trailing
non-whitespace, like the second "]" in C<["a"]]>.

=item No incremental parsing

JSON::Parse does not do incremental parsing. JSON::Parse only parses
fully-formed JSON strings which include all opening and closing
brackets.

=item UTF-8 only

Although JSON may come in various encodings of Unicode, JSON::Parse
only parses the UTF-8 format. If input is in a different Unicode
encoding than UTF-8, convert the input before handing it to this
module. For example, for the UTF-16 format,

    use Encode 'decode';
    my $input_utf8 = decode ('UTF-16', $input);
    my $perl = parse_json ($input_utf8);

or, for a file, use C<:encoding> (see L<PerlIO::encoding> and
L<perluniintro>):

    open my $input, "<:encoding(UTF-16)", 'some-json-file'; 

JSON::Parse does not determine the nature of the octet stream, as
described in part 3 of L</RFC 4627>.

This restriction to UTF-8 applies regardless of whether Perl thinks
that the input string is a character string or a byte
string. Non-UTF-8 input will cause an L</Unexpected character> error
to be thrown.

=back

=head1 DIAGNOSTICS

L</valid_json> does not produce error messages. L</parse_json> and
L</assert_valid_json> die on encountering invalid input.

Error messages have the line number and the byte number where
appropriate of the input which caused the problem. The line number is
formed simply by counting the number of "\n" (linefeed, ASCII 0x0A)
characters in the whitespace part of the JSON.

Parsing errors are fatal, so to continue after an error occurs, put
the parsing into an C<eval> block:

    my $p;                       
    eval {                       
        $p = parse_json ($j);  
    };                           
    if ($@) {                    
        # handle error           
    }

The following error messages are produced:



=head2 Unexpected character


An unexpected character (byte) was encountered in the input. For
example, when looking at the beginning of a string supposedly
containing JSON, there are six possible characters, the four JSON
whitespace characters plus "[" and "{". If the module encounters a
plus sign, it will give an error like this:

    assert_valid_json ('+');

gives output

    JSON error at line 1, byte 1/1: Unexpected character '+' parsing initial state: expecting whitespace: '\n', '\r', '\t', ' ' or start of an array or object: '{', '[' 



The message always includes a list of what characters are allowed.

If there is some recognizable structure being parsed, the error
message will include its starting point in the form "starting from
byte n":

    assert_valid_json ('{"this":"\a"}');

gives output

    JSON error at line 1, byte 11/13: Unexpected character 'a' parsing string starting from byte 9: expecting escape: '\', '/', '"', 'b', 'f', 'n', 'r', 't', 'u' 



A feature of JSON is that parsing it requires only one byte to be
examined at a time. Thus almost all parsing problems can be handled
using the "Unexpected character" error type, including spelling errors
in literals:

    assert_valid_json ('[true,folse]');

gives output

    JSON error at line 1, byte 8/12: Unexpected character 'o' parsing literal starting from byte 7: expecting 'a' 



and the missing second half of a surrogate pair:

    assert_valid_json ('["\udc00? <-- should be a second half here"]');

gives output

    JSON error at line 1, byte 9/44: Unexpected character '?' parsing unicode escape starting from byte 3: expecting '\' 



All kinds of errors can occur parsing numbers, for example a missing
fraction,

    assert_valid_json ('[1.e9]');

gives output

    JSON error at line 1, byte 4/6: Unexpected character 'e' parsing number starting from byte 2: expecting digit: '0-9' 



and a leading zero,

    assert_valid_json ('[0123]');

gives output

    JSON error at line 1, byte 3/6: Unexpected character '1' parsing number starting from byte 2: expecting whitespace: '\n', '\r', '\t', ' ' or comma: ',' or end of array: ']' or dot: '.' or exponential sign: 'e', 'E' 



The error message is this complicated because all of the following are
valid here: whitespace: C<[0 ]>; comma: C<[0,1]>, end of array:
C<[0]>, dot: C<[0.1]>, or exponential: C<[0e0]>.

These are all handled by this error.  Thus the error messages are a
little confusing as diagnostics.

Versions of this module prior to 0.29 gave more informative messages
like "leading zero in number". (The messages weren't documented.) The
reason to change over to the single message was because it makes the
parsing code simpler, and because the testing code described in
L</TESTING> makes use of the internals of this error to check that the
error message produced actually do correspond to the invalid and valid
bytes allowed by the parser, at the exact byte given.

This is a bytewise error, thus for example if a miscoded UTF-8 appears
in the input, an error message saying what bytes would be valid at
that point will be printed.

    
    no utf8;
    use JSON::Parse 'assert_valid_json';
    
    # Error in first byte:
    
    my $bad_utf8_1 = chr (hex ("81"));
    eval { assert_valid_json ("[\"$bad_utf8_1\"]"); };
    print "$@\n";
    
    # Error in third byte:
    
    my $bad_utf8_2 = chr (hex ('e2')) . chr (hex ('9C')) . 'b';
    eval { assert_valid_json ("[\"$bad_utf8_2\"]"); };
    print "$@\n";


prints

    JSON error at line 1, byte 3/5: Unexpected character 0x81 parsing string starting from byte 2: expecting printable ASCII or first byte of UTF-8: '\x20-\x7f', '\xC2-\xF4' at examples/bad-utf8.pl line 10.
    
    JSON error at line 1, byte 5/7: Unexpected character 'b' parsing string starting from byte 2: expecting bytes in range 80-bf: '\x80-\xbf' at examples/bad-utf8.pl line 16.
    







=head2 Unexpected end of input


The end of the string was encountered before the end of whatever was
being parsed was. For example, if a quote is missing from the end of
the string, it will give an error like this:

    assert_valid_json ('{"first":"Suzuki","second":"Murakami","third":"Asada}');

gives output

    JSON error at line 1: Unexpected end of input parsing string starting from byte 47 








=head2 Not surrogate pair


While parsing a string, a surrogate pair was encountered. While trying
to turn this into UTF-8, the second half of the surrogate pair turned
out to be an invalid value.

    assert_valid_json ('["\uDC00\uABCD"]');

gives output

    JSON error at line 1: Not surrogate pair parsing unicode escape starting from byte 11 








=head2 Empty input


This error occurs for L</assert_valid_json> when it's given an empty
or undefined value. Given empty input, L</parse_json> returns an
undefined value rather than throwing an error.






=head1 SPEED

On the author's computer, the module's speed of parsing is
approximately the same as L<JSON::XS>, with small variations depending
on the type of input. For validation, L</valid_json> is faster than
any other module known to the author, and up to ten times faster than
JSON::XS.

Some special types of input, such as floating point numbers containing
an exponential part, like "1e09", seem to be about two or three times
faster to parse with this module than with L<JSON::XS>. In
JSON::Parse, parsing of exponentials is done by the system's C<strtod>
function, but JSON::XS contains its own parser for exponentials, so
these results may be system-dependent.

On the other hand, JSON::XS makes better use of Perl's inbuilt string
handling than JSON::Parse and so it's faster for some types of
strings. The main focus of the version 0.29 release is increased
accuracy and better handling of edge cases. I'm planning to attend to
the speed issues in future versions.

There is some benchmarking code in the github repository under the
directory "benchmarks" for those wishing to test these claims. The
script F<benchmarks/bench> is an adaptation of the similar script in
the L<JSON::XS> distribution.

The following benchmark tests used version 0.29 of JSON::Parse and
version 3.01 of JSON::XS on the files in the "benchmarks" directory of
JSON::Parse. "short.json" and "long.json" are the benchmarks used by
JSON::XS.

=over

=item short.json

    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     | 358487.521 |  0.0000279 |
    JSON::Parse   | 179243.761 |  0.0000558 |
    JSON::XS      | 156503.881 |  0.0000639 |
    --------------+------------+------------+


=item long.json

    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     |   6385.968 |  0.0015659 |
    JSON::Parse   |   2803.492 |  0.0035670 |
    JSON::XS      |   3506.357 |  0.0028520 |
    --------------+------------+------------+


=item words-array.json

    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     | 164482.510 |  0.0000608 |
    JSON::Parse   |  22622.999 |  0.0004420 |
    JSON::XS      |  21936.736 |  0.0004559 |
    --------------+------------+------------+


=item exp.json

    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     |  88487.426 |  0.0001130 |
    JSON::Parse   |  35726.610 |  0.0002799 |
    JSON::XS      |  13662.228 |  0.0007319 |
    --------------+------------+------------+


=item literals.json

    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     | 204600.195 |  0.0000489 |
    JSON::Parse   |  31230.856 |  0.0003202 |
    JSON::XS      |  17578.810 |  0.0005689 |
    --------------+------------+------------+


=item cpantesters.json

    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     |    631.187 |  0.0158432 |
    JSON::Parse   |    132.401 |  0.0755279 |
    JSON::XS      |    131.020 |  0.0763240 |
    --------------+------------+------------+

=back

=head1 SEE ALSO

=over

=item RFC 4627

JSON is specified in L<RFC 4627 "The application/json Media Type for JavaScript Object Notation (JSON)"|http://www.ietf.org/rfc/rfc4627.txt>.

=item json.org

L<http://json.org> is the website for JSON, authored by Douglas
Crockford.

=item JSON, JSON::XS, and friends

These modules allow both reading and writing of JSON. JSON::Parse
originated as a response to the overcomplex interface of L<JSON>, in
particular its exasperating handling of Unicode.

There are also a lot of other modules for parsing and producing JSON
on CPAN. I have found the following ones: L<JSON::DWIW>, L<JSON::Any>,
L<JSON::YAJL>, L<JSON::Util>, L<JSON::Tiny>, L<Pegex::JSON>,
L<JSON::Streaming::Reader>, L<JSON::Syck>, L<Mojo::JSON>. Please let
me know of any others I've missed.

A fork of JSON::XS also exists as L<Cpanel::JSON::XS>. This is related
to a disagreement about how to report bugs. Please see the module for
details. Another module, L<JSON::XS::VersionOneAndTwo>, supports two
different interfaces of JSON::XS. However, JSON::XS is now onto
version 3. L<JSON::SL> also seems to be a fork of JSON::XS. I was
unable to run its example code.

=back

=head1 TEST RESULTS

The CPAN testers results are at the usual place. At the time of
release of this 0.29 version of the module, apart from pre-5.8.9
versions of Perl, there is only one CPAN testers testing machine on
which JSON::Parse fails its tests, a Windows 5.16.3 multithreaded
Perl. So far I have been unable to work out why these tests are
failing on that machine. If JSON::Parse does not install on your
machine, let me know.

The ActiveState test results are at
L<http://code.activestate.com/ppm/JSON-Parse/>.

=head1 EXPORTS

The module exports nothing by default. All of the functions,
L</parse_json>, L</json_file_to_perl>, L</valid_json> and
L</assert_valid_json>, as well as the old function names
L</validate_json> and L</json_to_perl>, can be exported on request.

All of the functions can be exported using the tag ':all':

    use JSON::Parse ':all';

=head1 SUPPORT

There is a mailing list at <json-parse@googlegroups.com> for
announcements and discussions about the module. You can read it on the
web at
L<https://groups.google.com/forum/#!forum/json-parse>. Membership is
open to anyone.

=head1 TESTING

The module incorporates extensive testing related to the production of
error messages and validation of input. Some of the testing code is
supplied with the module in the F</t/> subdirectory of the
distribution.

More extensive testing code is in the git repository. This is not
supplied in the CPAN distribution. A script, F<randomjson.pl>,
generates a set number of bytes of random JSON and checks that the
module's bytewise validation of input is correct. This setup relies on
a C file F<json-random-test.c> which isn't in the CPAN distribution,
and it also requires F<Json3.xs> to be edited to make the macro
C<TESTRANDOM> true (uncomment line 7 of the file). The testing code
uses C setjmp/longjmp, so it's not guaranteed to work on all operating
systems and is commented out for CPAN releases.

A pure C version called F<random-test.c> also exists. This applies
exactly the same tests, and requires no Perl at all.

=head1 AUTHOR

Ben Bullock, <bkb@cpan.org>

=head1 LICENSE

JSON::Parse can be used, copied, modified and redistributed under the
same terms as Perl itself.

=cut