The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
#
#  Copyright 2009-2013 10gen, Inc.
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
#


# PODNAME: MongoDB::DataTypes
# ABSTRACT: The data types used with MongoDB

__END__

=pod

=head1 NAME

MongoDB::DataTypes - The data types used with MongoDB

=head1 VERSION

version 0.702.1

=head1 DESCRIPTION

This goes over the types you can save to the database and use for queries in the
Perl driver.  If you are using another language, please refer to that language's
documentation (L<http://api.mongodb.org>).

=head1 NOTES FOR SQL PROGRAMMERS

=head2 You must query for data using the correct type.

For example, it is perfectly valid to have some records where the field "foo" is
123 (integer) and other records where "foo" is "123" (string).  Thus, you must
query for the correct type.  If you save C<{"foo" =E<gt> "123"}>, you cannot query
for it with C<{"foo" =E<gt> 123}>.  MongoDB is strict about types.

If the type of a field is ambiguous and important to your application, you
should document what you expect the application to send to the database and
convert your data to those types before sending.  There are some object-document
mappers that will enforce certain types for certain fields for you.

You generally shouldn't save numbers as strings, as they will behave like
strings (e.g., range queries won't work correctly) and the data will take up
more space.  If you set L<MongoDB::BSON/looks_like_number>, the driver will
automatically convert everything that looks like a number to a number before
sending it to the database.

Numbers are the only exception to the strict typing: all number types stored by
MongoDB (32-bit integers, 64-bit integers, 64-bit floating point numbers) will
match each other.

=head1 TYPES

=head2 Numbers

By default, numbers with a decimal point will be saved as doubles (64-bit).

=head3 32-bit Platforms

Numbers without decimal points will be saved as 32-bit integers.  To save a
number as a 64-bit integer, use bigint:

    use bigint;

    $collection->insert({"user_id" => 28347197234178})

The driver will die if you try to insert a number beyond the signed 64-bit
range: -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807.

Numbers that are saved as 64-bit integers will be decoded as doubles.

=head3 64-bit Platforms

Numbers without a decimal point will be saved and returned as 64-bit integers.
Note that there is no way to save a 32-bit int on a 64-bit machine.

Keep in mind that this can cause some weirdness to ensue if some machines are
32-bit and others are 64-bit.  Take the following example:

=over 4

=item * Programmer 1 saves an int on a 32-bit platform.

=item * Programmer 2 retrieves the document on a 64-bit platform and re-saves
it, effectively converting it to a 64-bit int.

=item * Programmer 1 retrieves the document on their 32-bit machine, which
decodes the 64-bit int as a double.

=back

Nothing drastic, but good to be aware of.

=head4 64-bit integers in the shell

The Mongo shell has one numeric type: the 8-byte float.  This means that it
cannot always represent an 8-byte integer exactly.  Thus, when you display a
64-bit integer in the shell, it will be wrapped in a subobject that indicates
it might be an approximate value.  For instance, if we run this Perl on a
64-bit machine:

    $coll->insert({_id => 1});

then look at it in the shell, we see:

    > db.whatever.findOne()
    {
        "_id" :
            {
                "floatApprox" : 1
            }
    }

This doesn't mean that we saved a float, it just means that the float value of
a 64-bit integer may not be exact.

=head4 Dealing with numbers and strings in Perl

Perl is very flexible about whether something is number or a string: it
generally infers the type from context.  Unfortunately, the driver doesn't have
any context when it has to choose how to serialize a variable.  Therefore, the
default behavior is to introspect the flags that are set on that variable and
decide what the user meant, which are generally affected by the last operation.

    my $var = "4";
    # stored as the string "4"
    $collection->insert({myVar => $var});

    $var = int($var) if (int($var) eq $var);
    # stored as the int 4
    $collection->insert({myVar => $var});

Because of this, users often find that they end up with more strings than they
wanted in their database.

If you would like to have everything that looks like a number saved as a number,
set the L<MongoDB::BSON/looks_like_number> option.

    $MongoDB::BSON::looks_like_number = 1;

    my $var = "4";
    # stored as the int 4
    $collection->insert({myVar => $var});

This will send anything that "looks like" a number as a number.  It can
recognize anything that L<Scalar::Util>'s C<looks_like_number> function can
recognize.

On the other hand, sometimes there is data that looks like a number but should
be saved as a string.  For example, suppose we were storing zip codes.  If we
wanted to generally convert strings to numbers, we might have something like:

    $MongoDB::BSON::looks_like_number = 1;

    # zip is stored as an int: 4101
    $collection->insert({city => "Portland", "zip" => "04101"});

To force a "number" to be saved as a string with aggressive number conversion
on, bless the string as a C<MongoDB::BSON::String> type:

    my $z = "04101";
    my $zip = bless(\$z, "MongoDB::BSON::String");

    # zip is stored as "04101"
    $collection->insert({city => "Portland",
        zip => bless(\$zip, "MongoDB::BSON::String")});

Additionally, there are two utility functions, C<force_int> and
c<force_double>, to explicitly set Perl's internal type flags to
Integer (C<IV>) and Double (C<NV>) respectively, thus triggering
MongoDB's recognition of the values as Int32/Int64 (depending on the
platform) or Double:

    my $x = 1.0;
    MongoDB::force_int($x);
    $coll->insert({x => $x}); # Inserts an integer

    MongoDB::force_double($x);
    $coll->insert({x => $x}); # Inserts a double

=head2 Strings

All strings must be valid UTF-8 to be sent to the database.  If a string is not
valid, it will not be saved.  If you need to save a non-UTF-8 string, you can
save it as a binary blob (see the Binary Data section below).

All strings returned from the database have the UTF-8 flag set.

Unfortunately, due to Perl weirdness, UTF-8 is not very pretty.  For example,
suppose we have a UTF-8 string:

    my $str = 'Ă…land Islands';

Now, let's print it:

    print "$str\n";

You can see in the output:

    "\x{c5}land Islands"

Lovely, isn't it?  This is how Perl prints UTF-8.  To make it "pretty," there
are a couple options:

    my $pretty_str = utf8::encode($str);

This, unintuitively, clears the UTF-8 flag.

You can also just run

    binmode STDOUT, ':utf8';

and then the string (and all future UTF-8 strings) will print "correctly."

You can also turn off C<$MongoDB::BSON::utf_flag_on>, and the UTF-8 flag will
not be set when strings are decoded:

    $MongoDB::BSON::utf8_flag_on = 0;

=head2 Arrays

Arrays must be saved as array references (C<\@foo>, not C<@foo>).

=head2 Embedded Documents

Embedded documents are of the same form as top-level documents: either hash
references or L<Tie::IxHash>s.

=head2 Dates

The L<DateTime> or L<DateTime::Tiny> packages can be used to insert
and query for dates. Dates stored in the database will be returned as
instances of one of these classes, depending on the C<dt_type> setting
of the connection:

    $conn->dt_type( 'DateTime::Tiny' );

An example of storing and retrieving a date:

    use DateTime;

    my $now = DateTime->now;
    $collection->insert({'ts' => $now});

    my $obj = $collection->find_one;
    print "Today is ".$obj->{'ts'}->ymd."\n";

An example of querying for a range of dates:

    my $start = DateTime->from_epoch( epoch => 100000 );
    my $end = DateTime->from_epoch( epoch => 500000 );

    my $cursor = $collection->query({event => {'$gt' => $start, '$lt' => $end}});

B<Warning: creating Perl L<DateTime> objects is extremely slow.  Consider saving
dates as numbers or C<DateTime::Tiny> objects and converting the numbers to L<DateTime>s only when needed.  A single
L<DateTime> field can make deserialization up to 10 times slower.>

For example, you could use the L<time> function to store seconds since the epoch:

    $collection->update($criteria, {'$set' => {"last modified" => time()}})

This will be faster to deserialize.

Note that (at least, as of C<DateTime::Tiny> version 1.04) there is no
time-zone attribute for C<DateTime::Tiny> objects.  We therefore
consider all such times to be in the C<UTC> time zone.  Likewise,
C<DateTime::Tiny> has no notion of milliseconds (yet?), so the
milliseconds portion of the datetime will be set to zero.

=head2 Regular Expressions

Use C<qr/.../> to use a regular expression in a query:

    my $cursor = $collection->query({"name" => qr/[Jj]oh?n/});

Regular expressions will match strings saved in the database.

You can also save and retrieve regular expressions themselves:

    $collection->insert({"regex" => qr/foo/i});
    $obj = $collection->find_one;
    if ("FOO" =~ $obj->{'regex'}) { # matches
        print "hooray\n";
    }

Note for Perl 5.8 users: flags are lost when regular expressions are retrieved
from the database (this does not affect queries or Perl 5.10+).

=head2 Booleans

Use the L<boolean> package to get boolean values.  C<boolean::true> and
C<boolean::false> are the only parts of the package used, currently.

An example of inserting boolean values:

    use boolean;

    $collection->insert({"okay" => true, "name" => "fred"});

An example using boolean values for query operators (only returns documents
where the name field exists):

    my $cursor = $collection->query({"name" => {'$exists' => boolean::true}});

Most of the time, you can just use 1 or 0 instead of C<true> and C<false>, such
as for specifying fields to return.  L<boolean> is the only way to save
booleans to the database, though.

By default, booleans are returned from the database as integers.  To return
booleans as L<boolean>s, set C<$MongoDB::BSON::use_boolean> to 1.

=head2 MongoDB::OID

"OID" stands for "Object ID", and is a unique id that is automatically added to
documents if they do not already have an C<_id> field before they are saved to
the database.  They are 12 bytes which are guarenteed to be unique.  Their
string form is a 24-character string of hexidecimal digits.

To create a unique id:

    my $oid = MongoDB::OID->new;

To create a MongoDB::OID from an existing 24-character hexidecimal string:

    my $oid = MongoDB::OID->new("value" => "123456789012345678901234");

=head2 Binary Data

By default, all database strings are UTF8.  To save images, binaries, and other
non-UTF8 data, you need to store it as binary data.  There are two ways to do this.

=head3 String Refs

In general, you can pass the string as a reference.  For example:

    # non-utf8 string
    my $string = "\xFF\xFE\xFF";

    $collection->insert({"photo" => \$string});

This will save the variable as binary data, bypassing the UTF8 check.

Binary data can be matched exactly by the database, so this query will match
the object we inserted above:

    $collection->find({"photo" => \$string});

=head3 L<MongoDB::BSON::Binary> type

You can also use the L<MongoDB::BSON::Binary> class.  This allows you to
preserve the I<subtype> of your data.  Binary data in MongoDB stores a "type"
field, which can be any integer between 0 and 255.  Identical data will only
match if the subtype is the same.

Perl uses the default subtype of C<SUBTYPE_GENERIC>.

The driver defaults to returning binary data as strings, not instances of
L<MongoDB::BSON::Binary> (or even string references) for backwards compatibility
reasons.  If you need to round-trip binary data, set the
C<MongoDB::BSON::use_binary> flag:

    $MongoDB::BSON::use_binary = 1;

Comparisons (e.g., $gt, $lt) may not work as you expect with binary data, so it
is worth experimenting.

=head2 MongoDB::Code

L<MongoDB::Code> is used to represent JavaScript code and, optionally, scope.
To create one:

    use MongoDB::Code;

    my $code = MongoDB::Code->new("code" => "function() { return 'hello, world'; }");

Or, with a scope:

    my $code = MongoDB::Code->new("code" => "function() { return 'hello, '+name; }",
        "scope" => {"name" => "Fred"});

Which would then return "hello, Fred" when run.

=head2 MongoDB::MinKey

C<MongoDB::MinKey> is "less than" any other value of any type.  This can be useful
for always returning certain documents first (or last).

C<MongoDB::MinKey> has no methods, fields, or string form.  To create one, it is
sufficient to say:

    bless $minKey, "MongoDB::MinKey";

=head2 MongoDB::MaxKey

C<MongoDB::MaxKey> is "greater than" any other value of any type.  This can be useful
for always returning certain documents last (or first).

C<MongoDB::MaxKey> has no methods, fields, or string form.  To create one, it is
sufficient to say:

    bless $minKey, "MongoDB::MaxKey";

=head2 MongoDB::Timestamp

    my $ts = MongoDB::Timestamp->new({sec => $seconds, inc => $increment});

Timestamps are used internally by MongoDB's replication.  You can see them in
their natural habitat by querying C<local.main.$oplog>.  Each entry looks
something like:

    { "ts" : { "t" : 1278872990000, "i" : 1 }, "op" : "n", "ns" : "", "o" : { } }

In the shell, timestamps are shown in milliseconds, although they are stored as
seconds.  So, to represent this document in Perl, we would do:

    my $oplog = {
        "ts" => MongoDB::Timestamp->new("sec" => 1278872990, "inc" => 1),
        "op" => "n",
        "ns" => "",
        "o" => {}
    }

Timestamps are not dates.  You should not use them unless you are doing
something low-level with replication.  To save dates or times, use a number,
L<DateTime> object, or L<DateTime::Tiny> object.

=head1 AUTHORS

=over 4

=item *

Florian Ragwitz <rafl@debian.org>

=item *

Kristina Chodorow <kristina@mongodb.org>

=item *

Mike Friedman <mike.friedman@10gen.com>

=back

=head1 COPYRIGHT AND LICENSE

This software is Copyright (c) 2013 by 10gen, Inc..

This is free software, licensed under:

  The Apache License, Version 2.0, January 2004

=cut