Ævar Arnfjörð Bjarmason > Hailo > Hailo::Storage

Download:
Hailo-0.72.tar.gz

Dependencies

Annotate this POD

Website

CPAN RT

New  1
Open  2
View/Report Bugs
Module Version: 0.72   Source  

NAME ^

Hailo::Storage - A base class for Hailo storage backends

METHODS ^

The following two methods must to be implemented by subclasses:

_build_dbd

Should return the name of the database driver (e.g. 'SQLite') which will be passed to DBI.

_build_dbd_options

Subclasses can override this method to add options of their own. E.g:

    override _build_dbd_options => sub {
        return {
            %{ super() },
            sqlite_unicode => 1,
        };
    };

Comparison of backends ^

This benchmark shows how the backends compare when training on the small testsuite dataset as reported by the utils/hailo-benchmark utility (found in the distribution):

                         Rate DBD::Pg DBD::mysql DBD::SQLite/file DBD::SQLite/memory
    DBD::Pg            2.22/s      --       -33%             -49%               -56%
    DBD::mysql         3.33/s     50%         --             -23%               -33%
    DBD::SQLite/file   4.35/s     96%        30%               --               -13%
    DBD::SQLite/memory 5.00/s    125%        50%              15%                 --

Under real-world workloads SQLite is much faster than these results indicate since the time it takes to train/reply is relative to the existing database size. Here's how long it took to train on a 214,710 line IRC log on a Linode 1080 with Hailo 0.18:

In the case of PostgreSQL it's actually much faster to first train with SQLite, dump that database and then import it with psql(1), see failo's README for how to do that.

However, replying with an existing database (using utils/hailo-benchmark-replies) yields different results. SQLite can reply really quickly without being warmed up (which is the typical usecase for chatbots) but once PostgreSQL and MySQL are warmed up they start replying faster:

Here's a comparison of doing 10 replies:

                        Rate PostgreSQL MySQL SQLite-file SQLite-file-28MB SQLite-memory
    PostgreSQL        71.4/s         --  -14%        -14%             -29%          -50%
    MySQL             83.3/s        17%    --          0%             -17%          -42%
    SQLite-file       83.3/s        17%    0%          --             -17%          -42%
    SQLite-file-28MB 100.0/s        40%   20%         20%               --          -30%
    SQLite-memory      143/s       100%   71%         71%              43%            --

In this test MySQL uses around 28MB of memory (using Debian's my-small.cnf) and PostgreSQL around 34MB. Plain SQLite uses 2MB of cache but it's also tested with 28MB of cache as well as with the entire database in memory.

But doing 10,000 replies is very different:

                       Rate SQLite-file PostgreSQL SQLite-file-28MB MySQL SQLite-memory
    SQLite-file      85.1/s          --        -7%             -18%  -27%          -38%
    PostgreSQL       91.4/s          7%         --             -12%  -21%          -33%
    SQLite-file-28MB  103/s         21%        13%               --  -11%          -25%
    MySQL             116/s         37%        27%              13%    --          -15%
    SQLite-memory     137/s         61%        50%              33%   18%            --

Once MySQL gets more memory (using Debian's my-large.cnf) and a chance to warm it starts yielding better results (I couldn't find out how to make PostgreSQL take as much memory as it wanted):

                   Rate         MySQL SQLite-memory
    MySQL         121/s            --          -12%
    SQLite-memory 138/s           14%            --

AUTHOR ^

Ævar Arnfjörð Bjarmason <avar@cpan.org>

Hinrik Örn Sigurðsson, hinrik.sig@gmail.com

LICENSE AND COPYRIGHT ^

Copyright 2010 Ævar Arnfjörð Bjarmason and Hinrik Örn Sigurðsson

This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: