The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DiaColloDB - diachronic collocation database, top-level

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB;
 
 ##========================================================================
 ## Constructors etc.
 
 $coldb = CLASS_OR_OBJECT->new(%args);
 
 ##========================================================================
 ## I/O: open/close
 
 $coldb_or_undef = $coldb->open($dbdir,%opts);
 @dbkeys = $coldb->dbkeys();
 $coldb_or_undef = $coldb->close();
 $bool = $coldb->opened();
 @files = $obj->diskFiles();
 
 ##========================================================================
 ## create: utils
 
 $multimap = $coldb->create_multimapmap($base, \%ts2i, $packfmt, $label="multimap");
 \@attrs = $coldb->attrs();
 $atitle = $CLASS_OR_OBJECT->attrTitle($attr_or_alias);
 $acbexpr = $CLASS_OR_OBJECT->attrCountBy($attr_or_alias,$matchid=0);
 $aquery_or_filter_or_undef = $CLASS_OR_OBJECT->attrQuery($attr_or_alias,$cquery);
 \@attrdata = $coldb->attrData();
 $bool = $coldb->hasAttr($attr);
 
 ##========================================================================
 ## create: from corpus
 
 $bool = $coldb->create($corpus,%opts);
 
 ##========================================================================
 ## create: union (aka merge)
 
 $coldb = $CLASS_OR_OBJECT->union(\@coldbs_or_dbdirs,%opts);
 
 ##========================================================================
 ## I/O: header
 
 @keys = $coldb->headerKeys();
 $bool = $coldb->loadHeaderData();
 
 ##========================================================================
 ## Export/Import
 
 $bool = $coldb->dbexport();
 $coldb = $coldb->dbimport();
 
 ##========================================================================
 ## Info
 
 \%info = $coldb->dbinfo();
 
 ##========================================================================
 ## Profiling: Utils
 
 $relname      = $coldb->relname($rel);
 $obj_or_undef = $coldb->relation($rel);
 \@ids         = $coldb->enumIds($enum,$req,%opts);
 
 ($dfilter,$sliceLo,$sliceHi,$dateLo,$dateHi)
                  = $coldb->parseDateRequest($dateRequest='', $sliceRequest=0, $fill=0, $ddcMode=0);
 
 $compiler        = $coldb->qcompiler();
 $cquery_or_undef = $coldb->qparse($ddc_query_string);
 $cquery          = $coldb->parseQuery([[$attr1,$val1],...], %opts) ##-- compat: ARRAY-of-ARRAYs
 
 \@aqs     = $coldb->queryAttributes($cquery,%opts);
 \@aqs     = $coldb->parseRequest($request, %opts);
 \%groupby = $coldb->groupby($groupby_request, %opts);
 $cqfilter = $coldb->query2filter($attr,$cquery,%opts);
 
 ($CQCountKeyExprs,\$CQRestrict,\@CQFilters)
           = $coldb->parseGroupBy($groupby_string_or_request,%opts);
 
 ##========================================================================
 ## Profiling: Generic
 
 $mprf  = $coldb->profile($relation, %opts);
 $mprf  = $coldb->extend($relation,%opts);
 \%opts = $CLASS_OR_OBJECT->profileOptions(\%opts);
 
 ##========================================================================
 ## Profiling: Comparison (diff)
 
 $mprf  = $coldb->compare($relation, %opts);
 \%opts = $CLASS_OR_OBJECT->compareOptions(\%opts);
 

DESCRIPTION

The DiaColloDB package is the top-level module for the DiaColloDB diachronic collocation database package. As a Perl class, a DiaColloDB object can be used to create or query a local native database instance.

Globals & Constants

Variable: $VERSION

Package version.

Variable: @ISA

DiaColloDB inherits from DiaColloDB::Client, and provides the low-level basis for the DiaColloDB::Client API.

Variable: $PGOOD_DEFAULT

Default positive pos regex for document parsing -- don't use qr// here, since Storable doesn't like pre-compiled Regexps. Default = q/^(?:N|TRUNC|VV|ADJ)/.

Variable: $PBAD_DEFAULT

Default negative pos regex for document parsing. Default = undef (none).

Variable: $WGOOD_DEFAULT

Default positive word regex for document parsing. Default = q/[[:alpha:]]/

Variable: $WBAD_DEFAULT

Default negative word regex for document parsing. Default = q/[\.]/.

Variable: $LGOOD_DEFAULT

Default positive lemma regex for document parsing. Default = undef (none).

Variable: $LBAD_DEFAULT

Default negative lemma regex for document parsing. Default = undef (none).

Variable: $TDF_MGOOD_DEFAULT

Default positive meta-field regex for document parsing (tdf only). Default = q/^(?:author|pnd|title|basename|collection|flags|textClass|genre)$/.

Variable: $TDF_MBAD_DEFAULT

Fefault negative meta-field regex for document parsing (tdf only). Default = q/_$/.

Variable: $ECLASS

enum class; default 'DiaColloDB::EnumFile::MMap'. Default = 'DiaColloDB::EnumFile::MMap'.

Variable: $XECLASS

fixed-length enum class. Default = 'DiaColloDB::EnumFile::FixedLen'

Variable: $MMCLASS

multimap class. Default = 'DiaColloDB::MultiMapFile'

Variable: %TDF_OPTS

Default options for DiaColloDB::Relation::TDF->new(). Default:

 mgood => $TDF_MGOOD_DEFAULT, ##-- positive filter regex for metadata attributes
 mbad  => $TDF_MBAD_DEFAULT,  ##-- negative filter regex for metadata attributes
 ##
 minFreq=>undef,    ##-- minimum total term-frequency for model inclusion (default=from $coldb->{tfmin})
 minDocFreq=>4,     ##-- minimim "doc-frequency" (#/docs per term) for model inclusion
 minDocSize=>4,     ##-- minimum doc size (#/tokens per doc) for model inclusion (default=8; formerly $coldb->{vbnmin})
 maxDocSize=>'inf', ##-- maximum doc size (#/tokens per doc) for model inclusion (default=inf; formerly $coldb->{vbnmax})
 ##
 vtype=>'float',    ##-- store compiled values as 32-bit floats
 itype=>'long',     ##-- store compiled indices as 32-bit integers

Constructors etc.

new
 $coldb = CLASS_OR_OBJECT->new(%args);

%args, object structure:

   (
    ##-- options
    dbdir => $dbdir,      ##-- database directory; REQUIRED
    flags => $fcflags,    ##-- fcntl flags or open()-style mode string; default='r'
    attrs => \@attrs,     ##-- index attributes (input as space-separated or array; compiled to array); default=undef (==>['l'])
                          ##    + each attribute can be token-attribute qw(w p l) or a document metadata attribute "doc.ATTR"
                          ##    + document "date" attribute is always indexed
    info => \%info,       ##-- additional data to return in info() method (e.g. collection, maintainer)
    pack_id => $fmt,      ##-- pack-format for IDs (default='N')
    pack_f  => $fmt,      ##-- pack-format for frequencies (default='N')
    pack_date => $fmt,    ##-- pack-format for dates (default='n')
    pack_off => $fmt,     ##-- pack-format for file offsets (default='N')
    pack_len => $len,     ##-- pack-format for string lengths (default='n')
    dmax  => $dmax,       ##-- maximum distance for collocation-frequencies and implicit ddc near() queries (default=5)
    cfmin => $cfmin,      ##-- minimum co-occurrence frequency for Cofreqs and ddc queries (default=2)
    tfmin => $tfmin,      ##-- minimum global term-frequency WITHOUT date component (default=2)
    fmin_${a} => $fmin,   ##-- minimum independent frequency for value of attribute ${a} (default=undef:from $tfmin)
    keeptmp => $bool,     ##-- keep temporary files? (default=0)
    index_xf  => $bool,   ##-- xf: create/use unigram index (default=1)
    index_cof => $bool,   ##-- cof: create/use co-frequency index (default=1)
    index_tdf => $bool,   ##-- tdf: create/use (term x document) frequency matrix index? (default=undef: if available)
    dbreak => $dbreak,    ##-- tdf: use break-type $break for tdf index (default=undef: files)
    tdfopts => \%tdfopts, ##-- tdf: options for DiaColloDB::Relation::TDF->new(); default=undef (all inherited from %TDF_OPTS)
    ##
    ##-- runtime ddc relation options
    ddcServer => $server, ##-- server for ddc relation ("$host:$port")
    ddcTimeout => $secs,  ##-- timeout for ddc relation
    ##
    ##-- source filtering (for create())
    pgood  => $regex,     ##-- positive filter regex for part-of-speech tags
    pbad   => $regex,     ##-- negative filter regex for part-of-speech tags
    wgood  => $regex,     ##-- positive filter regex for word text
    wbad   => $regex,     ##-- negative filter regex for word text
    lgood  => $regex,     ##-- positive filter regex for lemma text
    lbad   => $regex,     ##-- negative filter regex for lemma text
    ##
    ##-- logging
    logOpen => $level,       ##-- log-level for open/close (default='info')
    logCreate => $level,     ##-- log-level for create messages (default='info')
    logCorpusFile => $level, ##-- log-level for corpus file-parsing (default='trace')
    logCorpusFileN => $N,    ##-- log corpus file-parsing only for every N files (0 for none; default:undef ~ $corpus->size()/100)
    logExport => $level,     ##-- log-level for export messages (default='info')
    logProfile => $level,    ##-- log-level for verbose profiling messages (default='trace')
    logRequest => $level,    ##-- log-level for request-level profiling messages (default='debug')
    logCompat  => $level,    ##-- log-level for compatibility warnings (default='warn')
    ##
    ##-- runtime limits
    maxExpand => $size,    ##-- maximum number of elements in query expansions (default=65535)
    ##
    ##-- administrivia
    version => $version,   ##-- DiaColloDB version of stored db (==$DiaColloDB::VERSION)
    upgraded=>\@upgraded,  ##-- optional administrative information about auto-magic upgrades
    ##
    ##-- attribute data
    ${a}enum => $aenum,    ##-- attribute enum: $aenum : ($dbdir/${a}_enum.*) : $astr<=>$ai : A*<=>N
                           ##    e.g.  lemmata: $lenum : ($dbdir/l_enum.*   ) : $lstr<=>$li : A*<=>N
    ${a}2t   => $a2t,      ##-- attribute multimap: $a2t : ($dbdir/${a}_2t.*) : $ai=>@tis   : N=>N*
    pack_t$a => $fmt       ##-- pack format: extract attribute-id $ai from a packed tuple-string $ts ; $ai=unpack($coldb->{"pack_x$a"},$ts)
    ##
    ##-- tuple data (-dates)
    ##   + as of v0.10.000, packed term tuples EXCLUDING dates ("t-tuples") are mapped by $coldb->{tenum}
    ##   + prior to v0.10.000, term tuples INCLUDING dates ("x-tuples") were mapped by $coldb->{xenum}, now obsolete
    tenum  => $tenum,      ##-- enum: tuples ($dbdir/tenum.*) : \@ais<=>$ti : N*<=>N
    pack_t => $fmt,        ##-- symbol pack-format for $tenum : "${pack_id}[Nattrs]"
    xenum  => $xenum,      ##-- enum: tuples ($dbdir/xenum.*) : [@ais,$di]<=>$xi : N*n<=>N
    pack_t => $fmt,        ##-- symbol pack-format for $tenum : "${pack_id}[Nattrs]"
    xdmin => $xdmin,       ##-- minimum date (>= v0.04)
    xdmax => $xdmax,       ##-- maximum date (>= v0.04)
    ##
    ##-- relation data
    xf    => $xf,          ##-- ug: [$ti, $date]      => f($ti, $date)
    cof   => $cof,         ##-- cf: [$ti1,$date,$ti2] => f($ti1,$date,$ti2)
    ddc   => $ddc,         ##-- ddc client relation
    tdf   => $tdf,         ##-- tdf: (term x document) frequency matrix relation
   )
promote
 $cli_or_undef = $cli->promote($class,%opts);

DiaColloDB::Client method override: unsupported.

I/O: open/close

open
 $coldb_or_undef = $coldb->open($dbdir,%opts);
 $coldb_or_undef = $coldb->open();

Open the DB.

dbkeys
 @dbkeys = $coldb->dbkeys();

Returns list of %$coldb keys whose values are expected to be sub-objects.

close
 $coldb_or_undef = $coldb->close();

Close current DB, if opened.

opened
 $bool = $coldb->opened();

Returns truee iff db is opened.

diskFiles
 @files = $coldb->diskFiles();

Returns list of dist files for $coldb.

create: utils

Variables: (%ATTR_ALIAS,%ATTR_RALIAS,%ATTR_TITLE,%ATTR_CBEXPR);

Global attribute alias hacks.

 %ATTR_ALIAS  = ($name_or_alias=>$name, ...)
 %ATTR_RALIAS = ($name=>\@aliases, ...)
 %ATTR_CBEXPR = ($name=>$ddcCountByExpr, ...)
 %ATTR_TITLE  = ($name_or_alias=>$title, ...)
create_multimap
 $multimap = $coldb->create_multimap($base, \%ts2i, $packfmt, $label="multimap");

Create an expansion multimap, used by create().

attrs
 \@attrs = $coldb->attrs();
 \@attrs = $coldb->attrs($attrs=$coldb-E<gt>{attrs}, $default=[]);

parse attributes in $attrs as array.

attrName
 $aname = $CLASS_OR_OBJECT->attrName($attr)

Returns canonical (short) attribute name for $attr. Supports aliases in %ATTR_ALIAS = ($alias=>$name, ...).

attrTitle
 $atitle = $CLASS_OR_OBJECT->attrTitle($attr_or_alias);

Returns an attribute title for $attr_or_alias

attrCountBy
 $acbexpr = $CLASS_OR_OBJECT->attrCountBy($attr_or_alias,$matchid=0);

Returns a DDC::XS:CQCountKeyExpr object for $attr_or_alias with match-id $matchid.

attrQuery
 $aquery_or_filter_or_undef = $CLASS_OR_OBJECT->attrQuery($attr_or_alias,$cquery);

returns a DDC::XS::CQuery or DDC::XS::CQFilter object for condition $cquery on $attr_or_alias.

attrData
 \@attrdata = $coldb->attrData();
 \@attrdata = $coldb->attrData(\@attrs=$coldb->attrs)

get attribute data for \@attrs; returns @attrdata = ({a=>$a, i=>$i, enum=>$aenum, pack_x=>$pack_xa, a2x=>$a2x, ...})

hasAttr
 $bool = $coldb->hasAttr($attr);

Returns true iff $coldb natively supports the attribute (or alias) $attr.

create: from corpus

create
 $bool = $coldb->create($corpus,%opts);

%opts:

 $key => $val,  ##-- clobbers $coldb->{$key}

create: union (aka merge)

union
 $coldb = $CLASS_OR_OBJECT->union(\@coldbs_or_dbdirs,%opts);

Populates $coldb as union over @coldbs_or_dbdirs. Clobbers argument dbs {_union_${a}i2u}, {_union_xi2u}, {_union_argi}

I/O: header

Largely inherited from DiaColloDB::Persistent.

headerKeys
 @keys = $coldb->headerKeys();

keys to save as header

loadHeaderData
 $bool = $coldb->loadHeaderData();
 $bool = $coldb->loadHeaderData($data)

loads header data.

Export/Import

dbexport
 $bool = $coldb->dbexport();
 $bool = $coldb->dbexport($outdir,%opts);

$outdir defaults to "$coldb->{dbdir}/export" %opts:

 export_sdat => $bool,  ##-- whether to export *.sdat (stringified tuple files for debugging; default=0)
 export_cof  => $bool,  ##-- do/don't export cof.* (default=do)
dbimport
 $coldb = $coldb->dbimport();
 $coldb = $coldb->dbimport($txtdir,%opts)

Import ColocDB data from $txtdir

TODO

Info

dbinfo
 \%info = $coldb->dbinfo();

get db info

Profiling: Utils

relname
 $relname = $coldb->relname($rel);

Returns an appropriate relation name for profile() and friends:

  • returns $rel if $coldb->{$rel} supports a profile() method

  • otherwise heuristically parses $relationName /xf|f?1|ug/ or /f1?2|c/

relation
 $obj_or_undef = $coldb->relation($rel);

returns an appropriate relation-like object for profile() and friends; really just wraps $coldb->{$coldb->relname($rel)}.

relations
 @relnames = $coldb->relations();

gets list of relation names supported by $coldb.

enumIds
 \@ids = $coldb->enumIds($enum,$req,%opts);

parses enum IDs for $req, which is one of:

  • a DDC::XS::CQTokExact, ::CQTokInfl, ::CQTokSet, ::CQTokSetInfl, or ::CQTokRegex : interpreted

  • an ARRAY-ref : list of literal symbol-values

  • a Regexp ref : regexp for target strings, passed to $enum->re2i()

  • a string /REGEX/ : regexp for target strings, passed to $enum->re2i()

  • another string : space-, comma-, or |-separated list of literal values

%opts:

 logLevel  => $logLevel, ##-- logging level (default=undef)
 logPrefix => $prefix,   ##-- logging prefix (default="enumIds(): fetch ids")
parseDateRequest
 ($dfilter,$sliceLo,$sliceHi,$dateLo,$dateHi) = $coldb->parseDateRequest($dateRequest='', $sliceRequest=0, $fill=0, $ddcMode=0);
 \%dateRequest                                = $coldb->parseDateRequest($dateRequest='', $sliceRequest=0, $fill=0, $ddcMode=0);

low-level parsing for date (slice) requests. Returns limit and filter information as a list if called in list context (first form) or as a HASH-ref \%dateRequest if called in scalar context (second form). Returned \%dateRequest has keys corresponding to the list-elements returned in list context:

 dfilter => $dfilter,  ##-- filter-sub, called as: $wanted=$dfilter->($date); undef for none
 slo     => $sliceLo,  ##-- minimum slice (inclusive)
 shi     => $sliceHi,  ##-- maximum slice (inclusive)
 dlo     => $dateLo,   ##-- minimum date (inclusive); undef for none, always defined if $fill is true
 dhi     => $dateHi,   ##-- maximum date (inclusive); undef for none, always defined if $fill is true

Accepted formats for input parameter $dateRequest:

Empty Date

An empty string or a string containing only whitespace and asterisk (*) characters is ignored ($dlo=$dhi=undef); this should be interepreted by the caller as requesting the full indexed date range.

Date Regex

A date request /REGEX/ enclosed in slashes is treated as a regular expression matching all and only the desired dates. Throws an error if $ddcMode is true, since DDC currently does not support date regexes.

Date Range

A date request of the form MIN:MAX matches all dates in the range [MIN..MAX] (inclusive). For convenience, either or both of MIN and MAX may be an asterisk (*) to indicate the minimum (rsp. maximum) date stored in the index.

Date List

A whitespace-, comma-, or |-separated list of values is treated as a literal list of target dates. Throws an error if $ddcMode is true.

Date Value

Any other value is treated as a literal single target date.

qcompiler
 $compiler = $coldb->qcompiler();

get DDC::XS::CQueryCompiler for this object (cached in $coldb->{_qcompiler})

qparse
 $cquery_or_undef = $coldb->qparse($ddc_query_string);

wraps parse in an eval {...} block and sets $coldb->{error} on failure

parseQuery
 $cquery = $coldb->parseQuery([[$attr1,$val1],...], %opts) ##-- compat: ARRAY-of-ARRAYs;
 $cquery = $coldb->parseQuery(["$attr1:$val1",...], %opts) ##-- compat: ARRAY-of-requests
 $cquery = $coldb->parseQuery({$attr1=>$val1, ...}, %opts) ##-- compat: HASH
 $cquery = $coldb->parseQuery("$attr1=$val1, ...", %opts)  ##-- compat: string
 $cquery = $coldb->parseQuery($ddcQueryString, %opts)      ##-- ddc string (with shorthand ","->WITH, "&&"->WITH)

Guts for parsing user target and groupby requests; returns a DDC::XS::CQuery object representing the request. Index-only items "$l" are mapped to $l=*

%opts:

 warn    => $level,     ##-- log-level for unknown attributes (default: 'warn')
 logas   => $reqtype,   ##-- request type for warnings
 default => $attr,      ##-- default attribute (for query requests)
 mapand  => $bool,      ##-- map CQAnd to CQWith? (default=true unless '&&' occurs in query string)
 ddcmode => $bool,      ##-- force ddc query mode? (default=false)

If the first argument is a reference, it is parsed as a native query request. Otherwise, it is assumed to be a string either in the "native" (backwards-compatible) single-token request-notation or a valid DDC query. If the request looks like a simple request, it is parsed into a DDC::XS::CQuery object using local heuristics; DDC queries are parsed directly. The query syntax for "native" DiaColloDB queries is:

 q_native  ::= qn_clause ((" "|",") qn_clause)*
 qn_clause ::= ("$"? qn_attr "=")? qn_value
 qn_attr   ::= STRING
 qn_value  ::= qn_regex | qn_words
 qn_regex  ::= "/" REGEX "/" qn_regmod
 qn_regmod ::= ("g"|"i"|"m"|"s"|"a"|"l"|"u"|"x")*
 qn_words  ::= qn_word ("|" qn_word)*
 qn_word   ::= STRING

Native request clauses are parsed into queries of type CQTokSet, CQTokExact, CQTokRegex, or CQTokAny, and the returned query object conjoins multiple native request clauses using CQTokWith.

DDC queries are much more flexible, but not all DiaColloDB::Relation types support the full range of the DDC query syntax. In particular, the default relation classes DiaColloDB::Relation::Cofreqs and DiaColloDB::Relation::Unigrams support only those query types accepted by the queryAttributes() method.

queryAttributes
 \@aqs = $coldb->queryAttributes($cquery,%opts);

Utility for decomposing DDC queries into attribute-wise requests; returns an ARRAY-ref [[$attr1,$val1], ...]. Each value $vali is empty or undef (all values), a CQTokSet, a CQTokExact, a CQTokRegex, or a CQTokAny. Chokes on unsupported query types or filters.

%opts:

 warn    => $level,     ##-- log-level for unknown attributes (default: 'warn')
 logas   => $reqtype,   ##-- request type for warnings
 default => $attr,      ##-- default attribute (for query requests)
 allowUnknown => $bool, ##-- allow unknown attributes? (default: 0)
parseRequest
 \@aqs = $coldb->parseRequest($request, %opts);

Guts for parsing user target and groupby requests into attribute-wise ARRAY-ref [[$attr1,$val1], ...], used by native profiling methods. See parseQuery() method for supported $request formats and %opts. Wraps $coldb->queryAttributes($coldb->parseQuery($request,%opts)).

groupby
 \%groupby = $coldb->groupby($groupby_request, %opts);
 \%groupby = $coldb->groupby(\%groupby,        %opts);

Parse a user groupby request, used by native profiling methods. See parseRequest() for details on syntax of $groupby_request. Unlike "query" request parsing, native query-request attributes are obligatory and values are optional in "groupby" parsing mode:

 q_groupby ::= qg_clause ((" "|",") qg_clause)*
 qg_clause ::= "$"? qn_attr ("=" qn_value)?

Returns a HASH-ref of the form:

 req => $request,      ##-- save request
 ti2g => \&ti2g,       ##-- group-tuple extraction code ($ti => $gtuple) : $g_packed = $ti2g->($ti)
 ts2g => \&ts2g,       ##-- group-tuple extraction code ($ts => $gtuple) : $g_packed = $ts2g->($ts)
 g2s   => \&g2s,       ##-- stringification object suitable for DiaColloDB::Profile::stringify() [CODE,enum, or undef]
 g2txt => \&g2txt,     ##-- backwards-compatible join()-string stringifcation sub: join("\t",unpack($pack_g,$g_packed))
 tpack => \@tpack,     ##-- group-attribute-wise pack-templates, given @ttuple
 gpack => \@gpack,     ##-- group-attribute-wise pack-templates, given @gtuple
 areqs => \@areqs,     ##-- parsed attribute requests ([$attr,$ahaving],...)
 attrs => \@attrs,     ##-- like $coldb->attrs($groupby_request), modulo "having" parts
 titles => \@titles,   ##-- like map {$coldb->attrTitle($_)} @attrs

Options %opts:

 warn  => $level,    ##-- log-level for unknown attributes (default: 'warn')
 relax => $bool,     ##-- allow unsupported attributes (default=0)
 tenum => $tenum,    ##-- enum to use for \&t2g and \&t2s (default: $coldb->{tenum})
query2filter
 $cqfilter = $coldb->query2filter($attr,$cquery,%opts);

Converts a CQToken to a CQFilter, for ddc parsing. %opts:

 logas => $logas,   ##-- log-prefix for warnings
parseGroupBy
 ($CQCountKeyExprs,\$CQRestrict,\@CQFilters) = $coldb->parseGroupBy($groupby_string_or_request,%opts);

%opts:

 date => $date,
 slice => $slice,
 matchid => $matchid,    ##-- default match-id

ddc-mode groupby parsing utility. In addition to the native groupby syntax supported by the groupby() method, ddc-mode parsing also allows specification of a literal DDC count-ley list by enclosing it in square brackets:

 ddc_groupby ::= q_group | ("#BY"? "[" l_countkeys "]")

This is mainly useful in conjunction with user-defined match-ids in the corresponding parsed query, document metadata attributes, and/or server-side regex key transformations; see http://odo.dwds.de/~moocow/software/ddc/ddc_query.html#rule_count_key for details.

Profiling: Generic

profile
 $mprf = $coldb->profile($relation, %opts);

Get a relation profile for selected items as a DiaColloDB::Profile::Multi object. %opts:

 ##-- selection parameters
 query => $query,           ##-- target request ATTR:REQ...
 date  => $date1,           ##-- string or array or range "MIN-MAX" (inclusive) : default=all
 ##
 ##-- aggregation parameters
 slice   => $slice,         ##-- date slice (default=1, 0 for global profile)
 groupby => $groupby,       ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method
 ##
 ##-- scoring and trimming parameters
 eps     => $eps,           ##-- smoothing constant (default=0)
 score   => $func,          ##-- scoring function (f,fm,lf,lfm,mi,ld) : default="f"
 kbest   => $k,             ##-- return only $k best collocates per date (slice) : default=-1:all
 cutoff  => $cutoff,        ##-- minimum score
 global  => $bool,          ##-- trim profiles globally (vs. locally for each date-slice?) (default=0)
 ##
 ##-- profiling and debugging parameters
 strings => $bool,          ##-- do/don't stringify (default=do)
 fill    => $bool,          ##-- if true, returned multi-profile will have null profiles inserted for missing slices
 onepass => $bool,          ##-- if true, use old, fast, incorrect 1-pass method (default=0)

Sets default %opts and wraps $coldb->relation($rel)->profile($coldb, %opts).

extend
 $mprf = $coldb->extend($relation, %opts);

Get independent f2 frequencies for $opts{slice2keys}, which is EITHER a HASH-ref {$sliceLabel1=>\@sliceKeys1, ...}, OR a JSON-string encoding such a HASH-ref. Options %opts are as for the profile() method (mostly ignored), and also:

 slice2keys => \%slice2keys, ##-- target f2-items or JSON-string (REQUIRED)

Returns a DiaColloDB::Profile::Multi object containing the appropriate f2 entries. Used by list-clients|DiaColloDB::Client::list to ensure correct f2 counts for "missing" collocate items; see "Incorrect Independent Collocate Frequencies" in DiaColloDB::Client::list for details.

profileOptions
 \%opts = $CLASS_OR_OBJECT->profileOptions(\%opts);

Instantiates default options for profile() method. May be used e.g. by DiaColloDB::Client subclasses.

Profiling: Comparison (diff)

compare
 $mprf = $coldb->compare($relation, %opts);

Get a relation comparison profile for selected items as a DiaColloDB::Profile::MultiDiff object. %opts:

 ##-- selection parameters
 (a|b)?query => $query,       ##-- target query as for parseRequest()
 (a|b)?date  => $date1,       ##-- string or array or range "MIN-MAX" (inclusive) : default=all
 ##
 ##-- aggregation parameters
 groupby     => $groupby,     ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method
 (a|b)?slice => $slice,       ##-- date slice (default=1, 0 for global profile)
 ##
 ##-- scoring and trimming parameters
 eps     => $eps,             ##-- smoothing constant (default=0)
 score   => $func,            ##-- scoring function (f,fm,lf,lfm,mi,ld) : default="f"
 kbest   => $k,               ##-- return only $k best collocates per date (slice) : default=-1:all
 cutoff  => $cutoff,          ##-- minimum score (UNUSED for comparison profiles)
 global  => $bool,            ##-- trim profiles globally (vs. locally for each date-slice?) (default=0)
 diff    => $diff,            ##-- low-level score-diff operation (diff|adiff|sum|min|max|avg|havg|gavg|lavg); default='adiff'
 ##
 ##-- profiling and debugging parameters
 strings => $bool,            ##-- do/don't stringify (default=do)

Sets default %opts and wraps $coldb->relation($rel)->compare($coldb, %opts)

compareOptions
 \%opts = $CLASS_OR_OBJECT->compareOptions(\%opts);

Instantiates default options for compare() method. May be used e.g. by DiaColloDB::Client subclasses.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2016 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::Client(3pm), DiaColloDB::Corpus(3pm), DiaColloDB::Document(3pm), DiaColloDB::Persistent(3pm), DiaColloDB::Profile(3pm), DiaColloDB::Relation(3pm), DiaColloDB::Temp(3pm), DiaColloDB::Utils(3pm), dcdb-create.per(1), dcdb-query.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), dcdb-dump.perl(1), DiaColloDB::WWW(3pm), perl(1), ...