WAIT::Table -- Module for maintaining Tables / Relations
The constructor WAIT::Table->new is normally called via the create_table method of a database handle. This is not enforced, but creating a table does not make any sense unless the table is registered by the database because the latter implements persistence of the meta data. Registering is done automatically by letting the database handle the creation of a table.
my $db = WAIT::Database->create(name => 'sample'); my $tb = $db->create_table(name => 'test', access => $access, layout => $layout, attr => ['docid', 'headline'], );
The constructor returns a handle for the table. This handle is hidden by the table module, to prevent direct access if called via Table.
A reference to an access object for the external parts (attributes) of tuples. As you may remember, the WAIT System does not enforce that objects are completely stored inside the system to avoid duplication. There is no (strong) point in storing all your HTML documents inside the system when indexing your WWW-Server.
The access object is designed to work like as a tied hash. You pass the refernce to the object, not the tied hash though. An example implementation of an access class that works for manpages is WAIT::Document::Nroff.
The implementation needs to take into account that WAIT will keep this object in a Data::Dumper or Storable database and re-use it when sman is run. So it is not good enough if we can produce the index with it now, when we create or actively access the table, WAIT also must be able to retrieve documents on its own, when we are in a different context. This happens specifically in a retrieval. To get this working seemlessly, the access-defining class must implement a close method. This method will be called before the Data::Dumper dump takes place. In that moment the access-defining class must get rid of all data structures that cannot be reconstructed via the Data::Dumper dump, such as database handles or C pointers.
The filename of the records file. Files for indexes will have fname as prefix. Mandatory, but usually taken care of by the WAIT::Database handle when the constructor is called via WAIT::Database::create_table().
The name of this table. Mandatory
attr=> [ attr ... ]
A reference to an array of attribute names. WAIT will keep the contents of these attributes in its table. Mandatory
djk=> [ attr ... ]
A reference to an array of attribute names which make up the disjointness key. Don't think about it - it's of no use yet;
A reference to an external parser object. Defaults to a new instance of
WAIT::Parse::Base. For an example implementation see WAIT::Parse::Nroff. A layout class can be implemented as a singleton class if you so like.
The set of attributes needed to identify a record. Defaults to all attributes.
invindex=> inverted index
A reference to an anon array defining attributes of each record that need to be indexed. See the source of smakewhatis for how to set this up.
must be called with a list of attributes. This must be a subset of the attributes specified when the table was created. Currently this method must be called before the first tuple is inserted in the table!
$tb->create_inverted_index (attribute => 'au', pipeline => ['detex', 'isotr', 'isolc', 'split2', 'stop'], predicate => 'plain', );
The attribute to build the index on. This attribute may not be in the set attributes specified when the table was created.
A piplines specification is a reference to an array of method names (from package
WAIT::Filter) which are to be applied in sequence to the contents of the named attribute. The attribute name may not be in the attribute list.
An indication which predicate the index implements. This may be e.g. 'plain', 'stemming' or 'soundex'. The indicator will be used for query processing. Currently there is no standard set of predicate names. The predicate defaults to the last member of the pipeline if omitted.
Currently this method must be called before the first tuple is inserted in the table!
Returns the reference to the associated parser object.
Returns the array of attribute names.
Must be called via