Algorithm::SkipList - Perl implementation of skip lists
The following non-standard modules are used:
enum
my $list = new Algorithm::SkipList(); $list->insert( 'key1', 'value' ); $list->insert( 'key2', 'another value' ); $value = $list->find('key2'); $list->delete('key1');
This is an implementation of skip lists in Perl.
Skip lists are similar to linked lists, except that they have random links at various levels that allow searches to skip over sections of the list, like so:
4 +---------------------------> +----------------------> + | | | 3 +------------> +------------> +-------> +-------> +--> + | | | | | | 2 +-------> +--> +-------> +--> +--> +--> +-------> +--> + | | | | | | | | | 1 +--> +--> +--> +--> +--> +--> +--> +--> +--> +--> +--> + A B C D E F G H I J NIL
A search would start at the top level: if the link to the right exceeds the target key, then it descends a level.
Skip lists generally perform as well as balanced trees for searching but do not have the overhead with respect to inserting new items. See the included file Benchmark.txt for a comparison of performance with other Perl modules.
Benchmark.txt
For more information on skip lists, see the "SEE ALSO" section below.
Only alphanumeric keys are supported "out of the box". To use numeric or other types of keys, see "Customizing the Node Class" below.
A detailed description of the methods used is below.
$list = new Algorithm::SkipList();
Creates a new skip list.
If you need to use a different node class for using customized comparison routines, you will need to specify a different class:
$list = new Algorithm::SkipList( node_class => 'MyNodeClass' );
See the "Customizing the Node Class" section below.
Specialized internal parameters may be configured:
$list = new Algorithm::SkipList( max_level => 32 );
Defines a different maximum list level.
The initial list (see the "list" method) will be a random number of levels, and will increase over time if inserted nodes have higher levels, up until "max_level" levels. See "max_level" for more information on this parameter.
You can also control the probability used to determine level sizes for each node by setting the P and k values:
$list = new Algorithm::SkipList( p => 0.25, k => 1 );
See P for more information on this parameter.
You can enable duplicate keys by using the following:
$list = new Algorithm::SkipList( duplicates => 1 );
This is an experimental feature. See the "KNOWN ISSUES" section below.
$list->insert( $key, $value );
Inserts a new node into the list.
You may also use a search finger with insert, provided that the finger is for a key that occurs earlier in the list:
$list->insert( $key, $value, $finger );
Using fingers for inserts is not recommended since there is a risk of producing corrupted lists.
if ($list->exists( $key )) { ... }
Returns true if there exists a node associated with the key, false otherwise.
This may also be used with search fingers:
if ($list->exists( $key, $finger )) { ... }
$value = $list->find_with_finger( $key );
Searches for the node associated with the key, and returns the value. If the key cannot be found, returns undef.
undef
Search fingers may also be used:
$value = $list->find_with_finger( $key, $finger );
To obtain the search finger for a key, call "find_with_finger" in a list context:
($value, $finger) = $list->find_with_finger( $key );
$value = $list->find( $key ); $value = $list->find( $key, $finger );
This method is slightly faster than "find_with_finger" since it does not return a search finger when called in list context.
If you are searching for duplicate keys, you must use "find_with_finger" or "find_duplicates".
@values = $list->find_duplicates( $key ); @values = $list->find_duplicates( $key, $finger );
Returns an array of values from the list.
This is an autoloading method.
Search is an alias to "find".
$key = $list->first_key;
Returns the first key in the list.
If called in a list context, will return a search finger:
($key, $finger) = $list->first_key;
A call to "first_key" implicitly calls "reset".
$key = $list->next_key( $last_key );
Returns the key following the previous key. List nodes are always maintained in sorted order.
Search fingers may also be used to improve performance:
$key = $list->next_key( $last_key, $finger );
($key, $finger) = $list->next_key( $last_key, $finger );
If no arguments are called,
$key = $list->next_key;
then the value of "last_key" is assumed:
$key = $list->next_key( $list->last_key );
Note: calls to "delete" will "reset" the last key.
($key, $value) = $list->next( $last_key, $finger );
Returns the next key-value pair.
$last_key and $finger are optional.
$last_key
$finger
$key = $list->last_key; ($key, $finger, $value) = $list->last_key;
Returns the last key or the last key and finger returned by a call to "first_key", "next_key", "index_by_key", "key_by_index" or "value_by_index". This is not the greatest key.
Deletions and inserts may invalidate the "last_key" value. (Deletions will actually "reset" the value.)
Values for "last_key" can also be set by including parameters, however this feature is meant for internal use only:
$list->last_key( $node );
Note that this is a change form versions prior to 0.71.
$list->reset;
Resets the "last_key" to undef.
$index = $list->index_by_key( $key );
Returns the 0-based index of the key (as if the list were an array). This is not an efficient method of access.
$key = $list->key_by_index( $index );
Returns the key associated with an index (as if the list were an array). Negative indices return the key from the end. This is not an efficient method of access.
$value = $list->value_by_index( $index );
Returns the value associated with an index (as if the list were an array). Negative indices return the value from the end. This is not an efficient method of access.
$value = $list->delete( $key );
Deletes the node associated with the key, and returns the value. If the key cannot be found, returns undef.
$value = $list->delete( $key, $finger );
Calling "delete" in a list context will not return a search finger.
$list->clear;
Erases existing nodes and resets the list.
$size = $list->size;
Returns the number of nodes in the list.
$list2 = $list1->copy;
Makes a copy of a list. The "p", "max_level" and node class are copied, although the exact structure of node levels is not copied.
$list2 = $list1->copy( $key_from, $finger, $key_to );
Copy the list between $key_from and $key_to (inclusive). If $finger is defined, it will be used as a search finger to find $key_from. If $key_to is not specified, then it will be assumed to be the end of the list.
$key_from
$key_to
If $key_from does not exist, undef will be returned.
$list1->merge( $list2 );
Merges two lists. If both lists share the same key, then the valie from $list1 will be used.
$list1
Both lists should have the same node class.
$list1->append( $list2 );
Appends (concatenates) $list2 after $list1. The last key of $list1 must be less than the first key of $list2.
$list2
This method affects both lists. The "header" of the last node of $list1 points to the first node of $list2, so changes to one list may affect the other list.
If you do not want this entanglement, use the "merge" or "copy" methods instead:
or
$list1->append( $list2->copy );
$list2 = $list1->truncate( $key );
Truncates $list1 and returns $list2 starting at $key. Returns undef is the key does not exist.
$key
It is asusmed that the key is not the first key in $list1.
($key, $value) = $list->least;
Returns the least key and value in the list, or undef if the list is empty.
($key, $value) = $list->greatest;
Returns the greatest key and value in the list, or undef if the list is empty.
@keys = $list->keys;
Returns a list of keys (in sorted order).
@keys = $list->keys( $low, $high);
Returns a list of keys between $low and $high, inclusive. (This is only available in versions 1.02 and later.)
$low
$high
@values = $list->values;
Returns a list of values (corresponding to the keys returned by the "keys" method).
Internal methods are documented below. These are intended for developer use only. These may change in future versions.
($node, $finger, $cmp) = $list->_search_with_finger( $key );
Searches for the node with a key. If the key is found, that node is returned along with a "header". If the key is not found, the previous node from where the node would be if it existed is returned.
Note that the value of $cmp
$cmp
$cmp = $node->key_cmp( $key )
is returned because it is already determined by "_search".
Search fingers may also be specified:
($node, $finger, $cmp) = $list->_search_with_finger( $key, $finger );
Note that the "header" is actually a search finger.
($node, $finger, $cmp) = $list->_search( $key, [$finger] );
Same as "_search_with_finger", only that a search finger is not returned. (Actually, an initial "dummy" finger is returned.)
This is useful for searches where a finger is not needed. The speed of searching is improved.
$k = $list->k;
Returns the k value.
$list->k( $k );
Sets the k value.
Higher values will on the average have less pointers per node, but take longer for searches. See the section on the P value.
$plevel = $list->p;
Returns the P value.
$list->p( $plevel );
Changes the value of P. Lower values will on the average have less pointers per node, but will take longer for searches.
The probability that a particular node will have a forward pointer at level i is: p**(i+k-1).
For more information, consult the references below in the "SEE ALSO" section.
$max = $list->max_level;
Returns the maximum level that "_new_node_level" can generate.
eval { $list->max_level( $level ); };
Changes the maximum level. If level is less than "MIN_LEVEL", or greater than "MAX_LEVEL" or the current list "level", this will fail (hence the need for setting it in an eval block).
eval
The value defaults to "MAX_LEVEL", which is 32. There is usually no need to change this value, since the maximum level that a new node will have will not be greater than it actually needs, up until 2^32 nodes. (The current version of this module is not designed to handle lists larger than 2^32 nodes.)
Decreasing the maximum level to less than is needed will likely degrade performance.
$level = $list->_new_node_level;
This is an internal function for generating a random level for new nodes.
Levels are determined by the P value. The probability that a node will have 1 level is P; the probability that a node will have 2 levels is P^2; the probability that a node will have 3 levels is P^3, et cetera.
The value will never be greater than "max_level".
Note: in earlier versions it was called _random_level.
_random_level
$node = $list->list;
Returns the initial node in the list, which is a Algorithm::SkipList::Node (See below.)
Algorithm::SkipList::Node
The key and value for this node are undefined.
$node = $list->_first_node;
Returns the first node with a key (the second node) in a list. This is used by the "first_key", "least", "append" and "merge" methods.
$node = $list->_greatest_node;
Returns the last node in the list. This is used by the "append" and "greatest" methods.
$node_class_name = $list->_node_class;
Returns the name of the node class used. By default this is the Algorithm::SkipList::Node, which is discussed below.
$list->_build_distribution;
Rebuilds the probability distribution array {P_LEVELS} upon calls to "_set_p" and "_set_k".
{P_LEVELS}
These methods are used during initialization of the object.
$list->_debug;
Used for debugging skip lists by developer. The output of this function is subject to change.
Methods for the Algorithm::SkipList::Node object are documented in that module. They are for internal use by the main Algorithm::SkipList module.
Algorithm::SkipList
Hashes can be tied to Algorithm::SkipList objects:
tie %hash, 'Algorithm::SkipList'; $hash{'foo'} = 'bar'; $list = tied %hash; print $list->find('foo'); # returns bar
See the perltie manpage for more information.
The default node may not handle specialized data types. To define your own custom class, you need to derive a child class from Algorithm::SkipList::Node.
Below is an example of a node which redefines the default type to use numeric instead of string comparisons:
package NumericNode; our @ISA = qw( Algorithm::SkipList::Node ); sub key_cmp { my $self = shift; my $left = $self->key; # node key my $right = shift; # value to compare the node key with unless ($self->validate_key($right)) { die "Invalid key: \'$right\'"; } return ($left <=> $right); } sub validate_key { my $self = shift; my $key = shift; return ($key =~ s/\-?\d+(\.\d+)?$/); # test if key is numeric }
To use this, we say simply
$number_list = new Algorithm::SkipList( node_class => 'NumericNode' );
This skip list should work normally, except that the keys must be numbers.
For another example of customized nodes, see Tie::RangeHash version 1.00_b1 or later.
A side effect of the search function is that it returns a finger to where the key is or should be in the list.
We can use this finger for future searches if the key that we are searching for occurs after the key that produced the finger. For example,
($value, $finger) = $list->find('Turing');
If we are searching for a key that occurs after 'Turing' in the above example, then we can use this finger:
$value = $list->find('VonNeuman', $finger);
If we use this finger to search for a key that occurs before 'Turing' however, it may fail:
$value = $list->find('Goedel', $finger); # this may not work
Therefore, use search fingers with caution.
Search fingers are specific to particular instances of a skip list. The following should not work:
($value1, $finger) = $list1->find('bar'); $value2 = $list2->find('foo', $finger);
One useful feature of fingers is with enumerating all keys using the "first_key" and "next_key" methods:
($key, $finger) = $list->first_key; while (defined $key) { ... ($key, $finger) = $list->next_key($key, $finger); }
See also the "keys" method for generating a list of keys.
This module intentionally has a subset of the interface in the Tree::Base and other tree-type data structure modules, since skip lists can be used in place of trees.
Because pointers only point forward, there is no prev method to point to the previous key.
prev
Some of these methods (least, greatest) are autoloading because they are not commonly used.
One thing that differentiates this module from other modules is the flexibility in defining a custom node class.
See the included Benchmark.txt file for performance comparisons.
If you are upgrading a prior version of List::SkipList, then you may want to uninstall the module before installing Algorithm::SkipList, so as to remove unused autoloading files.
Certain methods such as "find" and "delete" will return the the value associated with a key, or undef if the key does not exist. However, if the value is undef, then these functions will appear to claim that the key cannot be found.
In such circumstances, use the "exists" method to test for the existence of a key.
Duplicate keys are an experimental feature in this module, since most methods have been designed for unique keys only.
Access to duplicate keys is akin to a stack. When a duplicate key is added, it is always inserted before matching keys. In searches, to find duplicate keys one must use "find_with_finger" or the "find_duplicates" method.
The "copy" method will reverse the order of duplicates.
The behavior of the "merge" and "append" methods is not defined for duplicates.
Skip lists are non-deterministic. Because of this, bugs in programs that use this module may be subtle and difficult to reproduce without many repeated attempts. This is especially true if there are bugs in a custom node.
Additional issues may be listed on the CPAN Request Tracker at http://rt.cpan.org/NoAuth/Bugs.html?Dist=Algorithm-SkipList or http://rt.cpan.org/NoAuth/Bugs.html?Dist=List-SkipList.
Robert Rothenberg <rrwo at cpan.org>
Carl Shapiro <cshapiro at panix.com> for introduction to skip lists.
Feedback is always welcome. Please use the CPAN Request Tracker at http://rt.cpan.org to submit bug reports.
Copyright (c) 2003-2005 Robert Rothenberg. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See the article by William Pugh, "A Skip List Cookbook" (1989), or similar ones by the author at http://www.cs.umd.edu/~pugh/ which discuss skip lists.
Another article worth reading is by Bruce Schneier, "Skip Lists: They're easy to implement and they work", Doctor Dobbs Journal, January 1994.
Tie::Hash::Sorted maintains a hash where keys are sorted. In many cases this is faster, uses less memory (because of the way Perl5 manages memory), and may be more appropriate for some uses.
If you need a keyed list that preserves the order of insertion rather than sorting keys, see List::Indexed or Tie::IxHash.
To install Algorithm::SkipList, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Algorithm::SkipList
CPAN shell
perl -MCPAN -e shell install Algorithm::SkipList
For more information on module installation, please visit the detailed CPAN module installation guide.