Panda::Lib - Collection of useful functions and classes with Perl and C interface.
Panda::Lib contains a number of very fast useful functions, written in C. You can use it from Perl or directly from XS code. Also it contains several C++ classes.
use Panda::Lib qw/ hash_merge merge compare clone fclone crypt_xor string_hash string_hash32 /; $result = hash_merge($dest, $source, $flags); $result = merge($dest, $source, $flags); $is_equal = compare($hash1, $hash2); $is_equal = compare($array1, $array2); $cloned = lclone($data); $cloned = fclone($data); $crypted = crypt_xor($data, $key); $val = string_hash($str); $val = string_hash32($str);
#include <xs/lib.h> using namespace xs::lib; HV* result = hash_merge(hvdest, hvsource, flags); SV* result = merge(hvdest, hvsource, flags); bool is_equal = hash_cmp(hv1, hv2); bool is_equal = av_cmp(av1, av2); SV* cloned = clone(sv, with_cross_checks); panda::string str = sv2string(sv, ref_type); #include <panda/lib.h> using namespace panda::lib; char* crypted = crypt_xor(source, slen, key, klen); uint32_t val = string_hash32(str, len); uint64_t val = string_hash(str, len); #include <panda/string.h> using panda::string; string abc("lala"); ... // everything that std::string supports #include <panda/lib/memory.h> void* mem = StaticMemoryPool<128>::instance()->alloc(); // extremely fast memory allocations void* mem = StaticMemoryPool<128>::threaded_instance()->alloc(); // thread-safe version (still very fast) void* mem = ObjectAllocator::instance()->alloc(256); // dynamic-size fast allocator class MyClass : public AllocatedObject<MyClass> {} MyClass* obj = new MyClass(); // get fast and thread-safe allocations with less code class SingleThreadedClass : public AllocatedObject<SingleThreadedClass, false> {} // faster but thread-unsafe // override behaviour from thread-unsafe to thread-safe class MultiThreadedClass : public SingleThreadedClass, public AllocatedObject<MultiThreadedClass> { using AllocatedObject<MultiThreadClass>::operator new; using AllocatedObject<MultiThreadClass>::operator delete; } MemoryPool mypool(32); void* mem = mypool.alloc(); // custom pool with maximum speed, but thread-unsafe
Merges hash $source into $dest. Merge is done extremely fast. $source and $dest must be HASHREFS or undefs. New keys from source are added to dest. Existing keys(values) are replaced. If a key contains HASHREF both in source and dest, they are merged recursively. Otherwise it gets replaced by value from source. Returns resulting hashref (it may or may not be the the same ref as $dest, depending on $flags provided).
$flags is a bitmask of these flags:
By default, if a key contains ARRAYREF both in source and dest, it gets replaced by array from source. If you enable this flag, such arrays will be concatenated (like: push @{$dest->{key}}, @{$source->{key}).
If a key contains ARRAYREF both in source and dest, it gets merged. It means that $dest->{key}[0] is merged with $source->{key}[0], and so on. Values are merged using following rules: if both are hashrefs or arrayrefs, they are merged recursively, otherwise value in dest gets replaced.
If you set this flag, merge process won't override any existing and defined values in dest. Keep in mind that if you also set MERGE_ARRAY_MERGE, then the same is in effect while merging array elements.
my $hash1 = {a => 1, b => undef}; my $hash2 = {a => 2, b => 3, c => undef }; hash_merge($hash1, $hash2, MERGE_LAZY); # $hash1 is {a => 1, b => 3, c => undef };
If enabled, values from source that are undefs won't replace anything in dest.
my $hash1 = {a => 1}; my $hash2 = {a => undef, b => undef, c => 2}; hash_merge($hash1, $hash2, MERGE_SKIP_UNDEF); # $hash1 is {a => 1, c => 2};
If enabled, values from source that are undefs acts as a 'deleters', i.e. the corresponding values get deleted from dest.
my $hash1 = {a => 1, b => 2}; my $hash2 = {a => undef}; hash_merge($hash1, $hash2, MERGE_DELETE_UNDEF); # $hash1 is {b => 2};
Makes deep copy of $dest, merges it with source and returns this new hashref.
By default, if any value from source replaces value from dest, it doesn't get deep copied. For example:
my $hash1 = {}; my $hash2 = {a => [1,2]}; hash_merge($hash1, $hash2); shift @{$hash1->{a}}; say scalar @{$hash2->{a}}; # prints 1
Moreover, even primitive values are not copied, instead they get aliased for speed. For example:
my $hash1 = {}; my $hash2 = {a => 'mystring'}; hash_merge($hash1, $hash2); substr($hash1->{a}, 0, 2); say $hash2->{a}; # prints 'string'
If you enable this flag, replacing values from source will be copied (references - deep copied).
It is MERGE_COPY_DEST + MERGE_COPY_SOURCE
This is how undefined $source or undefined $dest are handled:
Nothing is merged, however if MERGE_COPY_DEST is set, deep copy of $dest is still returned. If $dest is also undef, then regardless of MERGE_COPY_DEST flag, empty hashref is returned.
Empty hashref is created, merged with $source and returned.
Acts much like 'hash_merge', but receives any scalar as $dest and $source, not only hashrefs. Returns merged value which may or may not be the same scalar (modified or not) as $dest.
This function does the same work as 'hash_merge' does for its elements. I.e. if both $dest and $source are HASHREFs then they are merged via 'hash_merge'. If both are ARRAYREFs, then depending on $flags, $dest are either replaced, concatenated or merged. Otherwise $source replaces $dest following the rules described in 'hash_merge' function with respect to flags MERGE_COPY_DEST, MERGE_COPY_SOURCE and MERGE_LAZY.
For example, if $source and $dest are scalars (not refs), and no flags provided, then $dest becomes equal $source. If MERGE_LAZY is provided and $dest is not an undef, $dest is unchanged. If MERGE_COPY_DEST is provided then $dest is unchaged and the result is returned in a new scalar. And so on.
However there is one difference: if $dest and $source are primitive scalars, instead of creating an alias, the $source variable is copied to $dest (or new result). If MERGE_COPY_SOURCE is disabled, copying is not deep, like $dest = $source.
Light clone: makes a deep copy of $source and returns it.
Does not handle cross-references: references to the same data will be different references. If a cycled reference is present in $source, it will croak.
Handles CODEREFs and IOREFs, but doesn't clone it, just copies pointer to the same CODE and IO into new reference. All other data types are cloned normally.
If clone encounters a blessed object and it has a HOOK_CLONE method, the return value of this method is used instead of a default behaviour. You can call [lf]clone($self) again from HOOK_CLONE if you need to, for example to prevent cloning some of your properties:
clone
HOOK_CLONE
[lf]clone($self)
sub HOOK_CLONE { my $self = shift; my $tmp = delete local $self->{big_obj_backref}; my $ret = lclone($self); $ret->{big_obj_backref} = $tmp; return $ret; }
In this case second lclone() call won't call HOOK_CLONE again and will clone $self in a standart manner.
lclone()
Full clone: same as lclone() but handles cross-references: references to the same data will be the same references. If a cycled reference is present in $source, it will remain cycled in cloned data.
If $with_cross_checks is false or omitted, behaves like lclone(), otherwise like fclone()
$with_cross_checks
fclone()
Performs deep comparison and returns true if every element of $data1 is equal to corresponding element of $data2.
The rules of equality for two elements (including the top-level $data1 and $data2 itself):
If they are not objects of the same class, they're not equal
If class has overloaded '==' operation, it is used for checking equality. If not, objects' underlying data structures are compared.
Equal if all of the key/value pairs are equal.
Equal if corresponding elements are equal (a[0] equal b[0], etc).
Equal if they are references to the same code.
Equal if both IOs contain the same fileno.
Equal if both are references to the same glob.
They are dereferenced and checked again from the beginning.
Equal if perl's 'eq' or '==' (depending on data type) returns true.
Performs round-robin XOR $string with $key. Algorithm is symmetric, i.e.:
crypt_xor(crypt_xor($string, $key), $key) eq $string
Calculates 64-bit hash value for $string. Currently uses MurMurHash64A algorithm (very fast).
Calculates 32-bit hash value for $string. Currently uses jenkins_one_at_a_time_hash algorithm.
Functions marked with [pTHX] must receive aTHX_ as a first arg.
[pTHX]
aTHX_
All functions above behaves like its perl equivalents. See PERL FUNCTIONS docs.
Performs XOR crypt. If 'dest' is null, mallocs and returns new buffer. Buffer must be freed by user manually via 'free'. If 'dest' is not null, places result into this buffer. It must have enough space to hold the result.
Creates panda::string from SV string. If 'ref' is COPY then content of SV is copied to string. If 'ref' is REF, then returned string is a copy-on-write string holding SV's buffer. In this case you must NOT change or delete your SV until you're done with string.
Panda::Lib installs a typemap for panda::string, so it is okay to receive it in XS function params without copying.
using panda::string; ... void myfunc (string str) PPCODE: // dont change ST(0), while working with str printf("string is %s, len is %d", str.data(), str.length()); str.retain(); // it ok now to change ST(0), as str is detached from original string. ...
This string is fully compatible with std::string API, however it supports COW (copy-on-write) and therefore runs much faster in many cases. C++11 supports COW with other strings, but doesn't support COW with external pointers, which is meaningful when creating a string from literal: string("mystring"), or myhash["mykey"]
using panda::string; string str("abcd"); // "abcd" is not copied, COW mode. str.append("ef"); // str is copied on modification. cout << str; // prints 'abcdef' char* mystr = new char[10]; memcpy(mystr, "hello, world", 13); str.assign(mystr, 12); // COW mode, don't free mystr until you're done with str. str.retain(); // abort COW, str is detached, buffer is copied. string str2(mystr, string::COPY); // no-COW, std::string-like behaviour, mystr is copied to str2. str2.resize(5); cout << str2; // 'hello' str = str2; // COW mode, buffer is not copied. Unlike for char* pointers, you can safely destroy str2 at any time cout << str; // 'hello' str.append('!'); // detach on modification cout << str << str2; // 'hello!hello'
panda::string is converted into std::string on demand. Also it can be used in ostream's and istream's << >> operators.
Only new methods or methods with additional params are listed. All other methods have the same syntax and meaning as in std::string.
If 'ref' is REF, then newly created string will use COW mode with buffer 'p'. It's your responsibility to keep 'p' pointer valid until you're done with string or changed it anyhow.
If 'ref' is COPY, then 'p' is copied to string and it won't depend on 'p' pointer.
The default is REF, it saves time in such common cases as:
void myfunc (const string str) { ... } myfunc("hello");
or
std::map<string, int> myhash; iter = myhash["mykey"];
Returns string buffer like 'data' or 'c_str' but this buffer is writable. Therefore if a string was in COW mode, it detaches. Common case: parse something directly into string:
string str; str.reserve(1000); char* buf = str.buf(); // fill buf str.resize(actual_length);
Detaches string if it's in COW mode. Does nothing otherwise. Returns the string itself.
'ref' has the same meaning as in constructor.
Base object for fast memory allocations of particular size (commonly used for small objects allocation). This class is thread-unsafe, you can only allocate memory using this object from single thread at one time. It is about from 10x to 40x times faster than new+delete.
Creates object which allocates blocks of size blocksize. There is no memory overheat, because it doesn't store any additional data before/after a memory block. However if you pass blocksize less than 8 bytes, it will still allocate blocks large enough to hold 8 bytes.
blocksize
Allocates new block. Can throw std::bad_alloc if no memory.
Returns ptr back to pool. If ptr is a pointer that this object never allocated, the behaviour is undefined.
ptr
Frees internal storage and returns memory to system. All pointers ever allocated by this object become invalid.
This class provides access to singleton memory pool objects for particular block size. It is recommended to use memory pools via this interface to reduce memory consumption and fragmentation.
Returns MemoryPool object for BLOCKSIZE which is global to the whole process. This object is thread-unsafe.
BLOCKSIZE
Returns MemoryPool object for BLOCKSIZE which is global to the current thread. This object is thread-safe. Thread safeness is provided by TLS (thread local storage), without any mutexes, rwlocks and so on, so that perfomance is still great.
Sometimes you don't know the size of a block at compile time and therefore can't use StaticMemoryPool. From the other hand, creating MemoryPool objects for particular size every time is expensive. This class provides interface for allocating memory block of an arbitrary size. It holds a colletion of MemoryPool objects of various size which are created on-demand.
Creates allocator object. However i would recommend using singleton interface via instance/threaded_instance, see below.
Allocates block size bytes long. If you pass size less than 8 bytes, it will still allocate 8 bytes.
size
Returns ptr back to pool. size is required because MemoryPool doesn't store block sizes before/after blocks to avoid memory overheat. If you pass wrong size, or a pointer that was never allocated via this object, the behaviour is undefined.
Frees internal storage of all pools and returns memory to system. All pointers ever allocated by this object become invalid.
Returns ObjectAllocator object which is global to the whole process. This object is thread-unsafe.
Returns ObjectAllocator object which is global to the current thread. This object is thread-safe.
This class is a helper base class. If you inherit from it, objects of your class will be allocated via memory pools instead of using default new/delete operators.
Normally, you would need to write this code in order to allocate your objects via MemoryPool:
class MyClass { static void* operator new (size_t size) { if (size == sizeof(MyClass)) return StaticMemoryPool<sizeof(MyClass)>::threaded_instance()->alloc(); return ObjectAllocator::threaded_instance()->alloc(size); } static void operator delete (void* p, size_t size) { if (size == sizeof(MyClass)) StaticMemoryPool<sizeof(MyClass)>::threaded_instance()->dealloc(p); else ObjectAllocator::threaded_instance()->dealloc(p, size); } ... };
Size check (if/else) is needed to support inheritance, because in that case, size won't match sizeof(MyClass). Mostly, programmers use default operator ::new/::delete in case when sizes don't match, however ObjectAllocator can handle dynamic sizes and is much faster than default operators, so even in this case we save time.
To avoid writing this code every time, just inherit from AllocatedObject passing your class name as a template parameter. You can pass false as a second param to template if you don't need thread-safe allocations to achieve even more perfomance.
false
class MyClass : public AllocatedObject<MyClass> { ... }
class MyChild : public MyClass { ... }
In this case we will still using memory pool, however via dynamic ObjectAllocator which is slightly slower. To restore original perfomance redefine new/delete operators again passing your child class name. We will also need to resolve multiple inheritance conflicts via using operator.
using
class MyChild : public MyClass, public AllocatedObject<MyChild> { using AllocatedObject<MyChild>::operator new; using AllocatedObject<MyChild>::operator delete; ... }
This code will allocate MyChild objects via static memory pool.
For thread-safe instances to work correctly and without memory leaks, you must use pthreads interface to create and join threads. As far as i know now everybody use pthreads on UNIX, so this shouldn't be a problem i hope :-)
pthreads
no requirements
typemap for panda::string or std::string or anything else you see as 'string' in your local scope. Such a class must have std::string-compatible API.
Pronin Oleg <syber@crazypanda.ru>, Crazy Panda, CP Decision LTD
You may distribute this code under the same terms as Perl itself.
To install Panda::Lib, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Panda::Lib
CPAN shell
perl -MCPAN -e shell install Panda::Lib
For more information on module installation, please visit the detailed CPAN module installation guide.