The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Panda::URI - fast URI.pm-like framework written in C, with Perl and C interface

DESCRIPTION

Panda::URI has similar functionality as URI.pm, but is much faster (sometimes 100x). It is used as a base URI unit in all Panda::* modules.

SYNOPSIS

    use Panda::URI qw/uri :const/;
    
    my $u = Panda::URI->new("http://mysite.com:8080/my/path?a=b&c=d#myhash");
    say $u->scheme;
    say $u->host;
    say $u->port;
    say $u->path;
    say $u->query_string;
    Dumper($u->query);
    say $u->fragment;
    
    $u = Panda::URI->new("about:blank");
    say $u->scheme;
    say $u->path;

    $u->clone;
    

FUNCTIONS

uri($url, [$flags])

Creates URI object from string $url. Created object will be of special subclass (Panda::URI::http, Panda::URI::ftp, ...) if scheme is supported. Otherwise it will be of class Panda::URI.

Created object is in "strict" mode, i.e. it has additional methods according to the scheme, however you cannot change it's scheme. You can still set a new url to this objects, but it must have the same scheme or error will be raisen.

Also "strict"(customized) classes has its own constructors with possibly additional arguments like this:

    my $url = Panda::URI::http->new("http://google.com?b=20", q => 'something', a => 10);
    say $url->query_string; # q=something&a=10&b=20
    $url->scheme('ftp'); # CROAKS, changing scheme is disallowed.
    

See custom classes' docs for details.

$flags is a bitmask of one or more of these:

ALLOW_LEADING_AUTHORITY

By default, RFC doesn't allow urls to begin with authority (i.e. host,port). For example

    www.google.com/hello
    

is not interpreted as you might think. In this case, url is treated as relative and "www.google.com/hello" is a path

Enabling this flag makes Panda::URI detect such urls:

    www.google.com/hello      no scheme, host is www.google.com, path is /hello
    hello/world               no scheme, host is hello, path is world
    /hello/world              no scheme, no host, path is /hello/world
    

However, Panda::URI never produces RFC-uncompliant urls on output, so

    say uri("www.google.com/hello", ALLOW_LEADING_AUTHORITY);
    

prints "//www.google.com/hello" (scheme-relative format), making it valid

PARAM_DELIM_SEMICOLON

If true, Panda::URI will use ';' as delimiter between query string params instead of a default '&'. Both for input and output.

register_scheme($scheme, $perl_class)

Registers a new scheme and a perl class for that scheme (it must inherit from Panda::URI). This only applies when creating "strict"(customized) urls via uri() function (or via custom class' constructor).

As Panda::URI is a C++ framework in its base, you want also register a C++ class for that scheme in XS to be able to do something when such uris are constructed (even from XS/C code).

See REGISTERING SCHEMAS for how to.

encode_uri_component($bytes, [$use_plus]), encodeURIComponent($bytes, [$use_plus])

Does what JavaScript's encodeURIComponent does.

    $uri = encode_uri_component("http://www.example.com/");
    # http%3A%2F%2Fwww.example.com%2F

If $use_plus is true, then produces '+' for spaces instead of '%20'.

decode_uri_component($bytes), decodeURIComponent($bytes)

Does what JavaScript's decodeURIComponent does.

    $str = decode_uri_component("http%3A%2F%2Fwww.example.com%2F");
    # http://www.example.com/

CLASS METHODS

new($url, [$flags])

Creates URI object from string $url. Created object will be "non-strict", i.e. it will be of class "Panda::URI" and won't have any scheme-specific methods, however you can change its scheme and set new urls with defferent scheme into the object.

register_scheme() makes no effect for this method.

$flags are the same as for uri() function.

OBJECT METHODS

url([$newurl], [$flags])

Returns url as string. If $newurl is present, sets this url in object (respecting $flags). May croak if object is in "strict" mode and $newurl's scheme differs from current. If object is "strict" and $newurl has no scheme, it's assumed to be current (instead of leaving it empty if object is non-strict). Examples:

    my $u = Panda::URI->new("http://facebook.com"); # non-strict mode
    $u->query({a => 1, b => 2});
    say $u->url; # http://facebook.com?a=1&b=2
    $u->url("//twitter.com"); # scheme-relative url
    say $u; # //twitter.com
    
    $u = uri("http://facebook.com"); # strict mode
    $u->url("//twitter.com");
    say $u; # http://twitter.com, force object's scheme as it cannot change
    $u->url("svn://svn.com"); # croaks, scheme cannot change
    
    $u = Panda::URI::ftp->new("//my.com"); # strict mdoe
    say $u; # ftp://my.com
    $u->url("http://ya.ru"); # croaks
    

scheme([$new_scheme]), proto([$new_scheme]), protocol([$new_scheme])

Sets/returns uri's scheme. May croak if object is strict and new scheme differs from current.

user_info([$new_uinfo])

Sets/returns user_info part of uri (ftp://<user_info>@host/...)

host([$new_host])

Sets/returns host part of uri

port([$new_port])

Sets/returns port. If no port is explicitly present in uri, returns default port for uri's scheme. If no scheme in uri, returns 0.

explicit_port()

Returns port if it was explicitly set via port() or was present in uri. Otherwise returns 0.

default_port()

Returns default port for the uri's scheme. Returns 0 if scheme is not specified/not supported.

path([$new_path])

Sets/returns path part of uri as string.

query_string([$new_query_string])

Sets/returns query string part of uri as string. String is expected/returned in decoded, but plain format, i.e. after uri encode of all params, but before encode_uri_component of the whole result string.

raw_query([$new_query_string])

Sets/returns query string part of uri as string. String is expected/returned in RAW (encoded) format, i.e. after uri encode of all params and after encode_uri_component of the whole result string.

query([\%new_query | %new_query | $new_query_string])

If no params specified, returns query part of uri as hashref. Keys/values are returned unencoded. If uri has no query params, empty hash is returned.

If you change returned hash, no changes will occur in uri object. To commit these changes, set this hash again via query($hash) or use param() method.

If params are specified, sets new query from hash or hashref or string. Keys/values are accepted unencoded for hash/hashref.

If you pass query as string, the effect will be the same as calling query_string($new_query).

If you want to make query strings like 'a=1&a=2&a=3', set "a"'s value to an arrayref of values, like:

    $u = Panda::URI->new("http://ya.ru");
    $u->query(b => 10, a => [1,2,3]);
    say $u; # http://ya.ru?b=10&a=1&a=2&a=3
    

Note hovewer, that multiparams are NOT returned in hashref:

    say Dumper($u->query); # {b => 10, a => 1/2/3 }
    

A's value may be any of 1/2/3 depending on hash order. This is done because most of the time you don't want multiparams and don't wanna be suprised by an arrayref in query if someone passes you second value for some key.

If you want to get all values of multiparam, use multiparam().

add_query(\%query | %query | $query_string)

Like query() but instead of replacing, adds passed query to existing query. If some key already exists in uri's query, it doesn't get replaced, instead it becomes a multiparam.

param($name, [$value | \@values])

Without second arg, returns the value of query param '$name'. If no such param exists, return undef. If param $name is a multiparam, returns one of its values.

With $value supplied, replaces current value(values) of $name with $value.

With \@values supplied, replaces current value(values) of $name with \@values ($name becomes multiparam).

multiparam($name, [$value | \@values])

Does the same as param() does. The only difference is when called without second arg, returns a list of param's values if param is a multiparam. Also returns empty list instead of undef if there is no such param in query.

nparam()

Returns the number of query parameters in query (even for multiparams). For example:

    "http://google.com"; # nparam() == 0
    "http://google.com?a=1&b=2"; # nparam() == 2
    "http://google.com?a=1&b=2&b=3&b=4"; # nparam() == 4

remove_param($name)

Removes param $name from query. If param is a multiparam, removes all its values.

fragment([$new_fragment]), hash([$new_fragment])

Sets/returns fragment (hash) part of uri.

location([$new_location])

Sets/returns location part of uri. Location is a "host:port" together. If no port was explicitly set, returned location will contain details port for the scheme. If no scheme defined, or scheme is unknown, returned location will contain port 0 - "host:0". Examples:

    say Panda::URI->new("http://ya.ru:8080")->location; # ya.ru:8080
    say Panda::URI->new("http://ya.ru")->location; # ya.ru:80
    say Panda::URI->new("//ya.ru")->location; # ya.ru:0
    
    say Panda::URI->new("http://ya.ru")->explicit_location; # ya.ru
    say Panda::URI->new("http://ya.ru:8080")->explicit_location; # ya.ru:8080
    

explicit_location()

Returns location with explicit port set if any, otherwise returns location without port (i.e. just host).

Effect is the same as

    $u->explicit_port ? $u->host.':'.$u->port : $u->host
    

relative(), rel()

Returns uri, relative to current scheme and location, for example:

    say uri("http://ya.ru/mypath")->relative; # /mypath
    

to_string(), as_string(), '""'

Returns the whole uri as string.

to_bool(), 'bool'

Returns true if url is not empty. Note that if an uri object has only user_info or only port set, it is empty as it is not printable.

Actually

    if ($uri) {} # the same as if ($uri->to_bool())
    

is the same as

    if ($uri->to_string) {}
    

but runs faster.

secure()

Returns true if uri's scheme is secure (for example, https).

set($other_uri)

Sets uri from another uri object making them equal. May croak if current object is strict and other object has different scheme.

assign($url, [$flags])

Same as url($url, [$flags])

equals($other_uri), 'eq'

Returns true if $other_uri contains the same url (including all parts - query, fragment, etc).

clone()

Clones current uri. If current uri is in strict mode, then cloned uri will be in strict mode too.

path_segments([@new_segments])

Sets/returns path segments as list.

    $u = uri("http://ya.ru/abc/def/jopa");
    say join(", ", $u->path_segments); # abc, def, jopa
    $u->path_segments('my', 'folder');
    say $u; # http://ya.ru/my/folder
    

STRICT CLASSES

Panda::URI::http

new($url, [\%query | %query | $query_string])

If provided, adds query params to $url after creating object.

Panda::URI::https

new($url, [\%query | %query | $query_string])

If provided, adds query params to $url after creating object.

Panda::URI::ftp

user([$new_user])

Sets/returns user part of user_info in uri.

password([$new_pass])

Sets/returns password part of user_info in uri.

CLONING/SERIALIZING

Panda::URI supports:

cloning via Storable
cloning via Panda::Lib's clone
serializing/deserializing via Storable
serializing via JSON::XS with convert_blessed flag enabled

C++ INTERFACE

Here and below only short details are explained. For full docs see perl interface docs above. All functions and classes are in panda::uri:: namespace.

string is not an std::string, it's a panda::string, which has the same API, but is more effective and supports Copy-On-Write regardless of your compiler version. See Panda::Lib for details.

panda::uri::URI

static URI* create (const string& source, int flags = 0)

Creates uri object in strict mode. Returns object of customized class (panda::uri::URI::http, ...). If no scheme specified or scheme is not supported, returns object of a default class panda::uri::URI.

static URI* create (const URI& source)

Creates strict uri object from another uri object.

URI ()

Creates empty non-strict uri object.

URI (const string& source, int flags = 0)

Creates non-strict uri object from string.

URI (const URI& source)

Creates non-strict uri object from another object (cloning).

URI& operator= (const URI& source)

URI& operator= (const string& source)

Sets data from another uri object or url string.

const string& scheme () const

const string& user_info () const

const string& host () const

const string& path () const

const string& fragment () const

uint16_t explicit_port () const

uint16_t default_port () const

uint16_t port () const

bool secure () const

Returns properties of uri.

virtual void assign (const URI& source)

Assign data from another uri. Same as URI& operator= (const URI& source).

void assign (const string& uristr, int flags = 0)

Assign data from url string.

const string& query_string () const

Returns unencoded query string

const string raw_query () const

Returns encoded query string

Query& query ()

Returns query params as object of class Query (std::multimap<string,string>). Unlike for perl's method, you can change this multimap object and changes will take effect for uri object.

const Query& query () const

Same as previous method but only for reading.

virtual void scheme (const string& scheme)

Changes object's scheme. May throw an exception of class WrongScheme if object in strict mode and schemes differ.

void user_info (const string& user_info)

void host (const string& host)

void fragment (const string& fragment)

void port (uint16_t port)

void path (const string& path)

Changes properties of uri.

void query_string (const string& qstr)

Sets unencoded query string

void raw_query (const string& rq)

Sets encoded query string

void query (const string& qstr)

Same as query_string(qstr).

void query (const Query& query)

Replaces current query with new one supplied as multimap.

void add_query (const string& addstr)

Adds query params from addstr to current query.

void add_query (const Query& addquery)

Adds query params from addquery to current query.

const string& param (const string& key) const

Returns value for param with key 'key'. If it's a multiparam, returns first of its values.

void param (const string& key, const string& val)

Sets param value replacing existing one. If it's a multiparam, replaces just first of its values.

string explicit_location () const

string location () const

void location (const string& newloc)

const std::vector<string> path_segments () const

template <class It> void path_segments (It begin, It end)

It is an iterator for string or string_view

string to_string (bool relative = false) const

string relative () const

bool equals (const URI& uri) const

See perl interface docs for methods above.

void swap (URI& uri)

Swaps content of two uri objects.

typedef URI* (*uricreator) (const URI& uri)

Creator function type for custom scheme objects.

static void register_scheme (const string& scheme, const std::type_info* ti, uricreator, uint16_t default_port, bool secure = false)

Registers new scheme. "ti" is a typeinfo for your scheme's class. It's required for URI framework to automatically convert scheme names to classes and vice-versa.

See REGISTERING SCHEMAS for how to.

panda::uri::URI::http

http (const string& source, const Query& query, int flags = 0)

panda::uri::URI::https

https (const string& source, const Query& query, int flags = 0)

panda::uri::URI::ftp

const string user () const

void user (const string& user)

const string password () const

void password (const string& password)

panda::uri functions

size_t encode_uri_component (const std::string_view src, char* dest, const char* unsafe = unsafe_query_component)

Does what JavaScript's encodeURIComponent does.

'dest' must have enough space to hold the result (in worst case = srclen*3 + 1)). Returns the actual resulting string length.

'unsafe' is an array char[256] where index is char code to be replaced and value is either 0 or the same char code. If value is 0 then this char should be replaced with %XX. If value isn't 0, then it is replaced with value code. By default the alphabet for query param names and values is used. You can use one of these predefined arrays (in panda::uri::): unsafe_scheme, unsafe_uinfo, unsafe_host, unsafe_path, unsafe_path_segment, unsafe_query, unsafe_query_component, unsafe_fragment.

void encode_uri_component (const std::string_view src, panda::string& dest, const char* unsafe = unsafe_query_component)

panda::string encode_uri_component (const std::string_view src, const char* unsafe = unsafe_query_component)

String versions.

size_t decode_uri_component (const std::string_view src, char* dest)

Does what JavaScript's decodeURIComponent does.

'dest' must have enough space to hold the result (in worst case = srclen)). Returns the actual resulting string length.

void decode_uri_component (std::string_view, panda::string& dest)

panda::string decode_uri_component (std::string_view)

String versions.

REGISTERING SCHEMAS

Let's create our custom scheme "myproto" which like FTP uses some info from "user_info". Our protocol won't be secure and default port is for example 12345.

Firstly we need to create our own C++ class. It must inherit from panda::uri::URI::Strict

    #include <panda/uri.h>
    using panda::uri::URI;

    class URI::myproto : public Strict {
    public:
        myproto () : Strict() {}
        myproto (const string& source, int flags = 0) : Strict(source, flags) { strict_scheme(); }
        myproto (const URI& source)                   : Strict(source)        { strict_scheme(); }

        using Strict::operator=;

        const string some_data_from_user_info () const {
            // parse user_info
            // return result
        }
        
        void some_data_from_user_info (const string& new_data) {
            // change user_info
        }
        

Notice the strict_scheme() call in constructor. It is required because it will throw an exception if scheme is wrong. In all other methods it is done automatically, but unfortunately while in panda::uri::URI::Strict class constructor, object is not yet ready for type_info manipulations.

Secondly, create a function that creates URI::myproto object from default URI object.

    static URI* new_myproto (const URI& source) {
        return new URI::myproto(source);
    }

Now, register your new scheme somewhere in program's initialization:

    void init () {
        ...
        URI::register_scheme("myproto", &typeid(URI::myproto), new_myproto, 12345, false);
    }

That's it. Now use your custom scheme:

    URI* uri = URI::create("myproto://myinfo@google.com");
    URI::myproto myuri = dynamic_cast<URI::myproto*>(uri); // will return not-null
    cout << myuri->some_data_from_user_info();
};

Finally, create an XS and register a perl class

    # TYPEMAP
    
    URI::myproto* XT_PANDA_URI_STRICT
    
    # XS
    
    #include <xs/uri.h>

    MODULE = MyURI                PACKAGE = MyURI::myproto
    PROTOTYPES: DISABLE
    
    string URI::myproto::some_data_from_user_info (SV* newval = NULL) {
        if (newval) {
            THIS->some_data_from_user_info(sv2string(newval));
            XSRETURN_UNDEF;
        }
        RETVAL = THIS->some_data_from_user_info();
    }
    
    ...
    
    # Somewhere in perl
    
    Panda::URI::register_scheme("myproto", "MyURI::myproto");
    

Usage from perl:

    my $u = uri("myproto://info@google.com");
    say ref $u; # MyURI::myproto
    say $u->some_data_from_user_info;

EXPORTED TYPEMAPS

TYPEMAPS

URI*

Typemap for input/output any URI objects.

URI::http*, URI::https*, URI::ftp*

Typemaps for input/output strict uris.

URIx*

Output-only typemap for autodetecting strict uri type and setting right perl class to bless to. You must not define a 'CLASS' variable.

    # XS
    URI* my_cool_uri_create1 (string url) {
        const char* CLASS = "Panda::URI";
        RETVAL = URI::create(url);
    }
    
    URIx* my_cool_uri_create2 (string url) {
        RETVAL = URI::create(url);
    }
    
    # Perl
    
    say ref my_cool_uri_create1("http://ya.ru"); # Panda::URI
    say ref my_cool_uri_create1("ftp://ya.ru"); # Panda::URI
    say ref my_cool_uri_create2("http://ya.ru"); # Panda::URI::http
    say ref my_cool_uri_create2("ftp://ya.ru"); # Panda::URI::ftp

TYPEMAP CLASSES

XT_PANDA_URI

Typemap to inherit from for your custom typemap classes for non-strict uris.

XT_PANDA_URI_STRICT

Typemap to inherit from for your custom typemap classes for strict uris. The difference is that this typemap class will automatically set CLASS variable to the right perl class to bless to.

AUTHOR

Pronin Oleg <syber@crazypanda.ru>, Crazy Panda, CP Decision LTD

LICENSE

You may distribute this code under the same terms as Perl itself.