The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::Link::Repair::Substitutor - repair links by text substitution

SYNOPSIS

    use WWW::Link::Repair::Substitutor;
    $dirsubs = WWW::Link::Repair::Substitutor::gen_substitutor
       ( "http://bounce.bounce.com/frodo/dogo" ,
         "http://thing.thong/ding/dong",
          1, 0,  ); #directory substitution don't replace subsidiary links
    &$dirsubs ($line_from_file)

DESCRIPTION

A module for substituting one link in a file for another.

This link repairer works by going through a file line by line and doing a substitute on each line. It will substitute absolute links all of the time, including within the text of the HTML page. This is useful because it means that things like instructions to people about what to do with URLs will be corrected.

SUBSTITUTORS

A substituter is a function which substitutes one url for another in a string. Typically it would be fed a file a line at a time and would substitute it directly. It works on it's argument directly.

The two urls should be provided in absolute form.

FILE HANDLERS

A file handler goes through files calling a substitutor as needed.

gen_directory_substitutor

Warning: I think the logic around here is more than a little dubious

gen_substitutor

This function was previously an exported interface and currently remains visible. I think it's interface is likely to change though. Preferably use generate_file_substitutor as an entry point instead.

This function generates a function which can be called either on a complete line of text from a file or on a URL and which will update the URL based on the URLs it has been given

If the third argument is true then the function will return a substitutor which works on all of the links below a given url and substitutes them all together. Thus if we change

  http://fred.jim/eating/

to

  http://roger.jemima/food/eating-out/

we also change

  http://fred.jim/eating/hotels.html

to

  http://roger.jemima/food/eating-out/hotels.html

This function should handle fragments correctly. This means that we should allow fragments to be substituted to and from normal links, but also when we fix a url to a url all of the internal fragments should follow. Fragments are not relative links. Cases

  1. substitution of fragment for fragment

  2. substitution of link for link

  3. substitution of link to fragment

  4. substitution of fragment to link

  5. substitution of url base for url base with all relative links

Note that right now it isn't possible to substitute a tree under a fragment. There is no such thing as a sub-fragment defined in the standards.

If we stubstitute a link to a fragment then we should not substitute fragments under that link. that would loose information. Rather we should issue a warning. Maybe there should be an option that lets this happen.

gen_file_substitutor(<original url>, <new url>, [args...])

This function returns a function which will act on a text file or other file which can be treated as a text file and will carry out URL substitutions within it.

The returned code reference should be called with a filename as an argument, it will then replace all occurrences of original url with new url.

There are various options to this which can be set by putting various key value pairs in the call.

  fakeit - set to create a function which actually does nothing

  tree_mode - set to true to substitute also URLs which are "beneath"
              original url

  keep_orig - set to false to inhibit creation of backup files

  relative - substitute also relative relative URLs which are equivalent
                     to original url (requires file_to_url)

  file_to_url - provide a function which can translate a given filename
                to a URL, so we can work out relative URLs for the current
                file.

so a call like

  $subs=gen_file_substitutor
             ("http://www.example.com/friendstuff/old",
              "http://www.example.com/friendstuff/new",
              relative => 1, tree_mode => 1;
              file_to_url => 
              sub { my $ret=shift;
                $ret =~ return s,/var/www/me,http://www.example.com/mystuff,;
                return $ret});

  &$subs("/var/www/me/index.html");
  &$subs("/var/www/me/friends.html");

should allow you to fix your web pages if your friend renames a whole directory.

BUGS

One problem with directory substitutors is treatment of the two different urls

  http://fred.jim/eating/

and

  http://fred.jim/eating

Most of the time, the latter of the pair is really just a mistaken reference to the earlier. This is not always true. What is more, where it is true, a user of LinkController will usually have changed to the correct version. For this reason, if gen_directory_substitutor is passed the first form of a url, it will not substitute the second. If passed the second, it will substitute the first.

We have to be fed whole URLs at a time. If a url is split between two different chunks then we may not handle it correctly. Always feeding in a complete line protects us from this because a URL cannot contain an unencoded line break.