
PHP::Strings - Implement some of PHP's string functions.

use PHP::Strings;
my $slashed = addcslashes( $not_escaped, $charlist );
my $wordcount = str_word_count( $string );
my @words = str_word_count( $string, 1 );
my %positions = str_word_count( $string, 2 );
my $clean = strip_tags( $html, '<a><b><i><u>' );
my $unslashed = stripcslashes( '\a\b\f\n\r\xae' );

PHP has many functions. This is one of the main problems with PHP.
People do, however, get used to said functions and when they come to a better designed language they get lost because they have to implement some of these somewhat vapid functions themselves.
So I wrote PHP::Strings. It implements most of the strings functions of PHP. Those it doesn't implement it describes how to do in native Perl.
Any function that would be silly to implement has not been and has been marked as such in this documentation. They will still be exportable, but if you attempt to use said function you will get an error telling you to read these docs.


All arguments are checked using Params::Validate. Bad arguments will cause an error to be thrown. If you wish to catch it, use eval.
Attempts to use functions I've decided to not implement (as distinct from functions that aren't implemented because I've not gotten around to either writing or deciding whether to write) will cause an error displaying the documentation for said function.

By default, nothing is exported.
Each function and constant can be exported by explicit name.
use PHP::Strings qw( str_pad addcslashes );
To get a function and its associated constants as well, prefix them with a colon:
use PHP::Strings qw( :str_pad );
# This grabs str_pad, STR_PAD_LEFT, STR_PAD_BOTH, STR_PAD_RIGHT.
To export everything:
use PHP::Strings qw( :all );
For more information on what you can add there, consult "Specialised Import Lists" in Exporter.

http://www.php.net/addcslashes
my $slashed = addcslashes( $not_escaped, $charlist );
Returns a string with backslashes before characters that are listed in $charlist.
PHP::Strings::addslashes WILL NOT BE IMPLEMENTED.
Returns a string with backslashes before characters that need to be quoted in SQL queries. You should never need this function. I mean, never.
DBI, the standard method of accessing databases with perl, does all this for you. It provides by a quote method to escape anything, and it provides placeholders and bind values so you don't even have to worry about escaping. In PHP, PEAR DB also provides this facility.
DBI is also aware that some databases don't escape in this method, such as mssql which uses doubled characters to escape (like some versions of BASIC). This function doesn't.
The less said about PHP's magic_quotes "feature", the better.
PHP::Strings::bin2hex WILL NOT BE IMPLEMENTED.
This is trivially implemented using pack.
my $hex = unpack "H*", $data;
PHP::Strings::chop WILL NOT BE IMPLEMENTED.
PHP's chop function is an alias to its "rtrim" function.
Perl has a builtin named chop. Thus we do not support the use of chop as an alias to "rtrim".
PHP::Strings::chr WILL NOT BE IMPLEMENTED.
PHP's and Perl's chr functions operate sufficiently identically.
Note that PHP's claims an ASCII value as input. Perl assumes Unicode. But ensure you see the documentation for a precise definition.
Note that it returns one character, which in some string encodings may not necessarily be one byte.
http://www.php.net/chunk_split
Returns the given string, split into smaller chunks.
my $split = chunk_split( $body [, $chunklen [, $end ] ] );
Where $body is the data to split, $chunklen is the optional length of data between each split (default 76), and $end is what to insert both between each split (default "\r\n") and on the end.
Also trivially implemented as a regular expression:
$body =~ s/(.{$chunklen})/$1$end/sg;
$body .= $end;
http://www.php.net/convert_cyr_string
PHP::Strings::convert_cyr_string WILL NOT BE IMPLEMENTED.
Perl has the Encode module to convert between character encodings.
http://www.php.net/count_chars
A somewhat daft function that returns counts of characters in a string.
It's daft because it assumes characters have values in the range 0-255. This is patently false in today's world of Unicode. In fact, the PHP documentation for this function happily talks about characters in one part and bytes in another, not realising the distinction.
So, I've implemented this function as if it were called count_bytes. It will count raw bytes, not characters.
Takes two arguments: the byte sequence to analyse and a 'mode' flag that indicates what sort of return value to return. The default mode is 0.
Mode Return value
---- ------------
0 Return hash of byte values and frequencies.
1 As for 0, but hash does not contain bytes with frequency of 0.
2 As for 0, but hash only contains bytes with frequency of 0.
3 Return string composed of used byte-values.
4 Return string composed of unused byte-values.
my %freq = count_chars( $string, 1 );
PHP::Strings::crc32 WILL NOT BE IMPLEMENTED.
See the String::CRC32 module.
PHP::Strings::crypt WILL NOT BE IMPLEMENTED.
PHP's crypt is the same as Perl's. Thus there's no need for PHP::String to provide an implementation.
The CRYPT_* constants are not provided.
PHP::Strings::echo WILL NOT BE IMPLEMENTED.
See "print" in perlfunc.
PHP::Strings::explode WILL NOT BE IMPLEMENTED.
Use the \Q regex metachar and split.
my @pieces = split /\Q$separator/, $string, $limit;
See "split" in perlfunc for more details.
Note that split // will split between every character, rather than returning false. Note also that split "..." is the same as split /.../ which means to split everywhere three characters are matched. The first argument to split is always a regex.
PHP::Strings::fprintf WILL NOT BE IMPLEMENTED.
Perl's printf can be told to which file handle to print.
printf FILEHANDLE $format, @args;
See "printf" in perlfunc and "print" in perlfunc for details.
http://www.php.net/get_html_translation_table
PHP::Strings::get_html_translation_table WILL NOT BE IMPLEMENTED.
Use the HTML::Entities module to escape and unescape characters.
PHP::Strings::hebrev WILL NOT BE IMPLEMENTED.
Use the Encode module to convert between character encodings.
PHP::Strings::hebrevc WILL NOT BE IMPLEMENTED.
Use the Encode module to convert between character encodings.
http://www.php.net/html_entity_decode
PHP::Strings::html_entity_decode WILL NOT BE IMPLEMENTED.
Use the HTML::Entities module to decode character entities.
http://www.php.net/htmlentities
PHP::Strings::htmlentities WILL NOT BE IMPLEMENTED.
Use the HTML::Entities module to encode character entities.
http://www.php.net/htmlspecialchars
PHP::Strings::htmlspecialchars WILL NOT BE IMPLEMENTED.
Use the HTML::Entities module to encode character entities.
PHP::Strings::implode WILL NOT BE IMPLEMENTED.
See "join" in perlfunc. Note that join cannot accept its arguments in either order because that's just not how Perl arrays and lists work. Note also that the joining sequence is not optional.
PHP::Strings::join WILL NOT BE IMPLEMENTED.
PHP's join is an alias for implode. See "implode".
http://www.php.net/levenshtein
PHP::Strings::levenshtein WILL NOT BE IMPLEMENTED.
I have no idea why PHP has this function.
See Text::Levenshtein, Text::LevenshteinXS, String::Approx, Text::PHraseDistance and probably any number of other modules on CPAN.
PHP::Strings::ltrim WILL NOT BE IMPLEMENTED.
As per perlfaq:
$string =~ s/^\s+//;
A basic glance through perlretut or perlreref should give you an idea on how to change what characters get trimmed.
PHP::Strings::md5 WILL NOT BE IMPLEMENTED.
See Digest::MD5 which provides a number of functions for computing MD5 hashes from various sources and to various formats.
Note: the user notes for this function at http://www.php.net/md5 are among the most unintentionally funny and misinformed I've read.
PHP::Strings::md5_file WILL NOT BE IMPLEMENTED.
The Digest::MD5 module provides sufficient support.
use Digest::MD5;
sub md5_file
{
my $filename = shift;
my $ctx = Digest::MD5->new;
open my $fh, '<', $filename or die $!;
binmode( $fh );
$ctx->addfile( $fh )->digest; # or hexdigest, or b64digest
}
Despite providing that possible implementation just above, I've chosen to not include it as an export due to the amount of flexibility of Digest::MD5 and the number of ways you may want to get your file handle. After all, you may want to use Digest::SHA1, or Digest::MD4 or some other digest mechanism.
Again, I wonder why PHP has the function as they so arbitrarily hobble it.
PHP::Strings::metaphone WILL NOT BE IMPLEMENTED.
Text::Metaphone and Text::DoubleMetaphone and Text::TransMetaphone all provide metaphonic calculations.
http://www.php.net/money_format
sprintf for money.
PHP::Strings::nl2br WILL NOT BE IMPLEMENTED.
This is trivially implemented as:
s,$,<br />,mg;
http://www.php.net/nl_langinfo
PHP::Strings::nl_langinfo WILL NOT BE IMPLEMENTED.
I18N::Langinfo has a langinfo command that corresponds to PHP's nl_langinfo function.
http://www.php.net/number_format
TBD
PHP::Strings::ord WILL NOT BE IMPLEMENTED.
See "ord" in perlfunc. Note that Perl returns Unicode value, not ASCII.
PHP::Strings::parse_str WILL NOT BE IMPLEMENTED.
See instead the CGI and URI modules which handles that sort of thing.
PHP::Strings::print WILL NOT BE IMPLEMENTED.
See "print" in perlfunc.
PHP::Strings::printf WILL NOT BE IMPLEMENTED.
See "printf" in perlfunc.
http://www.php.net/quoted_printable_decode
PHP::Strings::quoted_printable_decode WILL NOT BE IMPLEMENTED.
MIME::QuotedPrint provides functions for encoding and decoding quoted-printable strings.
PHP::Strings::quotemeta WILL NOT BE IMPLEMENTED.
PHP::Strings::rtrim WILL NOT BE IMPLEMENTED.
Another trivial regular expression:
$string =~ s/\s+$//;
See the notes on "ltrim".
PHP::Strings::setlocale WILL NOT BE IMPLEMENTED.
setlocale is provided by the POSIX module.
PHP::Strings::sha1 WILL NOT BE IMPLEMENTED.
See "md5", mentally substituting Digest::SHA1 for Digest::MD5, although the user notes are not as funny.
PHP::Strings::sha1_file WILL NOT BE IMPLEMENTED.
See "md5_file"
http://www.php.net/similar_text
TBD
PHP::Strings::soundex WILL NOT BE IMPLEMENTED.
See Text::Soundex, which also happens to be a core module.
PHP::Strings::sprintf WILL NOT BE IMPLEMENTED.
PHP::Strings::sscanf WILL NOT BE IMPLEMENTED.
This is a godawful function. You should be using regular expressions instead. See perlretut and perlre.
http://www.php.net/str_ireplace
PHP::Strings::str_ireplace WILL NOT BE IMPLEMENTED.
Use the s/// operator instead. See perlop and perlre for details.
TBD
PHP::Strings::str_repeat WILL NOT BE IMPLEMENTED.
Instead, use the x operator. See perlop for details.
my $by_ten = "-=" x 10;
http://www.php.net/str_replace
PHP::Strings::str_replace WILL NOT BE IMPLEMENTED.
See the s/// operator. perlop and perlre have details.
PHP::Strings::str_rot13 WILL NOT BE IMPLEMENTED.
This is rather trivially implemented as:
$message =~ tr/A-Za-z/N-ZA-Mn-za-m/
(As per "Programming Perl", 3rd edition, section 5.2.4.)
http://www.php.net/str_shuffle
Implemented, against my better judgement. It's trivial, like so many of the others.
PHP::Strings::str_split WILL NOT BE IMPLEMENTED.
See "split" in perlfunc for details.
my @bits = split /(.{,$len})/, $string;
http://www.php.net/str_word_count
my $wordcount = str_word_count( $string );
my @words = str_word_count( $string, 1 );
my %positions = str_word_count( $string, 2 );
With a single argument, returns the number of words in that string. Equivalent to:
my $wordcount = () = $string =~ m/(\S+)/g;
With 2 arguments, where the second is the value 0, returns the same as with no second argument.
With 2 arguments, where the second is the value 1, returns each of those words. Equivalent to:
my @words = $string =~ m/(\S+)/g;
With 2 arguments, where the second is the value 2, returns a hash where the values are the words, and the keys are their position in the string (offsets are 0 based).
If words are duplicated, then they are duplicated. The definition of a word is anything that isn't a space. When I say equivalent above, I mean that's the exact code this function uses.
This function should really be three different functions, but as PHP already has over 3000, I can only assume they wanted to restrain themselves. Implementation wise, it is three different functions. I just keep them in an array and dispatch appropriately.
PHP::Strings::strcasecmp WILL NOT BE IMPLEMENTED.
Equivalent to:
lc($a) cmp lc($b)
PHP::Strings::strchr WILL NOT BE IMPLEMENTED.
See "strstr"
PHP::Strings::strcmp WILL NOT BE IMPLEMENTED.
Equivalent to:
$a cmp $b
PHP::Strings::strcoll WILL NOT BE IMPLEMENTED.
Equivalent to:
use locale;
$a cmp $b
PHP::Strings::strcspn WILL NOT BE IMPLEMENTED.
Trivially equivalent to:
my $cspn;
$cspn = $-[0]-1 if $string =~ m/[chars]/;
my $clean = strip_tags( $html, '<a><b><i><u>' ); You really want L<HTML::Scrubber>.
This function tries to return a string with all HTML tags stripped from a given string. It errors on the side of caution in case of incomplete or bogus tags.
You can use the optional second parameter to specify tags which should not be stripped.
For more control, use HTML::Scrubber.
http://www.php.net/stripcslashes
my $unslashed = stripcslashes( '\a\b\f\n\r\xae' );
Returns a string with backslashes stripped off. Recognizes C-like \n, \r ..., octal and hexadecimal representation.
PHP::Strings::stripos WILL NOT BE IMPLEMENTED.
Trivially implemented as:
my $pos = index( lc $haystack, lc $needle );
my $second = index( lc $haystack, lc $needle, $pos );
Note that unlike stripos, index returns -1 if $needle is not found. This makes testing much simpler.
If you want the additional behaviour of non-strings being converted to integers and from there to characters of that value, then you're silly. If you want to find a character of particular value, explicitly use the chr function:
my $charpos = index( lc $haystack, lc chr $char );
http://www.php.net/stripslashes
PHP::Strings::stripslashes WILL NOT BE IMPLEMENTED.
If you can think of a good reason for this function, you have more imagination than I do.
PHP::Strings::stristr WILL NOT BE IMPLEMENTED.
Use substr() and index() instead.
my $strstr = substr( $haystack, index( lc $haystack, lc $needle ) );
Or a regex:
my ( $strstr ) = $haystack =~ /(\Q$needle\E.*$)/si;
PHP::Strings::strlen WILL NOT BE IMPLEMENTED.
See "length" in perldoc.
http://www.php.net/strnatcasecmp
PHP::Strings::strnatcasecmp WILL NOT BE IMPLEMENTED.
See Sort::Naturally.
PHP::Strings::strnatcmp WILL NOT BE IMPLEMENTED.
See Sort::Naturally.
http://www.php.net/strncasecmp
PHP::Strings::strncasecmp WILL NOT BE IMPLEMENTED.
Unnecessary. Perl is smart enough. Use substr.
PHP::Strings::strncmp WILL NOT BE IMPLEMENTED.
Unnecessary. Perl is smart enough. Use substr.
PHP::Strings::strpos WILL NOT BE IMPLEMENTED.
This function is Perl's index function, however index has a sensible return value.
PHP::Strings::strrchr WILL NOT BE IMPLEMENTED.
See "rindex" in perlfunc. Note that all characters in the $needle are used: if you just want to find the first character, then extract it.
PHP::Strings::strrev WILL NOT BE IMPLEMENTED.
See "reverse" in perlfunc. Note the note about scalar context.
my $derf = reverse "fred";
print scalar reverse "fred";
PHP::Strings::strripos WILL NOT BE IMPLEMENTED.
This is just getting silly.
PHP::Strings::strrpos WILL NOT BE IMPLEMENTED.
See rindex.
PHP::Strings::strstr WILL NOT BE IMPLEMENTED.
Use substr() and index() instead.
my $strstr = substr( $haystack, index( $haystack, $needle ) );
Or a regex:
my ( $strstr ) = $haystack =~ /(\Q$needle\E.*$)/s;
PHP::Strings::strtolower WILL NOT BE IMPLEMENTED.
See "lc" in perlfunc.
PHP::Strings::strtoupper WILL NOT BE IMPLEMENTED.
See "uc" in perlfunc.
This function, like many in PHP, is really two functions.
The first is the same as the tr operator. And you really should use tr instead of this function.
The second is more complicated.
PHP::Strings::substr WILL NOT BE IMPLEMENTED.
See "substr" in perlfunc.
http://www.php.net/substr_compare
PHP::Strings::substr_compare WILL NOT BE IMPLEMENTED.
Use substr and the cmp operator.
http://www.php.net/substr_count
PHP::Strings::substr_count WILL NOT BE IMPLEMENTED.
This is even in the FAQ.
http://faq.perl.org/perlfaq4.html#How_can_I_count_the_
my $count = () = $string =~ /regex/g;
http://www.php.net/substr_replace
PHP::Strings::substr_replace WILL NOT BE IMPLEMENTED.
See "substr" in perlfunc.
PHP::Strings::trim WILL NOT BE IMPLEMENTED.
Also in the FAQ.
http://faq.perl.org/perlfaq4.html#How_do_I_strip_blank
PHP::Strings::ucfirst WILL NOT BE IMPLEMENTED.
PHP::Strings::ucwords WILL NOT BE IMPLEMENTED.
Another Perl FAQ.
http://faq.perl.org/perlfaq4.html#How_do_I_capitalize_
PHP::Strings::vprintf WILL NOT BE IMPLEMENTED.
Unlike PHP, Perl isn't stupid. See printf.
PHP::Strings::vsprintf WILL NOT BE IMPLEMENTED.
Unlike PHP, Perl isn't stupid. See sprintf.
PHP::Strings::wordwrap WILL NOT BE IMPLEMENTED.
See Text::Wrap, a core module.

Just in case you missed which functions were actually implemented in that huge mass of unimplemented functions, here's the condensed list of implemented functions:

All functions that I think are worthless are still exportable, with the exception of any that would clash with a Perl builtin function.
If you try to actually use said function, a big fat error will result.

Yes, this module is mostly a joke. I wrote a lot of it after being asked for the hundredth time: What's the equivalent to PHP's X in Perl?
That said, although it's a joke, I'm happy to receive amendments, additions and such. It's incomplete at present, and I would like to see it complete at some point.
In particular, the test suite needs a lot of work. (If you feel like it. Hint Hint.)
If you want to implement some of the functions that I've said will not be implemented, then I'll be happy to include them. After all, what I think is worthless is my opinion.

Log them via the CPAN RT system via the web or email:
http://rt.cpan.org/NoAuth/ReportBug.html?Queue=PHP-Strings ( shorter URL: http://xrl.us/4at ) bug-php-strings@rt.cpan.org
This makes it much easier for me to track things and thus means your problem is less likely to be neglected.

Andy Lester (PETDANCE) for taking care of Iain's modules.
Juerd Waalboer (JUERD) for suggesting a link, and the assorted regex functions.
Matthew Persico (PERSICOM) for the idea of having the functions give their documentation as their error.

PHP::Strings modifications from version 0.27 are copyright © Petras Kudaras. All rights reserved.
PHP::Strings is copyright © Iain Truskett, 2003. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.000 or, at your option, any later version of Perl 5 you may have available.
The full text of the licences can be found in the Artistic and COPYING files included with this module, or in perlartistic and perlgpl as supplied with Perl 5.8.1 and later.

Iain Truskett <spoon@cpan.org> Petras Kudaras <kudarasp@cpan.org>
