The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WordPress::Grep - Search Wordpress titles and content

SYNOPSIS

        use WordPress::Grep;

        my $wp_grep = WordPress::Grep->connect(
                # required
                user     => $user,
                database => $db,

                # optional
                password => $pass,

                # has defaults
                host     => 'localhost',
                port     => '3306',
                );

        my $posts = $wp_grep->search(
                sql_like        => '....',
                regex           => qr/ ... /,
                code            => sub { ... },
                include_columns => [ ],  # not implemented
                exclude_columns => [ ],  # not implemented
                );

        foreach my $post_id ( keys %$post ) {
                printf "%4d %s\n",
                        $posts->{$post_id}{ID}, $posts->{$post_id}{post_title};
                }

DESCRIPTION

[This is alpha software.]

This module allows you to search through the posts in a WordPress database by directly examining the wp_posts table. Forget about these limited APIs. Use the power of Perl directly on the content.

I've long wanted this tool to examine consistency in my posts. I want to check my use of CSS and HTML across all posts to check what I may need to change when I change how I do things. This sort of thing is hard to do with existing tools and the WordPress API (although there is a WordPress::API).

I want to go through all posts with all the power of Perl, so my grep:

1 Takes an optional LIKE argument that it applies to post_title and post_content.
2 Takes an optional regex argument that it uses to filter the returned rows, keeping only the rows whose titles or content that satisfy the regex.
3 Takes a code argument that it uses to filter the returned rows, keeping only the rows which return true for that subroutine.
4 Returns the matching rows in the same form that DBI's fetchall_hashref returns. The top-level key is the value in the ID column.

Right now, there are some limitations based on my particular use:

  • I only select the post types.

  • I assume UTF-8 everywhere, including in the database.

  • Applying a regex or code filter always return (at least) the post_title and post_content.

  • The LIKE and regex filters only work on post_title and post_content. The code filter gets the entire row as a hash reference and can do what it likes.

I've set up a slave of the MySQL server that runs my WordPress installations. In that slave, I set up a read-only user for this tool.

Methods

connect

Connect to the WordPress database. You must specify these parameters, which should be the same ones in your wp_config.php (although if you need this tool frequently, consider setting up a read-only user for this, or run it against a slave).

        user
        database

If you need a password, you'll have to provide that:

        password

These parameters have defaults

        host    defaults to localhost
        port    defaults to 3306
        user    defaults to root
db

Return the db connection. This is a vanilla DBI connection to MySQL. If you subclass this, you can do further setup by overriding _db_init.

The possible arguments:

        sql_like - a string
        regex    - a regular expression reference (qr//)
        code     - a subroutine reference

    categories - an array reference of category names
    tags       - an array reference of tags names

This method first builds a query to search through the wp_posts table.

If you specify sql_like, it limits the returned rows to those whose post_title or post_content match that argument.

If you specify categories or tags, another query annotates the return rows with term information. If the categories or tags have values, the return rows are reduced to those that have those categories or tags. If you don't want to reduce the rows just yet, you can use code to examine the row yourself.

If you specify regex, it filters the returned rows to those whose post_title or post_content satisfy the regular expression.

If you specify code, it filters the returned rows to those for which the subroutine reference returns true. The coderef gets a hash reference of the current row. It's up to you to decide what to do with it.

These filters are consecutive. You can specify any combination of them but they always happen in that order. The regex only gets the rows that satisfied the sql_like, and the code only gets the rows that satisfied sql_like and regex.

TO DO

SEE ALSO

WordPress::API

SOURCE AVAILABILITY

This source is in Github:

        http://github.com/briandfoy/wordpress-grep/

AUTHOR

brian d foy, <bdfoy@gmail.com>

COPYRIGHT AND LICENSE

Copyright (c) 2013, brian d foy, All Rights Reserved.

You may redistribute this under the same terms as Perl itself.