
combineCtrl - controls a Combine crawling job

combineCtrl <action> --jobname <name>
where action can be one of start, kill, load, recyclelinks, reharvest, stat, howmany, records, hosts, initMemoryTables, open, stop, pause, continue

jobname is used to find the appropriate configuration (mandatory)
takes an optional switch --harvesters n where n is the number of crawler processes to start
kills all active crawlers (and their associated combineRun monitors) for jobname
Read a list of URLs from STDIN (one per line) and schedules them for crawling
Schedule all newly found (since last invocation of recyclelinks) links in crawled pages for crawling
Schedules all pages in the database for crawling again (in order to check if they have changed)
opens database for URL scheduling (maybe after a stop)
stops URL scheduling
pauses URL scheduling
continues URL scheduling after a pause
prints out rudimentary status of the ready queue (ie eligible now) of URLs to be crawled
prints out rudimentary status of all URLs to be crawled
prints out the number of ercords in the SQL database
prints out rudimentary status of all hosts that have URLs to be crawled
initializes the administrative MySQL tables that are kept in memory

Implements various control functionality to administer a crawling job, like starting and stoping crawlers, injecting URLs into the crawl queue, scheduling newly found links for crawling, controlling scheduling, etc.
This is the preferred way of controling a crawl job.

echo 'http://www.yourdomain.com/' | combineCtrl load --jobname aatestSeed the crawling job aatest with a URL
combineCtrl start --jobname aatest --harvesters 3Start 3 crawling processes for job aatest
combineCtrl recyclelinks --jobname aatestSchedule all new links crawling
combineCtrl stat --jobname aatestSee how many URLs that are eligible for crawling right now.

combine
Combine configuration documentation in /usr/share/doc/combine/.

Anders Ardö, <anders.ardo@it.lth.se>

Copyright (C) 2005 Anders Ardö
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
See the file LICENCE included in the distribution at http://combine.it.lth.se/