Lingua::ZH::Summarize - Summarizing bodies of Chinese text
use Lingua::ZH::Summarize; print summarize( $text ); # Easy, no? :-) print summarize( $text, maxlength => 500 ); # 500-byte summary print summarize( $text, wrap => 75 ); # Wrap output to 75 col.
This is a simple module which makes an unscientific effort at summarizing Chinese text. It recognizes simple patterns which look like statements, abridges them, and concatenates them into something vaguely resembling a summary. It needs more work on large bodies of text, but it seems to have a decent effect on small inputs at the moment.
Lingua::ZH::Summarize exports one function,
summarize(), which takes the text to summarize as its first argument, and any number of optional directives in
name => value form. The options it'll take are:
Specifies the maximum length, in bytes, of the generated summary.
Prettyprints the summary output by wrapping it to the number of columns which you specify. This requires the Lingua::ZH::Wrap module.
Needless to say, this is a very simple and not terribly universally effective scheme, but it's good enough for a first draft, and I'll bang on it more later. Like I said, it's not a scientific approach to the problem, but it's better than nothing.
Algorithm adapted from the Lingua::EN::Summarize module by Dennis Taylor, <firstname.lastname@example.org>.
Autrijus Tang <email@example.com>
Copyright 2003 by Autrijus Tang <firstname.lastname@example.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.