Matt Sergeant > Matts-Message-Parser > Matts::Message::Parser

Download:
Matts-Message-Parser-1.0.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 1.0   Source  

NAME ^

Matts::Message::Parser - a MIME message parser for email and nttp

SYNOPSIS ^

  use Matts::Message::Parser;
  open(my $fh, "foo.eml");
  my $msg = Matts::Message::Parser->parse($fh);

DESCRIPTION ^

This is an email parser I originally wrote when I ran my own business that tries quite hard to decode the various parts of an email correctly and down to unicode so that all strings can be treated the same in perl.

DO NOT USE THIS MODULE

I urge you, please don't. It's not a very good API. I'm just uploading it to CPAN because it's better for my purposes than most of the Email::* and Mail::* classes I can find, and it's fast, and doesn't use any memory when parsing very large emails, which is a huge bonus for me. But I have no intention of documenting this module any more than I have to.

AUTHOR ^

Matt Sergeant, <matt@sergeant.org>

LICENSE ^

This is free software. You may use it and redistribute it under the same terms as perl itself.

HACKING NOTES ^

This is how mail messages can come in:

1. Plain text

Plain text messages come in with a content-type of text/plain. They may contain attachments as UU Encoded strings.

2. HTML text

Straight HTML messages come in with a content-type of text/html. They may not contain attachments as far as I'm aware.

3. Mixed text, html and maybe other.

These messages come in as MIME messages with the content-type of multipart/alternative (alternate means you get to pick which view of the message to display, as all must contain the same basic information).

There may not be attachments this way as far as I'm aware.

4. Plain text with attachments

Here the content-type is multipart/mixed. The first part of the multipart message is the the plain text message (after the preamble, that is), with a content type of text/plain. The remaining parts are attachments.

5. HTML text with attachments

Again, the content-type is multipart/mixed. The first part of the multipart message is the html message, with a content-type of text/html. The remaining parts are attachments.

6. Mixed text, html with attachments

Here the main part of the message has a content-type of multipart/mixed. The first part has a content-type of multipart/alternative, and is identical to item 3 above. The remaining parts are the attachments.

7. Report.

This is a delivery status report. It comes with the main part of the message having a content-type of multipart/report, the first one or two parts of which may be textual content of some sort, and the last seems to be of type message/rfc822.

Overall this is a fairly naive way to view email messages, as the attachments can be email messages themselves, and thus it gets very recursive. But this should be enough for us to deal with right now.

syntax highlighting: