Casey West > ppt-0.14 > File::Type

Download:
ppt-0.14.tar.gz

Annotate this POD

CPAN RT

New  3
Open  2
View/Report Bugs
Source  

NAME ^

File::Type - Determine a file's contents by looking at the name and contents

SYNOPSIS ^

   use File::Type qw( get_type type_2_mime ) ;

   File::Type::load_magic( "type_file" ) ;
   File::Type::load_magic( \@type_defs ) ;

   my $file_type = get_type( "foo.pl" ) ;

   print type_2_mime( $file_type ) ;

DESCRIPTION ^

A perl module that acts a lot like the traditional Unix file command, but using regular expressions to do the job.

File types are defined in a data structure that's passed in or in a file that contains such a data structure. Default file types are defined in the module, so you don't need to load_magic() in some cases.

FUNCTIONS

add_magic

Adds more types to the current magic database.

NOT IMPLEMENTED

#############################

add_mime_types

Adds the contents of a mime types file to the current magic database.

load_magic

load_magic() takes either a file name or a reference to an array and sets up the internal data structures needed by get_type() and type_2_mime(). See the source code for the module for more information on the data structure required.

The types included with this module are not that comprehensive, since Safari needs to know about very few of them. Submissions of new and better recognizers are appreciated.

get_type

get_type() does three levels of check and returns the result of the first sucessful check.

get_type() first stats the file, then looks at it's extension, then looks inside the file using regular expressions. Since perl5 regular expressions are pretty darn comprehensive, this should allow complete emulation of the magic files used by the Unix file command as well as the language identification heuristics.

If a second argument is provided, it will be used as the file's contents, and the file will not be opened. The contents must be from the beginning of the file for most binary file types, and should be for most text file types. As much data as is feasible should be provided.

type_2_mime

Takes the result from a get_type call and returns the corresponding mime_type.

MAGIC DATA STRUCTURE

The format of the magic data structure is:

   {
      'file type' => [  # reported when a match is found
         [
           'long type'    # Unix find-like description
           'mime type',   # used to translate file type to mime type
         ],
         name_test,     # the test applied when only the file name is known
         guts_test_1,   # the first test applied if the file name test fails.
         guts_test_2,   # the second test applied if guts_test_1 fails
         ...
      ],
      'another type' => [
         ...
      ],
      ...
  }

See file_type for a description of the testing algorithm.

Primitive tests

These functions may be used in the magic data structure as complete tests or as part of other tests.

The text / binary primitives only test the file state once and cache the results.

abort

Aborts all testing by dieing with the message passed.

debug

Prints a message if debugging is enabled.

has_extension

Returns 1 if the file name has an extension matching any of the arguments

is_text

Returns 1 if the file is text, 0 if it is not.

is_binary

Returns 1 if the file is not text, 0 if it is.

match_and_score

Returns a score based on where and how many of the words or regular expression arguments match. This is the routine used internally when a word list or a regular expression is used in the magic structure.

matches_name

Returns 1 if the file name matches the regular expression or strings passed in.

must_be_text

Returns -1 if the file is not text (according to -T), 0 if it is. This is used to disqualify a type for a file without scoring the file, since 0 means 'can't tell', and -1 means it's not that type.

must_be_binary

Returns -1 if the file is text (according to -T), 0 otherwise.

AUTHOR ^

Barrie Slaymaker

syntax highlighting: