The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Win32::LongPath - provide functions to access long paths and Unicode in the Windows environment

SYNOPSIS

        use File::Spec::Functions;
        use Win32::LongPath;
        use utf8;

        # make a really long path w/Unicode from around the world
        $path = 'c:';
        while (length ($path) < 5000) {
          $path = catdir ($path, 'ελληνικά-русский-日本語-한국-中國的-עִברִית-عربي');
          if (!testL ('e', $path)) {
            mkdirL ($path) or die "unable to create $path ($^E)";
          }
        }
        print 'ShortPath: ' . shortpathL ($path) . "\n";

        # next, create a file in the path
        $file = catfile ('more interesting characters فارسی-தமிழர்-​ພາສາ​ລາວ');
        openL (\$FH, '>:encoding(UTF-8)', $file)
          or die ("unable to open $file ($^E)");
        print $FH "writing some more Unicode characters\n";
        print $FH "דאס שרייבט אַ שורה אין ייִדיש.\n";
        close $FH;

        # now undo everything
        unlinkL ($file) or die "unable to delete file ($^E)";
        while ($path =~ /[\/\\]/) {
          rmdirL ($path) or die "unable to remove $path ($^E)";
          $path =~ s#[/\\][^/\\]+$##;
        }

DESCRIPTION

Although Perl natively supports functions that can access files in Windows these functions fail for Unicode or long file paths (i.e. greater than the Windows MAX_PATH value which is about 255 characters). Win32::LongPath overcomes these limitations by using Windows wide-character functions which support Unicode and extended-length paths. The end result is that you can process any file in the Windows environment without worrying about Unicode or path length.

Win32::LongPath provides replacement functions for most of the native Perl file functions. These functions attempt to imitate the native functionality and format as closely as possible and accept file paths which include Unicode characters and can be up to 32,767 characters long.

Some additional functions are also available to provide low-level features that are specific to Windows files.

Paths

File and directory paths can be provided containing any of the following components.

  • path separators: Both the forward (/) and reverse (\) slashes can be used to separate the path components.

  • Unicode: Unicode characters can be used anywhere in the path provided they are supported by the Windows file naming standard. If Unicode is used, the string must be internally identified as UTF-8. See perlunicode for more information on using Unicode with Perl.

  • drive letter: The path can begin with an upper or lower case letter from A to Z followed by a colon to indicate a drive letter path. For example, C:/path (fullpath) or c:path (relative path).

  • UNC: The path can begin with a UNC path in the form \\server\share or //server/share.

  • extended-length: The path can begin with an extended-length prefix in the form of \\?\ or //?/.

All input paths will be converted (normalized) to a fullpath using the extended-length format and wide characters. This allows paths to be up to 32,767 characters long and to include Unicode characters. The Microsoft specification still limits the directory component to MAX_PATH (about 255) characters.

Output paths will be converted back (denormalized) to a UTF-8 fullpath that begins with a drive letter or UNC.

NOTE: See the Naming Files, Paths, and Namespaces topic in the Microsoft MSDN Library for more information about extended-length paths.

Return Values

Unless stated otherwise, all functions return true (a numeric value of 1) if successful or false (undef) if an error occurred. Generally, if a function fails it will set the $! value to the failure. However, $^E will have the more specific Windows error value.

FILE FUNCTIONS

This section lists the replacements for native Perl file functions. Since "openL" returns a native Perl file handle, functions that use open file handles (read, write, close, binmode, etc.) can be used as is and do not have replacement functions. In like manner, "sysopenL" also returns a native Perl file handle.

Functions that are specific to the Unix environment (chmod, chown, umask, etc.) do not have replacements.

linkL OLDFILE,NEWFILE

If the Windows file system supports it, a hard link is created from NEWFILE to OLDFILE.

        linkL ('goodbye', 'до свидания')
          or die ("unable to link file ($^E)");
lstatL PATH

Does the same thing as the "statL" function but will retrieve the statistics for the link and not the file it links to.

openL FILEHANDLEREF,MODE,PATH

open is a very powerful and versatile Perl function with many modes and capabilities. The openL replacement does not provide the full range of capability but does provide what is needed to open files in the Windows file system. It only supports the three-argument form of open.

FILEHANDLEREF cannot be a bareword file handle or a scalar variable. It must be a reference to a scalar value which will be set to be a Perl file handle. For example:

        openL (\$fh, '<', $file) or die ("unable to open $file: ($^E)");

For the most part, MODE matches the native definition and can begin with <, >, >>, +<, +> and +>> to indicate read/write behavior. The |-, -|, <-, -, >- modes are not valid since they apply to pipes, STDIN and STDOUT. Read-only is assumed if the read/write symbols are not used. MODE can also include a colon followed by the I/O layer definition. For example:

        openL (\$fh, '>:encoding(UTF-8)', $file);

PATH is the relative or fullpath name of the file. It cannot be undef for temporary files, a reference to a variable for in-memory files or a file handle.

        # these are WRONG!
        openL ($infh, '', $infile);
        openL (INFILE, '', $infile);
        openL (\$infh, '', undef);
        openL (\$infh, '', \$memory);
        openL (\$infh, '', INFILE);
        openL (\$infh, '-|', "file<$infile");

        # these are correct
        # append infile to outfile
        openL (\$infh, '', $infile)
          or die ("unable to open $infile: ($^E)");
        openL (\$outfh, '>>', $outfile)
          or die ("unable to open $outfile: ($^E)");
        while (<$infh>) {
          print $outfh $_;
        }
        eof ($infh) or print "terminated before EOF!\n";
        close $infh;
        close $outfh;
readlinkL PATH

Returns the path that a junction/mount point or symbolic link points to. If PATH is not provided, $_ is used. It will fail for hard links.

        # symlinks should always be equal
        symlinkL ($orig, $slink) or die ("unable to symlink file ($^E)");
        $rlink = readlinkL ($slink) or die ("unable to read link ($^E)");
        die ("links not equal!") if ($rlink ne $orig);

        # hard links should always be undef
        linkL ($orig, $hlink) or die ("unable to link file ($^E)");
        !readlinkL ($hlink) or die ("should have failed!");
renameL OLDNAME,NEWNAME

Changes the name or moves OLDNAME to NEWNAME. Renames directories as well as files. Cannot move directories across volumes.

NOTE: See MoveFile in the Microsoft MSDN Library for more information.

        # should work
        renameL ('c:/file', 'c:/newfile');
        # fails, can't move file to directory
        renameL ('d:/file', '.');
        # should work for files
        renameL ('e:/file', 'f:/newfile');
        # should work
        renameL ('d:/dir', 'd:/topdir/subdir');
        # fails, can't move directory across volumes
        renameL ('c:/dir', 'd:/newdir');
statL PATH

Returns an object with the statistics for the file. PATH must be a path to a file and cannot be a file or directory handle. If it is not provided, $_ is used. If there is an error gathering the statistics undef is returned and the error variables are set. The definition of object elements are very similar to the native Perl stat function although the access method is like File::stat.

    atime: Last access time in seconds. NOTE: Different file systems have different time resolutions. For example, FAT has a resolution of 1 day for the access time. See the Microsoft MSDN Library for more information about file time.

    attribs: File attributes as returned by the Windows GetFileAttributes () function. Use the following constants to retrieve the individual values. See the Microsoft MSDN Library for more information about the meaning of these values. Import these values into your environment if you do not want to refer to them with the Win32::LongPath:: prefix.

      FILE_ATTRIBUTE_ARCHIVE
      FILE_ATTRIBUTE_COMPRESSED
      FILE_ATTRIBUTE_DEVICE
      FILE_ATTRIBUTE_DIRECTORY
      FILE_ATTRIBUTE_ENCRYPTED
      FILE_ATTRIBUTE_HIDDEN
      FILE_ATTRIBUTE_INTEGRITY_STREAM
      FILE_ATTRIBUTE_NORMAL
      FILE_ATTRIBUTE_NOT_CONTENT_INDEXED
      FILE_ATTRIBUTE_NO_SCRUB_DATA
      FILE_ATTRIBUTE_OFFLINE
      FILE_ATTRIBUTE_PINNED
      FILE_ATTRIBUTE_READONLY
      FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS
      FILE_ATTRIBUTE_RECALL_ON_OPEN
      FILE_ATTRIBUTE_REPARSE_POINT
      FILE_ATTRIBUTE_SPARSE_FILE
      FILE_ATTRIBUTE_SYSTEM
      FILE_ATTRIBUTE_TEMPORARY
      FILE_ATTRIBUTE_UNPINNED
      FILE_ATTRIBUTE_VIRTUAL

    ctime: Although defined to be inode change time in seconds for native Perl, it will reflect the Windows creation time.

    dev: The Windows serial number for the volume. See the Microsoft MSDN Library for more information.

    gid: Is always zero.

    ino: Is always zero.

    mode: File mode (type and permissions). use Fcntl ':mode' can be used to extract the meaning of the mode. Regardless of the actual user and group permissions, the following bits are set.

    • Directories: S_IFDIR, S_IRWXU, S_IRWXG and S_IRWXO

    • Files: S_IFREG, S_IRUSR, S_IRGRP and S_IROTH

    • Files without read-only attribute: S_IWUSR, S_IWGRP and S_IWOTH

    • Files with BAT, CMD, COM and EXE extension: S_IXUSR, S_IXGRP and S_IXOTH

    mtime: Last modify time in seconds. NOTE: Different file systems have different time resolutions. For example, FAT has a resolution of 2 seconds for the modification time. See the Microsoft MSDN Library for more information about file time.

    nlink: Is always one.

    rdev: Same as dev.

    size: Total size of the file in bytes. Has a value of zero for directories.

    uid: Is always zero.

        use Fcntl ':mode';
        use Win32::LongPath qw(:funcs :fileattr);

        # get object
        testL ('e', $file)
          or die "$file doesn't exist!";
        $stat = statL ($file)
          or die ("unable to get stat for $file ($^E)");

        # this test for directory
        $stat->{mode} & S_IFDIR ? print "Directory\n" : print "File\n";
        # is the same as this one
        $stat->{attribs} & FILE_ATTRIBUTE_DIRECTORY ? print "Directory\n" : print "File\n";

        # show file times as local time
        printf "Created: %s\nAccessed: %s\nModified: %s\n",
          scalar localtime $stat->{ctime},
          scalar localtime $stat->{atime},
          scalar localtime $stat->{mtime};
symlinkL OLDFILE,NEWFILE

If the Windows OS, file system and user permissions support it, a symbolic link is created from NEWFILE to OLDFILE.

OLDFILE can be a relative or full path. If relative path is used, it will not be converted to an extended-length path.

NOTE: See CreateSymbolicLink in the Microsoft MSDN Library for more information about symbolic links.

        symlinkL ('no problem', '問題ない')
          or die ("unable to link file ($^E)");
        symlinkL ('c:/', 'rootpath')
          or die ("unable to link file ($^E)");
sysopenL FILEHANDLEREF,PATH,MODE

Performs the same function as the native Perl sysopen function but only supports the three-argument form of sysopen.

FILEHANDLEREF cannot be a bareword file handle or a scalar variable. It must be a reference to a scalar value which will be set to be a Perl file handle. For example:

        sysopenL (\$fh, $file, O_CREAT | O_EXCL) or die ("unable to open $file: ($^E)");

PATH is the relative or fullpath name of the file.

MODE matches the native definition.

testL TYPE,PATH

Used to replace the native -X functions. TYPE is the same value as the -X function. For example:

        # these are equivalent
        die 'unable to read!' if -r $file;
        die 'unable to read!' if testL ('r', $file);

The supported TYPEs and their values are:

  • b: Block device. Always returns undef.

  • c: Character device. Always returns undef.

  • d: Directory.

  • e: Exists.

  • f: Plain file. Returns true if not a directory of Windows offline file.

  • l: Link file. Only returns true for junction/mount points and symbolic links.

  • o or O: Owned. Always returns true.

  • r or R: Read. Always returns true.

  • s: File has nonzero size (returns size in bytes).

  • w or W: Read. Returns true if the file does not have the read-only attribute.

  • x or X: Read. Returns true if the file has one of the following extensions: bat, cmd, com, exe.

  • z: Zero size.

unlinkL PATH[,...]

Deletes the list of files. If successful, it returns the number of files deleted. It will fail if the file has the read-only attribute set. It returns undef if an error occurs, and the error variable is set to the value of the last error encountered.

        # if you do this you don't know which failed
        die ("delete of some files failed!") if !unlinkL ($f1, $f2, $f3, $f4);

        # this identifies the failures
        foreach my $file ($f1, $f2, $f3, $f4) {
          unlinkL ($file) or print "Unable to delete $file ($^E)\n";  
        }
utimeL [ATIME],[MTIME],PATH[,...]

Changes the access and modification times on each file. ATIME and MTIME are the numeric times from the time () function. If both are undef then the times will be changed to the current time. If only one is undef that one will use a time value of zero.

PATH must be the path to a file.

If successful, it returns the number of files changed. It returns undef if an error occurs, and the error variable is set to the value of the last error encountered.

NOTE: This function is not supported in Cygwin and will return an error.

NOTE: Different file systems have different time resolutions. For example, FAT has a resolution of 2 seconds for modification time and 1 day for the access time. See the Microsoft MSDN Library for more information about file time.

        # set back 24 hours
        $yesterday = time () - (24 * 60 * 60);
        utimeL ($yesterday, $yesterday, $file)
          or die ("unable to change time on $file ($^E)");

        # this is the same as the touch command
        utimeL (undef, undef, $file)
          or die ("unable to change time on $file ($^E)");

DIRECTORY FUNCTIONS

NOTE: Although extended-length paths are used, the Microsoft specification still limits the directory component to MAX_PATH (about 255) characters.

chdirL PATH

Changes the working directory. If PATH is missing it tries to change to $ENV{HOME} if it is set, or $ENV{LOGDIR} if that is set. If neither is set then it will do nothing and return.

Unlike other functions, the PATH cannot exceed MAX_PATH characters, although it can contain Unicode and be in the extended-path format.

        chdirL ($path)
          or die ("unable to change to $path ($^E)");
getcwdL

Returns the fullpath of the current working directory. This does not replace a native Perl function since none exists. It works like the curdir function in File::Spec.

        print "The current directory is: ", getcwdL (), "\n";
mkdirL PATH

Creates a directory which inherits the permissions of the parent. If PATH is not provided, $_ is used. An error is returned if the parent directory does not exist.

        mkdirL ($dir)
          or die ("unable to create $dir ($^E)");
rmdirL PATH

Deletes a directory. If PATH is not provided, $_ is used. An error is returned if the directory is not empty.

        rmdirL ($dir)
          or die ("unable to delete $dir ($^E)");

OPENDIR FUNCTIONS

Unlike the "openL" function which returns a native handle, the open directory functions must create a directory object and then use that object to manipulate the directory. The native Perl rewinddir, seekdir and telldir functions are not supported.

new

Creates a directory object.

        $dir = Win32::LongPath->new ();
closedirL

Closes the current directory for reading.

        $dir->closedirL ();
opendirL PATH

Opens a directory for reading. If the directory object is already open the existing directory will be closed before opening the new one.

        my $path = 'c:/rootdir/very long directory name/First Level';
        $dir->opendirL ($path)
          or die ("unable to open $path ($^E)");
readdirL

Reads the next item in the directory. In list context returns all the items as a list. Otherwise returns the next item or undef if there are no more items or an error occurred.

NOTE: Only the item name is returned, not the whole path to the item.

        use Win32::LongPath qw(:funcs :fileattr);

        # search down the whole tree
        search_tree ($rootdir);
        exit 0;

        sub search_tree {

        # open directory and read contents
        my $path = shift;
        my $dir = Win32::LongPath->new ();
        $dir->opendirL ($path)
          or die ("unable to open $path ($^E)");
        foreach my $file ($dir->readdirL ()) {
          # skip parent dir
          if ($file eq '..'){
            { next; }

          # get file stats
          my $name = $file eq '.' ? $path : "$path/$file";
          my $stat = lstatL ($name)
            or die "unable to stat $name ($^E)";

          # recurse if dir
          if (($file ne '.') && (($stat->{attribs}
            & (FILE_ATTRIBUTE_DIRECTORY | FILE_ATTRIBUTE_REPARSE_POINT))
            == FILE_ATTRIBUTE_DIRECTORY)) {
            search_tree ($name);
            next;
          }

          # output stats
          print "$name\t$stat->{attribs}\t$stat->{size}\t",
            scalar localtime $stat->{ctime}, "\t",
            scalar localtime $stat->{mtime}, "\n";
        }
        $dir->closedirL ();
        return;
        }

MISCELLANEOUS FUNCTIONS

The following functions are not native Perl functions but are useful when working with Windows.

abspathL PATH

Returns the absolute (fullpath) for PATH. If the path exists, it will replace the components with Windows' long path names. Otherwise, it returns a path that may contain short path names.

        $short = '../SYSTEM~2.PPT';
        $long = abspathL ($short);
        print "$short = $long\n";
        # if it exists it could print something like
        # ../SYSTEM~2.PPT = c:\rootdir\subdir\System File.ppt
        # if not, it might print
        # ../SYSTEM~2.PPT = c:\rootdir\subdir\SYSTEM~2.PPT

        # probably not the same because TMP is short path
        chdirL ($ENV {TMP}) or die "unable to change to TMP dir!";
        $curdir = getcwdL ();
        if (abspathL ($curdir) ne $curdir) {
          print "not the same!\n";
        }
attribL ATTRIBS,PATH

Sets file attributes like the DOS attrib command.

ATTRIBS is a string that identifies the attributes to enable or disable. A plus sign (+) enables and a minus sign (-) disables the attributes that follow. If not provided, a plus sign is assumed.

The attributes are identified by letters which can be upper or lower case. The letters and their values are:

  • H: Hidden.

  • I: Not content indexed. This value may not be valid for all file systems.

  • R: Read-only.

  • S: System.

        # sets System and hidden but disables read-only
        # could also be '-r+sh', 's-r+h', '+hs-r', etc.
        attribL ('sh-r', $file)
          or die "unable to set attributes for $file ($^E)";
copyL FROM,TO

Copies the FROM file to the TO file. If the file exists it is overwritten unless it is hidden or read-only. If it does not exist it inherits the permissions of the parent directory. File attributes are copied with the file. If the FROM file is a symbolic link the target is copied and not the symbolic link. If the TO file is a symbolic link the target is overwritten.

        copyL ($from, $to)
          or die "unable to copy $from to $to ($^E)";
shortpathL PATH

Returns the short path of the file. It returns a blank string if it is unable to get the short path.

        if (shortpathL ($file) eq '') {
          or die "unable to get shortpath for $file";
        }
volinfoL PATH

Returns an object with the volume information for the PATH. PATH can be a relative or fullpath to any object on the volume. The object elements are:

    maxlen: The maximum length of path components (the characters between the backslashes; usually directory names).

    name: The name of the volume.

    serial: The Windows serial number for the volume.

    sysflags: System flags. Indicates the features that are supported by the file system. Use the following constants to retrieve the individual values. Import these values into your environment if you do not want to refer to them with the Win32::LongPath:: prefix.

      FILE_CASE_PRESERVED_NAMES
      FILE_FILE_COMPRESSION
      FILE_NAMED_STREAMS
      FILE_PERSISTENT_ACLS
      FILE_READ_ONLY_VOLUME
      FILE_SEQUENTIAL_WRITE_ONCE
      FILE_SUPPORTS_ENCRYPTION
      FILE_SUPPORTS_EXTENDED_ATTRIBUTES
      FILE_SUPPORTS_OBJECT_IDS
      FILE_SUPPORTS_OPEN_BY_FILE_ID
      FILE_SUPPORTS_REPARSE_POINTS
      FILE_SUPPORTS_SPARSE_FILES
      FILE_SUPPORTS_TRANSACTIONS
      FILE_SUPPORTS_USN_JOURNAL
      FILE_UNICODE_ON_DISK
      FILE_VOLUME_IS_COMPRESSED
      FILE_VOLUME_QUOTAS

NOTE: See the Microsoft MSDN Library for more information about this feature.

        use Win32::LongPath qw(:funcs :volflags);

        $vol = volinfoL ($file)
          or die "unable to get volinfo for $file";
        if (!($vol->{sysflags} & FILE_SUPPORTS_REPARSE_POINTS)) {
          die "symbolic links will not work on $vol->{name}!";
        }

MODULE EXPORTS

All functions are automatically exported by default. The following tags export specific values:

  • :all: all values

  • :funcs: all functions

  • :fileattr: file attributes used by the "statL" and "lstatL" functions

  • :volflags: system flags used by the "volinfoL" function

LIMITATIONS

This module was developed for the Microsoft WinXP and greater environment. It also supports the Cygwin environment.

AUTHOR

Robert Boisvert <rdbprog@gmail.com>

CREDITS

Many thanks to Jan Dubois for getting Windows support started with Win32. It remains the number one module in use on almost every Windows installation of Perl.

A big thank you (どうもありがとうございました) to Yuji Shimada for Win32::Unicode. The concepts used there are the basis for much of Win32::LongPath.

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.