Linux::NFS::BigDir - use Linux getdents syscall to read large directories over NFS
use Linux::NFS::BigDir qw(getdents); # entries_ref is an array reference my $entries_ref = getdents($very_large_dir);
This module was created to solve a very specific problem: you have a directory over NFS, mounted by a Linux OS, and that directory has a very large number of items (files, directories, etc). The number of entries is so large that you have trouble to list the contents with readdir or even ls from the shell. In extreme cases, the operation just "hangs" and will provide a feedback hours later.
readdir
ls
I observed this behavior only with NFS version 3 (and wasn't able to simulate it with local EXT3/EXT4): you might find in different situations, but in that case it migh be a wrong configuration regarding the filesystem. Ask your administrator first.
If you can't fix (or get fixed) the problem, then you might want to try to use this module. It will use the getdents syscall from Linux. You can check the documentation about this syscall with man getdents in a shell.
getdents
man getdents
In short, this syscall will return a data structure, but you probably will want to use only the name of each entry in the directory.
How can this be useful? Here are some directions:
You want to remove all directory content.
You want to remove files from the directory with a pattern in their filename (using regular expressions, for example).
You want to select specific files by their filenames and then test something else (like atime).
These are examples, but it should cover the vast majority of what you want to do. getdents syscall will be more effective because it will not call stat of each of those files before returning the information to you. That means, you will have the opportunity to filter whatever you need and then call stat if you really need.
stat
I came up at getdents after researching about "how to remove million of files". After a while I reached an C program example that uses getdents to print the filenames under the directory. By using it, I was able to cleanup directories with thousands (or even millions) of files in a couple of minutes, instead of many hours.
This module is a Perl implementation of that.
The sub getdents and getdents_safe are exported on demand.
getdents_safe
Expects the complete path to the directory as a parameter.
Returns an array reference with all files inside that directory but without the 'dot' files.
Meanwhile simple (and probably faster), you should be careful regarding memory restrictions when using this functions.
If you have too many files, you program may try to allocate too much memory, with all the undesired effects. See getdents_safe.
"Safe" version of getdents because it will write each entry read to a text file instead of storing all the entries on memory.
Expects as parameters:
The complete path to the directory to be read.
The complete path to the file that will be used to print each entry, one per line. As convenience, all filenames will be prepended with the complete path to the directory given as parameter.
The filename given will be created. If it already exists, this function will die.
die
This function returns the number of files read from the given directory.
Create C versions of getdents and getdents_safe with Inline::C to see if they get close to readdir speed when running over a local file system (currently they are slower).
pack
syscall
The manual page of getdents.
This discussion about it at PerlMonks.
Alceu Rodrigues de Freitas Junior, <arfreitas@cpan.org>
This software is copyright (c) 2016 of Alceu Rodrigues de Freitas Junior, <glasswalk3r@yahoo.com.br>.
This file is part of Linux-NFS-BigDir distribution.
Linux-NFS-BigDir is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Linux-NFS-BigDir is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Linux-NFS-BigDir. If not, see <http://www.gnu.org/licenses/>.
To install Linux::NFS::BigDir, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Linux::NFS::BigDir
CPAN shell
perl -MCPAN -e shell install Linux::NFS::BigDir
For more information on module installation, please visit the detailed CPAN module installation guide.