![]() |
Sponsored Link •
|
Advertisement
|
STL meets glob()
: Power, robustness, and genericity without
sacrificing efficiency.
This article is the second of a two part
series looking at techniques for adapting UNIX file-system enumeration
APIs to STL-like sequences. In the first article [1] I described an adaptation of
the UNIX opendir()/readdir()
API to
readdir_sequence
, an STL-like sequence class supporting Input
Iterators. readdir_sequence
reflects the semantics of
opendir()/readdir()
in that it supports enumeration of the
entries in a single directory. It provides the additional features of
being able to select files and/or directories, and the ability to elide
the dots directories—"."
(current directory) and
".."
(parent directory)—from the resultant range if
required.
UNIX has another, more powerful, method of searching the file-system:
the glob()
API. Rather than taking the name of a directory
and enumerating its contents, glob()
takes a wildcard
pattern, such as "/usr/*/std*h/"
, and returns all matching
file-system entries.
glob()
-ing ManuallyWe saw in [1] that there
are several advantages to using an STL-like sequence class instead of a
raw API, and that principle applies equally to using glob()
.
Consider the task of finding all hidden ("dot") files—such as
.bashrc
—in a given code directory using the
glob()
API directly, as shown in the following code.
// enumwithglob.cpp: Enumerating sub-directories using glob() #include <glob.h> // glob(), globfree() #include <sys/stat.h> // stat() #include <algorithm> // std::copy #include <iterator> // std::ostream_iterator #include <iostream> // std::cout, std::endl #include <string> // std::string #include <vector> // std::vector using std::copy; using std::cout; using std::endl; using std::ostream_iterator; using std::string; using std::vector; const char HOME[] = "/home/matty/"; const char PATTERN[] = ".*"; int main() { vector<string> dotNames; glob_t gl; if(0 == glob((string(HOME) + PATTERN).c_str(), 0, NULL, &gl)) { for(char **begin = &gl.gl_pathv[0], **end = &gl.gl_pathv[gl.gl_pathc]; begin != end; ++begin) { struct stat st; // Skip dots if( '.' == (*begin)[0] && ( '\0' == (*begin)[1] || ( '.' == (*begin)[1] && '\0' == (*begin)[2]))) { // do nothing } else { if(0 == stat(*begin, &st)) { if(S_IFREG == (st.st_mode & S_IFREG)) { dotNames.push_back((*begin)); } } } } globfree(&gl); } cout << "Dumping . files in " << HOME << endl; copy(dotNames.begin(), dotNames.end(), ostream_iterator<string>(cout, "\n")); return 0; }
As well as being quite a large amount of code, there are several
specific problems with it. First, we must concatenate the directory and
search pattern, before passing the combined string value as the first
argument in the call to glob()
. Admittedly, in many cases
the two will not be separate, so that's a small thing, but there are other
issues are not so trivial.
A successful call to glob()
returns a block of memory in
the gl_pathv
member of the glob_t
structure
passed as the fourth argument. (glob()
's second and third
arguments are used for flags, which I'll discuss later, and a callback
error-handling function, which I don't discuss in this article.) The
returned memory block contains the paths for all matching entries to the
specified pattern at the time of the invocation of the call. In common
with many UNIX library functions, the addition of the const
keyword to C was too late in the game, so the type of
gl_pathv
is char**
, rather than char
const**
. Hence, the first significant concern in the given code is
that begin
and end
are declared to be of a
non-const pointer type. Although I've obviously avoided it in this case,
it's always possible to introduce bugs by writing to non-const pointers.
The second issue is that the memory block must be freed, and this is
done by calling the beguilingly named globfree()
. This issue
of calling paired allocate/release functions represents a classic problem
area in C++, since any statements occurring between them may be a source
of exception unsafety. Indeed, the code in enumwithglob.cpp is not exception
safe since the call to dotNames.push_back()
may result in an
exception being thrown, in which case globfree()
would not be
called, and the memory block would leak. To correct this would require
inserting try-catch scoping into our sample, making it even more verbose.
The remaining issues with this code are more prosaic, but still detract
from readability and maintainability. The elision of the dots directories
is a manual process, which is a pain for all but the rare cases when you
do actually want them. Finally, each entry must be stat()
-ed
and the return code of stat() tested along with the resultant flags. (For
the non-UNIX folks, stat()
is the logical equivalent of
Win32's GetFileAttributeEx()
.)
(I must admit that in this particular case, the specific dots directory
elision test is not needed because the stat()
test ensures
that directories are not added to dotNames
vector. But I'm
sure you can see how this test could be necessary in the more general
case.)
Sponsored Links
|