Sponsored Link •
This article shows how to easily process UNIX-style directory entries as STL sequences.Copyright © 2004, Matthew Wilson
The C++ community is gradually moving towards the STL paradigm—containers, iterators, function objects, algorithms and adaptors—which provides great advantages in terms of commonality of expression, reduction in developer effort, and greater robustness, maintainability and reuse. However, apart from the standard library and one or two exceptions (Boost , STLSoft ), there is relatively little STL-compliant code, which is, in part, due to the complexities involved in its development.
In this article I'll demonstrate how one kind of enumeration
opendir()/readdir() API—can be mapped
into an STL-compliant sequence-like class providing Input
Iterators. I examine the implementation of
readdir_sequence, from UNIXSTL, the UNIX-specific
sub-project of STLSoft ().
Before getting into the details, consider the advantages of an
STL-compliant approach. Let's say you need to load the names of
the sub-directories of
/home/matty/ into a vector of
strings. Using the raw
opendir() API, your code
might look something like that shown in Listing 1.
It's not an enormous amount of code, to be sure, but it's
still quite a bit for something so conceptually straightforward.
Let's look at the
readdir_sequence-based version, in
Listing 2. It's clear in
comparing the two code snippets that
readdir_sequence represents a big win over the
version using the
opendir() API. There are several
closedir(), is automatically handled, via Resource Acquisition Is Initialisation, which improves robustness. Indeed, the first example is not exception-safe, since the
std::stringconstructor and the
std::vector<>::push_back()method may throw exceptions; the second version is exception-safe.
The one flaw in the design is that the value_type is
struct dirent const*, which means you have to
explicitly enumerate the entries, rather than use algorithms or
iterator-based constructors (see Sidebar).
Let's look now how it's implemented. Listing 3 shows the definition of
the readdir_sequence class. It provides
end() methods, which return iterators of member type
const_iterator, since only non-mutating access to
the entries is provided. Hence,
provides a read-only view on a directory.
It also provides an
empty() method, which tests
deliberately omitted a
size() member, since the size
could only be obtained by conducting an enumeration over the
range, which is a costly operation. Not providing this method is
an unequivocal documentation of this fact . If one really wants
to calculate the range size, it can be done via
For convenience to client code, a
method is provided which returns the search directory, having
ensured, in the constructor, that it has a trailing path name
separator ('/'). This means that if you need to express the
returned values in absolute form, you can do so simply, as
dirNames.push_back(dir.get_directory() + (*b)->d_name);
The two parameters to the constructor are the directory to
search and the flags, which control the search. I'll look at
these shortly when I discuss the
Since all the member variables of this class are fully-fledged value types, there is no need to proscribe or provide explicit implementations of either the copy constructor or the copy assignment operator. The compiler provided ones will work quite nicely and safely.
Another notable aspect to the class is that the type of the
m_directory, depends on whether the
PATH_MAX is defined . It if is, then the
operating environment has a fixed maximum path limit, and the
class uses the STLSoft string
which has an internal array of the given size, so there's no
memory allocation involved. If
PATH_MAX is not
defined, then the operating environment does not have a fixed
maximum path limit, so the STLSoft string
basic_simple_string is used instead . It's a small
efficiency, certainly, but I'm funny like that.
A further saving comes from the use of the
m_scratch member. Since the directory is constant
for the life of the sequence/iterator, and it's contents are not
revealed directly outside the iterator instance, you are able to
reuse it for constructing the full path names of the entries, in
order to call
stat() on them. The
m_dirLen member remembers the original length of the
directory alone, so it can be truncated to that length (which
includes path name separator) ready for each entry name to be
appended to it.
opendir() API provides a two-step process to
directory enumeration (in contrast to, say, Win32's
FindFirstFile()), so the sequence class starts the
begin() method—and hands the DIR
pointer over to the iterator to walk through the matched
4 shows the definition of the
readdir_sequence::const_iterator class, which is
where all the action happens. Because
affords only single-pass manipulation, only the Input Iterator
concept  is
supported. Iterator copy construction and copy assignment
semantics are supported by the use of a reference-counted shared
handle, of type
, to support
iterator instance copying. Multiple concurrent enumerations may
be conducted by calling
begin() but since we're
dealing with file systems, whose contents may change at any time
as a result of other processes, subsequent ranges obtained from
begin() may not contain the same elements.
opendir() API returns all the entries in a
particular directory, irrespective of whether they are files or
directories, and includes the dots directories—"." and "..".
Specifying files without directories, or vice versa, as the
second argument to the
causes only the matching entries to be returned, which is a nice
convenience. If neither are specified, it defaults to returning
both, because this conforms to the API's behaviour and also
because it is more efficient, as you'll see shortly. Since most
directory enumeration—in my experience, at least—is not concerned
with the dots directories, they are elided from the enumeration
range by default. Specifying the
readdir_sequence::includeDots flag causes them to be
included in the range.
Entry filtering is performed within
Detection of files and/or directories is done by calling
stat(), but this is only called when one type or the
other is to be returned; hence no unnecessary costs ensue
. If the
entry does not match, the
for loop is not broken,
and the next entry is retrieved. Dot elision is similarly done by
detecting whether the entry is "." or ".." . When
NULL the enumeration
is complete, and the iterator enters a state whereby a comparison
with that returned by
end() would evaluate to true,
terminating the client iteration loop (assuming it's written
In my next article, I'll describe the mapping of the UNIX
glob() API, which supports a more refined STL
Iterator model , and presents a
number of different challenges to providing a simple and
efficient sequence class. If you're interested, you can download
the STLSoft libraries here.
std::basic_string, since you can use algorithms for them. It is compatible with the IOStreams, while having no knowledge of them whatsoever. But I'm not given to attempting to sell its use to others; we can settle on it being an internal STLSoft implementation component.
readdir_sequenceis not written to be thread-safe, so
rds_shared_handledoes not use atomic integer operations. If you want thread-safety, you must handle that in client code.
Discuss this article in the Articles Forum topic, Reading Unix-style Directories via STL-compliant Sequences.