Sponsored Link •
Despite the fact that
glob_sequence provides a more
refined iterator concept (Random Access ) than does
readdir_sequence (Input Iterator ), its implementation (see globsequencemethods.h)
is conceptually simpler. This is because
glob() returns an
array of pointers to entries, hence, the
char const** . Because it provides Random
glob_sequence is able to also provide a
subscript operator, within which the requested index is subject to a
validity assertion, and also the
rend() functions that provide reverse iterators to the
managed range. All we would need to do in a dumb mapping is provide
pointers to the first (
&gl_pathv) and one-past-the-end
&gl_pathv[gl_pathc]) locations in the array for our
Because we want to be able to select files and/or directories, and trim
dots directories, we need to put in a little more effort. As demonstrated
in  and , when dealing with sequential
APIs such as
opendir()/readdir() on UNIX, or
FindFirstFile() on Win32, it's relatively straightforward to
remove an element from the enumerated range: just call
FindNextFile()) again. Such logic
can be added to the iterator class. But
glob() returns the
entire unfiltered set en bloc, and to continue to support the
Random Iterator concept we need to ensure that the pointers to all the
selected entries are stored contiguously. This means we have to
manipulate the pointers in situ.
All the action takes place in the
function, called by the two constructors. Each constructor initialises
m_flags members via the
static method, which adds both files and directories to the flags in the
case where neither is specified in the constructor arguments.
glob_init_() performs three main functions. First, it
concatenates the directory, if specified, and the search pattern, and
translates the directory to an absolute path if absolutePath is specified.
Second, it calls
glob(), presenting the composite
directory+pattern and the appropriate translation of the
glob_sequence constructor flags (
files/directories) to those of the
glob() API. If the call to
glob() fails, a
glob_sequence_exception is thrown.
The final role of
glob_init_() is to process the results.
This is done in four parts. First, if there are to be any changes, the
entry pointers are copied into the member buffer
which is an instance of
stlsoft::auto_buffer . This is to allow us to
manipulate the array of pointers without altering the contents of the
buffer returned by
glob(), which would be very poor form .
Second, if the dots directories are to be elided, the function searches
through the array of pointers until it's found them both. For each one
found the pointer is swapped with the one at the base of the array, and
then the base advanced one and the item count dropped. In the first
implementation I simply incremented
".." if they were present, since they
were always the first two (or one) entries in the array. Unfortunately,
this was only an artefact of my test environment , and when I moved to a
different platform I quickly saw the error of my ways. This is an
important point: when doing cross-platform development, you must test on
glob() will append a path name separator to
directories if asked, we cannot simply test the entry names for
".." using the
filesystem_traits::is_dots() function we saw in the
readdir_sequence. Instead, the member
is_end_of_path_elements_() provide the requisite
functionality, and enable the processing to be terminated once both dots
directories are found in a given set of results.
The third part of the filtering is to elide directories. Since
glob() supports the
GLOB_ONLYDIR to request only
directories, we can trust it to not return files when we've not asked for
them with the files flags. There's no flag to ask
return only files, however, so we need to do that ourselves. As you saw
in globsequencemethods.h, this is
achieved in the same manner as for the dots elision. This is done in one
of two ways. If we asked for directories to be marked with a trailing
path name separator (with the
markDirs flag) then we simply
test whether the entry name is so marked. If it is, then we do the
swap/advance. If not, we test it with
swap/advance if it's not a file.
All this swapping and advancing is quite efficient, but it leaves the
entries out of their original order. Hence, the last part of the
filtering task for
glob_init_() is to call
std::sort() on the remaining entries if the
noSort flag was not specified, and if the resultant number of
entries is different to that returned by
[See the sidebar, Taking Exception to
Error Handling for more information on
You may be wondering, given the power of
glob_sequence, why should one ever want to use
readdir_sequence? There are several reasons:
glob()is implemented in terms of
readdir(). Consequently, it's inevitable that if your requirement is more suitable to
readdir(), then using
readdir() / readdir_sequencedirectly will be more efficient.
glob()returns all the matching entries in an array. Naturally, the search has to be conducted to completion before any results may be returned. If you are looking for a subset of entries in a given search, it may well be more efficient to be able to search and process them in a step-wise fashion with
readdir(), especially if the entries you are after may occur earlier in the search.
glob_sequence) accepts a pattern that may contain wildcards, and those wildcards may also appear in the directory parts of the given pattern, it increases the complexity of your program. If you want to search a given directory, then you constrain this search more clearly by using
readdir(), which will balk at being given any wildcards.
Notwithstanding these specific aspects,
glob() is a very
powerful and useful tool on UNIX. Using it via
affords all the advantages over the raw API demonstrated here, and costs
FindFileenumeration API to STL iterators, in the form of the WinSTL (http://winstl.org)
(basic_)findfile_sequence<>class. These three file enumeration classes represent an interesting spectrum of models:
readdir_sequencedoes not take wildcards and returns all files and/or directories in a single directory;
findfile_sequencedoes the same, but can specify wildcards for filtering the entries;
glob_sequencemay use wildcards in the entry names and in the directory names. If you keep wildcards out of the directory when using
glob_sequence, then it has virtually identical semantics as
findfile_sequence, and the two may be used to provide the file-searching functionality in cross-platform developments: client code can remain largely unchanged, save for the selection of the requisite component from its namespace [5, 6]
stlsoft::pointer_iterator<value_type const , const_pointer , const_reference >::iterator_type. This is to ensure that it is compatible with standard library implementations whose random-access iterators are actually implemented as class objects. This could be the subject of an entire article in itself—and probably will be in the future—so you?ll have to trust me for now. :-)
auto_buffer<>is an optimised dynamically (re-)sizable memory buffer, suitable for manipulating arrays of POD types. In most cases, its use in
glob_sequencewon't result in any memory allocation anyway, since the internal array size is set to 64 (see globsequence.h in this article).
globfree()is not going to check that the order of the pointers in the
gl_pathv buffer. It seemed too fantastic to me to imagine that any implementation would. Note that I was not so cavalier with the actual contents of the pointer array. Specifically they were swapped in situ, rather than, say, setting to NULL the ones selected out, in case the
glob()implementation allocated the storage for each entry individually, rather than in a single block. Nonetheless, a few years older and wiser, and this assumption seemed much less smart. Making assumptions about the insides of APIs you've not implemented is a very bad habit to get into, and this instance certainly didn't warrant taking the risks.
glob()for Win32 using the
FindFirstFile()API, available online at http://synesis.com.au/software/index.html#unixem. The initial version returned "." and ".." as the first two items—because
FindFirstFile()orders the matched items—hence my consternation when I moved the class and test program to one of my Linux boxes.
ls") library ( http://recls.org/) is a platform-independent search library, which provides recursive enumeration of file-system and, as of version 1.5, FTP systems. It has a C-API, and provides mappings to several languages/technologies, including C++, C#/.NET, COM, D, Java, and STL. It has been the exemplar for the first few installments of my Positive Integration column, for C/C++ Users Journal (http://www.cuj.com/).
FindFileChangeNotification()function, which means that application code is inherently biased towards looking at file-system contents as static. Given the fact that UNIX focuses, or originally focused, on short-lived discrete tasks, this is understandable. Nonetheless, it is certainly possible for file-system contents to change, and when writing file-system code, one needs to be prepared to handle it.