|
|
|
Sponsored Link •
|
|
Advertisement
|
Despite the fact that glob_sequence provides a more
refined iterator concept (Random Access [7]) than does
readdir_sequence (Input Iterator [8]), its implementation (see globsequencemethods.h)
is conceptually simpler. This is because glob() returns an
array of pointers to entries, hence, the const_iterator type
is char const** [9]. Because it provides Random
Access iterators, glob_sequence is able to also provide a
subscript operator, within which the requested index is subject to a
validity assertion, and also the rbegin() and
rend() functions that provide reverse iterators to the
managed range. All we would need to do in a dumb mapping is provide
pointers to the first (&gl_pathv[0]) and one-past-the-end
(&gl_pathv[gl_pathc]) locations in the array for our
begin() and end() iterators.
Because we want to be able to select files and/or directories, and trim
dots directories, we need to put in a little more effort. As demonstrated
in [1] and [3], when dealing with sequential
APIs such as opendir()/readdir() on UNIX, or
FindFirstFile() on Win32, it's relatively straightforward to
remove an element from the enumerated range: just call
(readdir() or FindNextFile()) again. Such logic
can be added to the iterator class. But glob() returns the
entire unfiltered set en bloc, and to continue to support the
Random Iterator concept we need to ensure that the pointers to all the
selected entries are stored contiguously. This means we have to
manipulate the pointers in situ.
All the action takes place in the glob_init_() member
function, called by the two constructors. Each constructor initialises
the m_flags members via the validate_flag_()
static method, which adds both files and directories to the flags in the
case where neither is specified in the constructor arguments.
glob_init_() performs three main functions. First, it
concatenates the directory, if specified, and the search pattern, and
translates the directory to an absolute path if absolutePath is specified.
Second, it calls glob(), presenting the composite
directory+pattern and the appropriate translation of the
glob_sequence constructor flags (markDirs,
noSort, files/directories) to those of the
glob() API. If the call to glob() fails, a
glob_sequence_exception is thrown.
The final role of glob_init_() is to process the results.
This is done in four parts. First, if there are to be any changes, the
entry pointers are copied into the member buffer m_buffer,
which is an instance of stlsoft::auto_buffer [10]. This is to allow us to
manipulate the array of pointers without altering the contents of the
buffer returned by glob(), which would be very poor form [11].
Second, if the dots directories are to be elided, the function searches
through the array of pointers until it's found them both. For each one
found the pointer is swapped with the one at the base of the array, and
then the base advanced one and the item count dropped. In the first
implementation I simply incremented m_base past
"." and ".." if they were present, since they
were always the first two (or one) entries in the array. Unfortunately,
this was only an artefact of my test environment [12], and when I moved to a
different platform I quickly saw the error of my ways. This is an
important point: when doing cross-platform development, you must test on
several platforms.
Since glob() will append a path name separator to
directories if asked, we cannot simply test the entry names for
equivalence to "." or ".." using the
filesystem_traits::is_dots() function we saw in the
implementation of readdir_sequence. Instead, the member
functions is_dots_maybe_slashed_() and
is_end_of_path_elements_() provide the requisite
functionality, and enable the processing to be terminated once both dots
directories are found in a given set of results.
The third part of the filtering is to elide directories. Since
glob() supports the GLOB_ONLYDIR to request only
directories, we can trust it to not return files when we've not asked for
them with the files flags. There's no flag to ask glob() to
return only files, however, so we need to do that ourselves. As you saw
in globsequencemethods.h, this is
achieved in the same manner as for the dots elision. This is done in one
of two ways. If we asked for directories to be marked with a trailing
path name separator (with the markDirs flag) then we simply
test whether the entry name is so marked. If it is, then we do the
swap/advance. If not, we test it with stat(), and
swap/advance if it's not a file.
All this swapping and advancing is quite efficient, but it leaves the
entries out of their original order. Hence, the last part of the
filtering task for glob_init_() is to call
std::sort() on the remaining entries if the
noSort flag was not specified, and if the resultant number of
entries is different to that returned by glob().
The implementation is available online [2] as part of the STLSoft
libraries, which includes sample/test programs. This class is also used
within the UNIX implementation of the recls library [13].
[See the sidebar, Taking Exception to
Error Handling for more information on glob_sequence and
exception safety.]
You may be wondering, given the power of glob() /
glob_sequence, why should one ever want to use readdir() /
readdir_sequence? There are several reasons:
glob() is implemented in terms of readdir().
Consequently, it's inevitable that if your requirement is more suitable to
readdir(), then using readdir() /
readdir_sequence directly will be more efficient.
glob() returns all the matching entries in an
array. Naturally, the search has to be conducted to completion before any
results may be returned. If you are looking for a subset of entries in a
given search, it may well be more efficient to be able to search and
process them in a step-wise fashion with readdir(),
especially if the entries you are after may occur earlier in the search.
glob() (or
glob_sequence) accepts a pattern that may contain wildcards,
and those wildcards may also appear in the directory parts of the given
pattern, it increases the complexity of your program. If you want to
search a given directory, then you constrain this search more clearly by
using readdir(), which will balk at being given any
wildcards.
Notwithstanding these specific aspects, glob() is a very
powerful and useful tool on UNIX. Using it via glob_sequence
affords all the advantages over the raw API demonstrated here, and costs
you nothing.
FindFile enumeration API to STL iterators, in the form of the
WinSTL (http://winstl.org)
(basic_)findfile_sequence<> class. These three file
enumeration classes represent an interesting spectrum of models:
readdir_sequence does not take wildcards and returns all
files and/or directories in a single directory;
findfile_sequence does the same, but can specify wildcards
for filtering the entries; glob_sequence may use wildcards in
the entry names and in the directory names. If you keep wildcards out of
the directory when using glob_sequence, then it has virtually
identical semantics as findfile_sequence, and the two may be
used to provide the file-searching functionality in cross-platform
developments: client code can remain largely unchanged, save for the
selection of the requisite component from its namespace [5, 6]
stlsoft::pointer_iterator<value_type const , const_pointer ,
const_reference >::iterator_type. This is to ensure that it is
compatible with standard library implementations whose random-access
iterators are actually implemented as class objects. This could be the
subject of an entire article in itself—and probably will be in the
future—so you?ll have to trust me for now. :-)
auto_buffer<> is an optimised dynamically (re-)sizable
memory buffer, suitable for manipulating arrays of POD types. In most
cases, its use in glob_sequence won't result in any memory
allocation anyway, since the internal array size is set to 64 (see globsequence.h in this article).
globfree() is not going
to check that the order of the pointers in the gl_pathv
buffer. It seemed too fantastic to me to imagine that any
implementation would. Note that I was not so cavalier with the actual
contents of the pointer array. Specifically they were swapped in situ,
rather than, say, setting to NULL the ones selected out, in case the
glob() implementation allocated the storage for each entry
individually, rather than in a single block. Nonetheless, a few years
older and wiser, and this assumption seemed much less smart. Making
assumptions about the insides of APIs you've not implemented is a very bad
habit to get into, and this instance certainly didn't warrant taking the
risks.
glob() for Win32 using the
FindFirstFile() API, available online at
http://synesis.com.au/software/index.html#unixem. The initial version
returned "." and ".." as the first two items—because
FindFirstFile() orders the matched items—hence my
consternation when I moved the class and test program to one of my Linux
boxes.
recls ("recursive ls") library ( http://recls.org/) is a platform-independent
search library, which provides recursive enumeration of file-system and,
as of version 1.5, FTP systems. It has a C-API, and provides mappings to
several languages/technologies, including C++, C#/.NET, COM, D, Java, and
STL. It has been the exemplar for the first few installments of my
Positive Integration column, for C/C++ Users Journal (http://www.cuj.com/).
FindFileChangeNotification()
function, which means that application code is inherently biased towards
looking at file-system contents as static. Given the fact that UNIX
focuses, or originally focused, on short-lived discrete tasks, this is
understandable. Nonetheless, it is certainly possible for file-system
contents to change, and when writing file-system code, one needs to be
prepared to handle it.
|
Sponsored Links
|