How To Walk A Directory Tree in Jython
In this article, Frank Cohen shows how to write a Jython script to walk through the contents of a nested set of file directories.
While Python has improved in function over the years and Jython development is slowly moving to Python 2.3, Jython has not kept up. One place this is painful is with a simple way to walk through a nested set of file directories. Jython 2.1 is still at the Python 2.1 stage and lacks the os.walk() function introduce in the current Python.
I often find I need an os.walk()-like function when working with TestMaker and a set of results files. Imagine being given an archive of results from a series of load tests. The results come in a set of nested directories with files ending in .XML. The following Jython 2.1 script shows how to implement an os.walk-like function.
class DirectoryWalk:
'''
Walks a filesystem directory and performs an operation on
each file.
''' def __init__( self, directory ):
''' Initialize the DirectoryWalk class '''
self.dir = directory
def walk( self, process ):
'''walk a directory tree'''
print "dir =", self.dir
self.process = process
for self.f in os.listdir( self.dir ):
self.fullpath = os.path.join( self.dir, self.f)
if os.path.isdir( self.fullpath ) and not os.path.islink( self.fullpath ):
self.dw = DirectoryWalk( self.fullpath )
self.dw.walk( self.process )
if os.path.isfile( self.fullpath ):
self.s = str( self.fullpath )
if self.fullpath[-4:].upper()=='.XML':
print self.f
self.process.tallyxml( self.fullpath )
This script implements a DirectoryWalk class and walk method. The method finds all the *.XML files in any of the subdirectories in the defined directory.
To understand the script in more depth I will comment on each part of the script.
class DirectoryWalk:
'''
Walks a filesystem directory and performs an operation on
each file.
'''
def __init__( self, directory ):
''' Initialize the DirectoryWalk class '''
self.dir = directory
The above script defines the DirectoryWalk class. I took this approach to make it easier to implement a recursive pattern as you will see soon. The __init__ method takes a directory path starting point.
def walk( self, process ):
'''walk a directory tree''' print "dir =", self.dir
self.process = process
The above script defines the walk() method and identifies a callback method. Each path to a file is sent to the callback method.
for self.f in os.listdir( self.dir ):
self.fullpath = os.path.join( self.dir, self.f)
The os.listdir function returns a list of the contents of the directory, including any contained directories and files. The script then iterates through each of the directories and files contained in the directory.
if os.path.isdir( self.fullpath ) and not os.path.islink( self.fullpath ):
self.dw = DirectoryWalk( self.fullpath )
self.dw.walk( self.process )
If the returned value is a directory - and not a symbolic link to a file or directory - then the script instantiates a new class to walk down the directory path. This is a recursive pattern to parse the values of each subdirectory.
if os.path.isfile( self.fullpath ):
self.s = str( self.fullpath )
if self.fullpath[-4:].upper()=='.XML':
print self.f
self.process.tallyxml( self.fullpath )
This part of the script identifies the files that we are interested in. In the above case, we are looking for files that end in .XML.
Eventually Jython will release a version that is functionally equivalent to Python 2.3 and the above approach to walking a directory tree will no longer be needed. Until then I find this to be a simple and effective way to process directory heirarchies.
-Frank
After I wrote the above article Kent Johnson sent me his os.walk-like module. It has a much cleaner design than mine. So here is Ken's code:
Hi Frank: Here is an os.walk() work-alike for Jython 2.1. I have tested it and it seems to work but no guarantees. It has the same interface as os.walk() without the optional arguments (though they would not be hard to add).
Kent
import osclass walk: ''' walk() generates the file names in a directory tree, by walking the tree top down. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
This is os.walk() implemented with the old-style iterator protocol. '''
def __init__(self, dirpath): self.dirpath = dirpath self.filenames = [] self.dirnames = []
for f in os.listdir(dirpath): if os.path.isdir(os.path.join(dirpath, f)): self.dirnames.append(f) else: self.filenames.append(f)
# Index into self.dirnames; -1 signals to yield current dir self.next = -1
def __getitem__(self, ix): if self.next == -1: # Yield the current base dir self.next = 0 return (self.dirpath, self.dirnames, self.filenames)
try: # Return the next item from the current subdirectory return self.subwalk.__getitem__(None) except (AttributeError, IndexError): # No subdirectory started or current subdirectory exhausted # In either case, try to start the next self._next_sub() return self.subwalk.__getitem__(None)
def _next_sub(self): ''' Start walking the next sub-directory or raise IndexError ''' if self.next < len(self.dirnames): self.subwalk = walk(os.path.join(self.dirpath, self.dirnames[self.next])) self.next += 1 else: # Nothing left to do raise IndexError


