Os.walk() in python

os.walk() is a function in Python's OS module that generates the file names in a directory tree by walking the tree either top-down or bottom-up. It can be used to search for files in a directory hierarchy or to perform operations on all files in a directory tree.

Syntax:

The parameters for os.walk():

  • top: It is the starting directory for the walk.
  • topdown: It is a Boolean value that determines whether the walk is done top-down (from the root directory down to the leaves) or bottom-up (from the leaves up to the root directory).
  • onerror: It is a function that is called when an error occurs during the walk. If onerror is not specified, the error is printed to standard error and the walk continues.
  • followlinks: This Boolean value determines whether the walk follows symbolic links. If followlinks is True, symbolic links are followed; if it is False, they are not.

os.walk() returns a generator object that yields a 3-tuple (dirpath, dirnames, filenames) for each directory in the directory tree. Here's what each part of the tuple represents:

  • dirpath: It is a string that contains the path of the directory.
  • dirnames: The dirpath's subdirectories are listed by name in this list.
  • filenames: It is a list of the names of the non-directory files in dirpath.

Here's an example of how to use os.walk() to print the names of all the files in a directory tree:

In this example, os.path.join() is used to join dirpath and filename to create the full path of each file. It ensures that the correct path separator is used, regardless of the operating system.

Some other key points about 'os.walk()' in Python:

  1. By default, walk() traverses the directory tree top-down. It means that it starts at the root directory and visits each directory in the tree before visiting the files in each directory. If you want to traverse the directory tree bottom-up, you can pass topdown=False as a parameter.
  2. The dirnames list that is returned for each directory includes only the names of the immediate subdirectories of that directory. If you want to recurse into the subdirectories, you can call os.walk() again on each subdirectory.
  3. If the followlinks parameter is True, os.walk() will follow symbolic links and traverse the linked directory tree. However, be aware that it can result in an infinite loop if the symbolic links form a loop in the directory hierarchy.
  4. You can specify a function that is invoked if an error happens while the walk by using the onerror The function should take three arguments: the exception that was raised, the directory being walked, and the error's traceback. If onerror is not specified, os.walk() will print the error message to the standard error stream and continue the walk.

The os.walk() can be memory-intensive if the directory tree is very large. If you want to minimize memory usage, you can use a with statement to open each file in the directory tree and process it one at a time, rather than loading all the file names into memory at once. Here's an example

In this example, each file is opened with a statement and processed inside the loop, so only one file is in memory at a time.

The os.walk() can be used to perform various operations on files and directories in a directory tree, such as copying, moving, or deleting files. For example, you can use the shutil module to copy all files in a directory tree to a new location:

This code copies all files in the source_dir directory tree to the dest_dir directory.

If you need to skip certain directories or files during the walk, you can modify the dirnames or filenames lists in-place to remove the unwanted directories or files. For example, if you want to skip all the files with a particular extension, you can use a list comprehension to filter the filenames list:






Latest Courses