Skip to content

How to work with tarball/tar files in Python

Learn to manage tar files using the tarfile standard module.


TAR stands for Tape Archive Files and this format is used to bundle a set of files into a single file, this is specifically helpful when archiving older files or sending a bunch of files over the network.

The Python programming language has tarfile standard module which can be used to work with tar files with support for gzip, bz2, and lzma compressions.

In this article, we will see how tarfile is used to read and write tar files in Python.

Reading a tar file

The tarfile.open function is used to read a tar file. It returns a tarfile.TarFile object.

The two most important arguments this function takes are the filename and operation mode, with the former being a path to the tar file and the latter indicating the mode in which the file should be opened.

The operation mode can optionally be paired with a compression method. The new syntax, therefore, becomes mode[:compression].

Following are the abbreviations for supported compression techniques:

  • gz for gzip
  • bz2 for bz2
  • xz for lzma

Example:

import tarfile

with tarfile.open("sample.tar", "r") as tf:
    print("Opened tarfile")

Extracting tar file contents

After opening a file, extraction can be done using tarfile.TarFile.extractall method. Following are the important arguments accepted by the method:

  • path: path to a directory to which a tar file should be extracted, defaults to .
  • members: specify files to be extracted, should be a subset of tarfile.TarFile.getmembers() output, by default all files are extracted

Example:

import tarfile

with tarfile.open("sample.tar", "r") as tf:
    print("Opened tarfile")
    tf.extractall(path="./extraction_dir")
    print("All files extracted")

Extracting single file

In order to selectively extract files, we need to pass a reference of the file object or file path as string to tarfile.TarFile.extract method.

To list all files inside a tar file use the tarfile.TarFile.getmembers method which returns a list tarfile.TarInfo class instances.

Example:

import tarfile

with tarfile.open("./sample.tar", "r") as tf:
    print("Opened tarfile")
    print(tf.getmembers())
    print("Members listed")

Output:

Opened tarfile
[<TarInfo 'sample' at 0x7fe14b53a048>, <TarInfo 'sample/sample_txt1.txt' at 0x7fe14b53a110>, <TarInfo 'sample/sample_txt2.txt' at 0x7fe14b53a1d8>, <TarInfo 'sample/sample_txt3.txt' at 0x7fe14b53a2a0>, <TarInfo 'sample/sample_txt4.txt' at 0x7fe14b53a368>]

Single file extraction

import tarfile

file_name = "sample/sample_txt1.txt"
with tarfile.open("sample.tar", "r") as tf:
    print("Opened tarfile")
    tf.extract(member=file_name, path="./extraction_dir")
    print(f"{file_name} extracted")

Writing a tar file

To add files to a tar file, the user has to open the file in append mode and use tarfile.TarFile.add method, it takes the path of file to be added as a parameter.

import tarfile

file_name = "sample_txt5.txt"
with tarfile.open(f"./sample.tar", "a") as tf:
    print("Opened tarfile")
    print(f"Members before addition of {file_name}")
    print(tf.getmembers())
    tf.add(f"{file_name}", arcname="sample")
    print(f"Members after addition of {file_name}")
    print(tf.getmembers())

FREE VS Code / PyCharm Extensions I Use

✅ Write cleaner code with Sourcery, instant refactoring suggestions: Link*

* This is an affiliate link. By clicking on it you will not have any additional costs. Instead, you will support my project. Thank you! 🙏