Extract Files From a .tar.gz or .tar.bz2 File on Linux
Tar files are compressed archives. You’ll encounter them frequently while using a Linux distribution like Ubuntu or even while using the terminal on macOS. Here’s how to extract—or untar—the contents of a tar file, also known as a tarball.
What Does .tar.gz and .tar.bz2 Mean?
Files that have a
.tar.gz or a
.tar.bz2 extension are compressed archive files. A file with just a
.tar extension is uncompressed, but those will be very rare.
.tar portion of the file extension stands for tape archive, and is the reason that both of these file types are called tar files. Tar files date all the way back to 1979 when the
tar command was created to allow system administrators to archive files onto tape. Forty years later we are still using the
tar command to extract tar files on to our hard drives. Someone somewhere is probably still using
tar with tape.
.bz2 extension suffix indicates that the archive has been compressed, using either the
bzip2 compression algorithm. The
tar command will work happily with both types of file, so it doesn’t matter which compression method was used—and it should be available everywhere you have a Bash shell. You just need to use the appropriate
tar command line options.
Extracting Files from Tar Files
Let’s say you’ve downloaded two files of sheet music. One file is called
ukulele_songs.tar.gz , the other is called
guitar_songs.tar.bz2. These files are in the Downloads directory.
Let’s extract the ukulele songs:
tar -xvzf ukulele_songs.tar.gz
As the files are extracted, they are listed in the terminal window.
The command line options we used are:
- -x: Extract, retrieve the files from the tar file.
- -v: Verbose, list the files as they are being extracted.
- -z: Gzip, use gzip to decompress the tar file.
- -f: File, the name of the tar file we want
tarto work with. This option must be followed by the name of the tar file.
List the files in the directory with
ls and you’ll see that a directory has been created called Ukulele Songs. The extracted files are in that directory. Where did this directory come from? It was contained in the
tar file, and was extracted along with the files.
Now let’s extract the guitar songs. To do this we’ll use almost exactly the same command as before but with one important difference. The
.bz2 extension suffix tells us it has been compressed using the bzip2 command. Instead of using the
-z (gzip) option, we will use the
-j (bzip2) option.
tar -xvjf guitar_songs.tar.bz2
Once again, the files are listed to the terminal as they are extracted. To be clear, the command line options we used with
tar for the
.tar.bz2 file were:
- -x: Extract, retrieve the files from of the tar file.
- -v: Verbose, list the files as they are being extracted.
- -j: Bzip2, use bzip2 to decompress the tar file.
- -f: File, name of the tar file we want tar to work with.
If we list the files in the Download directory we will see that another directory called Guitar Songs has been created.
Choosing Where to Extract the Files To
If we want to extract the files to a location other than the current directory, we can specify a target directory using the
-C (specified directory) option.
tar -xvjf guitar_songs.tar.gz -C ~/Documents/Songs/
Looking in our Documents/Songs directory we’ll see the Guitar Songs directory has been created.
Note that the target directory must already exist,
tar will not create it if it is not present. If you need to create a directory and have
tar extract the files into it all in one command, you can do that as follows:
mkdir -p ~/Documents/Songs/Downloaded && tar -xvjf guitar_songs.tar.gz -C ~/Documents/Songs/Downloaded/
-p (parents) option causes
mkdir to create any parent directories that are required, ensuring the target directory is created.
Looking Inside Tar Files Before Extracting Them
So far we’ve just taken a leap of faith and extracted the files sight unseen. You might like to look before you leap. You can review the contents of a
tar file before you extract it by using the
-t (list) option. It is usually convenient to pipe the output through the
tar -tf ukulele_songs.tar.gz | less
Notice that we don’t need to use the
-z option to list the files. We only need to add the
-z option when we’re extracting files from a
.tar.gz file. Likewise, we don’t need the
-j option to list the files in a
Scrolling through the output we can see that everything in the tar file is held within a directory called Ukulele Songs, and within that directory, there are files and other directories.
We can see that the Ukulele Songs directory contains directories called Random Songs, Ramones and Possibles.
To extract all the files from a directory within a tar file use the following command. Note that the path is wrapped in quotation marks because there are spaces in the path.
tar -xvzf ukulele_songs.tar.gz "Ukulele Songs/Ramones/"
To extract a single file, provide the path and the name of the file.
tar -xvzf ukulele_songs.tar.gz "Ukulele Songs/023 - My Babe.odt"
You can extract a selection of files by using wildcards, where
* represents any string of characters and
? represents any single character. Using wildcards requires the use of the
tar -xvz --wildcards -f ukulele_songs.tar.gz "Ukulele Songs/Possibles/B*"
Extracting Files Without Extracting Directories
If you don’t want the directory structure in the tar file to be recreated on your hard drive, use the
--strip-components option. The
--strip-components option requires a numerical parameter. The number represents how many levels of directories to ignore. Files from the ignored directories are still extracted, but the directory structure is not replicated on your hard drive.
If we specify
--strip-components=1 with our example tar file, the Ukulele Songs top-most directory within the tar file is not created on the hard drive. The files and directories that would have been extracted to that directory are extracted in the target directory.
tar -xvzf ukulele_songs.tar.gz --strip-components=1
There are only two levels of directory nesting within our example tar file. So if we use
--strip-components=2, all the files are extracted in the target directory, and no other directories are created.
tar -xvzf ukulele_songs.tar.gz --strip-components=2
If you look at the Linux man page you’ll see that
tar has got to be a good candidate for the title of “command having the most command line options.” Thankfully, to allow us to extract files from
tar.bz2 files with a good degree of granular control, we only need to remember a handful of these options.