ChatGPT解决这个技术问题 Extra ChatGPT

How to tar certain file types in all subdirectories?

I want to tar and all .php and .html files in a directory and its subdirectories. If I use

tar -cf my_archive *

it tars all the files, which I don't want. If I use

tar -cf my_archive *.php *.html

it ignores subdirectories. How can I make it tar recursively but include only two types of files?


D
DeeDee

find ./someDir -name "*.php" -o -name "*.html" | tar -cf my_archive -T -


@DeeDee Are there any limitations on the number of files, etc.?
@DeeDee - no, what I meant was you don't need the parens!
@user1566515 There may be some filesystem limit or overall space limit which would put an upper limit on your tar file. That entirely depends on your own system. Otherwise, the piping will essentially create the tar file on-the-fly, so you won't be constrained by file number or size.
Thanks! ... how to add more than 2 conditions / kind of file?
@gluuke use -o -name [pattern] for each new condition
S
Stabledog

If you're using bash version > 4.0, you can exploit shopt -s globstar to make short work of this:

shopt -s globstar; tar -czvf deploy.tar.gz **/Alice*.yml **/Bob*.json

this will add all .yml files that starts with Alice from any sub-directory and add all .json files that starts with Bob from any sub-directory.


The only answer that just uses tar, the best answer IMO.
Despite the impression by glob '**' for directory, this command does not execute recursively (any sub-sub-folders)
@Eddie ** should work. may be there is something different with your parameters. Also check if there is any space in folder name that you pass in the command line. If not, can you paste your actual command ?
'**' is evaluated by the shell before reaching the command and it only seen as 2 independent * which resolves to 0 or characters, it has no recursive functionality to span directories tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm
@eddie yes it is evaluated by shell, though bash > 4.0 has a shopt -s globstar option, so the answer is correct and is actually the best one
s
steampowered

One method is:

tar -cf my_archive.tar $( find -name "*.php" -or -name "*.html" )

There are some caveats with this method however:

It will fail if there are any files or directories with spaces in them, and it will fail if there are so many files that the maximum command line length is full.

A workaround to these could be to output the contents of the find command into a file, and then use the "-T, --files-from FILE" option to tar.


1) By "fail" do you mean the files with spaces will be skipped or the tar archive will not be created? 2) I have about 100K files. Is that over the maximum command line length?
1. It will create the archive, but it will report missing files. 2. That will be too long, I expect. Given this, you'd be best using a method like @DeeDee suggests below, it'll work around these problems quite nicely.
I
Ian Reinhart Geiser

This will handle paths with spaces:

find ./ -type f -name "*.php" -o -name "*.html" -exec tar uvf myarchives.tar {} +

N
Noam Geffen

Put them in a file

find . \( -name "*.php" -o -name "*.html" \) -print > files.txt

Then use the file as input to tar, use -I or -T depending on the version of tar you use

Use h to copy symbolic links

tar cfh my.tar -I files.txt 

d
dmitry_podyachev

find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -I 'pigz -9' -cf target.tgz

for multicore or just for one core:

find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -czf target.tgz


J
John Delaney

Easy with zsh:

tar cvzf foo.tar.gz **/*.(php|html)

Did you mean -czvf?
The - is optional with tar.
W
Walter Tross

If you want to produce a zipped tar file (.tgz) and want to avoid problems with spaces in filenames:

find . \( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz

The -print0 “primary” of find separates output filenames using the NULL (\0) byte, thus playing well with the -0 option of xargs.

The parentheses around the two -name primaries are needed, because otherwise the -print0 would only output the filenames of the second -name (there is no implied printing if -print or -print0 is present, and these only have an effect if they are evaluated).

If you need to skip some filenames or directories (e.g., the node_modules directory if you work with Node.js), prepend one or more -prune primaries like this:

find . -name skipThisName -prune -o \( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz