I want to tar and all .php and .html files in a directory and its subdirectories. If I use
tar -cf my_archive *
it tars all the files, which I don't want. If I use
tar -cf my_archive *.php *.html
it ignores subdirectories. How can I make it tar recursively but include only two types of files?
find ./someDir -name "*.php" -o -name "*.html" | tar -cf my_archive -T -
If you're using bash
version > 4.0, you can exploit shopt -s globstar
to make short work of this:
shopt -s globstar; tar -czvf deploy.tar.gz **/Alice*.yml **/Bob*.json
this will add all .yml files that starts with Alice from any sub-directory and add all .json files that starts with Bob from any sub-directory.
shopt -s globstar
option, so the answer is correct and is actually the best one
One method is:
tar -cf my_archive.tar $( find -name "*.php" -or -name "*.html" )
There are some caveats with this method however:
It will fail if there are any files or directories with spaces in them, and it will fail if there are so many files that the maximum command line length is full.
A workaround to these could be to output the contents of the find command into a file, and then use the "-T, --files-from FILE" option to tar.
This will handle paths with spaces:
find ./ -type f -name "*.php" -o -name "*.html" -exec tar uvf myarchives.tar {} +
Put them in a file
find . \( -name "*.php" -o -name "*.html" \) -print > files.txt
Then use the file as input to tar, use -I or -T depending on the version of tar you use
Use h to copy symbolic links
tar cfh my.tar -I files.txt
find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -I 'pigz -9' -cf target.tgz
for multicore or just for one core:
find ./ -type f -name "*.php" -o -name "*.html" -printf '%P\n' |xargs tar -czf target.tgz
Easy with zsh:
tar cvzf foo.tar.gz **/*.(php|html)
-czvf
?
-
is optional with tar
.
If you want to produce a zipped tar file (.tgz
) and want to avoid problems with spaces in filenames:
find . \( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz
The -print0
“primary” of find
separates output filenames using the NULL (\0
) byte, thus playing well with the -0
option of xargs
.
The parentheses around the two -name
primaries are needed, because otherwise the -print0
would only output the filenames of the second -name
(there is no implied printing if -print
or -print0
is present, and these only have an effect if they are evaluated).
If you need to skip some filenames or directories (e.g., the node_modules
directory if you work with Node.js), prepend one or more -prune
primaries like this:
find . -name skipThisName -prune -o \( -name \*.php -o -name \*.html \) -print0 | xargs -0 tar -cvzf my_archive.tgz
Success story sharing
-o -name [pattern]
for each new condition