I have several hundred PDFs under a directory in UNIX. The names of the PDFs are really long (approx. 60 chars).
When I try to delete all PDFs together using the following command:
rm -f *.pdf
I get the following error:
/bin/rm: cannot execute [Argument list too long]
What is the solution to this error? Does this error occur for mv
and cp
commands as well? If yes, how to solve for these commands?
The reason this occurs is because bash actually expands the asterisk to every matching file, producing a very long command line.
Try this:
find . -name "*.pdf" -print0 | xargs -0 rm
Warning: this is a recursive search and will find (and delete) files in subdirectories as well. Tack on -f
to the rm command only if you are sure you don't want confirmation.
You can do the following to make the command non-recursive:
find . -maxdepth 1 -name "*.pdf" -print0 | xargs -0 rm
Another option is to use find's -delete
flag:
find . -name "*.pdf" -delete
tl;dr
It's a kernel limitation on the size of the command line argument. Use a for
loop instead.
Origin of problem
This is a system issue, related to execve
and ARG_MAX
constant. There is plenty of documentation about that (see man execve, debian's wiki, ARG_MAX details).
Basically, the expansion produce a command (with its parameters) that exceeds the ARG_MAX
limit. On kernel 2.6.23
, the limit was set at 128 kB
. This constant has been increased and you can get its value by executing:
getconf ARG_MAX
# 2097152 # on 3.5.0-40-generic
Solution: Using for Loop
Use a for
loop as it's recommended on BashFAQ/095 and there is no limit except for RAM/memory space:
Dry run to ascertain it will delete what you expect:
for f in *.pdf; do echo rm "$f"; done
And execute it:
for f in *.pdf; do rm "$f"; done
Also this is a portable approach as glob have strong and consistant behavior among shells (part of POSIX spec).
Note: As noted by several comments, this is indeed slower but more maintainable as it can adapt more complex scenarios, e.g. where one want to do more than just one action.
Solution: Using find
If you insist, you can use find
but really don't use xargs as it "is dangerous (broken, exploitable, etc.) when reading non-NUL-delimited input":
find . -maxdepth 1 -name '*.pdf' -delete
Using -maxdepth 1 ... -delete
instead of -exec rm {} +
allows find
to simply execute the required system calls itself without using an external process, hence faster (thanks to @chepner comment).
References
I'm getting "Argument list too long". How can I process a large list in chunks? @ wooledge
execve(2) - Linux man page (search for ARG_MAX) ;
Error: Argument list too long @ Debian's wiki ;
Why do I get “/bin/sh: Argument list too long” when passing quoted arguments? @ SuperUser
for
loop. I've used find
before, but I'm always looking up how to do it as I forget the options, etc. all the time. for
seems easier to recall IMHO
for f in *; do rm "$f"; done
work as a charm
find -exec
solution seems to be MUCH faster than the for
loop.
4.15.0-1019-gcp
to be exact) and the limit is still at 2097152. Interestingly enough, searching for ARG_MAX on the linux git repo gives a result showing ARG_MAX to be at 131702.
find
has a -delete
action:
find . -maxdepth 1 -name '*.pdf' -delete
xargs
, as per Dennis' answer, works as intended.
-exec
is to remove a bunch of files. -exec rm {} +
would do the same thing, but still requires starting at least one external process. -delete
allows find
to simply execute the required system calls itself without using an external wrapper.
Another answer is to force xargs
to process the commands in batches. For instance to delete
the files 100
at a time, cd
into the directory and run this:
echo *.pdf | xargs -n 100 rm
echo
is a shell builtin. If you end up using the command echo
, you'll still run into the program arguments limit.
echo
could produce something else than the literal file names. If a file name contains a newline, it will look to xargs
like two separate file names, and you get rm: firsthalfbeforenewline: No such file or directory
. On some platform, file names which contain single quotes will also confuse xargs
with the default options. (And the -n 100
is probably way too low; just omit the option to let xargs
figure out the optimal number of processes it needs.)
-n 100
as a conservatively low number because I don't think there is any reason to increase n
to the point of being "optimal" here.
If you’re trying to delete a very large number of files at one time (I deleted a directory with 485,000+ today), you will probably run into this error:
/bin/rm: Argument list too long.
The problem is that when you type something like rm -rf *
, the *
is replaced with a list of every matching file, like “rm -rf file1 file2 file3 file4” and so on. There is a relatively small buffer of memory allocated to storing this list of arguments and if it is filled up, the shell will not execute the program.
To get around this problem, a lot of people will use the find command to find every file and pass them one-by-one to the “rm” command like this:
find . -type f -exec rm -v {} \;
My problem is that I needed to delete 500,000 files and it was taking way too long.
I stumbled upon a much faster way of deleting files – the “find” command has a “-delete” flag built right in! Here’s what I ended up using:
find . -type f -delete
Using this method, I was deleting files at a rate of about 2000 files/second – much faster!
You can also show the filenames as you’re deleting them:
find . -type f -print -delete
…or even show how many files will be deleted, then time how long it takes to delete them:
root@devel# ls -1 | wc -l && time find . -type f -delete
100000
real 0m3.660s
user 0m0.036s
sys 0m0.552s
sudo find . -type f -delete
to delete about 485 thousand files and it worked for me. Took about 20 seconds.
Or you can try:
find . -name '*.pdf' -exec rm -f {} \;
find . -maxdepth 1 -name '*.pdf' -exec rm -f {} \;
you can try this:
for f in *.pdf
do
rm "$f"
done
EDIT: ThiefMaster comment suggest me not to disclose such dangerous practice to young shell's jedis, so I'll add a more "safer" version (for the sake of preserving things when someone has a "-rf . ..pdf" file)
echo "# Whooooo" > /tmp/dummy.sh
for f in '*.pdf'
do
echo "rm -i \"$f\""
done >> /tmp/dummy.sh
After running the above, just open the /tmp/dummy.sh
file in your favorite editor and check every single line for dangerous filenames, commenting them out if found.
Then copy the dummy.sh
script in your working dir and run it.
All this for security reasons.
-rf .. .pdf
-rf
takes precedence over -i
, so your 2nd version is no better (without manual inspection). And is basically useless for mass delete, because of prompting for every file.
You could use a bash array:
files=(*.pdf)
for((I=0;I<${#files[@]};I+=1000)); do
rm -f "${files[@]:I:1000}"
done
This way it will erase in batches of 1000 files per step.
xargs
, rather poorly.
you can use this commend
find -name "*.pdf" -delete
The rm command has a limitation of files which you can remove simultaneous.
One possibility you can remove them using multiple times the rm command bases on your file patterns, like:
rm -f A*.pdf
rm -f B*.pdf
rm -f C*.pdf
...
rm -f *.pdf
You can also remove them through the find command:
find . -name "*.pdf" -exec rm {} \;
rm
has no such limit on the number of files it will process (other than that its argc
cannot be larger than INT_MAX
). It's the kernel's limitation on the maximum size of the entire argument array (that's why the length of the filenames is significant).
If they are filenames with spaces or special characters, use:
find -name "*.pdf" -delete
For files in current directory only:
find -maxdepth 1 -name '*.pdf' -delete
This sentence search all files in the current directory (-maxdepth 1) with extension pdf (-name '*.pdf'), and then, delete.
-exec
is that you don't invoke a shell. The quotes here do absolutely nothing useful. (They prevent any wildcard expansion and token splitting on the string in the shell where you type in this command, but the string {}
doesn't contain any whitespace or shell wildcard characters.)
find
, but still too long
i was facing same problem while copying form source directory to destination
source directory had files ~3 lakcs
i used cp with option -r and it's worked for me
cp -r abc/ def/
it will copy all files from abc to def without giving warning of Argument list too long
Try this also If you wanna delete above 30/90 days (+) or else below 30/90(-) days files/folders then you can use the below ex commands
Ex: For 90days excludes above after 90days files/folders deletes, it means 91,92....100 days
find <path> -type f -mtime +90 -exec rm -rf {} \;
Ex: For only latest 30days files that you wanna delete then use the below command (-)
find <path> -type f -mtime -30 -exec rm -rf {} \;
If you wanna giz the files for more than 2 days files
find <path> -type f -mtime +2 -exec gzip {} \;
If you wanna see the files/folders only from past one month . Ex:
find <path> -type f -mtime -30 -exec ls -lrt {} \;
Above 30days more only then list the files/folders Ex:
find <path> -type f -mtime +30 -exec ls -lrt {} \;
find /opt/app/logs -type f -mtime +30 -exec ls -lrt {} \;
And another one:
cd /path/to/pdf
printf "%s\0" *.[Pp][Dd][Ff] | xargs -0 rm
printf
is a shell builtin, and as far as I know it's always been as such. Now given that printf
is not a shell command (but a builtin), it's not subject to "argument list too long ...
" fatal error.
So we can safely use it with shell globbing patterns such as *.[Pp][Dd][Ff]
, then we pipe its output to remove (rm
) command, through xargs
, which makes sure it fits enough file names in the command line so as not to fail the rm
command, which is a shell command.
The \0
in printf
serves as a null separator for the file names wich are then processed by xargs
command, using it (-0
) as a separator, so rm
does not fail when there are white spaces or other special characters in the file names.
printf
isn't a shell builtin, it will be subject to the same limitation.
Argument list too long
As this question title for cp
, mv
and rm
, but answer stand mostly for rm
.
Un*x commands
Read carefully command's man page!
For cp
and mv
, there is a -t
switch, for target:
find . -type f -name '*.pdf' -exec cp -ait "/path to target" {} +
and
find . -type f -name '*.pdf' -exec mv -t "/path to target" {} +
Script way
There is an overall workaroung used in bash script:
#!/bin/bash
folder=( "/path to folder" "/path to anther folder" )
if [ "$1" != "--run" ] ;then
exec find "${folder[@]}" -type f -name '*.pdf' -exec $0 --run {} +
exit 0;
fi
shift
for file ;do
printf "Doing something with '%s'.\n" "$file"
done
For somone who doesn't have time. Run the following command on terminal.
ulimit -S -s unlimited
Then perform cp/mv/rm operation.
What about a shorter and more reliable one?
for i in **/*.pdf; do rm "$i"; done
I had the same problem with a folder full of temporary images that was growing day by day and this command helped me to clear the folder
find . -name "*.png" -mtime +50 -exec rm {} \;
The difference with the other commands is the mtime parameter that will take only the files older than X days (in the example 50 days)
Using that multiple times, decreasing on every execution the day range, I was able to remove all the unnecessary files
To delete all *.pdf
in a directory /path/to/dir_with_pdf_files/
mkdir empty_dir # Create temp empty dir
rsync -avh --delete --include '*.pdf' empty_dir/ /path/to/dir_with_pdf_files/
To delete specific files via rsync
using wildcard is probably the fastest solution in case you've millions of files. And it will take care of error you're getting.
(Optional Step): DRY RUN. To check what will be deleted without deleting. `
rsync -avhn --delete --include '*.pdf' empty_dir/ /path/to/dir_with_pdf_files/
. . .
Click rsync tips and tricks for more rsync hacks
If you want to remove both files and directories, you can use something like:
echo /path/* | xargs rm -rf
I solved with for
I am on macOS with zsh
I moved thousands only jpg files. Within mv in one line command.
Be sure there are no spaces or special characters in the name of the files you are trying to move
for i in $(find ~/old -type f -name "*.jpg"); do mv $i ~/new; done
I only know a way around this. The idea is to export that list of pdf files you have into a file. Then split that file into several parts. Then remove pdf files listed in each part.
ls | grep .pdf > list.txt
wc -l list.txt
wc -l is to count how many line the list.txt contains. When you have the idea of how long it is, you can decide to split it in half, forth or something. Using split -l command For example, split it in 600 lines each.
split -l 600 list.txt
this will create a few file named xaa,xab,xac and so on depends on how you split it. Now to "import" each list in those file into command rm, use this:
rm $(<xaa)
rm $(<xab)
rm $(<xac)
Sorry for my bad english.
pdf_format_sucks.docx
this will be deleted as well... ;-) You should use proper and accurate regular expression when grepping for the pdf files.
still_pdf_format_sucks.docx
will get deleted. The dot .
in ".pdf"
regular expression matches any character. I would suggest "[.]pdf$"
instead of .pdf
.
xargs
, you need to understand all the corner cases it handles.
I ran into this problem a few times. Many of the solutions will run the rm
command for each individual file that needs to be deleted. This is very inefficient:
find . -name "*.pdf" -print0 | xargs -0 rm -rf
I ended up writing a python script to delete the files based on the first 4 characters in the file-name:
import os
filedir = '/tmp/' #The directory you wish to run rm on
filelist = (os.listdir(filedir)) #gets listing of all files in the specified dir
newlist = [] #Makes a blank list named newlist
for i in filelist:
if str((i)[:4]) not in newlist: #This makes sure that the elements are unique for newlist
newlist.append((i)[:4]) #This takes only the first 4 charcters of the folder/filename and appends it to newlist
for i in newlist:
if 'tmp' in i: #If statment to look for tmp in the filename/dirname
print ('Running command rm -rf '+str(filedir)+str(i)+'* : File Count: '+str(len(os.listdir(filedir)))) #Prints the command to be run and a total file count
os.system('rm -rf '+str(filedir)+str(i)+'*') #Actual shell command
print ('DONE')
This worked very well for me. I was able to clear out over 2 million temp files in a folder in about 15 minutes. I commented the tar out of the little bit of code so anyone with minimal to no python knowledge can manipulate this code.
os.system()
anyway? You want os.unlink()
instead; then you don't have to solve the quoting problems which this fails to solve properly. But the only reason this is more efficient than find
is that it doesn't recurse into subdirectories; you can do the same with printf '%s\0' /tmp/* | xargs -r0 rm -rf
or of course by adding a -maxdepth 1
option to the find
command.
find
will probably be quicker because the shell will alphabetize the list of files matched by the wildcard, which could be a significant amount of work when there are a lot of matches.)
You can create a temp folder, move all the files and sub-folders you want to keep into the temp folder then delete the old folder and rename the temp folder to the old folder try this example until you are confident to do it live:
mkdir testit
cd testit
mkdir big_folder tmp_folder
touch big_folder/file1.pdf
touch big_folder/file2.pdf
mv big_folder/file1,pdf tmp_folder/
rm -r big_folder
mv tmp_folder big_folder
the rm -r big_folder
will remove all files in the big_folder
no matter how many. You just have to be super careful you first have all the files/folders you want to keep, in this case it was file1.pdf
mv
so I'm not confident this solves the problem.
I found that for extremely large lists of files (>1e6), these answers were too slow. Here is a solution using parallel processing in python. I know, I know, this isn't linux... but nothing else here worked.
(This saved me hours)
# delete files
import os as os
import glob
import multiprocessing as mp
directory = r'your/directory'
os.chdir(directory)
files_names = [i for i in glob.glob('*.{}'.format('pdf'))]
# report errors from pool
def callback_error(result):
print('error', result)
# delete file using system command
def delete_files(file_name):
os.system('rm -rf ' + file_name)
pool = mp.Pool(12)
# or use pool = mp.Pool(mp.cpu_count())
if __name__ == '__main__':
for file_name in files_names:
print(file_name)
pool.apply_async(delete_files,[file_name], error_callback=callback_error)
I have faced a similar problem when there were millions of useless log files created by an application which filled up all inodes. I resorted to "locate", got all the files "located"d into a text file and then removed them one by one. Took a while but did the job!
locate
back when you still had room on your disk.
A bit safer version than using xargs, also not recursive: ls -p | grep -v '/$' | grep '\.pdf$' | while read file; do rm "$file"; done
Filtering our directories here is a bit unnecessary as 'rm' won't delete it anyway, and it can be removed for simplicity, but why run something that will definitely return error?
ls
is a common antipattern which should definitely be avoided, and adds a number of additional bugs here. The grep | grep
is just not very elegant.
find
are good, and well-documented here and elsewhere. See e.g. the mywiki.wooledge.org for much more on this and related topics.
Using GNU parallel (sudo apt install parallel
) is super easy
It runs the commands multithreaded where '{}' is the argument passed
E.g.
ls /tmp/myfiles* | parallel 'rm {}'
ls
directly to other commands is a dangerous antipattern - that, and the fact that the expansion of the wildcard will cause the same failure when executing ls
as experienced in the original rm
command.
parallel
makes some folks who prefer avoiding complexity uncomfortable -- if you look under the hood, it's pretty opaque. See the mailing list thread at lists.gnu.org/archive/html/bug-parallel/2015-05/msg00005.html between Stephane (one of the Unix & Linux StackExchange greybeards) and Ole Tange (Parallel's author). xargs -P
also paralellizes, but it does it in a simpler, dumber way with fewer moving parts, making its behavior far easier to predict and reason about.
For remove first 100 files:
rm -rf 'ls | head -100'
Success story sharing
xargs
specifically splits up the list and issues several commands if necessary.-maxdepth 1
needs to be the first argument after the path.-delete
flag to delete the files it finds, and even if it didn't it would still be considered better practice to use-exec
to execute rm, rather than invoking xargs(which is now 3 processes and a pipe instead of a single process with-delete
or 2 processes with-exec
).dangerous (broken, exploitable, etc.)
, is fairly ridiculous. Undoubtedly you should be careful when usingxargs
, but it is not quiteeval/evil
.-exec
callingrm
, the number of processes will be 1 + number of files, although the number of concurrent processes from this may be 2 (maybe find would execute rm processes concurrently). The number of processes usingxargs
would be reduced dramatically to 2 + n, where n is some number processes less than number of files (say number of files / 10, although likely more depending on the length of the paths). Assuming find does the deletion directly, using-delete
should be the only process that would be invoked.