I often use the find
command to search through source code, delete files, whatever. Annoyingly, because Subversion stores duplicates of each file in its .svn/text-base/
directories my simple searches end up getting lots of duplicate results. For example, I want to recursively search for uint
in multiple messages.h
and messages.cpp
files:
# find -name 'messages.*' -exec grep -Iw uint {} +
./messages.cpp: Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./messages.cpp: Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./messages.cpp: Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./messages.cpp: Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./messages.cpp: Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./messages.cpp: Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./messages.cpp: for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base: Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base: Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base: for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/messages.h: void _progress(const std::string &fileName, uint scanCount);
./virus/messages.h: ProgressMessage(const std::string &fileName, uint scanCount);
./virus/messages.h: uint _scanCount;
./virus/.svn/text-base/messages.cpp.svn-base:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.cpp.svn-base:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.h.svn-base: void _progress(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base: ProgressMessage(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base: uint _scanCount;
How can I tell find
to ignore the .svn
directories?
Update: If you upgrade your SVN client to version 1.7 this is no longer an issue.
A key feature of the changes introduced in Subversion 1.7 is the centralization of working copy metadata storage into a single location. Instead of a .svn directory in every directory in the working copy, Subversion 1.7 working copies have just one .svn directory—in the root of the working copy. This directory includes (among other things) an SQLite-backed database which contains all of the metadata Subversion needs for that working copy.
find ... -print0 | xargs -0 egrep ...
instead of find ... -exec grep ...
(does not fork grep
for each file, but for a bunch of files at a time). Using this form you can also prune .svn
directories without using the -prune
option of find, i.e. find ... -print0 | egrep -v '/\.svn' | xargs -0 egrep ...
-exec
with +
doesn't fork grep
for each file, while using it with ;
does. Using -exec
is actually more correct than using xargs
. Please notice that commands like ls
do something even if the argument list is empty, while commands like chmod
give an error if there is insufficient arguments. To see what I mean, just try the following command in a directory that does not have any shell script: find /path/to/dir -name '*.sh' -print0 | xargs -0 chmod 755
. Compare with this one: find /path/to/dir -name '*.sh' -exec chmod 755 '{}' '+'
.
grep
-ing out .svn
is not a good idea too. While find
is specialized for handling file properties, grep
does not. In your example, a file named '.svn.txt' will also be filtered by your egrep
command. Although you can modify your regex to '^/\.svn$', it is still not a good practice to do so. The -prune
predicate of find
works perfectly for filtering a file (by filename, or creation timestamp, or whatever condition you supplied). It is just like even if you can kill a cockroach using a big sword doesn't mean it is the suggested way to do so :-).
why not just
find . -not -iwholename '*.svn*'
The -not predicate negates everything that has .svn anywhere in the path.
So in your case it would be
find -not -iwholename '*.svn*' -name 'messages.*' -exec grep -Iw uint {} + \;
As follows:
find . -path '*/.svn*' -prune -o -print
Or, alternatively based on a directory and not a path prefix:
find . -name .svn -a -type d -prune -o -print
find . -type d -name .svn -prune -o -print
because it is a little bit faster. According to the POSIX standard, the expressions are evaluated one by one, in the order specified. If the first expression in -a
is false
, the second expression will not be evaluated (also called short-circuit and evaluation).
-type d
before -name .svn
is theoretically more efficient. However, it is usually insignificant except if you have a very very big directory tree.
-print
as part of the last expression. Something like find . -name .git -prune -o \( -type f -name LICENSE -print \)
works as expected.
find . -name .svn -prune -o -name .git -prune -o -type d -print
. It might be a few milliseconds faster putting -type d
before the two -name
, but its not worth the extra typing.
For searching, can I suggest you look at ack ? It's a source-code aware find
, and as such will automatically ignore many file types, including source code repository info such as the above.
ack
very much, but I have found it to be substantially slower than find -type f -name "*.[ch]" | xargs grep
when dealing with a large codebase.
ack
billed as a better grep
, not a source-aware find
? Some examples of using it to replace find
would make this a real answer.
To ignore .svn
, .git
and other hidden directories (starting with a dot), try:
find . -type f -not -path '*/\.*'
However, if the purpose of using find
is searching within the files, you may try to use these commands:
git grep - specially designed command for searching patterns within the Git repository.
ripgrep - which by default ignores hidden files and files specified in .gitignore.
Related: How do I find all files containing specific text on Linux?
Here is what I would do in your case:
find . -path .svn -prune -o -name messages.* -exec grep -Iw uint {} +
Emacs' rgrep
built-in command ignores .svn
directory, and many more files you're probably not interested in when performing a find | grep
. Here is what it uses by default:
find . \( -path \*/SCCS -o -path \*/RCS -o -path \*/CVS -o -path \*/MCVS \
-o -path \*/.svn -o -path \*/.git -o -path \*/.hg -o -path \*/.bzr \
-o -path \*/_MTN -o -path \*/_darcs -o -path \*/\{arch\} \) \
-prune -o \
\( -name .\#\* -o -name \*.o -o -name \*\~ -o -name \*.bin -o -name \*.lbin \
-o -name \*.so -o -name \*.a -o -name \*.ln -o -name \*.blg \
-o -name \*.bbl -o -name \*.elc -o -name \*.lof -o -name \*.glo \
-o -name \*.idx -o -name \*.lot -o -name \*.fmt -o -name \*.tfm \
-o -name \*.class -o -name \*.fas -o -name \*.lib -o -name \*.mem \
-o -name \*.x86f -o -name \*.sparcf -o -name \*.fasl -o -name \*.ufsl \
-o -name \*.fsl -o -name \*.dxl -o -name \*.pfsl -o -name \*.dfsl \
-o -name \*.p64fsl -o -name \*.d64fsl -o -name \*.dx64fsl -o -name \*.lo \
-o -name \*.la -o -name \*.gmo -o -name \*.mo -o -name \*.toc \
-o -name \*.aux -o -name \*.cp -o -name \*.fn -o -name \*.ky \
-o -name \*.pg -o -name \*.tp -o -name \*.vr -o -name \*.cps \
-o -name \*.fns -o -name \*.kys -o -name \*.pgs -o -name \*.tps \
-o -name \*.vrs -o -name \*.pyc -o -name \*.pyo \) \
-prune -o \
-type f \( -name pattern \) -print0 \
| xargs -0 -e grep -i -nH -e regex
It ignores directories created by most version control systems, as well as generated files for many programming languages. You could create an alias that invokes this command and replace find
and grep
patterns for your specific problems.
GNU find
find . ! -regex ".*[/]\.svn[/]?.*"
-type d
) - this answer did. +1
I use grep for this purpose. Put this in your ~/.bashrc
export GREP_OPTIONS="--binary-files=without-match --color=auto --devices=skip --exclude-dir=CVS --exclude-dir=.libs --exclude-dir=.deps --exclude-dir=.svn"
grep automatically uses these options on invocation
GREP_OPTIONS=xxx grep "$@"
. This means that the GREP_OPTIONS variable is only set for instances of grep that I run manually using 'grp'. This means I never get a situation where I run a tool, and internally it calls grep, but the tool gets confused because grep isn't behaving as it expected. Also, I have a second function 'grpy', which calls 'grp', but adds --include=*.py
, to just search Python files.
grep --exclude=tags --exclude_dir=.git ...etc... "$@"
. I like that this runs like 'ack', but I retain awareness of, and control over, what it's doing.
Create a script called ~/bin/svnfind
:
#!/bin/bash
#
# Attempts to behave identically to a plain `find' command while ignoring .svn/
# directories.
OPTIONS=()
PATHS=()
EXPR=()
while [[ $1 =~ ^-[HLP]+ ]]; do
OPTIONS+=("$1")
shift
done
while [[ $# -gt 0 ]] && ! [[ $1 =~ '^[-(),!]' ]]; do
PATHS+=("$1")
shift
done
# If user's expression contains no action then we'll add the normally-implied
# `-print'.
ACTION=-print
while [[ $# -gt 0 ]]; do
case "$1" in
-delete|-exec|-execdir|-fls|-fprint|-fprint0|-fprintf|-ok|-print|-okdir|-print0|-printf|-prune|-quit|-ls)
ACTION=;;
esac
EXPR+=("$1")
shift
done
if [[ ${#EXPR} -eq 0 ]]; then
EXPR=(-true)
fi
exec -a "$(basename "$0")" find "${OPTIONS[@]}" "${PATHS[@]}" -name .svn -type d -prune -o '(' "${EXPR[@]}" ')' $ACTION
This script behaves identically to a plain find
command but it prunes out .svn
directories. Otherwise the behavior is identical.
Example:
# svnfind -name 'messages.*' -exec grep -Iw uint {} +
./messages.cpp: Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./messages.cpp: Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./messages.cpp: Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./messages.cpp: Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./messages.cpp: Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./messages.cpp: Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./messages.cpp: for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/messages.h: void _progress(const std::string &fileName, uint scanCount);
./virus/messages.h: ProgressMessage(const std::string &fileName, uint scanCount);
./virus/messages.h: uint _scanCount;
echo
to the find command and tell me what command is executed? svnfind -type f
works great on my Red Hat machine.
echo find "${OPTIONS[@]}"...
so it prints the find command instead of actually running it.
echo find ${OPTIONS[@]} ${PATHS[@]} -name .svn -type d -prune -o ( ${EXPR[@]} ) $ACTION
, This gives me the following output: find -type f -name .svn -type d -prune -o ( -true ) -print
find . | grep -v \.svn
.
in the .svn
regexp.
| fgrep -v /.svn/
or ` | grep -F -v /.svn/` to exclude exactly the directory and not files with ".svn" as part of their name.
Why dont you pipe your command with grep which is easily understandable:
your find command| grep -v '\.svn'
.
in the .svn
regexp.
Just thought I'd add a simple alternative to Kaleb's and others' posts (which detailed the use of the find -prune
option, ack
, repofind
commands etc.) which is particularly applicable to the usage you have described in the question (and any other similar usages):
For performance, you should always try to use find ... -exec grep ... + (thanks Kenji for pointing this out) or find ... | xargs egrep ... (portable) or find ... -print0 | xargs -0 egrep ... (GNU; works on filenames containing spaces) instead of find ... -exec grep ... \;. The find ... -exec ... + and find | xargs form does not fork egrep for each file, but rather for a bunch of files at a time, resulting in much faster execution. When using the find | xargs form you can also use grep to easily and quickly prune .svn (or any directories or regular expression), i.e. find ... -print0 | grep -v '/\.svn' | xargs -0 egrep ... (useful when you need something quick and can't be bothered to remember how to set up find's -prune logic.) The find | grep | xargs approach is similar to GNU find's -regex option (see ghostdog74's post), but is more portable (will also work on platforms where GNU find is not available.)
-exec
switch in find
: one is ending with ;
and the other is ending with +
. The one ending with +
replaces {}
by a list of all matching files. Besides, your regex '/\.svn'
matches file names like '.svn.txt'
too. Please refer to my comments to the question for more information.
In a source code repository, I generally want to do things only to the text files.
The first line is all files, excluding CVS, SVN, and GIT repository files.
The second line excludes all binary files.
find . -not \( -name .svn -prune -o -name .git -prune -o -name CVS -prune \) -type f -print0 | \
xargs -0 file -n | grep -v binary | cut -d ":" -f1
I use find with the -not -path options. I have not had good luck with prune.
find . -name "*.groovy" -not -path "./target/*" -print
will find the groovy files not in the target directory path.
To resolve this problem, you can simply use this find condition:
find \( -name 'messages.*' ! -path "*/.svn/*" \) -exec grep -Iw uint {} +
You can add more restriction like this:
find \( -name 'messages.*' ! -path "*/.svn/*" ! -path "*/CVS/*" \) -exec grep -Iw uint {} +
You can find more information about this in man page section "Operators": http://unixhelp.ed.ac.uk/CGI/man-cgi?find
Note that if you do
find . -type f -name 'messages.*'
then -print
is implied when the whole expression (-type f -name 'messages.*'
) is true, because there is no 'action' (like -exec
).
While, to stop descending into certain directories, you should use anything that matches those directories and follow it by -prune
(which is intended to stop descending into directories); like so:
find . -type d -name '.svn' -prune
This evaluates to True for the .svn directories, and we can use boolean short-circuit by following this by -o
(OR), after which what follows after the -o
is only checked when the first part is False, hence is not a .svn directory. In other words, the following:
find . -type d -name '.svn' -prune -o -name 'message.*' -exec grep -Iw uint {}
will only evalute what is right of the -o
, namely -name 'message.*' -exec grep -Iw uint {}
, for files NOT inside .svn directories.
Note that because .svn
is likely always a directory (and not for example a file), and in this case certainly isn't matching the name 'message.*', you might as well leave out the -type d
and do:
find . -name '.svn' -prune -o -name 'message.*' -exec grep -Iw uint {}
Finally, note that if you omit any action (-exec
is an action), say like so:
find . -name '.svn' -prune -o -name 'message.*'
then the -print
action is implied but will apply to the WHOLE expression, including the -name '.svn' -prune -o
part and thus print all .svn directories as well as the 'message.*' files, which is probably not what you want. Therefore you always should use an 'action' in the right-hand side of the boolean expression when using -prune
in this way. And when that action is printing you have to explicitly add it, like so:
find . -name '.svn' -prune -o -name 'message.*' -print
Try findrepo which is a simple wrapper around find/grep and much faster than ack You would use it in this case like:
findrepo uint 'messages.*'
This works for me in the Unix prompt
gfind . \( -not -wholename '*\.svn*' \) -type f -name 'messages.*' -exec grep -Iw uint {} +
The command above will list FILES that are not with .svn and do the grep you mentioned.
xxx.svnxxx
. This is important - for example if you are using git instead of svn, you will often want to include files like .gitignore (which is not metadata, it is a regular file that is included in the repo) in the results from find.
i usually pipe the output through grep one more time removing .svn, in my use it isn't much slower. typical example:
find -name 'messages.*' -exec grep -Iw uint {} + | grep -Ev '.svn|.git|.anythingElseIwannaIgnore'
OR
find . -type f -print0 | xargs -0 egrep messages. | grep -Ev '.svn|.git|.anythingElseIwannaIgnore'
Success story sharing
'*.svn*'
at first but then'*.svn'
. Which is right? Do both work? I think it should probably be'*.svn*'
?