ChatGPT解决这个技术问题 Extra ChatGPT

How can I count all the lines of code in a directory recursively?

We've got a PHP application and want to count all the lines of code under a specific directory and its subdirectories.

We don't need to ignore comments, as we're just trying to get a rough idea.

wc -l *.php 

That command works great for a given directory, but it ignores subdirectories. I was thinking the following comment might work, but it is returning 74, which is definitely not the case...

find . -name '*.php' | wc -l

What's the correct syntax to feed in all the files from a directory resursively?


J
Jarl

Try:

find . -name '*.php' | xargs wc -l

or (when file names include special characters such as spaces)

find . -name '*.php' | sed 's/.*/"&"/' | xargs  wc -l

The SLOCCount tool may help as well.

It will give an accurate source lines of code count for whatever hierarchy you point it at, as well as some additional stats.

Sorted output:

find . -name '*.php' | xargs wc -l | sort -nr


cloc.sourceforge.net might be worth looking as an alternative to sloccount (more languages but less informations)
with include files also: find . -name '*.php' -o -name '*.inc' | xargs wc -l
This will print more than one number when there are many files (because wc will be run multiple times. Also doesn't handle many special file names.
@idober: find . -name "*.php" -not -path "./tests*" | xargs wc -l
If a directory name contains any spaces... the above command fails!!
P
Peter Mortensen

For another one-liner:

( find ./ -name '*.php' -print0 | xargs -0 cat ) | wc -l

It works on names with spaces and only outputs one number.


+1 ditto...searched forever...all the other "find" commands only returned the # of actual files....the -print0 stuff here got the actual line count for me!!! thanks!
Best solution I've found. I parameterized the path and filetype and added this code to a script on my path. I plan to use it frequently.
@TorbenGundtofte-Bruun - see man find .. print0 with xargs -0 lets you operate on files that have spaces or other weird characters in their name
@TorbenGundtofte-Bruun - also, the -0 in xargs corresponds to the print0, it's kind of encoding/decoding to handle the spaces.
If you need more than one name filter, I've found that (at least with the MSYSGit version of find), you need extra parens: ( find . \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 cat ) | wc -l
P
Peter Mortensen

You can use the cloc utility which is built for this exact purpose. It reports each the amount of lines in each language, together with how many of them are comments, etc. CLOC is available on Linux, Mac and Windows.

Usage and output example:

$ cloc --exclude-lang=DTD,Lua,make,Python .
    2570 text files.
    2200 unique files.
    8654 files ignored.

http://cloc.sourceforge.net v 1.53  T=8.0 s (202.4 files/s, 99198.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JavaScript                    1506          77848         212000         366495
CSS                             56           9671          20147          87695
HTML                            51           1409            151           7480
XML                              6           3088           1383           6222
-------------------------------------------------------------------------------
SUM:                          1619          92016         233681         467892
-------------------------------------------------------------------------------

That's a lovely tool that runs nice and quickly giving useful stats at the end. Love it.
Note that you can run Unix commands on Windows using cygwin (or other similar ports/environments). To me, having this kind of access so extremely useful, it's a necessity. A unix command line is magical. I especially like perl and regular expressions.
CLOC and SLOCCount work fine on mid 2015 macbook. Note their numbers are close but not exactly the same for 127k Java Android project. Also note the iOS equivalent had 2x the LoC; so, the "cost" metric in SLOCCount might be off (or maybe iOS dev make 2x what Android dev make. :-)
Would you consider editing the beginning of this question to make it clear that cloc is cross-platform since it's just a Perl script?
Just perfect, works fine in Windows bash as well of course.
M
Michael Wild

If using a decently recent version of Bash (or ZSH), it's much simpler:

wc -l **/*.php

In the Bash shell this requires the globstar option to be set, otherwise the ** glob-operator is not recursive. To enable this setting, issue

shopt -s globstar

To make this permanent, add it to one of the initialization files (~/.bashrc, ~/.bash_profile etc.).


I am upvoting this for simplicity, however I just want to point out that it doesn't appear to search the directories recursively, it only checks the subdirectories of the current directory. This is on SL6.3.
That depends on your shell and the options you have set. Bash requires globstar to be set for this to work.
@PeterSenna, with the current 3.9.8 kernel archive, the command wc -l **/*.[ch] finds a total of 15195373 lines. Not sure whether you consider that to be a "very low value". Again, you need to make sure that you have globstar enabled in Bash. You can check with shopt globstar. To enable it explicitly, do shopt -s globstar.
@MichaelWild This is a good solution, but it will still overflow ARG_MAX if you have a large number of .php files, since wc is not builtin.
@AlbertSamuel No, you'd need to compare the list of files produced by both methods. My method has the problem of not working for large numbers of files, as mentioned by @BroSlow. The accepted answer will fail if the paths produced by find contain spaces. That could be fixed by using print0 and --null with the find and xargs calls, respectively.
P
Peter Mortensen

On Unix-like systems, there is a tool called cloc which provides code statistics.

I ran in on a random directory in our code base it says:

      59 text files.
      56 unique files.
       5 files ignored.

http://cloc.sourceforge.net v 1.53  T=0.5 s (108.0 files/s, 50180.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               36           3060           1431          16359
C/C++ Header                    16            689            393           3032
make                             1             17              9             54
Teamcenter def                   1             10              0             36
-------------------------------------------------------------------------------
SUM:                            54           3776           1833          19481
-------------------------------------------------------------------------------

@moose technically simtao mentioned it specifically as a solution for windows users, not mentioning linux or unix at all.
@moose Table was edited into his answer much later than my answer, now the two indeed look similar.
I like it. cloc is really neat. But what means that name?
It's on Windows now too! Assuming you've got chocolatey: choco install cloc
P
Peter Mortensen

You didn't specify how many files are there or what is the desired output.

This may be what you are looking for:

find . -name '*.php' | xargs wc -l

This will work, as long as there are not too many files : if there are a lot of files, you will get several lines as a result (xargs will split the files list in several sub-lists)
ah, yes. That's why I said He didn't specify how many files are there. My version is easier to remember, but Shin's version is better if You have more than a few files. I'm voting it up.
I needed to adapt this for use in a function, where single quotes are too restrictive: go () { mkdir /tmp/go; [[ -f ./"$1" ]] && mv ./"$1" /tmp/go; (find ./ -type f -name "$*" -print0 | xargs -0 cat ) | wc -l; wc -l /tmp/go/*; mv /tmp/go/* . } Results were close to slocount for *.py, but it didn't know *.js, *.html.
P
Peter Mortensen

Yet another variation :)

$ find . -name '*.php' | xargs cat | wc -l

This will give the total sum, instead of file-by-file.

Add . after find to make it work.


At least in cygwin, I had better results with: $ find -name \*\.php -print0 | xargs -0 cat | wc -l
on Darwin, this just gives a grand total: find . -name '*.php' | xargs cat | wc -l ... whereas this gives file-by-file and a grand total: find . -name '*.php' | xargs wc -l
P
Peter Mortensen

Use find's -exec and awk. Here we go:

find . -type f -exec wc -l {} \; | awk '{ SUM += $0} END { print SUM }'

This snippet finds for all files (-type f). To find by file extension, use -name:

find . -name '*.py' -exec wc -l '{}' \; | awk '{ SUM += $0; } END { print SUM; }'

Functionally, this works perfectly, but on large listing (linux source) it is really slow because it's starting a wc process for each file instead of 1 wc process for all the files. I timed it at 31 seconds using this method compared to 1.5 seconds using find . -name '*.c' -print0 |xargs -0 wc -l. That said, this faster method (at least on OS X), ends up printing "total" several times so some additional filtering is required to get a proper total (I posted details in my answer).
This has the benefit of working for an unlimited number of files. Well done!
this is far better solution once working with large amount of GB and files. doing one wc on a form of a cat is slow because the system first must process all GB to start counting the lines (tested with 200GB of jsons, 12k files). doing wc first then counting the result is far faster
@DougRichardson, you could consider this instead: find . -type f -exec wc -l {} \+ or find . -name '*.py' -type f -exec wc -l {} \+ which prints a total at the end of the output. If all you're interested in is the total, then you could go a bit further and use tail: find . -type f -exec wc -l {} \+ | tail -1 or find . -name '*.py' -type f -exec wc -l {} \+ | tail -1
P
Peter Mortensen

More common and simple as for me, suppose you need to count files of different name extensions (say, also natives):

wc $(find . -type f | egrep "\.(h|c|cpp|php|cc)" )

this does not do quite what you think. find . -name '.[am]' is identical to find . -name '.[a|m]' both will find all files that ends with .m or .a
but the second will also find files ending in .| , if any. So [h|c|cpp|php|cc] ends up being the same as [hcp|] .
backticks are deprecated, prefer $()
This works under Cygwin. Of course, the "C:\" drive has to follow the cygwin convention, like for example: wc $(find /cygdrive/c//SomeWindowsFolderj/ -type f | egrep "\.(h|c|cpp|php|cc)" )
P
Paul Draper

POSIX

Unlike most other answers here, these work on any POSIX system, for any number of files, and with any file names (except where noted).

Lines in each file:

find . -name '*.php' -type f -exec wc -l {} \;
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} +

Lines in each file, sorted by file path

find . -name '*.php' -type f | sort | xargs -L1 wc -l
# for files with spaces or newlines, use the non-standard sort -z
find . -name '*.php' -type f -print0 | sort -z | xargs -0 -L1 wc -l

Lines in each file, sorted by number of lines, descending

find . -name '*.php' -type f -exec wc -l {} \; | sort -nr
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} + | sort -nr

Total lines in all files

find . -name '*.php' -type f -exec cat {} + | wc -l

P
Peter Mortensen

The tool Tokei displays statistics about code in a directory. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language. Tokei is also available on Mac, Linux, and Windows.

An example of the output of Tokei is as follows:

$ tokei
-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 CSS                     2           12           12            0            0
 JavaScript              1          435          404            0           31
 JSON                    3          178          178            0            0
 Markdown                1            9            9            0            0
 Rust                   10          408          259           84           65
 TOML                    3           69           41           17           11
 YAML                    1           30           25            0            5
-------------------------------------------------------------------------------
 Total                  21         1141          928          101          112
-------------------------------------------------------------------------------

Tokei can be installed by following the instructions on the README file in the repository.


Great tool, thanks.
J
John Bachir

There is a little tool called sloccount to count the lines of code in a directory.

It should be noted that it does more than you want as it ignores empty lines/comments, groups the results per programming language and calculates some statistics.


For windows, LocMetrics do the job
A repeat of the accepted answer (though posted at the same time).
P
Peter Mortensen

You want a simple for loop:

total_count=0
for file in $(find . -name *.php -print)
do
    count=$(wc -l $file)
    let total_count+=count
done
echo "$total_count"

isn't this overkill compared to the answers that suggest xargs?
No, Nathan. The xargs answers won't necessarily print the count as a single number. It may just print a bunch of subtotals.
what will this program do if file names contain spaces? What about newlines? ;-)
If your file names contain new lines, I'd say you have bigger problems.
@ennuikiller Number of issues with this, first of all it will break on files with whitespaces. Setting IFS=$'\n' before the loop would at least fix it for all but files with newlines in their names. Second, you're not quoting '*.php', so it will get expanded by the shell and not find, and ergo won't actually find any of the php files in subdirectories. Also the -print is redundant, since it's implied in the absence of other actions.
P
Peter Mortensen

For sources only:

wc `find`

To filter, just use grep:

wc `find | grep .php$`

P
Peter Mortensen

A straightforward one that will be fast, will use all the search/filtering power of find, not fail when there are too many files (number arguments overflow), work fine with files with funny symbols in their name, without using xargs, and will not launch a uselessly high number of external commands (thanks to + for find's -exec). Here you go:

find . -name '*.php' -type f -exec cat -- {} + | wc -l

I was about to post a variant of this myself (with \; instead of + as I wasn't aware of it), this answer should be the correct answer.
I did ( find . -type f -exec cat {} \; |wc -l ) then I saw this. Just wondering what '--' and '+' in this solution mean and the difference to my version regarding number of external commands.
@grenix: your version will spawn a new cat for each file found, whereas the \+ version will give all the files found to cat in one call. The -- is to mark the end of options (it's a bit unnecessary here).
what I do not understand is how this avoids number of arguments overflow. If I do 'find . -type f -exec cat -- {} + |more' and ' ps aux|grep "cat "' in another terminal I get somthing like '... 66128 0.0 0.0 7940 2020 pts/10 S+ 13:45 0:00 cat -- ./file1 ./file2 ...'
J
Ja͢ck

I know the question is tagged as , but it seems that the problem you're trying to solve is also PHP related.

Sebastian Bergmann wrote a tool called PHPLOC that does what you want and on top of that provides you with an overview of a project's complexity. This is an example of its report:

Size
  Lines of Code (LOC)                            29047
  Comment Lines of Code (CLOC)                   14022 (48.27%)
  Non-Comment Lines of Code (NCLOC)              15025 (51.73%)
  Logical Lines of Code (LLOC)                    3484 (11.99%)
    Classes                                       3314 (95.12%)
      Average Class Length                          29
      Average Method Length                          4
    Functions                                      153 (4.39%)
      Average Function Length                        1
    Not in classes or functions                     17 (0.49%)

Complexity
  Cyclomatic Complexity / LLOC                    0.51
  Cyclomatic Complexity / Number of Methods       3.37

As you can see, the information provided is a lot more useful from the perspective of a developer, because it can roughly tell you how complex a project is before you start working with it.


P
Peter Mortensen

None of the answers so far gets at the problem of filenames with spaces.

Additionally, all that use xargs are subject to fail if the total length of paths in the tree exceeds the shell environment size limit (defaults to a few megabytes in Linux).

Here is one that fixes these problems in a pretty direct manner. The subshell takes care of files with spaces. The awk totals the stream of individual file wc outputs, so it ought never to run out of space. It also restricts the exec to files only (skipping directories):

find . -type f -name '*.php' -exec bash -c 'wc -l "$0"' {} \; | awk '{s+=$1} END {print s}'

Really, it's an honest question. Being able to do this kind of thing on your own in less time than it takes to get someone else on SO to do it for you is definitely a skill someone who aspires to be a software engineer should have. I'm not condescending. If you figure this out yourself, you'll be ahead.
a
alexis

If you want to keep it simple, cut out the middleman and just call wc with all the filenames:

wc -l `find . -name "*.php"`

Or in the modern syntax:

wc -l $(find . -name "*.php")

This works as long as there are no spaces in any of the directory names or filenames. And as long as you don't have tens of thousands of files (modern shells support really long command lines). Your project has 74 files, so you've got plenty of room to grow.


I like this one! If you are in hybrid C/C++ environment: wc -l `find . -type f \( -name "*.cpp" -o -name "*.c" -o -name "*.h" \) -print`
P
Peter Mortensen

WC -L ? better use GREP -C ^

wc -l? Wrong!

The wc command counts new lines codes, not lines! When the last line in the file does not end with new line code, this will not be counted!

If you still want count lines, use grep -c ^. Full example:

# This example prints line count for all found files
total=0
find /path -type f -name "*.php" | while read FILE; do
     # You see, use 'grep' instead of 'wc'! for properly counting
     count=$(grep -c ^ < "$FILE")
     echo "$FILE has $count lines"
     let total=total+count #in bash, you can convert this for another shell
done
echo TOTAL LINES COUNTED:  $total

Finally, watch out for the wc -l trap (counts enters, not lines!!!)


Please read the POSIX definition of a line. With grep -c ^ you're counting the number of incomplete lines, and such incomplete lines cannot appear in a text file.
I know it. In practice only last line can be incomplete because it hasn't got EOL. Idea is counting all lines including incomplete one. It is very frequent mistake, counting only complete lines. after counting we are thinking "why I missed last line???". This is answer why, and recipe how to do it properly.
Or, if you want a one liner: find -type f -name '*.php' -print0 | xargs -0 grep -ch ^ | paste -sd+ - | bc See here for alternatives to bc: stackoverflow.com/q/926069/2400328
M
Matt

Giving out the longest files first (ie. maybe these long files need some refactoring love?), and excluding some vendor directories:

 find . -name '*.php' | xargs wc -l | sort -nr | egrep -v "libs|tmp|tests|vendor" | less

Excluding directories is important in projects in which there's generated code or files copied on the build process
P
Peter Mortensen

For Windows, an easy-and-quick tool is LocMetrics.


It's pretty unlikely OP is on Windows if they're using bash.
@VanessaMcHale question title and description both don't clearly require unix-only solution. So Windows based solution is acceptable. Also Google pointed me to this page when I was looking for similar solution.
This comment helped me. I tried this and it works well.
P
Peter Mortensen

You can use a utility called codel (link). It's a simple Python module to count lines with colorful formatting.

Installation

pip install codel

Usage

To count lines of C++ files (with .cpp and .h extensions), use:

codel count -e .cpp .h

You can also ignore some files/folder with the .gitignore format:

codel count -e .py -i tests/**

It will ignore all the files in the tests/ folder.

The output looks like:

https://i.stack.imgur.com/dNckF.jpg

You also can shorten the output with the -s flag. It will hide the information of each file and show only information about each extension. The example is below:

https://i.stack.imgur.com/ctJED.jpg


Is there a way to do this for all text files, not just specific extensions?
@AaronFranke By now there is no way.
P
Paul Pettengill

If you want your results sorted by number of lines, you can just add | sort or | sort -r (-r for descending order) to the first answer, like so:

find . -name '*.php' | xargs wc -l | sort -r

Since the output of xargs wc -l is numeric, one would actually need to use sort -n or sort -nr.
P
Peter Mortensen

Very simply:

find /path -type f -name "*.php" | while read FILE
do
    count=$(wc -l < $FILE)
    echo "$FILE has $count lines"
done

it will fail if there is a space or a newline in one of the filenames
P
Peter Mortensen

Something different:

wc -l `tree -if --noreport | grep -e'\.php$'`

This works out fine, but you need to have at least one *.php file in the current folder or one of its subfolders, or else wc stalls.


may also overflow ARG_MAX
P
Peter Mortensen

It’s very easy with Z shell (zsh) globs:

wc -l ./**/*.php

If you are using Bash, you just need to upgrade. There is absolutely no reason to use Bash.


D
Doug Richardson

On OS X at least, the find+xarg+wc commands listed in some of the other answers prints "total" several times on large listings, and there is no complete total given. I was able to get a single total for .c files using the following command:

find . -name '*.c' -print0 |xargs -0 wc -l|grep -v total|awk '{ sum += $1; } END { print "SUM: " sum; }'


Instead of grep -v total you can use grep total - which will sum the intermediate sums given by wc. It doesn't make sense to re-calculate intermediate sums since wc already did it.
b
bharath

If the files are too many, better to just look for the total line count.

find . -name '*.php' | xargs wc -l | grep -i ' total' | awk '{print $1}'

P
Peter Mortensen

If you need just the total number of lines in, let's say, your PHP files, you can use very simple one line command even under Windows if you have GnuWin32 installed. Like this:

cat `/gnuwin32/bin/find.exe . -name *.php` | wc -l

You need to specify where exactly is the find.exe otherwise the Windows provided FIND.EXE (from the old DOS-like commands) will be executed, since it is probably before the GnuWin32 in the environment PATH and has different parameters and results.

Please note that in the command above you should use back-quotes, not single quotes.


In the example above I'm using the bash for windows instead of the cmd.exe that's why there are forward slashes "/" and not back slashes "\".
P
Peter Mortensen

While I like the scripts, I prefer this one as it also shows a per-file summary as long as a total:

wc -l `find . -name "*.php"`

Re "...as long as a total...": Don't you mean "...as well as a total..."?