ChatGPT解决这个技术问题 Extra ChatGPT

How to 'grep' a continuous stream?

Is that possible to use grep on a continuous stream?

What I mean is sort of a tail -f <file> command, but with grep on the output in order to keep only the lines that interest me.

I've tried tail -f <file> | grep pattern but it seems that grep can only be executed once tail finishes, that is to say never.

It is highly likely the program generating the file is not flushing its output.
tail -f file works (I see the new output in real time)
Would be appropriate to unix.stackexchange.com
@Luc indeed, didn't think of that
May be there is no new lines in your input stream? If so grep will not proceed.

c
ciobi

Turn on grep's line buffering mode when using BSD grep (FreeBSD, Mac OS X etc.)

tail -f file | grep --line-buffered my_pattern

It looks like a while ago --line-buffered didn't matter for GNU grep (used on pretty much any Linux) as it flushed by default (YMMV for other Unix-likes such as SmartOS, AIX or QNX). However, as of November 2020, --line-buffered is needed (at least with GNU grep 3.5 in openSUSE, but it seems generally needed based on comments below).


@MichaelNiemand you could use tail -F file | grep --line-buffered my_pattern
@MichaelGoldshteyn Take it easy. People upvote it because they find this page when they google "grep line buffered" and it solves a problem for them which may not exactly be the one posed as the question.
I came here trying to grep the output of strace. Without the --line-buffered, it won't work.
@MichaelGoldshteyn (and the upvoters of his comment): I have always had this problem with tail -f | grep, and --line-buffered solves it for me (on Ubuntu 14.04, GNU grep version 2.16). Where is the "use line buffering if stdout is a tty" logic implemented? In git.savannah.gnu.org/cgit/grep.git/tree/src/grep.c, line_buffered is set only by the argument parser.
@MichaelGoldshteyn I'm on macOS using BSD grep and without --line-buffered I get no output. However, after testing, it looks like GNU grep does what you describe. So like most things Unix, it depends on your platform's implementation. Since the question did not specify platform, your information appears to be false - after reviewing the code for BSD grep and comparing it to GNU grep, the behavior is definitely controlled by the --line-buffered option. It's just that only GNU grep flushes by default.
I
Irit Katriel

I use the tail -f <file> | grep <pattern> all the time.

It will wait till grep flushes, not till it finishes (I'm using Ubuntu).


Which can last quite a while, so try not to get impatient.
How long can it take approximately?
@Matthieu: Depends mainly on what you grep for, and how large the buffers are on your OS. If the grep only matches a short string every few hours, it will be days before the first flush.
Tail doesn't use output buffering - grep does.
No, grep does not do output buffering when the output is going to a tty device, as it clearly is in this answer. It does line buffering! This is the correct answer and should be the accepted answer. See my longer comment to the currently accepted (wrong) answer for more details.
X
XzKto

I think that your problem is that grep uses some output buffering. Try

tail -f file | stdbuf -o0 grep my_pattern

it will set output buffering mode of grep to unbuffered.


And this has the advantage that it can be used for many other commands besides grep.
However, as I've discovered after playing more with it, some commands only flush their output when connected to a tty, and for that, unbuffer (in the expect-dev package on debian) is king. So I'd use unbuffer over stdbuf.
@Peter V. Mørch Yes, you are right, unbuffer can sometimes work where stdbuf can't. But I think you are trying to find a 'magic' programm that will always fix your problems instead of understanding your problem. Creating a virtual tty is unrelated task. Stdbuf does exactly what we want (sets standard output buffer to give value), while unbuffer does a lot of hidden stuff that we may not want (compare interactive top with stdbuf and unbuffer). And there is really no 'magic' solution: unbuffer fails sometimes too, for example awk uses different buffer implementation (stdbuf will fail too).
"But I think you are trying to find a 'magic' programm that will always fix your problems instead of understanding your problem." - I think you're right! ;-)
Some more info about stdbuf, `unbuffer, and stdio buffering at pixelbeat.org/programming/stdio_buffering
K
Ken Williams

If you want to find matches in the entire file (not just the tail), and you want it to sit and wait for any new matches, this works nicely:

tail -c +0 -f <file> | grep --line-buffered <pattern>

The -c +0 flag says that the output should start 0 bytes (-c) from the beginning (+) of the file.


D
Dale Anderson

In most cases, you can tail -f /var/log/some.log |grep foo and it will work just fine.

If you need to use multiple greps on a running log file and you find that you get no output, you may need to stick the --line-buffered switch into your middle grep(s), like so:

tail -f /var/log/some.log | grep --line-buffered foo | grep bar

m
mebada

you may consider this answer as enhancement .. usually I am using

tail -F <fileName> | grep --line-buffered  <pattern> -A 3 -B 5

-F is better in case of file rotate (-f will not work properly if file rotated)

-A and -B is useful to get lines just before and after the pattern occurrence .. these blocks will appeared between dashed line separators

But For me I prefer doing the following

tail -F <file> | less

this is very useful if you want to search inside streamed logs. I mean go back and forward and look deeply


grep -C 3 <pattern>, replaces -A <N> and -B <N> if N is same.
H
Hans.Loven.work

Didn't see anyone offer my usual go-to for this:

less +F <file>
ctrl + c
/<search term>
<enter>
shift + f

I prefer this, because you can use ctrl + c to stop and navigate through the file whenever, and then just hit shift + f to return to the live, streaming search.


C
Christian Herr

sed would be a better choice (stream editor)

tail -n0 -f <file> | sed -n '/search string/p'

and then if you wanted the tail command to exit once you found a particular string:

tail --pid=$(($BASHPID+1)) -n0 -f <file> | sed -n '/search string/{p; q}'

Obviously a bashism: $BASHPID will be the process id of the tail command. The sed command is next after tail in the pipe, so the sed process id will be $BASHPID+1.


The assumption that the next process started on the system ($BASHPID+1) will be yours is false in many situations, and this does nothing to solve the buffering problem which is probably what the OP was trying to ask about. In particular, recommending sed over grep here seems like merely a matter of (dubious) preference. (You can get p;q behavior with grep -m 1 if that's the point you are attempting to deliver.)
Works, the sed command prints each lines as soon as there are ready, the grep command with --line-buffered did not. I sincerely do not understand the minus 1.
It is heretofore established that buffering is the problem with grep. No special action is required to handle line buffering using sed, it is default behavior, hence my emphasis of the word stream. And true, there is no guarantee $BASHPID+1 will be the correct pid to follow, but since pid allocation is sequential and the piped command is assigned a pid immediately following, it is utterly probable.
C
Caleb

Yes, this will actually work just fine. Grep and most Unix commands operate on streams one line at a time. Each line that comes out of tail will be analyzed and passed on if it matches.


That's not actually correct. If grep is the last command in the pipe chain, it will act as you explain. However, if it's in the middle it will buffer around 8k output at a time.
F
F. Hauri - Give Up GitHub

Coming some late on this question, considering this kind of work as an important part of monitoring job, here is my (not so short) answer...

Following logs using bash

1. Command tail

This command is a little more porewfull than read on already published answer

Difference between follow option tail -f and tail -F, from manpage: -f, --follow[={name|descriptor}] output appended data as the file grows; ... -F same as --follow=name --retry ... --retry keep trying to open a file if it is inaccessible This mean: by using -F instead of -f, tail will re-open file(s) when removed (on log rotation, for sample). This is usefull for watching logfile over many days. Ability of following more than one file simultaneously I've already used: tail -F /var/www/clients/client*/web*/log/{error,access}.log /var/log/{mail,auth}.log \ /var/log/apache2/{,ssl_,other_vhosts_}access.log \ /var/log/pure-ftpd/transfer.log For following events through hundreds of files... (consider rest of this answer to understand how to make it readable... ;) Using switches -n (Don't use -c for line buffering!). By default tail will show 10 last lines. This can be tunned: tail -n 0 -F file Will follow file, but only new lines will be printed tail -n +0 -F file Will print whole file before following his progression.

2. Buffer issues when piping:

If you plan to filter ouptuts, consider buffering! See -u option for sed, --line-buffered for grep, or stdbuf command:

tail -F /some/files | sed -une '/Regular Expression/p'

Is (a lot more efficient than using grep) a lot more reactive than if you does'nt use -u switch in sed command.

tail -F /some/files |
    sed -une '/Regular Expression/p' |
    stdbuf -i0 -o0 tee /some/resultfile

3. Recent journaling system

On recent system, instead of tail -f /var/log/syslog you have to run journalctl -xf, in near same way...

journalctl -axf | sed -une '/Regular Expression/p'

But read man page, this tool was built for log analyses!

4. Integrating this in a bash script

Colored output of two files (or more) Here is a sample of script watching for many files, coloring ouptut differently for 1st file than others: #!/bin/bash tail -F "$@" | sed -une " /^==> /{h;}; //!{ G; s/^\\(.*\\)\\n==>.*${1//\//\\\/}.*<==/\\o33[47m\\1\\o33[0m/; s/^\\(.*\\)\\n==> .* <==/\\o33[47;31m\\1\\o33[0m/; p;}" They work fine on my host, running: sudo ./myColoredTail /var/log/{kern.,sys}log Interactive script You may be watching logs for reacting on events? Here is a little script playing some sound when some USB device appear or disappear, but same script could send mail, or any other interaction, like powering on coffe machine... #!/bin/bash exec {tailF}< <(tail -F /var/log/kern.log) tailPid=$! while :;do read -rsn 1 -t .3 keyboard [ "${keyboard,}" = "q" ] && break if read -ru $tailF -t 0 _ ;then read -ru $tailF line case $line in *New\ USB\ device\ found* ) play /some/sound.ogg ;; *USB\ disconnect* ) play /some/othersound.ogg ;; esac printf "\r%s\e[K" "$line" fi done echo exec {tailF}<&- kill $tailPid You could quit by pressing Q key.


Excellent and exhaustive answer. Thanks
u
user10584393

This one command workes for me (Suse):

mail-srv:/var/log # tail -f /var/log/mail.info |grep --line-buffered LOGIN  >> logins_to_mail

collecting logins to mail service


u
user882786

you certainly won't succeed with

tail -f /var/log/foo.log |grep --line-buffered string2search

when you use "colortail" as an alias for tail, eg. in bash

alias tail='colortail -n 30'

you can check by type alias if this outputs something like tail isan alias of colortail -n 30. then you have your culprit :)

Solution:

remove the alias with

unalias tail

ensure that you're using the 'real' tail binary by this command

type tail

which should output something like:

tail is /usr/bin/tail

and then you can run your command

tail -f foo.log |grep --line-buffered something

Good luck.


A
Atif

Use awk(another great bash utility) instead of grep where you dont have the line buffered option! It will continuously stream your data from tail.

this is how you use grep

tail -f <file> | grep pattern

This is how you would use awk

tail -f <file> | awk '/pattern/{print $0}'

This is not correct; Awk out of the box performs line buffering, just like most other standard Unix tools. (Moreover, the {print $0} is redundant, as printing is the default action when a condition passes.)