ChatGPT解决这个技术问题 Extra ChatGPT

Search and replace in bash using regular expressions

I've seen this example:

hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}

Which follows this syntax: ${variable//pattern/replacement}

Unfortunately the pattern field doesn't seem to support full regex syntax (if I use . or \s, for example, it tries to match the literal characters).

How can I search/replace a string using full regex syntax?

Found a related question here: stackoverflow.com/questions/5658085/…
FYI, \s isn't part of standard POSIX-defined regular expression syntax (neither BRE or ERE); it's a PCRE extension, and mostly not available from shell. [[:space:]] is the more universal equivalent.
\s can be replaced by [[:space:]], by the way, . by ?, and extglob extensions to the baseline shell pattern language can be used for things like optional subgroups, repeated groups, and the like.
I use this in bash version 4.1.11 on Solaris... echo ${hello//[0-9]} Notice the lack of the final slash.

C
Charles Duffy

Use sed:

MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

Note that the subsequent -e's are processed in order. Also, the g flag for the expression will match all occurrences in the input.

You can also pick your favorite tool using this method, i.e. perl, awk, e.g.:

echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'

This may allow you to do more creative matches... For example, in the snip above, the numeric replacement would not be used unless there was a match on the first expression (due to lazy and evaluation). And of course, you have the full language support of Perl to do your bidding...


This only does a single replace as far as I can tell. Is there a way to have it replace all occurances of the pattern like what the code I posted does?
I've updated my answer to demonstrate multiple replacements as well as global pattern matching. Let me know if that helps.
Thanks so much! Out of curiosity, why did you switch from a one line version (in your original answer) to a two-liner?
Using sed or other external tools is expensive due to process initialization time. I especially searched for all-bash solution, because I found using bash substitutions to be more than 3x faster than calling sed for each item in my loop.
@CiroSantilli六四事件法轮功纳米比亚威视, granted, that's the common wisdom, but that doen't make it wise. Yes, bash is slow no matter what -- but well-written bash that avoids subshells is literally orders of magnitude faster than bash that calls external tools for every tiny little task. Also, well-written shell scripts will benefit from faster interpreters (like ksh93, which has performance on par with awk), whereas poorly-written ones there's nothing to be done for.
C
Charles Duffy

This actually can be done in pure bash:

hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
  hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"

...yields...

howareyoudoingtodday

Something tells me you will love these: stackoverflow.com/questions/5624969/… =)
=~ is the key. But a bit clunky, given the reassignment in the loop. @jheddings solution 2 years prior is another good option - calling sed or perl).
Calling sed or perl is sensible, if using each invocation to process more than a single line of input. Invoking such a tool on the inside of a loop, as opposed to using a loop to process its output stream, is foolhardy.
FYI, in zsh, it's just $match instead of $BASH_REMATCH. (You can make it behave like bash with setopt bash_rematch.)
It's odd -- inasmuch as zsh isn't trying to be a POSIX shell, it's arguably following the letter of POSIX guidance about all-caps variables being used for POSIX-specified (shell or system-relevant) purposes and lowercase variables being reserved for application use. But inasmuch as zsh is something that runs applications, rather than an application itself, this decision to use application variable namespace rather than the system namespace seems awfully perverse.
n
nickl-

These examples also work in bash no need to use sed:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X} 
echo ${MYVAR//[0-9]/N}

you can also use the character class bracket expressions

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X} 
echo ${MYVAR//[[:digit:]]/N}

output

XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

What @Lanaru wanted to know however, if I understand the question correctly, is why the "full" or PCRE extensions \s\S\w\W\d\D etc don't work as supported in php ruby python etc. These extensions are from Perl-compatible regular expressions (PCRE) and may not be compatible with other forms of shell based regular expressions.

These don't work:

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}


#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'

output with all literal "d" characters removed

ho02123ware38384you44334o3434ingto38384ay

but the following does work as expected

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'

output

howareyoudoingtodday

Hope that clarifies things a bit more but if you are not confused yet why don't you try this on Mac OS X which has the REG_ENHANCED flag enabled:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'

On most flavours of *nix you will only see the following output:

d
d
d

nJoy!


Pardon? ${foo//$bar/$baz} is not POSIX.2 BRE or ERE syntax -- it's fnmatch()-style pattern matching.
...so, whereas ${hello//[[:digit:]]/} works, if we wanted to filter out only digits preceded by the letter o, ${hello//o[[:digit:]]*} would have an entirely different behavior than the one expected (since in fnmatch patterns, * matches all characters, rather than modifying the immediately prior item to be 0-or-more).
See pubs.opengroup.org/onlinepubs/9699919799/utilities/… (and all that it incorporates by reference) for the full spec on fnmatch.
man bash: An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).
@aderchox you are correct, for digits you can use [0-9] or [[:digit:]]
J
Josiah DeWitt

If you are making repeated calls and are concerned with performance, This test reveals the BASH method is ~15x faster than forking to sed and likely any other external process.

hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X

P1=$(date +%s)

for i in {1..10000}
do
   echo $hello | sed s/X//g > /dev/null
done

P2=$(date +%s)
echo $[$P2-$P1]

for i in {1..10000}
do
   echo ${hello//X/} > /dev/null
done

P3=$(date +%s)
echo $[$P3-$P2]

If you're interested in way for reducing forks, search for the word newConnector in this answer to How to set a variable to the output of a command in Bash?
C
Community

Use [[:digit:]] (note the double brackets) as the pattern:

$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday

Just wanted to summarize the answers (especially @nickl-'s https://stackoverflow.com/a/22261334/2916086).


D
Dabe Murphy

I know this is an ancient thread, but it was my first hit on Google, and I wanted to share the following resub that I put together, which adds support for multiple $1, $2, etc. backreferences...

#!/usr/bin/env bash

############################################
###  resub - regex substitution in bash  ###
############################################

resub() {
    local match="$1" subst="$2" tmp

    if [[ -z $match ]]; then
        echo "Usage: echo \"some text\" | resub '(.*) (.*)' '\$2 me \${1}time'" >&2
        return 1
    fi

    ### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...

    ### Utility function to 'single-quote' a list of strings
    squot() { local a=(); for i in "$@"; do a+=( $(echo \'${i//\'/\'\"\'\"\'}\' )); done; echo "${a[@]}"; }

    tmp=""
    while [[ $subst =~ (.*)\${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)\$([0-9]+)(.*) ]]; do
        tmp="\${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
        subst="${BASH_REMATCH[1]}"
    done
    subst="$(squot "${subst}")${tmp}"

    ### Now start (globally) substituting

    tmp=""
    while read line; do
        counter=0
        while [[ $line =~ $match(.*) ]]; do
            eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
            line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
        done
        echo "${tmp}${line}"
    done
}

resub "$@"

##################
###  EXAMPLES  ###
##################

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
###    The slow brown fox jumps slowly over the lazy dog

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
###    The slow brown sheep jumps quickly over the lazy dog

###  % animal="sheep"
###  % echo "The quick brown fox 'jumps' quickly over the \"lazy\" \$dog" | resub 'quick ([^ ]+) fox' "\"\$low\" \${1} '$animal'"
###    The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog

###  % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
###    one four three two five

###  % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
###    XXX two XXX four five

###  % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
###    XXX two YYY four five XXX six YYY seven eight

H/T to @Charles Duffy re: (.*)$match(.*)


T
Tono Nam

This example in the input hello ugly world it searches for the regex bad|ugly and replaces it with nice

#!/bin/bash

# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input              Example:  hello ugly world
# arg2 = search regex       Example:  bad|ugly
# arg3 = replace            Example:  nice
function regex_replace()
{
  # $1 = hello ugly world
  # $2 = bad|ugly
  # $3 = nice

  # REGEX
  re="(.*?)($2)(.*)"

  if [[ $1 =~ $re ]]; then
    # if there is a match
    
    # ${BASH_REMATCH[0]} = hello ugly world
    # ${BASH_REMATCH[1]} = hello 
    # ${BASH_REMATCH[2]} = ugly
    # ${BASH_REMATCH[3]} = world    

    # hello + nice + world
    echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
  else    
    # if no match return original input  hello ugly world
    echo "$1"
  fi    
}

# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'

# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit

V
Vladimir Djuricic

Set the var

hello=ho02123ware38384you443d34o3434ingtod38384day

then, echo with regex replacement on var

echo ${hello//[[:digit:]]/}

and this will print:

howareyoudoingtodday

Extra - if you'd like the opposite (to get the digit characters)

echo ${hello//[![:digit:]]/}

and this will print:

021233838444334343438384

That's pretty much the same code as the question. You're missing the part about how "the pattern field doesn't seem to support full regex syntax (if I use . or \s, for example, it tries to match the literal characters)." – You can't do echo ${hello//[[:digit:]\s]/} for example.
@AdamKatz yeah, no biggie, it happens. Thx
A
Asclepius

You can use python. This will be not efficient, but gets the job done with a bit more flexible syntax.

apply on file

The following pythonscript will replace "FROM" (but not "notFrom") with "TO".

regex_replace.py

import sys
import re

for line in sys.stdin:
    line = re.sub(r'(?<!not)FROM', 'TO', line)
    sys.stdout.write(line)

You can apply that on a text file, like

$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla

bla  notFROM FROM

bla FROM
bla bla


$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla

bla  notFROM TO

bla TO
bla bla

apply on variable

#!/bin/bash

hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello

PYTHON_CODE=$(cat <<END
import sys
import re

for line in sys.stdin:
    line = re.sub(r'[0-9]', '', line)
    sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"

output

ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday

I'm downvoting this because I searched for "using regular expressions in Bash." Python won't help me to set my PS1 prompt (afaik).