I've seen this example:
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}
Which follows this syntax: ${variable//pattern/replacement}
Unfortunately the pattern
field doesn't seem to support full regex syntax (if I use .
or \s
, for example, it tries to match the literal characters).
How can I search/replace a string using full regex syntax?
\s
isn't part of standard POSIX-defined regular expression syntax (neither BRE or ERE); it's a PCRE extension, and mostly not available from shell. [[:space:]]
is the more universal equivalent.
\s
can be replaced by [[:space:]]
, by the way, .
by ?
, and extglob extensions to the baseline shell pattern language can be used for things like optional subgroups, repeated groups, and the like.
Use sed:
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX
Note that the subsequent -e
's are processed in order. Also, the g
flag for the expression will match all occurrences in the input.
You can also pick your favorite tool using this method, i.e. perl, awk, e.g.:
echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'
This may allow you to do more creative matches... For example, in the snip above, the numeric replacement would not be used unless there was a match on the first expression (due to lazy and
evaluation). And of course, you have the full language support of Perl to do your bidding...
This actually can be done in pure bash:
hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"
...yields...
howareyoudoingtodday
=~
is the key. But a bit clunky, given the reassignment in the loop. @jheddings solution 2 years prior is another good option - calling sed or perl).
sed
or perl
is sensible, if using each invocation to process more than a single line of input. Invoking such a tool on the inside of a loop, as opposed to using a loop to process its output stream, is foolhardy.
$match
instead of $BASH_REMATCH
. (You can make it behave like bash with setopt bash_rematch
.)
These examples also work in bash no need to use sed:
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X}
echo ${MYVAR//[0-9]/N}
you can also use the character class bracket expressions
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X}
echo ${MYVAR//[[:digit:]]/N}
output
XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX
What @Lanaru wanted to know however, if I understand the question correctly, is why the "full" or PCRE extensions \s\S\w\W\d\D
etc don't work as supported in php ruby python etc. These extensions are from Perl-compatible regular expressions (PCRE) and may not be compatible with other forms of shell based regular expressions.
These don't work:
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'
output with all literal "d" characters removed
ho02123ware38384you44334o3434ingto38384ay
but the following does work as expected
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'
output
howareyoudoingtodday
Hope that clarifies things a bit more but if you are not confused yet why don't you try this on Mac OS X which has the REG_ENHANCED flag enabled:
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'
On most flavours of *nix you will only see the following output:
d
d
d
nJoy!
${foo//$bar/$baz}
is not POSIX.2 BRE or ERE syntax -- it's fnmatch()-style pattern matching.
${hello//[[:digit:]]/}
works, if we wanted to filter out only digits preceded by the letter o
, ${hello//o[[:digit:]]*}
would have an entirely different behavior than the one expected (since in fnmatch patterns, *
matches all characters, rather than modifying the immediately prior item to be 0-or-more).
[0-9]
or [[:digit:]]
If you are making repeated calls and are concerned with performance, This test reveals the BASH method is ~15x faster than forking to sed and likely any other external process.
hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X
P1=$(date +%s)
for i in {1..10000}
do
echo $hello | sed s/X//g > /dev/null
done
P2=$(date +%s)
echo $[$P2-$P1]
for i in {1..10000}
do
echo ${hello//X/} > /dev/null
done
P3=$(date +%s)
echo $[$P3-$P2]
Use [[:digit:]]
(note the double brackets) as the pattern:
$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday
Just wanted to summarize the answers (especially @nickl-'s https://stackoverflow.com/a/22261334/2916086).
I know this is an ancient thread, but it was my first hit on Google, and I wanted to share the following resub
that I put together, which adds support for multiple $1, $2, etc. backreferences...
#!/usr/bin/env bash
############################################
### resub - regex substitution in bash ###
############################################
resub() {
local match="$1" subst="$2" tmp
if [[ -z $match ]]; then
echo "Usage: echo \"some text\" | resub '(.*) (.*)' '\$2 me \${1}time'" >&2
return 1
fi
### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...
### Utility function to 'single-quote' a list of strings
squot() { local a=(); for i in "$@"; do a+=( $(echo \'${i//\'/\'\"\'\"\'}\' )); done; echo "${a[@]}"; }
tmp=""
while [[ $subst =~ (.*)\${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)\$([0-9]+)(.*) ]]; do
tmp="\${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
subst="${BASH_REMATCH[1]}"
done
subst="$(squot "${subst}")${tmp}"
### Now start (globally) substituting
tmp=""
while read line; do
counter=0
while [[ $line =~ $match(.*) ]]; do
eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
done
echo "${tmp}${line}"
done
}
resub "$@"
##################
### EXAMPLES ###
##################
### % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
### The slow brown fox jumps slowly over the lazy dog
### % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
### The slow brown sheep jumps quickly over the lazy dog
### % animal="sheep"
### % echo "The quick brown fox 'jumps' quickly over the \"lazy\" \$dog" | resub 'quick ([^ ]+) fox' "\"\$low\" \${1} '$animal'"
### The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog
### % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
### one four three two five
### % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
### XXX two XXX four five
### % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
### XXX two YYY four five XXX six YYY seven eight
H/T to @Charles Duffy re: (.*)$match(.*)
This example in the input hello ugly world
it searches for the regex bad|ugly
and replaces it with nice
#!/bin/bash
# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input Example: hello ugly world
# arg2 = search regex Example: bad|ugly
# arg3 = replace Example: nice
function regex_replace()
{
# $1 = hello ugly world
# $2 = bad|ugly
# $3 = nice
# REGEX
re="(.*?)($2)(.*)"
if [[ $1 =~ $re ]]; then
# if there is a match
# ${BASH_REMATCH[0]} = hello ugly world
# ${BASH_REMATCH[1]} = hello
# ${BASH_REMATCH[2]} = ugly
# ${BASH_REMATCH[3]} = world
# hello + nice + world
echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
else
# if no match return original input hello ugly world
echo "$1"
fi
}
# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'
# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit
Set the var
hello=ho02123ware38384you443d34o3434ingtod38384day
then, echo with regex replacement on var
echo ${hello//[[:digit:]]/}
and this will print:
howareyoudoingtodday
Extra - if you'd like the opposite (to get the digit characters)
echo ${hello//[![:digit:]]/}
and this will print:
021233838444334343438384
pattern
field doesn't seem to support full regex syntax (if I use .
or \s
, for example, it tries to match the literal characters)." – You can't do echo ${hello//[[:digit:]\s]/}
for example.
You can use python. This will be not efficient, but gets the job done with a bit more flexible syntax.
apply on file
The following pythonscript will replace "FROM" (but not "notFrom") with "TO".
regex_replace.py
import sys
import re
for line in sys.stdin:
line = re.sub(r'(?<!not)FROM', 'TO', line)
sys.stdout.write(line)
You can apply that on a text file, like
$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla
bla notFROM FROM
bla FROM
bla bla
$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla
bla notFROM TO
bla TO
bla bla
apply on variable
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello
PYTHON_CODE=$(cat <<END
import sys
import re
for line in sys.stdin:
line = re.sub(r'[0-9]', '', line)
sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"
output
ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday
Success story sharing
sed
or other external tools is expensive due to process initialization time. I especially searched for all-bash solution, because I found using bash substitutions to be more than 3x faster than callingsed
for each item in my loop.