ChatGPT解决这个技术问题 Extra ChatGPT

How do I remove all non alphanumeric characters from a string except dash?

How do I remove all non alphanumeric characters from a string except dash and space characters?


A
Amarghosh

Replace [^a-zA-Z0-9 -] with an empty string.

Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");

Worth mentioning that - must be at the end of the character class, or escaped with a backslash, to prevent being used for a range.
@Dan set the global flag in your regex - without that, it just replaces the first match. A quick google should tell you how to set global flag in classic ASP regex. Otherwise, look for a replaceAll function instead of replace.
Here's a regex compiled version: return Regex.Replace(str, "[^a-zA-Z0-9_.]+", "", RegexOptions.Compiled); Same basic question
@MGOwen because every time you use "" you are creating a new object due to strings being immutable. When you use string.empty you are reusing the single instance required for representing an empty string which is quicker as well as being more efficient.
@BrianScott I know this is old, but was found in a search so I feel this is relevant. This actually depends on the version of .NET you are running under. > 2.0 uses "" & string.Empty exactly the same. stackoverflow.com/questions/151472/…
C
Community

I could have used RegEx, they can provide elegant solution but they can cause performane issues. Here is one solution

char[] arr = str.ToCharArray();

arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c) 
                                  || char.IsWhiteSpace(c) 
                                  || c == '-')));
str = new string(arr);

When using the compact framework (which doesn't have FindAll)

Replace FindAll with1

char[] arr = str.Where(c => (char.IsLetterOrDigit(c) || 
                             char.IsWhiteSpace(c) || 
                             c == '-')).ToArray(); 

str = new string(arr);

1 Comment by ShawnFeatherly


in my testing, this technique was much faster. to be precise, it was just under 3 times faster than the Regex Replace technique.
The compact framework doesn't have FindAll, you can replace FindAll with char[] arr = str.Where(c => (char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-')).ToArray();
has anyone tested this? That didn't work at all. --but this did for me: string str2 = new string(str.Where(c => (char.IsLetterOrDigit(c))).ToArray());
As a single line str = string.Concat(str.Where(c => Char.IsLetterOrDigit(c) || Char.IsWhiteSpace(c)))
m
miken32

You can try:

string s1 = Regex.Replace(s, "[^A-Za-z0-9 -]", "");

Where s is your string.


OP asked for dash not underscore
This does not work as it gives a "symbol not found" error, even after importing java.util.regex.*
@DavidBandel it's C#
w
w.b

Using System.Linq

string withOutSpecialCharacters = new string(stringWithSpecialCharacters.Where(c =>char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-').ToArray());

@Michael It is similar but at least this is a one liner, rather than 3 lines. I'd say that's enough to make it a different answer.
@Dymas I now agree that it is acceptable, but not because the whitespace is different. Apparently the part that is functionally equivalent (only var names differ) was edited in after this answer was written.
@ZainAli, if you make a trivial edit and ping me, I'll reverse my downvote. I apologize for any insinuation of plagiary.
T
True Soft

The regex is [^\w\s\-]*:

\s is better to use instead of space (), because there might be a tab in the text.


unless you want to remove tabs.
...and newlines, and all other characters considered "whitespace".
This solution is far superior to the above solutions since it also supports international (non-English) characters. string s = "Mötley Crue 日本人: の氏名 and Kanji 愛 and Hiragana あい"; string r = Regex.Replace(s,"[^\\w\\s-]*",""); The above produces r with: Mötley Crue 日本人 の氏名 and Kanji 愛 and Hiragana あい
Use @ to escape \ conversion in string: @"[^\w\s-]*"
it, uhhh... doesn't remove underscores? that is considered a "word" character by regex implementation across creation, but it's not alphanumeric, dash, or space... (?)
P
Ppp

Based on the answer for this question, I created a static class and added these. Thought it might be useful for some people.

public static class RegexConvert
{
    public static string ToAlphaNumericOnly(this string input)
    {
        Regex rgx = new Regex("[^a-zA-Z0-9]");
        return rgx.Replace(input, "");
    }

    public static string ToAlphaOnly(this string input)
    {
        Regex rgx = new Regex("[^a-zA-Z]");
        return rgx.Replace(input, "");
    }

    public static string ToNumericOnly(this string input)
    {
        Regex rgx = new Regex("[^0-9]");
        return rgx.Replace(input, "");
    }
}

Then the methods can be used as:

string example = "asdf1234!@#$";
string alphanumeric = example.ToAlphaNumericOnly();
string alpha = example.ToAlphaOnly();
string numeric = example.ToNumericOnly();

For the example that you provide it would also be useful if you provide the outcomes of each of the methods.
This solution is culture dependent.
A
Andreas

Want something quick?

public static class StringExtensions 
{
    public static string ToAlphaNumeric(this string self,
                                        params char[] allowedCharacters)
    {
        return new string(Array.FindAll(self.ToCharArray(),
                                        c => char.IsLetterOrDigit(c) ||
                                        allowedCharacters.Contains(c)));
    }
}

This will allow you to specify which characters you want to allow as well.


IMHO - the best solution here.
Looks clean, but a bit hard to specify how to add white space ? I would have added another overload which allows whitespace too as this method works fine on words, but not sentences or other whitespace such as newlines or tabs. +1 anyways, good solution. public static string ToAlphaNumericWithWhitespace(this string self, params char[] allowedCharacters) { return new string(Array.FindAll(self.ToCharArray(), c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || allowedCharacters.Contains(c))); }
B
BjarkeCK

Here is a non-regex heap allocation friendly fast solution which was what I was looking for.

Unsafe edition.

public static unsafe void ToAlphaNumeric(ref string input)
{
    fixed (char* p = input)
    {
        int offset = 0;
        for (int i = 0; i < input.Length; i++)
        {
            if (char.IsLetterOrDigit(p[i]))
            {
                p[offset] = input[i];
                offset++;
            }
        }
        ((int*)p)[-1] = offset; // Changes the length of the string
        p[offset] = '\0';
    }
}

And for those who don't want to use unsafe or don't trust the string length hack.

public static string ToAlphaNumeric(string input)
{
    int j = 0;
    char[] newCharArr = new char[input.Length];

    for (int i = 0; i < input.Length; i++)
    {
        if (char.IsLetterOrDigit(input[i]))
        {
            newCharArr[j] = input[i];
            j++;
        }
    }

    Array.Resize(ref newCharArr, j);

    return new string(newCharArr);
}

You shouldn't alter the contents of the string because of string pooling.
t
th1rdey3

I´ve made a different solution, by eliminating the Control characters, which was my original problem.

It is better than putting in a list all the "special but good" chars

char[] arr = str.Where(c => !char.IsControl(c)).ToArray();    
str = new string(arr);

it´s simpler, so I think it´s better !


A
Aaron Hudon

Here's an extension method using @ata answer as inspiration.

"hello-world123, 456".MakeAlphaNumeric(new char[]{'-'});// yields "hello-world123456"

or if you require additional characters other than hyphen...

"hello-world123, 456!?".MakeAlphaNumeric(new char[]{'-','!'});// yields "hello-world123456!"


public static class StringExtensions
{   
    public static string MakeAlphaNumeric(this string input, params char[] exceptions)
    {
        var charArray = input.ToCharArray();
        var alphaNumeric = Array.FindAll<char>(charArray, (c => char.IsLetterOrDigit(c)|| exceptions?.Contains(c) == true));
        return new string(alphaNumeric);
    }
}

J
Jeff

If you are working in JS, here is a very terse version

myString = myString.replace(/[^A-Za-z0-9 -]/g, "");

I believe OP might have asked about C#, not JS.
P
Philip Johnson

I use a variation of one of the answers here. I want to replace spaces with "-" so its SEO friendly and also make lower case. Also not reference system.web from my services layer.

private string MakeUrlString(string input)
{
    var array = input.ToCharArray();

    array = Array.FindAll<char>(array, c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-');

    var newString = new string(array).Replace(" ", "-").ToLower();
    return newString;
}

a
astef

There is a much easier way with Regex.

private string FixString(string str)
{
    return string.IsNullOrEmpty(str) ? str : Regex.Replace(str, "[\\D]", "");
}

only replaces non numeric characters