I want to convert a std::string
to lowercase. I am aware of the function tolower()
. However, in the past I have had issues with this function and it is hardly ideal anyway as using it with a std::string
would require iterating over each character.
Is there an alternative which works 100% of the time?
Adapted from Not So Frequently Asked Questions:
#include <algorithm>
#include <cctype>
#include <string>
std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
[](unsigned char c){ return std::tolower(c); });
You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.
If you really hate tolower()
, here's a specialized ASCII-only alternative that I don't recommend you use:
char asciitolower(char in) {
if (in <= 'Z' && in >= 'A')
return in - ('Z' - 'z');
return in;
}
std::transform(data.begin(), data.end(), data.begin(), asciitolower);
Be aware that tolower()
can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.
Boost provides a string algorithm for this:
#include <boost/algorithm/string.hpp>
std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str
#include <boost/algorithm/string.hpp>
const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);
tl;dr
Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.
First you have to answer a question: What is the encoding of your std::string
? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f
?)
If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string
as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr()
could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.
As soon as you try something like std::toupper( 'ß' )
, or std::tolower( 'Σ' )
in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß
into SS
as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ
is in the middle of a word (where σ
would be correct), or at the end (ς
). Another example would be std::tolower( 'I' )
, which should yield different results depending on the locale -- virtually everywhere you would expect i
, but in Turkey ı
(LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).
So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std::
variants in existence at this time.
Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?
So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<>
variants.
(C++11 note: std::u16string
and std::u32string
are better, but still not perfect. C++20 brought std::u8string
, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)
While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.
And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)
So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:
#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>
#include <iostream>
int main()
{
/* "Odysseus" */
char const * someString = u8"ΟΔΥΣΣΕΥΣ";
icu::UnicodeString someUString( someString, "UTF-8" );
// Setting the locale explicitly here for completeness.
// Usually you would use the user-specified system locale,
// which *does* make a difference (see ı vs. i above).
std::cout << someUString.toLower( "el_GR" ) << "\n";
std::cout << someUString.toUpper( "el_GR" ) << "\n";
return 0;
}
Compile (with G++ in this example):
g++ -Wall example.cpp -licuuc -licuio
This gives:
ὀδυσσεύς
Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>
-based solution can give you that.
Using range-based for loop of C++11 a simpler code would be :
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for(auto elem : str)
std::cout << std::tolower(elem,loc);
}
If the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html
Another approach using range based for loop with reference variable
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
cout<<test<<endl;
This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform
. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.
#include <string>
#include <algorithm>
#include <iostream>
int main (int argc, char* argv[])
{
std::string sourceString = "Abc";
std::string destinationString;
// Allocate the destination space
destinationString.resize(sourceString.size());
// Convert the source string to lower case
// storing the result in destination string
std::transform(sourceString.begin(),
sourceString.end(),
destinationString.begin(),
::tolower);
// Output the result of the conversion
std::cout << sourceString
<< " -> "
<< destinationString
<< std::endl;
}
Simplest way to convert string into loweercase without bothering about std namespace is as follows
1:string with/without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
getline(cin,str);
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}
2:string without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
cin>>str;
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}
std::tolower
cannot work with char
, it only supports unsigned char
. So this code is UB if str
contains characters outside of 0x00-0x7F.
An alternative to Boost is POCO (pocoproject.org).
POCO provides two variants:
The first variant makes a copy without altering the original string. The second variant changes the original string in place. "In Place" versions always have "InPlace" in the name.
Both versions are demonstrated below:
#include "Poco/String.h"
using namespace Poco;
std::string hello("Stack Overflow!");
// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));
// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);
I wrote this simple helper function:
#include <locale> // tolower
string to_lower(string s) {
for(char &c : s)
c = tolower(c);
return s;
}
Usage:
string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.
My own template functions which performs upper / lower case.
#include <string>
#include <algorithm>
//
// Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), tolower);
return s2;
}
//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), toupper);
return s2;
}
towlower
for wide characters which supports the UTF-16.
std::ctype::tolower()
from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page
#include <locale>
#include <iostream>
int main () {
std::locale::global(std::locale("en_US.utf8"));
std::wcout.imbue(std::locale());
std::wcout << "In US English UTF-8 locale:\n";
auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
std::wstring str = L"HELLo, wORLD!";
std::wcout << "Lowercase form of the string '" << str << "' is ";
f.tolower(&str[0], &str[0] + str.size());
std::wcout << "'" << str << "'\n";
}
const
? That seems to make it a bit more messy (e.g. it doesn't look like you can use f.tolower()
), since you need to put the characters in a new string. Would you use transform()
and something like std::bind1st( std::mem_fun() )
for the operator?
tolower
with locale
parameter, the implicit call to use_facet
appears to be a performance bottleneck. One of my coworkers has achieved a several 100% speed increase by replacing boost::iequals
(which has this problem) with a version where use_facet
is only called once outside of the loop.
Since none of the answers mentioned the upcoming Ranges library, which is available in the standard library since C++20, and currently separately available on GitHub as range-v3
, I would like to add a way to perform this conversion using it.
To modify the string in-place:
str |= action::transform([](unsigned char c){ return std::tolower(c); });
To generate a new string:
auto new_string = original_string
| view::transform([](unsigned char c){ return std::tolower(c); });
(Don't forget to #include <cctype>
and the required Ranges headers.)
Note: the use of unsigned char
as the argument to the lambda is inspired by cppreference, which states:
Like all other functions from
There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.
Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.
Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.
https://i.stack.imgur.com/DBIXl.png
The code looks like this...
#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap() {
for (int i = 0; i < sizeof(UtoL); i++) {
if (isupper(i)) {
UtoL[i] = (char)(i + 32);
} else {
UtoL[i] = i;
}
}
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
char *p = szMyStr;
// do conversion in-place so as not to require a destination buffer
while (*p) { // szMyStr must be null-terminated
*p = UtoL[*p];
p++;
}
return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
time_t start;
char *Lowered, Upper[128];
InitUtoLMap();
strcpy(Upper, "Every GOOD boy does FINE!");
Lowered = LowerStr(Upper);
return 0;
}
This approach will, at the same time, allow you to remap any other characters you wish to change.
This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.
Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.
On microsoft platforms you can use the strlwr
family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx
// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>
int main( void )
{
char string[100] = "The String to End All Strings!";
char * copy1 = _strdup( string ); // make two copies
char * copy2 = _strdup( string );
_strlwr( copy1 ); // C4996
_strupr( copy2 ); // C4996
printf( "Mixed: %s\n", string );
printf( "Lower: %s\n", copy1 );
printf( "Upper: %s\n", copy2 );
free( copy1 );
free( copy2 );
}
Here's a macro technique if you want something simple:
#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(), ::toupper); std::transform (x.begin()+1, x.end(), x.begin()+1,::tolower)
However, note that @AndreasSpindler's comment on this answer still is an important consideration, however, if you're working on something that isn't just ASCII characters.
void strtoupper(std::string& x) { std::transform (x.begin(), x.end(), x.begin(), ::toupper); }
x
could be a valid expression, that just happens to compile correctly but will give completely bogus results because of the macros.
Is there an alternative which works 100% of the time?
No
There are several questions you need to ask yourself before choosing a lowercasing method.
How is the string encoded? plain ASCII? UTF-8? some form of extended ASCII legacy encoding? What do you mean by lower case anyway? Case mapping rules vary between languages! Do you want something that is localised to the users locale? do you want something that behaves consistently on all systems your software runs on? Do you just want to lowercase ASCII characters and pass through everything else? What libraries are available?
Once you have answers to those questions you can start looking for a soloution that fits your needs. There is no one size fits all that works for everyone everywhere!
// tolower example (C++)
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for (std::string::size_type i=0; i<str.length(); ++i)
std::cout << std::tolower(str[i],loc);
return 0;
}
For more information: http://www.cplusplus.com/reference/locale/tolower/
Copy because it was disallowed to improve answer. Thanks SO
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
Explanation:
for(auto& c : test)
is a range-based for loop of the kind
for (
range_declaration
:
range_expression
)
loop_statement
:
range_declaration: auto& c Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer. range_expression: test The range in this case are the characters of string test.
The characters of the string test
are available as a reference inside the for loop through identifier c
.
C++ doesn't have tolower
or toupper
methods implemented for std::string
, but it is available for char
. One can easily read each char of string, convert it into required case and put it back into string. A sample code without using any third party library:
#include<iostream>
int main(){
std::string str = std::string("How IS The Josh");
for(char &ch : str){
ch = std::tolower(ch);
}
std::cout<<str<<std::endl;
return 0;
}
For character based operation on string : For every character in string
Try this function :)
string toLowerCase(string str) {
int str_len = str.length();
string final_str = "";
for(int i=0; i<str_len; i++) {
char character = str[i];
if(character>=65 && character<=92) {
final_str += (character+32);
} else {
final_str += character;
}
}
return final_str;
}
Use fplus::to_lower_case()
from fplus library.
Search to_lower_case
in fplus API Search
Example:
fplus::to_lower_case(std::string("ABC")) == std::string("abc");
Have a look at the excellent c++17 cpp-unicodelib (GitHub). It's single-file and header-only.
#include <exception>
#include <iostream>
#include <codecvt>
// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"
using namespace std;
using namespace unicode;
// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;
std::u32string in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;
std::u32string lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;
Output
Je suis là!
je suis là!
Google's absl
library has absl::AsciiStrToLower
/ absl::AsciiStrToUpper
Code Snippet
#include<bits/stdc++.h>
using namespace std;
int main ()
{
ios::sync_with_stdio(false);
string str="String Convert\n";
for(int i=0; i<str.size(); i++)
{
str[i] = tolower(str[i]);
}
cout<<str<<endl;
return 0;
}
Add some optional libraries for ASCII string to_lower, both of which are production level and with micro-optimizations, which is expected to be faster than the existed answers here(TODO: add benchmark result).
Facebook's Folly:
void toLowerAscii(char* str, size_t length)
Google's Abseil:
void AsciiStrToLower(std::string* s);
I wrote a templated version that works with any string :
#include <type_traits> // std::decay
#include <ctype.h> // std::toupper & std::tolower
template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1,
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------
template<class T, class T2 =
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::toupper(*t); }
template<class T, class T2 = typename std::decay< typename
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::tolower(*t); }
Tested with gcc compiler:
#include <iostream>
#include "upove_code.h"
int main()
{
std::string str1 = "hEllo ";
char str2 [] = "wOrld";
ToUpper(str1);
ToUpper(str2);
std::cout << str1 << str2 << '\n';
Tolower(str1);
Tolower(str2);
std::cout << str1 << str2 << '\n';
return 0;
}
output:
>HELLO WORLD
>
>hello world
use this code to change case of string in c++.
#include<bits/stdc++.h>
using namespace std;
int main(){
string a = "sssAAAAAAaaaaDas";
transform(a.begin(),a.end(),a.begin(),::tolower);
cout<<a;
}
#include <bits/stdc++.h>
in an answer on Stack Overflow. You'll get downvoted.
This could be another simple version to convert uppercase to lowercase and vice versa. I used VS2017 community version to compile this source code.
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string _input = "lowercasetouppercase";
#if 0
// My idea is to use the ascii value to convert
char upperA = 'A';
char lowerA = 'a';
cout << (int)upperA << endl; // ASCII value of 'A' -> 65
cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
// 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0
cout << "Input String = " << _input.c_str() << endl;
for (int i = 0; i < _input.length(); ++i)
{
_input[i] -= 32; // To convert lower to upper
#if 0
_input[i] += 32; // To convert upper to lower
#endif // 0
}
cout << "Output String = " << _input.c_str() << endl;
return 0;
}
Note: if there are special characters then need to be handled using condition check.
I tried std::transform, all i get is abominable stl criptic compilation error that only druids from 200 years ago can understand (cannot convert from to flibidi flabidi flu)
this works fine and can be easily tweaked
string LowerCase(string s)
{
int dif='a'-'A';
for(int i=0;i<s.length();i++)
{
if((s[i]>='A')&&(s[i]<='Z'))
s[i]+=dif;
}
return s;
}
string UpperCase(string s)
{
int dif='a'-'A';
for(int i=0;i<s.length();i++)
{
if((s[i]>='a')&&(s[i]<='z'))
s[i]-=dif;
}
return s;
}
Success story sharing
char
to::tolower(int)
.) You need to ensure you don't pass a negative value.::tolower
may well crash, it's UB for non-ASCII input.