There are various string formatting methods:
Python <2.6: "Hello %s" % name
Python 2.6+: "Hello {}".format(name) (uses str.format)
Python 3.6+: f"{name}" (uses f-strings)
Which is better, and for what situations?
The following methods have the same outcome, so what is the difference? name = "Alice" "Hello %s" % name "Hello {0}".format(name) f"Hello {name}" # Using named arguments: "Hello %(kwarg)s" % {'kwarg': name} "Hello {kwarg}".format(kwarg=name) f"Hello {name}" When does string formatting run, and how do I avoid a runtime performance penalty?
%
style more often, because if you do not need the improved capabilities of the format()
style, the %
style is often a lot more convenient.
format()
formatting style and the older %
-based formatting style.
To answer your first question... .format
just seems more sophisticated in many ways. An annoying thing about %
is also how it can either take a variable or a tuple. You'd think the following would always work:
"Hello %s" % name
yet, if name
happens to be (1, 2, 3)
, it will throw a TypeError
. To guarantee that it always prints, you'd need to do
"Hello %s" % (name,) # supply the single argument as a single-item tuple
which is just ugly. .format
doesn't have those issues. Also in the second example you gave, the .format
example is much cleaner looking.
Only use it for backwards compatibility with Python 2.5.
To answer your second question, string formatting happens at the same time as any other operation - when the string formatting expression is evaluated. And Python, not being a lazy language, evaluates expressions before calling functions, so the expression log.debug("some debug info: %s" % some_info)
will first evaluate the string to, e.g. "some debug info: roflcopters are active"
, then that string will be passed to log.debug()
.
Something that the modulo operator ( % ) can't do, afaik:
tu = (12,45,22222,103,6)
print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)
result
12 22222 45 22222 103 22222 6 22222
Very useful.
Another point: format()
, being a function, can be used as an argument in other functions:
li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)
print
from datetime import datetime,timedelta
once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8, minutes=20)
gen =(once_upon_a_time +x*delta for x in xrange(20))
print '\n'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))
Results in:
['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']
2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00
map
just as easily as format. map('some_format_string_%s'.__mod__, some_iterable)
printf("%2$s %1$s\n", "One", "Two");
compiled with gcc -std=c99 test.c -o test
, the output is Two One
. But I stand corrected: It is actually a POSIX extension and not C. I cannot find it again in the C/C++ standard, where I thought I'd seen it. The code works even with 'c90' std flag. sprintf
man page. This does not list it, but allows libs to implement a superset. My original argument is still valid, replacing C
with Posix
%
for reordering placeholders. I'd still like to not delete that first comment for the sake of comment consistency here. I apologize for having vented my anger here. It is directed against the often made statement that the old syntax per se would not allow this. Instead of creating a completely new syntax we could have introduced the std Posix extensions. We could have both.
Assuming you're using Python's logging
module, you can pass the string formatting arguments as arguments to the .debug()
method rather than doing the formatting yourself:
log.debug("some debug info: %s", some_info)
which avoids doing the formatting unless the logger actually logs something.
log.debug("some debug info: %(this)s and %(that)s", dict(this='Tom', that='Jerry'))
However, you can't use the new style .format()
syntax here, not even in Python 3.3, which is a shame.
As of Python 3.6 (2016) you can use f-strings to substitute variables:
>>> origin = "London"
>>> destination = "Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'
Note the f"
prefix. If you try this in Python 3.5 or earlier, you'll get a SyntaxError
.
See https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings
PEP 3101 proposes the replacement of the %
operator with the new, advanced string formatting in Python 3, where it would be the default.
.format
won't replace %
string formatting.
But please be careful, just now I've discovered one issue when trying to replace all %
with .format
in existing code: '{}'.format(unicode_string)
will try to encode unicode_string and will probably fail.
Just look at this Python interactive session log:
Python 2.7.2 (default, Aug 27 2012, 19:52:55)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='й'
; u=u'й'
; s
'\xd0\xb9'
; u
u'\u0439'
s
is just a string (called 'byte array' in Python3) and u
is a Unicode string (called 'string' in Python3):
; '%s' % s
'\xd0\xb9'
; '%s' % u
u'\u0439'
When you give a Unicode object as a parameter to %
operator it will produce a Unicode string even if the original string wasn't Unicode:
; '{}'.format(s)
'\xd0\xb9'
; '{}'.format(u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256)
but the .format
function will raise "UnicodeEncodeError":
; u'{}'.format(s)
u'\xd0\xb9'
; u'{}'.format(u)
u'\u0439'
and it will work with a Unicode argument fine only if the original string was Unicode.
; '{}'.format(u'i')
'i'
or if argument string can be converted to a string (so called 'byte array')
format
method are really needed ...
%
string interpolation would ever go away.
"p1=%s p2=%d" % "abc", 2
or "p1=%s p2=%s" % (tuple_p1_p2,)
. You might think it's the coder's fault but I think it's just weird faulty syntax that looks nice for the quicky-scriptie but is bad for production code.
%s
, %02d
like "p1=%s p2=%02d".format("abc", 2)
. I blame those who invented and approved the curly braces formatting that needs you to escape them like {{}}
and looks ugly imho.
%
gives better performance than format
from my test.
Test code:
Python 2.7.2:
import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")
Result:
> format: 0.470329046249
> %: 0.357107877731
Python 3.5.2
import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
Result
> format: 0.5864730989560485
> %: 0.013593495357781649
It looks in Python2, the difference is small whereas in Python3, %
is much faster than format
.
Thanks @Chris Cogdon for the sample code.
Edit 1:
Tested again in Python 3.7.2 in July 2019.
Result:
> format: 0.86600608
> %: 0.630180146
There is not much difference. I guess Python is improving gradually.
Edit 2:
After someone mentioned python 3's f-string in comment, I did a test for the following code under python 3.7.2 :
import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
print('f-string:', timeit.timeit("f'{1}{1.23}{\"hello\"}'"))
Result:
format: 0.8331376779999999
%: 0.6314778750000001
f-string: 0.766649943
It seems f-string is still slower than %
but better than format
.
str.format
gives more functionalities (especially type-specialized formatting e.g. '{0:%Y-%m-%d}'.format(datetime.datetime.utcnow())
). Performance cannot be the absolute requirement of all jobs. Use the right tool for the job.
%
operator allows to reuse printf
knowledge; dictionary interpolation is a very simple extension of the principle.
%
is much more efficient than format()
in Python 3. The code that I used can be found here: github.com/rasbt/python_efficiency_tweaks/blob/master/test_code/… and github.com/rasbt/python_efficiency_tweaks/blob/master/test_code/…
Yet another advantage of .format
(which I don't see in the answers): it can take object properties.
In [12]: class A(object):
....: def __init__(self, x, y):
....: self.x = x
....: self.y = y
....:
In [13]: a = A(2,3)
In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'
Or, as a keyword argument:
In [15]: 'x is {a.x}, y is {a.y}'.format(a=a)
Out[15]: 'x is 2, y is 3'
This is not possible with %
as far as I can tell.
'x is {0}, y is {1}'.format(a.x, a.y)
. Should only be used when the a.x
operation is very costly.
'x is {a.x}, y is {a.y}'.format(a=a)
. More readable than both examples.
'x is {a.x}, y is {a.y}'.format(**vars())
'{foo[bar]}'.format(foo={'bar': 'baz'})
.
Your order, number {order[number]} was processed at {now:%Y-%m-%d %H:%M:%S}, will be ready at about {order[eta]:%H:%M:%S}
or whatever they wish. This is far cleaner than trying to offer the same functionality with the old formatter. It makes user-supplied format strings way more powerful.
As I discovered today, the old way of formatting strings via %
doesn't support Decimal
, Python's module for decimal fixed point and floating point arithmetic, out of the box.
Example (using Python 3.3.5):
#!/usr/bin/env python3
from decimal import *
getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard
print('%.50f' % d)
print('{0:.50f}'.format(d))
Output:
0.00000000000000000000000312375239000000009907464850 0.00000000000000000000000312375239000000000000000000
There surely might be work-arounds but you still might consider using the format()
method right away.
str(d)
before expanding the parameter, whereas old-style formatting probably calls float(d)
first.
str(d)
returns "3.12375239e-24"
, not "0.00000000000000000000000312375239000000000000000000"
If your python >= 3.6, F-string formatted literal is your new friend.
It's more simple, clean, and better performance.
In [1]: params=['Hello', 'adam', 42]
In [2]: %timeit "%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [3]: %timeit "{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
As a side note, you don't have to take a performance hit to use new style formatting with logging. You can pass any object to logging.debug
, logging.info
, etc. that implements the __str__
magic method. When the logging module has decided that it must emit your message object (whatever it is), it calls str(message_object)
before doing so. So you could do something like this:
import logging
class NewStyleLogMessage(object):
def __init__(self, message, *args, **kwargs):
self.message = message
self.args = args
self.kwargs = kwargs
def __str__(self):
args = (i() if callable(i) else i for i in self.args)
kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())
return self.message.format(*args, **kwargs)
N = NewStyleLogMessage
# Neither one of these messages are formatted (or calculated) until they're
# needed
# Emits "Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))
def expensive_func():
# Do something that takes a long time...
return 'foo'
# Emits "Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))
This is all described in the Python 3 documentation (https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles). However, it will work with Python 2.6 as well (https://docs.python.org/2.6/library/logging.html#using-arbitrary-objects-as-messages).
One of the advantages of using this technique, other than the fact that it's formatting-style agnostic, is that it allows for lazy values e.g. the function expensive_func
above. This provides a more elegant alternative to the advice being given in the Python docs here: https://docs.python.org/2.6/library/logging.html#optimization.
format
without the performance hit -- does it by overriding __str__
precisely as logging
was designed for -- shortens the function call to a single letter (N
) which feels very similar to some of the standard ways to define strings -- AND allows for lazy function calling. Thank you! +1
logging.Formatter(style='{')
parameter?
One situation where %
may help is when you are formatting regex expressions. For example,
'{type_names} [a-z]{2}'.format(type_names='triangle|square')
raises IndexError
. In this situation, you can use:
'%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'}
This avoids writing the regex as '{type_names} [a-z]{{2}}'
. This can be useful when you have two regexes, where one is used alone without format, but the concatenation of both is formatted.
'{type_names} [a-z]{{2}}'.format(type_names='triangle|square')
. It's like saying .format()
can help when using strings which already contain a percent character. Sure. You have to escape them then.
"One situation where % may help is when you are formatting regex expressions."
Specifically, assume a=r"[a-z]{2}"
is a regex chunk that you will be used in two different final expressions (e.g. c1 = b + a
and c2 = a
). Assume that c1
needs to be format
ed (e.g. b
needs to be formatted runtime), but c2
does not. Then you need a=r"[a-z]{2}"
for c2
and a=r"[a-z]{{2}}"
for c1.format(...)
.
I would add that since version 3.6, we can use fstrings like the following
foo = "john"
bar = "smith"
print(f"My name is {foo} {bar}")
Which give
My name is john smith
Everything is converted to strings
mylist = ["foo", "bar"]
print(f"mylist = {mylist}")
Result:
mylist = ['foo', 'bar']
you can pass function, like in others formats method
print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')
Giving for example
Hello, here is the date : 16/04/2018
Python 3.6.7 comparative:
#!/usr/bin/env python
import timeit
def time_it(fn):
"""
Measure time of execution of a function
"""
def wrapper(*args, **kwargs):
t0 = timeit.default_timer()
fn(*args, **kwargs)
t1 = timeit.default_timer()
print("{0:.10f} seconds".format(t1 - t0))
return wrapper
@time_it
def new_new_format(s):
print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")
@time_it
def new_format(s):
print("new_format:", "{0} {1} {2} {3} {4}".format(*s))
@time_it
def old_format(s):
print("old_format:", "%s %s %s %s %s" % s)
def main():
samples = (("uno", "dos", "tres", "cuatro", "cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14, "cuatro", 5.5),)
for s in samples:
new_new_format(s)
new_format(s)
old_format(s)
print("-----")
if __name__ == '__main__':
main()
Output:
new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----
But one thing is that also if you have nested curly-braces, won't work for format but %
will work.
Example:
>>> '{{0}, {1}}'.format(1,2)
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
'{{0}, {1}}'.format(1,2)
ValueError: Single '}' encountered in format string
>>> '{%s, %s}'%(1,2)
'{1, 2}'
>>>
Success story sharing
"%(a)s, %(a)s" % {'a':'test'}
log.debug("something: %s" % x)
but not forlog.debug("something: %s", x)
The string formatting will be handled in the method and you won't get the performance hit if it won't be logged. As always, Python anticipates your needs =)'{0}, {0}'.format('test')
.man sprintf
and learn about the$
notation inside%
placeholdersprintf("%2$d", 1, 3)
to print out "3", that's specified in POSIX, not C99. The very man page you referenced notes, "The C99 standard does not include the style using '$'…".