How do I pretty-print a JSON file in Python?
json.loads()
and pretty print that resulting dictionary. Or just skip to the Pretty printing section of the Python documentation for json
.
<your_file.js python -mjson.tool
as in @ed's link?
The json
module already implements some basic pretty printing in the dump
and dumps
functions, with the indent
parameter that specifies how many spaces to indent by:
>>> import json
>>>
>>> your_json = '["foo", {"bar":["baz", null, 1.0, 2]}]'
>>> parsed = json.loads(your_json)
>>> print(json.dumps(parsed, indent=4, sort_keys=True))
[
"foo",
{
"bar": [
"baz",
null,
1.0,
2
]
}
]
To parse a file, use json.load()
:
with open('filename.txt', 'r') as handle:
parsed = json.load(handle)
You can do this on the command line:
python3 -m json.tool some.json
(as already mentioned in the commentaries to the question, thanks to @Kai Petzke for the python3 suggestion).
Actually python is not my favourite tool as far as json processing on the command line is concerned. For simple pretty printing is ok, but if you want to manipulate the json it can become overcomplicated. You'd soon need to write a separate script-file, you could end up with maps whose keys are u"some-key" (python unicode), which makes selecting fields more difficult and doesn't really go in the direction of pretty-printing.
You can also use jq:
jq . some.json
and you get colors as a bonus (and way easier extendability).
Addendum: There is some confusion in the comments about using jq to process large JSON files on the one hand, and having a very large jq program on the other. For pretty-printing a file consisting of a single large JSON entity, the practical limitation is RAM. For pretty-printing a 2GB file consisting of a single array of real-world data, the "maximum resident set size" required for pretty-printing was 5GB (whether using jq 1.5 or 1.6). Note also that jq can be used from within python after pip install jq
.
jq '' < some.json
python3 -m json.tool <IN >OUT
, as this keeps the original order of the fields in JSON dicts. The python interpreter version 2 sorts the fields in alphabetically ascending order, which often is not, what you want.
You could use the built-in module pprint (https://docs.python.org/3.9/library/pprint.html).
How you can read the file with json data and print it out.
import json
import pprint
json_data = None
with open('file_name.txt', 'r') as f:
data = f.read()
json_data = json.loads(data)
print(json_data)
{"firstName": "John", "lastName": "Smith", "isAlive": "true", "age": 27, "address": {"streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100"}, 'children': []}
pprint.pprint(json_data)
{'address': {'city': 'New York',
'postalCode': '10021-3100',
'state': 'NY',
'streetAddress': '21 2nd Street'},
'age': 27,
'children': [],
'firstName': 'John',
'isAlive': True,
'lastName': 'Smith'}
The output is not a valid json, because pprint use single quotes and json specification require double quotes.
If you want to rewrite the pretty print formated json to a file, you have to use pprint.pformat.
pretty_print_json = pprint.pformat(json_data).replace("'", '"')
with open('file_name.json', 'w') as f:
f.write(pretty_print_json)
Pygmentize + Python json.tool = Pretty Print with Syntax Highlighting
Pygmentize is a killer tool. See this.
I combine python json.tool with pygmentize
echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json
See the link above for pygmentize installation instruction.
A demo of this is in the image below:
https://i.stack.imgur.com/bpTen.png
-g
is not actually working ;) Since input comes from stdin, pygmentize is not able to make a good guess. You need to specify lexer explicitly: echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json
Use this function and don't sweat having to remember if your JSON is a str
or dict
again - just look at the pretty print:
import json
def pp_json(json_thing, sort=True, indents=4):
if type(json_thing) is str:
print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
else:
print(json.dumps(json_thing, sort_keys=sort, indent=indents))
return None
pp_json(your_json_string_or_dict)
Use pprint: https://docs.python.org/3.6/library/pprint.html
import pprint
pprint.pprint(json)
print()
compared to pprint.pprint()
print(json)
{'feed': {'title': 'W3Schools Home Page', 'title_detail': {'type': 'text/plain', 'language': None, 'base': '', 'value': 'W3Schools Home Page'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.w3schools.com'}], 'link': 'https://www.w3schools.com', 'subtitle': 'Free web building tutorials', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': '', 'value': 'Free web building tutorials'}}, 'entries': [], 'bozo': 0, 'encoding': 'utf-8', 'version': 'rss20', 'namespaces': {}}
pprint.pprint(json)
{'bozo': 0,
'encoding': 'utf-8',
'entries': [],
'feed': {'link': 'https://www.w3schools.com',
'links': [{'href': 'https://www.w3schools.com',
'rel': 'alternate',
'type': 'text/html'}],
'subtitle': 'Free web building tutorials',
'subtitle_detail': {'base': '',
'language': None,
'type': 'text/html',
'value': 'Free web building tutorials'},
'title': 'W3Schools Home Page',
'title_detail': {'base': '',
'language': None,
'type': 'text/plain',
'value': 'W3Schools Home Page'}},
'namespaces': {},
'version': 'rss20'}
pprint
does not produce a valid JSON document.
json
module to work with the data and dictionary keys work the same with double- or single-quoted strings, but some tools, e.g. Postman and JSON Editor Online, both expect keys and values to be double-quoted (as per the JSON spec). In any case, json.org specifies the use of double quotes, which pprint
doesn't produce. E.g. pprint.pprint({"name": "Jane"})
produces {'name': 'Jane'}
.
'language': None,
in the result above, which should be "language": null
. Note the null
and the double quotes. What you do is pretty-printing a Python object.
To be able to pretty print from the command line and be able to have control over the indentation etc. you can set up an alias similar to this:
alias jsonpp="python -c 'import sys, json; print json.dumps(json.load(sys.stdin), sort_keys=True, indent=2)'"
And then use the alias in one of these ways:
cat myfile.json | jsonpp
jsonpp < myfile.json
def saveJson(date,fileToSave):
with open(fileToSave, 'w+') as fileToSave:
json.dump(date, fileToSave, ensure_ascii=True, indent=4, sort_keys=True)
It works to display or save it to a file.
Here's a simple example of pretty printing JSON to the console in a nice way in Python, without requiring the JSON to be on your computer as a local file:
import pprint
import json
from urllib.request import urlopen # (Only used to get this example)
# Getting a JSON example for this example
r = urlopen("https://mdn.github.io/fetch-examples/fetch-json/products.json")
text = r.read()
# To print it
pprint.pprint(json.loads(text))
You could try pprintjson.
Installation
$ pip3 install pprintjson
Usage
Pretty print JSON from a file using the pprintjson CLI.
$ pprintjson "./path/to/file.json"
Pretty print JSON from a stdin using the pprintjson CLI.
$ echo '{ "a": 1, "b": "string", "c": true }' | pprintjson
Pretty print JSON from a string using the pprintjson CLI.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }'
Pretty print JSON from a string with an indent of 1.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -i 1
Pretty print JSON from a string and save output to a file output.json.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -o ./output.json
Output
https://i.stack.imgur.com/vsdBU.png
import pprint pprint.pprint(json)
?
I think that's better to parse the json before, to avoid errors:
def format_response(response):
try:
parsed = json.loads(response.text)
except JSONDecodeError:
return response.text
return json.dumps(parsed, ensure_ascii=True, indent=4)
I had a similar requirement to dump the contents of json file for logging, something quick and easy:
print(json.dumps(json.load(open(os.path.join('<myPath>', '<myjson>'), "r")), indent = 4 ))
if you use it often then put it in a function:
def pp_json_file(path, file):
print(json.dumps(json.load(open(os.path.join(path, file), "r")), indent = 4))
Hopefully this helps someone else.
In the case when there is a error that something is not json serializable the answers above will not work. If you only want to save it so that is human readable then you need to recursively call string on all the non dictionary elements of your dictionary. If you want to load it later then save it as a pickle file then load it (e.g. torch.save(obj, f)
works fine).
This is what worked for me:
#%%
def _to_json_dict_with_strings(dictionary):
"""
Convert dict to dict with leafs only being strings. So it recursively makes keys to strings
if they are not dictionaries.
Use case:
- saving dictionary of tensors (convert the tensors to strins!)
- saving arguments from script (e.g. argparse) for it to be pretty
e.g.
"""
if type(dictionary) != dict:
return str(dictionary)
d = {k: _to_json_dict_with_strings(v) for k, v in dictionary.items()}
return d
def to_json(dic):
import types
import argparse
if type(dic) is dict:
dic = dict(dic)
else:
dic = dic.__dict__
return _to_json_dict_with_strings(dic)
def save_to_json_pretty(dic, path, mode='w', indent=4, sort_keys=True):
import json
with open(path, mode) as f:
json.dump(to_json(dic), f, indent=indent, sort_keys=sort_keys)
def my_pprint(dic):
"""
@param dic:
@return:
Note: this is not the same as pprint.
"""
import json
# make all keys strings recursively with their naitve str function
dic = to_json(dic)
# pretty print
pretty_dic = json.dumps(dic, indent=4, sort_keys=True)
print(pretty_dic)
# print(json.dumps(dic, indent=4, sort_keys=True))
# return pretty_dic
import torch
# import json # results in non serializabe errors for torch.Tensors
from pprint import pprint
dic = {'x': torch.randn(1, 3), 'rec': {'y': torch.randn(1, 3)}}
my_pprint(dic)
pprint(dic)
output:
{
"rec": {
"y": "tensor([[-0.3137, 0.3138, 1.2894]])"
},
"x": "tensor([[-1.5909, 0.0516, -1.5445]])"
}
{'rec': {'y': tensor([[-0.3137, 0.3138, 1.2894]])},
'x': tensor([[-1.5909, 0.0516, -1.5445]])}
I don't know why returning the string then printing it doesn't work but it seems you have to put the dumps directly in the print statement. Note pprint
as it has been suggested already works too. Note not all objects can be converted to a dict with dict(dic)
which is why some of my code has checks on this condition.
Context:
I wanted to save pytorch strings but I kept getting the error:
TypeError: tensor is not JSON serializable
so I coded the above. Note that yes, in pytorch you use torch.save
but pickle files aren't readable. Check this related post: https://discuss.pytorch.org/t/typeerror-tensor-is-not-json-serializable/36065/3
PPrint also has indent arguments but I didn't like how it looks:
pprint(stats, indent=4, sort_dicts=True)
output:
{ 'cca': { 'all': {'avg': tensor(0.5132), 'std': tensor(0.1532)},
'avg': tensor([0.5993, 0.5571, 0.4910, 0.4053]),
'rep': {'avg': tensor(0.5491), 'std': tensor(0.0743)},
'std': tensor([0.0316, 0.0368, 0.0910, 0.2490])},
'cka': { 'all': {'avg': tensor(0.7885), 'std': tensor(0.3449)},
'avg': tensor([1.0000, 0.9840, 0.9442, 0.2260]),
'rep': {'avg': tensor(0.9761), 'std': tensor(0.0468)},
'std': tensor([5.9043e-07, 2.9688e-02, 6.3634e-02, 2.1686e-01])},
'cosine': { 'all': {'avg': tensor(0.5931), 'std': tensor(0.7158)},
'avg': tensor([ 0.9825, 0.9001, 0.7909, -0.3012]),
'rep': {'avg': tensor(0.8912), 'std': tensor(0.1571)},
'std': tensor([0.0371, 0.1232, 0.1976, 0.9536])},
'nes': { 'all': {'avg': tensor(0.6771), 'std': tensor(0.2891)},
'avg': tensor([0.9326, 0.8038, 0.6852, 0.2867]),
'rep': {'avg': tensor(0.8072), 'std': tensor(0.1596)},
'std': tensor([0.0695, 0.1266, 0.1578, 0.2339])},
'nes_output': { 'all': {'avg': None, 'std': None},
'avg': tensor(0.2975),
'rep': {'avg': None, 'std': None},
'std': tensor(0.0945)},
'query_loss': { 'all': {'avg': None, 'std': None},
'avg': tensor(12.3746),
'rep': {'avg': None, 'std': None},
'std': tensor(13.7910)}}
compare to:
{
"cca": {
"all": {
"avg": "tensor(0.5144)",
"std": "tensor(0.1553)"
},
"avg": "tensor([0.6023, 0.5612, 0.4874, 0.4066])",
"rep": {
"avg": "tensor(0.5503)",
"std": "tensor(0.0796)"
},
"std": "tensor([0.0285, 0.0367, 0.1004, 0.2493])"
},
"cka": {
"all": {
"avg": "tensor(0.7888)",
"std": "tensor(0.3444)"
},
"avg": "tensor([1.0000, 0.9840, 0.9439, 0.2271])",
"rep": {
"avg": "tensor(0.9760)",
"std": "tensor(0.0468)"
},
"std": "tensor([5.7627e-07, 2.9689e-02, 6.3541e-02, 2.1684e-01])"
},
"cosine": {
"all": {
"avg": "tensor(0.5945)",
"std": "tensor(0.7146)"
},
"avg": "tensor([ 0.9825, 0.9001, 0.7907, -0.2953])",
"rep": {
"avg": "tensor(0.8911)",
"std": "tensor(0.1571)"
},
"std": "tensor([0.0371, 0.1231, 0.1975, 0.9554])"
},
"nes": {
"all": {
"avg": "tensor(0.6773)",
"std": "tensor(0.2886)"
},
"avg": "tensor([0.9326, 0.8037, 0.6849, 0.2881])",
"rep": {
"avg": "tensor(0.8070)",
"std": "tensor(0.1595)"
},
"std": "tensor([0.0695, 0.1265, 0.1576, 0.2341])"
},
"nes_output": {
"all": {
"avg": "None",
"std": "None"
},
"avg": "tensor(0.2976)",
"rep": {
"avg": "None",
"std": "None"
},
"std": "tensor(0.0945)"
},
"query_loss": {
"all": {
"avg": "None",
"std": "None"
},
"avg": "tensor(12.3616)",
"rep": {
"avg": "None",
"std": "None"
},
"std": "tensor(13.7976)"
}
}
It's far from perfect, but it does the job.
data = data.replace(',"',',\n"')
you can improve it, add indenting and so on, but if you just want to be able to read a cleaner json, this is the way to go.
Success story sharing
print json.dumps(your_json_string, indent=4)
var str = JSON.stringify(obj, null, 4);
as discussed here stackoverflow.com/questions/4810841/…