TypeError: a bytes-like object is required, not 'str'
getting above error while Executing below python code to save the HTML table data in Csv file. don't know how to get rideup.pls help me.
import csv
import requests
from bs4 import BeautifulSoup
url='http://www.mapsofindia.com/districts-india/'
response=requests.get(url)
html=response.content
soup=BeautifulSoup(html,'html.parser')
table=soup.find('table', attrs={'class':'tableizer-table'})
list_of_rows=[]
for row in table.findAll('tr')[1:]:
list_of_cells=[]
for cell in row.findAll('td'):
list_of_cells.append(cell.text)
list_of_rows.append(list_of_cells)
outfile=open('./immates.csv','wb')
writer=csv.writer(outfile)
writer.writerow(["SNo", "States", "Dist", "Population"])
writer.writerows(list_of_rows)
on above the last line.
You are using Python 2 methodology instead of Python 3.
Change:
outfile=open('./immates.csv','wb')
To:
outfile=open('./immates.csv','w')
and you will get a file with the following output:
SNo,States,Dist,Population
1,Andhra Pradesh,13,49378776
2,Arunachal Pradesh,16,1382611
3,Assam,27,31169272
4,Bihar,38,103804637
5,Chhattisgarh,19,25540196
6,Goa,2,1457723
7,Gujarat,26,60383628
.....
In Python 3 csv takes the input in text mode, whereas in Python 2 it took it in binary mode.
Edited to Add
Here is the code I ran:
url='http://www.mapsofindia.com/districts-india/'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
table=soup.find('table', attrs={'class':'tableizer-table'})
list_of_rows=[]
for row in table.findAll('tr')[1:]:
list_of_cells=[]
for cell in row.findAll('td'):
list_of_cells.append(cell.text)
list_of_rows.append(list_of_cells)
outfile = open('./immates.csv','w')
writer=csv.writer(outfile)
writer.writerow(['SNo', 'States', 'Dist', 'Population'])
writer.writerows(list_of_rows)
I had the same issue with Python3. My code was writing into io.BytesIO()
.
Replacing with io.StringIO()
solved.
io.StringIO()
is memory greed and can be a headache with large files.
just change wb to w
outfile=open('./immates.csv','wb')
to
outfile=open('./immates.csv','w')
You are opening the csv file in binary mode, it should be 'w'
import csv
# open csv file in write mode with utf-8 encoding
with open('output.csv','w',encoding='utf-8',newline='')as w:
fieldnames = ["SNo", "States", "Dist", "Population"]
writer = csv.DictWriter(w, fieldnames=fieldnames)
# write list of dicts
writer.writerows(list_of_dicts) #writerow(dict) if write one row at time
file = open('parsed_data.txt', 'w')
for link in soup.findAll('a', attrs={'href': re.compile("^http")}): print (link)
soup_link = str(link)
print (soup_link)
file.write(soup_link)
file.flush()
file.close()
In my case, I used BeautifulSoup to write a .txt with Python 3.x. It had the same issue. Just as @tsduteba said, change the 'wb' in the first line to 'w'.
Success story sharing
csv
module, the Python 3open
should also havenewline=''
as a parameter [ref]