import requests
url=['www....','www.....',...]
for i in range(0,len(url)):
linkhtml = requests.get(url[i])
The crawler reported the following error:
File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Refer to an article on stack Overflow
Python: urllib.error.HTTPError: HTTP Error 404: Not Found-Stack Overflow
In the crawler scenario, the original link may not be opened, and it naturally prompts HTTP Error 404. What you have to do is to skip this link and then crawl the following page.
Fix the code
import requests
url=['www....','www.....',...]
for i in range(0,len(url)):
try:
linkhtml = requests.get(url[i])
except:
pass