raise HTTPError(req.full_url, code, msg, hdrs, fp)urllib.error.HTTPError: HTTP Error 404: Not Found

created at 11-21-2021 views: 99

error description

import requests
url=['www....','www.....',...]
for i in range(0,len(url)):
     linkhtml = requests.get(url[i])

The crawler reported the following error:


  File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 640, in http_response
    response = self.parent.error(
  File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "C:\Users\lenovo7\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Refer to an article on stack Overflow

Python: urllib.error.HTTPError: HTTP Error 404: Not Found-Stack Overflow

In the crawler scenario, the original link may not be opened, and it naturally prompts HTTP Error 404. What you have to do is to skip this link and then crawl the following page.

Fix the code

import requests
url=['www....','www.....',...]
for i in range(0,len(url)):
    try:
        linkhtml = requests.get(url[i])
    except:
        pass
created at:11-21-2021
edited at: 11-21-2021: