raise ContentTooShortError(urllib.error.ContentTooShortError: <urlopen error retrieval incomplete

created at 11-21-2021 views: 64

Problem description

The following error occurs in the crawler batch download

raise ContentTooShortError(
urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 0 out of 290758 bytes>

cause of the problem

Cause of the problem: Incomplete download of urlretrieve

solution

1. Solution one

The recursive method is used to solve the incomplete download of urlretrieve. The code is as follows:

def auto_down(url,filename):
    try:
        urllib.urlretrieve(url,filename)
    except urllib.ContentTooShortError:
        print 'Network conditions is not good.Reloading.'
        auto_down(url,filename)

However, after testing, the download file appears urllib.ContentTooShortError and the re-downloading of the file will take too long, and it often tries several times, or even a dozen times, and occasionally falls into an infinite loop. This situation is very undesirable. 

2. Solution two

For this reason, the socket module is used to shorten the time for each re-download and avoid falling into an infinite loop, thereby improving operating efficiency.
The following is the code:

import socket
import urllib.request
#Set the timeout period to 30s
socket.setdefaulttimeout(30)
#Solve the problem of incomplete download and avoid falling into an endless loop
try:
    urllib.request.urlretrieve(url,image_name)
except socket.timeout:
    count = 1
    while count <= 5:
        try:
            urllib.request.urlretrieve(url,image_name)                                                
            break
        except socket.timeout:
            err_info = 'Reloading for %d time'%count if count == 1 else 'Reloading for %d times'%count
            print(err_info)
            count += 1
    if count > 5:
        print("downloading picture fialed!")
created at:11-21-2021
edited at: 11-21-2021: