The following error occurs in the crawler batch download
raise ContentTooShortError( urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 0 out of 290758 bytes>
Cause of the problem: Incomplete download of
1. Solution one
The recursive method is used to solve the incomplete download of
urlretrieve. The code is as follows:
def auto_down(url,filename): try: urllib.urlretrieve(url,filename) except urllib.ContentTooShortError: print 'Network conditions is not good.Reloading.' auto_down(url,filename)
However, after testing, the download file appears
urllib.ContentTooShortError and the re-downloading of the file will take too long, and it often tries several times, or even a dozen times, and occasionally falls into an infinite loop. This situation is very undesirable.
2. Solution two
For this reason, the socket module is used to shorten the time for each re-download and avoid falling into an infinite loop, thereby improving operating efficiency.
The following is the code:
import socket import urllib.request #Set the timeout period to 30s socket.setdefaulttimeout(30) #Solve the problem of incomplete download and avoid falling into an endless loop try: urllib.request.urlretrieve(url,image_name) except socket.timeout: count = 1 while count <= 5: try: urllib.request.urlretrieve(url,image_name) break except socket.timeout: err_info = 'Reloading for %d time'%count if count == 1 else 'Reloading for %d times'%count print(err_info) count += 1 if count > 5: print("downloading picture fialed!")