Halfway through the program, the error Too many open files
is reported:
... ...
File "/home/miniconda3/envs/gpu/lib/python3.9/multiprocessing/resource_sharer.py", line 48, in __init__
OSError: [Errno 24] Too many open files
Refer to the issue on github
It is expected because the default file_descriptor share strategy uses file descriptors as shared memory handles, and this will hit the limit when there are too many batches at DataLoader. To get around this, you can switch to file_system strategy by adding this to your script.
Add the following statement to the file and change the policy of file_descriptor
to file_system
import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')