Here will first talk about the data loading and image visualization of CIFAR10, and then the model will introduce and implement the network.
CIFAR-10 is a small data set compiled by Hinton students Alex Krizhevsky and Ilya Sutskever for identifying universal objects. A total of 10 categories of RGB color pictures are included: arplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The size of the picture is 32×32, and there are a total of 50,000 training pictures and 10,000 test pictures in the data set.
Compared with the MNIST data set, CIFAR-10 has the following differences:
First use torchvision
to load and normalize our training data and test data.
torchvision.datasets
module.torchvision.transforms
moduletorchvision.models
contains models such as AlexNet, VGG, ResNet, and SqueezeNet.Since the output of torchvision's datasets is [0,1] PILImage, we first normalize to [-1,1] Tensor
First, a transformation transform is defined. The Compose()
in the transforms module mentioned above is used to combine multiple transformations. You can see that the two transformations ToTensor and Normalize are combined.
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
The front (0.5, 0.5, 0.5)
is the average of the three RGB channels, and the back (0.5, 0.5, 0.5)
is the three channels Note that the channel order is RGB. Students who have used opencv should know that the image read by openCV is in BRG order. These two tuple data are used to normalize the RGB image, as shown by the name Normalize, where 0.5 is just an approximate operation. In fact, the mean and variance are not so many, but for this example, the impact Don't count. The exact value is calculated by calculating the data of the three channels R, G, and B respectively.
transform = transforms.Compose([
# transforms.CenterCrop(224),
transforms.RandomCrop(32,padding=4), # Data augmentation
transforms.RandomHorizontalFlip(), # Data augmentation
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
Trainloader is actually a more important thing. We will pass the data to the network through trainloader later. Of course, the trainloader here is actually a variable name, which can be taken whatever you want. The point is that it is defined by the following torch.utils.data.DataLoader()
Yes, this thing comes from the torch.utils.data
module.
Batch_Size = 256
trainset = datasets.CIFAR10(root='./data', train=True,download=True, transform=transform)
testset = datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=Batch_Size,shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=Batch_Size,shuffle=True, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Files already downloaded and verified
Files already downloaded and verified
First you can view the category
classes = trainset.classes
classes
['airplane',
'automobile',
'bird',
'cat',
'deer',
'dog',
'frog',
'horse',
'ship',
'truck']
trainset.class_to_idx
{'airplane': 0,
'automobile': 1,
'bird': 2,
'cat': 3,
'deer': 4,
'dog': 5,
'frog': 6,
'horse': 7,
'ship': 8,
'truck': 9}
You can also check the data of the training set
trainset.data.shape
#50000 is the number of pictures, 32x32 is the picture size, 3 is the number of channels RGB
(50000, 32, 32, 3)
View data type
print(type(trainset.data))
print(type(trainset))
<class 'numpy.ndarray'>
<class 'torchvision.datasets.cifar.CIFAR10'>
summary
trainset.data.shape
is the standard numpy.ndarray
type, where 50000 is the number of pictures, 32x32 is the picture size, and 3 is the number of channels RGB;import numpy as np
import matplotlib.pyplot as plt
plt.imshow(trainset.data[0])
im,label = iter(trainloader).next()
Convert np.ndarray to torch.Tensor
In deep learning, the original image needs to be converted to a data format customized by the deep learning framework, and in pytorch, it needs to be converted to torch.Tensor
.
pytorch provides torch.Tensor
and numpy.ndarray
conversion interfaces:
torch.from_numpy(xxx)
: Convert numpy.ndarray
to torch.Tensor
tensor1.numpy()
: Get the numpy format data of the tensor1 objectThe representation of torch.Tensor high-dimensional matrix: N x C x H x W
numpy.ndarray
Representation of high-dimensional matrix: N x H x W x C
Therefore, you need to use the numpy.transpose()
method when converting between the two.
def imshow(img):
img = img / 2 + 0.5
img = np.transpose(img.numpy(),(1,2,0))
plt.imshow(img)
imshow(im[0])
plt.figure(figsize=(8,12))
imshow(torchvision.utils.make_grid(im[:32]))