np.ascontiguousarray() function in numpy

created at 07-15-2021 views: 15

Instructions in the Numpy documentation:

"Return a contiguous array (ndim >= 1) in memory (C order)."

usage

The ascontiguousarray function converts an array that is not continuously stored in memory to an array that is continuously stored in memory, which makes the operation faster.

C order vs Fortran order

  • C order refers to the row-major order, that is, the elements of the same line in memory exist together,
  • Fortran Order refers to the column-major order (Column-major Order), that is, the elements of the same column in memory exist together.

Pascal, C, C++, and Python are all row-first storage, while Fortran and MatLab are column-first storage.

Contiguous array

Contiguous array refers to the address of the array stored in the memory is also continuous (note that the memory address is actually one-dimensional).

The 2-dimensional array arr = np.arange(12).reshape(3,4). The array structure is as follows

2-dimensional array

The actual storage in memory is as follows:

actual storage in memory

arr is the C order, which is row-first in memory. If you want to move down one column, you need to skip 3 blocks (for example, from 0 to 4 you only need to skip 1, 2 and 3).

If it is transposed, arr.T does not have the C continuous feature, because the address of the element in the memory does not change, and the adjacent elements in the same row are not continuous in the memory:

transposed array

At this time, arr.T becomes Fortran order, because the elements in adjacent columns are stored adjacently in memory.

In terms of performance, it is much faster to obtain adjacent addresses in memory than non-adjacent addresses (when reading a value from RAM, you can read the value in a block of addresses together and save it in the Cache), This means that operations on contiguous arrays will be much faster.

Since arr is C continuous, row operations are faster than column operations. usually

np.sum(arr, axis=1) # Sum by row

Will be Slightly faster than

np.sum(arr, axis=0) # sum by column

Similarly, on arr.T, column operations are faster than row operations.

Use np.ascontiguousarray()

  • In Numpy, randomly initialized arrays are C continuous by default.
  • After irregular slice operations, the continuity will be changed, and it may become neither C continuous nor Fortran continuous.
  • You can check whether an array is C continuous or Fortran continuous through the .flags property of the array
>>> import numpy as np
>>> arr = np.arange(12).reshape(3, 4)
>>> arr.flags
    C_CONTIGUOUS : True
    F_CONTIGUOUS : False
    OWNDATA : False
    WRITEABLE : True
    ALIGNED : True
    WRITEBACKIFCOPY : False
    UPDATEIFCOPY : False

You can see from the output that the array arr is C continuous.
Perform a column-wise slice operation on arr, without changing the value of each row, then C is continuous:

>>> arr
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
>>> arr1 = arr[:2, :]
>>> arr1
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]])
>>> arr1.flags
    C_CONTIGUOUS : True
    F_CONTIGUOUS : False
    OWNDATA : False
    WRITEABLE : True
    ALIGNED : True
    WRITEBACKIFCOPY : False
    UPDATEIFCOPY : False

If you slice on the line, it will change the continuity and become neither C continuous nor Fortran continuous:


>>> arr1 = arr[:, 1:3]
>>> arr1.flags
    C_CONTIGUOUS : False
    F_CONTIGUOUS : False
    OWNDATA : False
    WRITEABLE : True
    ALIGNED : True
    WRITEBACKIFCOPY : False
    UPDATEIFCOPY : False

At this time, using the ascontiguousarray function, you can make it continuous:


>>> arr2 = np.ascontiguousarray(arr1)
>>> arr2.flags
    C_CONTIGUOUS : True
    F_CONTIGUOUS : False
    OWNDATA : True
    WRITEABLE : True
    ALIGNED : True
    WRITEBACKIFCOPY : False
    UPDATEIFCOPY : False
created at:07-15-2021
edited at: 07-15-2021: