I have a Numpy function that takes the values of an existing array X
and a size (int) k
equivalent to the columns of the array. This function does some calculations in a new array, to finally output the new array after the calculations are done.
def ff(x, k):
newarr = np.zeros((x.shape[0], 1), dtype=np.uint32)
newarr[:, 0] = x[:, 0] * 4
for i in range(1, k - 1):
newarr[:, 0] += x[:, i]
newarr[:, 0] = newarr[:, 0] * 4
newarr[:, 0] += x[:, -1]
return newarr
newarr
it's the array where the calculated data is stored, it's initialized with np.zeros
and a dtype
, and it has a shape of (y,1)
where y
is the length of the x
array.
The first step is to store in the (only) column of the newarr
the multiplication of the first column of the x
by 4.
The iteration occurs over the columns 1 to k-1
of the original array (x
).
Inside the iteration, the next step is to sum the values of the first column of x
to the column of newarr
.
Next, to multiply the column of newarr
by 4
To finally, after the iteration is complete, sum the last column of the original array x
to the column of newarr
.
I'm looking for a way (if there's any) to avoid the creation of the newarr
, and do the calculations in the original array. This', to reduce the memory usage of this function, mainly because the x
tends to be very big in execution.
The x
array looks like this.
array([[3, 3, 0, 1],
[3, 0, 1, 0],
[0, 1, 0, 2],
[1, 0, 2, 2],
[0, 2, 2, 3],
[2, 2, 3, 1],
[2, 3, 1, 3],
.
.
.
[2, 2, 0, 1]], dtype=uint8)
In this case, the k
value would be 4.
For the sake of completeness, I've attached the memory profile of my solution and the proposed dot product, for a big x
array: (248956413, 10).
My solution
Line # Mem usage Increment Line Contents
================================================
18 322.0 MiB 322.0 MiB @profile
19 def ff(x, k):
20 322.0 MiB 0.0 MiB newarr = np.zeros((x.shape[0], 1), dtype=np.uint32)
21 1271.7 MiB 949.8 MiB newarr[:, 0] = x[:, 0] * 4
22 1274.2 MiB 0.0 MiB for i in range(1, k - 1):
23 1274.2 MiB 0.0 MiB newarr[:, 0] += x[:, i]
24 1274.2 MiB 2.5 MiB newarr[:, 0] = newarr[:, 0] * 4
25 1274.2 MiB 0.0 MiB newarr[:, 0] += x[:, -1]
26 1274.2 MiB 0.0 MiB return newarr
Dot product.
Line # Mem usage Increment Line Contents
================================================
18 321.7 MiB 321.7 MiB @profile
19 def ff2(x,k):
20 2221.2 MiB 1899.5 MiB return x.dot(4 ** np.arange(k - 1, -1, -1))
I was expecting a reduction in memory usage. Clearly, I was wrong.
x.dot(4**np.arange(k-1,-1,-1))
.newarr
is only one column, or 1d, a fraction of the size ofx
(withk
columns). I wouldn't worry about memory savings. \$\endgroup\$