python - How to merge two large numpy arrays if slicing doesn't resolve memory error? -
i have 2 numpy arrays container1 , container2 container1.shape = (900,4000) , container2.shape = (5000,4000). merging them using vstack results in memoryerror. after searching through old questions posted here, tried merge them using slicing this:
mergedcontainer = numpy.vstack((container1, container2[:1000])) mergedcontainer = numpy.vstack((mergedcontainer, container[1000:2500])) mergedcontainer = numpy.vstack((mergedcontainer, container[2500:3000])) but after if do:
mergedcontainer = numpy.vstack((mergedcontainer, container[3000:3100])) it results in memoryerror.
i using python 3.4.3 (32-bit) , resolve without shifting 64-bit.
every time call np.vstack numpy has allocate space brand new array. if 1 row requires 1 unit of memory
np.vstack([container, container2]) requires an additional 900+5000 units of memory. moreover, before assignment occurs, python needs hold space old mergedcontainer (if exists) space new mergedcontainer. building mergedcontainer iteratively slices requires more memory trying build single call np.vstack.
building iteratively:
| total | mergedcontainer | container1 | container2 | temp | | |-------+-----------------+------------+------------+------+----------------------------------------------------------------------| | 7800 | 1900 | 900 | 5000 | 0 | mergedcontainer = np.vstack((container1, container2[:1000])) | | 11200 | 3400 | 900 | 5000 | 1900 | mergedcontainer = np.vstack((mergedcontainer, container[1000:2500])) | | 13200 | 3900 | 900 | 5000 | 3400 | mergedcontainer = np.vstack((mergedcontainer, container[2500:3000])) | building single call np.vstack:
| total | mergedcontainer | container1 | container2 | temp | | |-------+-----------------+------------+------------+------+-------------------------------------------------------| | 11800 | 5900 | 900 | 5000 | 0 | mergedcontainer = np.vstack((container1, container2)) | we can better, however. instead of calling np.vstack repeatedly, allocate space needed once beginning , write contents of both container1 , container2 it. in other words, avoid allocating 2 disparate arrays container1 , container2 if know want merge them.
container = np.empty((5900, 4000)) note basic slices such container[:900] return views, , views require no additional memory. define container1 , container2 this:
container1 = container[:900] container2 = container[900:] and assign values in place. modifies container:
container1[:] = ... container2[:] = ... thus your memory requirement stay around 5900 units.
for example,
import numpy np np.random.seed(2015) container = np.empty((5, 4), dtype='int') container1 = container[:2] container2 = container[2:] container1[:] = np.random.randint(10, size=(2,4)) container2[:] = np.random.randint(1000, size=(3,4)) print(container) yields
[[ 2 2 9 6] [ 8 5 7 8] [112 70 487 124] [859 8 275 936] [317 134 393 909]] while requiring space 1 array of shape (5, 4), , temporarly-used space random arrays.
thus, wouldn't have change in code save memory. set with
container = np.empty((5900, 4000)) container1 = container[:900] container2 = container[900:] and use
container1[:] = ... instead of
container1 = ... to assign values in-place. (or, of course, write directly container.)
Comments
Post a Comment