python - How to merge two large numpy arrays if slicing doesn't resolve memory error? -


i have 2 numpy arrays container1 , container2 container1.shape = (900,4000) , container2.shape = (5000,4000). merging them using vstack results in memoryerror. after searching through old questions posted here, tried merge them using slicing this:

mergedcontainer = numpy.vstack((container1, container2[:1000])) mergedcontainer = numpy.vstack((mergedcontainer, container[1000:2500])) mergedcontainer = numpy.vstack((mergedcontainer, container[2500:3000])) 

but after if do:

mergedcontainer = numpy.vstack((mergedcontainer, container[3000:3100])) 

it results in memoryerror.

i using python 3.4.3 (32-bit) , resolve without shifting 64-bit.

every time call np.vstack numpy has allocate space brand new array. if 1 row requires 1 unit of memory

np.vstack([container, container2]) 

requires an additional 900+5000 units of memory. moreover, before assignment occurs, python needs hold space old mergedcontainer (if exists) space new mergedcontainer. building mergedcontainer iteratively slices requires more memory trying build single call np.vstack.

building iteratively:

| total | mergedcontainer | container1 | container2 | temp |                                                                      | |-------+-----------------+------------+------------+------+----------------------------------------------------------------------| |  7800 |            1900 |        900 |       5000 |    0 | mergedcontainer = np.vstack((container1, container2[:1000]))         | | 11200 |            3400 |        900 |       5000 | 1900 | mergedcontainer = np.vstack((mergedcontainer, container[1000:2500])) | | 13200 |            3900 |        900 |       5000 | 3400 | mergedcontainer = np.vstack((mergedcontainer, container[2500:3000])) | 

building single call np.vstack:

| total | mergedcontainer | container1 | container2 | temp |                                                       | |-------+-----------------+------------+------------+------+-------------------------------------------------------| | 11800 |            5900 |        900 |       5000 |    0 | mergedcontainer = np.vstack((container1, container2)) | 

we can better, however. instead of calling np.vstack repeatedly, allocate space needed once beginning , write contents of both container1 , container2 it. in other words, avoid allocating 2 disparate arrays container1 , container2 if know want merge them.

container = np.empty((5900, 4000)) 

note basic slices such container[:900] return views, , views require no additional memory. define container1 , container2 this:

container1 = container[:900]    container2 = container[900:]    

and assign values in place. modifies container:

container1[:] = ...               container2[:] = ... 

thus your memory requirement stay around 5900 units.


for example,

import numpy np np.random.seed(2015)  container = np.empty((5, 4), dtype='int') container1 = container[:2]    container2 = container[2:]    container1[:] = np.random.randint(10, size=(2,4)) container2[:] = np.random.randint(1000, size=(3,4)) print(container) 

yields

[[  2   2   9   6]  [  8   5   7   8]  [112  70 487 124]  [859   8 275 936]  [317 134 393 909]] 

while requiring space 1 array of shape (5, 4), , temporarly-used space random arrays.

thus, wouldn't have change in code save memory. set with

container = np.empty((5900, 4000)) container1 = container[:900]    container2 = container[900:]    

and use

container1[:] = ... 

instead of

container1 = ... 

to assign values in-place. (or, of course, write directly container.)


Comments

Popular posts from this blog

How to provide Authorization & Authentication using Asp.net, C#? -

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

How to use Authorization & Authentication in Asp.net, C#? -