Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
526 views
in Technique[技术] by (71.8m points)

numpy - Creating large ndarray from multiple mem-mapped arrays

I have multiple large images stored on binary (fits) file on disc. Each array is of the same shape, and dtype.

I need to read in N of these images, but wish to preserve memory-mapping as they would swamp RAM. The easiest way to do this is, of course, read in as elements of a list. However, ideally I would like to treat this as a numpy array ( of shape [n, ny, nx]) e.g. for easy transpose etc.

Is this possible, without reading these in to RAM?

Note: in practice, what I need is more complicated, equivalent to reading in list-of-list (e.g. an M element list, each element itself an N element list, each a ndarray image), but an answer to the simple case above should hopefully be sufficient.

Thanks for any help.

question from:https://stackoverflow.com/questions/65832848/creating-large-ndarray-from-multiple-mem-mapped-arrays

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can either create a complex abstraction that creates an array-like interface to multiple files, or you can consolidate your data. The former is going to be fairly complex, and probably not worth your time.

Consolidating the data, e.g. in a temporary file, is a much simpler option, which I've implemented here with the assumption that you are using astropy for your FITS I/O. You can tailor it for other libraries or other use-cases as you see fit.

from tempfile import TemporaryFile
from astropy.io import fits

n = 0
with TemporaryFile() as output:
    for filename in my_list_of_files:
        with fits.open(filename) as hdus:
            # If you have a single HDU that you know how to reference, get rid of the loop
            for hdu in hdus:
                if isinstance(hdu, fits.ImageHDU):
                    data = hdu.data.T
                    if n == 0:
                        shape = data.shape
                        dtype = data.dtype
                    elif data.shape != shape or data.dtype != dtype:
                        continue
                    data.tofile(output)
                    n += 1

Now you have a single binary flatfile with all your data in row-major order, and all the metadata you need to use numpy's memmap:

    array = np.memmap(output, dtype, shape=(n,) + shape)

Do all your work in the outer with block, since output will be delete on close in this implementation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...