NetCDF I/O Handling in Iris#
This document provides a basic account of how Iris loads and saves NetCDF files.
Under Construction
This document is still a work in progress, so might include blank or unfinished sections, watch this space!
Chunk Control#
Default Chunking#
Chunks are, by default, optimised by Iris on load. This will automatically decide the best chunksize for your data without any user input. This is calculated based on a number of factors, including:
File Variable Chunking
Full Variable Shape
Dask Default Chunksize
Dimension Order: Earlier (outer) dimensions will be prioritised to be split over later (inner) dimensions.
>>> cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.shape)
(240, 37, 49)
>>> print(cube.core_data().chunksize)
(60, 37, 49)
For more user control, functionality was updated in PR #5588, with the
creation of the iris.fileformats.netcdf.loader.CHUNK_CONTROL class.
Custom Chunking: Set#
There are three context manangers within CHUNK_CONTROL. The most basic is
set(). This allows you to specify the chunksize for each dimension,
and to specify a var_name specifically to change.
Using -1 in place of a chunksize will ensure the chunksize stays the same
as the shape, i.e. no optimisation occurs on that dimension.
>>> with CHUNK_CONTROL.set("air_temperature", time=180, latitude=-1, longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(180, 37, 25)
Note that var_name is optional, and that you don’t need to specify every dimension. If you
specify only one dimension, the rest will be optimised using Iris’ default behaviour.
>>> with CHUNK_CONTROL.set(longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 25)
Custom Chunking: From File#
The second context manager is from_file().
This takes chunksizes as defined in the NetCDF file. Any dimensions without specified chunks
will default to Iris optimisation.
>>> with CHUNK_CONTROL.from_file():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 49)
Custom Chunking: As Dask#
The final context manager, as_dask(), bypasses
Iris’ optimisation all together, and will take its chunksizes from Dask’s behaviour.
>>> with CHUNK_CONTROL.as_dask():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(70, 37, 49)
Split Attributes#
TBC
Deferred Saving#
TBC
Guess Axis#
TBC