NetCDF I/O Handling in Iris#

This document provides a basic account of how Iris loads and saves NetCDF files.

Under Construction

This document is still a work in progress, so might include blank or unfinished sections, watch this space!

Chunk Control#

Default Chunking#

Chunks are, by default, optimised by Iris on load. This will automatically decide the best chunksize for your data without any user input. This is calculated based on a number of factors, including:

File Variable Chunking
Full Variable Shape
Dask Default Chunksize
Dimension Order: Earlier (outer) dimensions will be prioritised to be split over later (inner) dimensions.

>>> cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.shape)
(240, 37, 49)
>>> print(cube.core_data().chunksize)
(60, 37, 49)

For more user control, functionality was updated in PR #5588, with the creation of the iris.fileformats.netcdf.loader.CHUNK_CONTROL class.

Custom Chunking: Set#

There are three context manangers within CHUNK_CONTROL. The most basic is set(). This allows you to specify the chunksize for each dimension, and to specify a var_name specifically to change.

Using -1 in place of a chunksize will ensure the chunksize stays the same as the shape, i.e. no optimisation occurs on that dimension.

>>> with CHUNK_CONTROL.set("air_temperature", time=180, latitude=-1, longitude=25):
...     cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(180, 37, 25)

Note that var_name is optional, and that you don’t need to specify every dimension. If you specify only one dimension, the rest will be optimised using Iris’ default behaviour.

>>> with CHUNK_CONTROL.set(longitude=25):
...     cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 25)

Custom Chunking: From File#

The second context manager is from_file(). This takes chunksizes as defined in the NetCDF file. Any dimensions without specified chunks will default to Iris optimisation.

>>> with CHUNK_CONTROL.from_file():
...     cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 49)

Custom Chunking: As Dask#

The final context manager, as_dask(), bypasses Iris’ optimisation all together, and will take its chunksizes from Dask’s behaviour.

>>> with CHUNK_CONTROL.as_dask():
...    cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(70, 37, 49)

Split Attributes#

TBC

Deferred Saving#

TBC

Guess Axis#

TBC