ch5mpy.Dataset

class ch5mpy.Dataset(bind, *, readonly=False)[source]

A subclass of h5py.Dataset that implements pickling.

Create a new Dataset object by binding to a low-level DatasetID.

Attributes

`Dataset.attrs`	Attributes attached to this object
`Dataset.chunks`	Dataset chunks (or None)
`Dataset.compression`	Compression strategy (or None)
`Dataset.compression_opts`	Compression setting.
`Dataset.dims`	Access dimension scales attached to this dataset.
`Dataset.dtype`	Numpy dtype representing the datatype
`Dataset.external`	External file settings.
`Dataset.file`	Return a File instance associated with this object
`Dataset.fillvalue`	Fill value for this dataset (0 by default)
`Dataset.fletcher32`	Fletcher32 filter is present (T/F)
`Dataset.id`	Low-level identifier appropriate for this object
`Dataset.is_scale`	Return `True` if this dataset is also a dimension scale.
`Dataset.is_virtual`	Check if this is a virtual dataset
`Dataset.maxshape`	Shape up to which this dataset can be resized.
`Dataset.name`	Return the full name of this object.
`Dataset.nbytes`	Numpy-style attribute giving the raw dataset size as the number of bytes
`Dataset.ndim`	Numpy-style attribute giving the number of dimensions
`Dataset.parent`	Return the parent group of this object.
`Dataset.ref`	An (opaque) HDF5 reference to this object
`Dataset.regionref`	Create a region reference (Datasets only).
`Dataset.scaleoffset`	Scale/offset filter settings.
`Dataset.shape`	Numpy-style shape tuple giving dataset dimensions
`Dataset.shuffle`	Shuffle filter present (T/F)
`Dataset.size`	Numpy-style attribute giving the total dataset size

Methods

`Dataset.__init__`	Create a new Dataset object by binding to a low-level DatasetID.
`Dataset.asstr`	Get a wrapper to read string data as Python strings:
`Dataset.astype`	Get a wrapper allowing you to perform reads to a different destination type, e.g.: :rtype: `AsDtypeWrapper`[`generic`]
`Dataset.fields`	Get a wrapper to read a subset of fields from a compound data type:
`Dataset.flush`	Flush the dataset data and metadata to the file.
`Dataset.iter_chunks`	Return chunk iterator.
`Dataset.len`	The size of the first axis.
`Dataset.make_scale`	Make this dataset an HDF5 dimension scale.
`Dataset.maptype`	rtype: `AsObjectWrapper`[`Any`]
`Dataset.read_direct`	Read data directly from HDF5 into an existing NumPy array.
`Dataset.refresh`	Refresh the dataset metadata by reloading from the file.
`Dataset.resize`	Resize the dataset, or the specified axis.
`Dataset.virtual_sources`	Get a list of the data mappings for a virtual dataset
`Dataset.write_direct`	Write data directly to HDF5 from a NumPy array.

Attributes

Dataset.attrs

Dataset.chunks

Dataset chunks (or None)

Dataset.compression

Compression strategy (or None)

Dataset.compression_opts

Compression setting. Int(0-9) for gzip, 2-tuple for szip.

Dataset.dims

Access dimension scales attached to this dataset.

Dataset.dtype

Dataset.external

External file settings. Returns a list of tuples of (name, offset, size) for each external file entry, or returns None if no external files are used.

Dataset.file

Dataset.fillvalue

Fill value for this dataset (0 by default)

Dataset.fletcher32

Fletcher32 filter is present (T/F)

Dataset.id

Low-level identifier appropriate for this object

Dataset.is_scale

Return True if this dataset is also a dimension scale.

Return False otherwise.

Dataset.is_virtual

Check if this is a virtual dataset

Dataset.maxshape

Shape up to which this dataset can be resized. Axes with value None have no resize limit.

Dataset.name

Return the full name of this object. None if anonymous.

Dataset.nbytes

Numpy-style attribute giving the raw dataset size as the number of bytes

Dataset.ndim

Numpy-style attribute giving the number of dimensions

Dataset.parent

Return the parent group of this object.

This is always equivalent to obj.file[posixpath.dirname(obj.name)]. ValueError if this object is anonymous.

Dataset.ref

An (opaque) HDF5 reference to this object

Dataset.regionref

Create a region reference (Datasets only).

The syntax is regionref[<slices>]. For example, dset.regionref[…] creates a region reference in which the whole dataset is selected.

Can also be used to determine the shape of the referenced dataset (via .shape property), or the shape of the selection (via the .selection property).

Dataset.scaleoffset

Scale/offset filter settings. For integer data types, this is the number of bits stored, or 0 for auto-detected. For floating point data types, this is the number of decimal places retained. If the scale/offset filter is not in use, this is None.

Dataset.shape

Numpy-style shape tuple giving dataset dimensions

Dataset.shuffle

Shuffle filter present (T/F)

Dataset.size

Numpy-style attribute giving the total dataset size

Methods

Dataset.__init__(bind, *, readonly=False)[source]

Create a new Dataset object by binding to a low-level DatasetID.

Dataset.asstr(encoding=None, errors='strict')[source]

Get a wrapper to read string data as Python strings:

The parameters have the same meaning as in bytes.decode(). If encoding is unspecified, it will use the encoding in the HDF5 datatype (either ascii or utf-8).

Return type:

AsStrWrapper

Parameters:

encoding (Literal['ascii', 'utf-8'] | None) –

errors (Literal['backslashreplace', 'ignore', 'namereplace', 'strict', 'replace', 'xmlcharrefreplace']) –
Dataset.astype(dtype)[source]
Get a wrapper allowing you to perform reads to a different destination type, e.g.: :rtype: AsDtypeWrapper[generic]
>>> double_precision = dataset.astype('f8')[0:100:2]
Parameters:

dtype (dtype[Any] | None | type[Any] | _SupportsDType[dtype[Any]] | str | tuple[Any, int] | tuple[Any, Union[SupportsIndex, collections.abc.Sequence[SupportsIndex]]] | list[Any] | _DTypeDict | tuple[Any, Any]) –

Return type:

AsDtypeWrapper[generic]
Dataset.fields(names, *, _prior_dtype=None)[source]
Get a wrapper to read a subset of fields from a compound data type:
>>> 2d_coords = dataset.fields(['x', 'y'])[:]
If names is a string, a single field is extracted, and the resulting arrays will have that dtype. Otherwise, it should be an iterable, and the read data will have a compound dtype.
Dataset.flush()[source]

Flush the dataset data and metadata to the file. If the dataset is chunked, raw data chunks are written to the file.

This is part of the SWMR features and only exist when the HDF5 library version >=1.9.178

Dataset.iter_chunks(sel=None)[source]

Return chunk iterator. If set, the sel argument is a slice or tuple of slices that defines the region to be used. If not set, the entire dataspace will be used for the iterator.

For each chunk within the given region, the iterator yields a tuple of slices that gives the intersection of the given chunk with the selection area.

A TypeError will be raised if the dataset is not chunked.

A ValueError will be raised if the selection region is invalid.

Dataset.len()[source]

The size of the first axis. TypeError if scalar.

Use of this method is preferred to len(dset), as Python’s built-in len() cannot handle values greater then 2**32 on 32-bit systems.
Dataset.make_scale(name='')[source]
Make this dataset an HDF5 dimension scale.

You can then attach it to dimensions of other datasets like this:
other_ds.dims[0].attach_scale(ds)
You can optionally pass a name to associate with this scale.
Dataset.maptype(otype)[source]

Return type:

AsObjectWrapper[Any]

Parameters:

otype (type[Any]) –

Dataset.read_direct(dest, source_sel=None, dest_sel=None)[source]

Read data directly from HDF5 into an existing NumPy array.

The destination array must be C-contiguous and writable. Selections must be the output of numpy.s_[<args>].

Broadcasting is supported for simple indexing.

Dataset.refresh()[source]

Refresh the dataset metadata by reloading from the file.

This is part of the SWMR features and only exist when the HDF5 library version >=1.9.178

Dataset.resize(size, axis=None)[source]

Resize the dataset, or the specified axis.

The dataset must be stored in chunked format; it can be resized up to the “maximum shape” (keyword maxshape) specified at creation time. The rank of the dataset cannot be changed.

“Size” should be a shape tuple, or if an axis is specified, an integer.

BEWARE: This functions differently than the NumPy resize() method! The data is not “reshuffled” to fit in the new shape; each axis is grown or shrunk independently. The coordinates of existing data are fixed.

Dataset.virtual_sources()[source]

Get a list of the data mappings for a virtual dataset

Dataset.write_direct(source, source_sel=None, dest_sel=None)[source]

Write data directly to HDF5 from a NumPy array.

The source array must be C-contiguous. Selections must be the output of numpy.s_[<args>].

Broadcasting is supported for simple indexing.

Return type:

None

Parameters:

source (ndarray[Any, dtype[Any]]) –

source_sel (tuple[Union[int, slice, Collection[int]], ...] | None) –

dest_sel (tuple[Union[int, slice, Collection[int]], ...] | None) –