ch5mpy.Dataset

class ch5mpy.Dataset(bind, *, readonly=False)[source]

A subclass of h5py.Dataset that implements pickling.

Create a new Dataset object by binding to a low-level DatasetID.

Attributes

Dataset.attrs

Attributes attached to this object

Dataset.chunks

Dataset chunks (or None)

Dataset.compression

Compression strategy (or None)

Dataset.compression_opts

Compression setting.

Dataset.dims

Access dimension scales attached to this dataset.

Dataset.dtype

Numpy dtype representing the datatype

Dataset.external

External file settings.

Dataset.file

Return a File instance associated with this object

Dataset.fillvalue

Fill value for this dataset (0 by default)

Dataset.fletcher32

Fletcher32 filter is present (T/F)

Dataset.id

Low-level identifier appropriate for this object

Dataset.is_scale

Return True if this dataset is also a dimension scale.

Dataset.is_virtual

Check if this is a virtual dataset

Dataset.maxshape

Shape up to which this dataset can be resized.

Dataset.name

Return the full name of this object.

Dataset.nbytes

Numpy-style attribute giving the raw dataset size as the number of bytes

Dataset.ndim

Numpy-style attribute giving the number of dimensions

Dataset.parent

Return the parent group of this object.

Dataset.ref

An (opaque) HDF5 reference to this object

Dataset.regionref

Create a region reference (Datasets only).

Dataset.scaleoffset

Scale/offset filter settings.

Dataset.shape

Numpy-style shape tuple giving dataset dimensions

Dataset.shuffle

Shuffle filter present (T/F)

Dataset.size

Numpy-style attribute giving the total dataset size

Methods

Dataset.__init__

Create a new Dataset object by binding to a low-level DatasetID.

Dataset.asstr

Get a wrapper to read string data as Python strings:

Dataset.astype

Get a wrapper allowing you to perform reads to a different destination type, e.g.: :rtype: AsDtypeWrapper[generic]

Dataset.fields

Get a wrapper to read a subset of fields from a compound data type:

Dataset.flush

Flush the dataset data and metadata to the file.

Dataset.iter_chunks

Return chunk iterator.

Dataset.len

The size of the first axis.

Dataset.make_scale

Make this dataset an HDF5 dimension scale.

Dataset.maptype

rtype:

AsObjectWrapper[Any]

Dataset.read_direct

Read data directly from HDF5 into an existing NumPy array.

Dataset.refresh

Refresh the dataset metadata by reloading from the file.

Dataset.resize

Resize the dataset, or the specified axis.

Dataset.virtual_sources

Get a list of the data mappings for a virtual dataset

Dataset.write_direct

Write data directly to HDF5 from a NumPy array.

Attributes

Dataset.attrs
Dataset.chunks

Dataset chunks (or None)

Dataset.compression

Compression strategy (or None)

Dataset.compression_opts

Compression setting. Int(0-9) for gzip, 2-tuple for szip.

Dataset.dims

Access dimension scales attached to this dataset.

Dataset.dtype
Dataset.external

External file settings. Returns a list of tuples of (name, offset, size) for each external file entry, or returns None if no external files are used.

Dataset.file
Dataset.fillvalue

Fill value for this dataset (0 by default)

Dataset.fletcher32

Fletcher32 filter is present (T/F)

Dataset.id

Low-level identifier appropriate for this object

Dataset.is_scale

Return True if this dataset is also a dimension scale.

Return False otherwise.

Dataset.is_virtual

Check if this is a virtual dataset

Dataset.maxshape

Shape up to which this dataset can be resized. Axes with value None have no resize limit.

Dataset.name

Return the full name of this object. None if anonymous.

Dataset.nbytes

Numpy-style attribute giving the raw dataset size as the number of bytes

Dataset.ndim

Numpy-style attribute giving the number of dimensions

Dataset.parent

Return the parent group of this object.

This is always equivalent to obj.file[posixpath.dirname(obj.name)]. ValueError if this object is anonymous.

Dataset.ref

An (opaque) HDF5 reference to this object

Dataset.regionref

Create a region reference (Datasets only).

The syntax is regionref[<slices>]. For example, dset.regionref[…] creates a region reference in which the whole dataset is selected.

Can also be used to determine the shape of the referenced dataset (via .shape property), or the shape of the selection (via the .selection property).

Dataset.scaleoffset

Scale/offset filter settings. For integer data types, this is the number of bits stored, or 0 for auto-detected. For floating point data types, this is the number of decimal places retained. If the scale/offset filter is not in use, this is None.

Dataset.shape

Numpy-style shape tuple giving dataset dimensions

Dataset.shuffle

Shuffle filter present (T/F)

Dataset.size

Numpy-style attribute giving the total dataset size

Methods

Dataset.__init__(bind, *, readonly=False)[source]

Create a new Dataset object by binding to a low-level DatasetID.

Dataset.asstr(encoding=None, errors='strict')[source]

Get a wrapper to read string data as Python strings:

The parameters have the same meaning as in bytes.decode(). If encoding is unspecified, it will use the encoding in the HDF5 datatype (either ascii or utf-8).

Return type:

AsStrWrapper

Parameters:
  • encoding (Literal['ascii', 'utf-8'] | None) –

  • errors (Literal['backslashreplace', 'ignore', 'namereplace', 'strict', 'replace', 'xmlcharrefreplace']) –

Dataset.astype(dtype)[source]

Get a wrapper allowing you to perform reads to a different destination type, e.g.: :rtype: AsDtypeWrapper[generic]

>>> double_precision = dataset.astype('f8')[0:100:2]
Parameters:

dtype (dtype[Any] | None | type[Any] | _SupportsDType[dtype[Any]] | str | tuple[Any, int] | tuple[Any, Union[SupportsIndex, collections.abc.Sequence[SupportsIndex]]] | list[Any] | _DTypeDict | tuple[Any, Any]) –

Return type:

AsDtypeWrapper[generic]

Dataset.fields(names, *, _prior_dtype=None)[source]

Get a wrapper to read a subset of fields from a compound data type:

>>> 2d_coords = dataset.fields(['x', 'y'])[:]

If names is a string, a single field is extracted, and the resulting arrays will have that dtype. Otherwise, it should be an iterable, and the read data will have a compound dtype.

Dataset.flush()[source]

Flush the dataset data and metadata to the file. If the dataset is chunked, raw data chunks are written to the file.

This is part of the SWMR features and only exist when the HDF5 library version >=1.9.178

Dataset.iter_chunks(sel=None)[source]

Return chunk iterator. If set, the sel argument is a slice or tuple of slices that defines the region to be used. If not set, the entire dataspace will be used for the iterator.

For each chunk within the given region, the iterator yields a tuple of slices that gives the intersection of the given chunk with the selection area.

A TypeError will be raised if the dataset is not chunked.

A ValueError will be raised if the selection region is invalid.

Dataset.len()[source]

The size of the first axis. TypeError if scalar.

Use of this method is preferred to len(dset), as Python’s built-in len() cannot handle values greater then 2**32 on 32-bit systems.

Dataset.make_scale(name='')[source]

Make this dataset an HDF5 dimension scale.

You can then attach it to dimensions of other datasets like this:

other_ds.dims[0].attach_scale(ds)

You can optionally pass a name to associate with this scale.

Dataset.maptype(otype)[source]
Return type:

AsObjectWrapper[Any]

Parameters:

otype (type[Any]) –

Dataset.read_direct(dest, source_sel=None, dest_sel=None)[source]

Read data directly from HDF5 into an existing NumPy array.

The destination array must be C-contiguous and writable. Selections must be the output of numpy.s_[<args>].

Broadcasting is supported for simple indexing.

Dataset.refresh()[source]

Refresh the dataset metadata by reloading from the file.

This is part of the SWMR features and only exist when the HDF5 library version >=1.9.178

Dataset.resize(size, axis=None)[source]

Resize the dataset, or the specified axis.

The dataset must be stored in chunked format; it can be resized up to the “maximum shape” (keyword maxshape) specified at creation time. The rank of the dataset cannot be changed.

“Size” should be a shape tuple, or if an axis is specified, an integer.

BEWARE: This functions differently than the NumPy resize() method! The data is not “reshuffled” to fit in the new shape; each axis is grown or shrunk independently. The coordinates of existing data are fixed.

Dataset.virtual_sources()[source]

Get a list of the data mappings for a virtual dataset

Dataset.write_direct(source, source_sel=None, dest_sel=None)[source]

Write data directly to HDF5 from a NumPy array.

The source array must be C-contiguous. Selections must be the output of numpy.s_[<args>].

Broadcasting is supported for simple indexing.

Return type:

None

Parameters: