ch5mpy.Dataset
- class ch5mpy.Dataset(bind, *, readonly=False)[source]
A subclass of h5py.Dataset that implements pickling.
Create a new Dataset object by binding to a low-level DatasetID.
Attributes
Attributes attached to this object
Dataset chunks (or None)
Compression strategy (or None)
Compression setting.
Access dimension scales attached to this dataset.
Numpy dtype representing the datatype
External file settings.
Return a File instance associated with this object
Fill value for this dataset (0 by default)
Fletcher32 filter is present (T/F)
Low-level identifier appropriate for this object
Return
Trueif this dataset is also a dimension scale.Check if this is a virtual dataset
Shape up to which this dataset can be resized.
Return the full name of this object.
Numpy-style attribute giving the raw dataset size as the number of bytes
Numpy-style attribute giving the number of dimensions
Return the parent group of this object.
An (opaque) HDF5 reference to this object
Create a region reference (Datasets only).
Scale/offset filter settings.
Numpy-style shape tuple giving dataset dimensions
Shuffle filter present (T/F)
Numpy-style attribute giving the total dataset size
Methods
Create a new Dataset object by binding to a low-level DatasetID.
Get a wrapper to read string data as Python strings:
Get a wrapper allowing you to perform reads to a different destination type, e.g.: :rtype:
AsDtypeWrapper[generic]Get a wrapper to read a subset of fields from a compound data type:
Flush the dataset data and metadata to the file.
Return chunk iterator.
The size of the first axis.
Make this dataset an HDF5 dimension scale.
- rtype:
AsObjectWrapper[Any]
Read data directly from HDF5 into an existing NumPy array.
Refresh the dataset metadata by reloading from the file.
Resize the dataset, or the specified axis.
Get a list of the data mappings for a virtual dataset
Write data directly to HDF5 from a NumPy array.
Attributes
- Dataset.attrs
- Dataset.chunks
Dataset chunks (or None)
- Dataset.compression
Compression strategy (or None)
- Dataset.compression_opts
Compression setting. Int(0-9) for gzip, 2-tuple for szip.
- Dataset.dims
Access dimension scales attached to this dataset.
- Dataset.dtype
- Dataset.external
External file settings. Returns a list of tuples of (name, offset, size) for each external file entry, or returns None if no external files are used.
- Dataset.file
- Dataset.fillvalue
Fill value for this dataset (0 by default)
- Dataset.fletcher32
Fletcher32 filter is present (T/F)
- Dataset.id
Low-level identifier appropriate for this object
- Dataset.is_scale
Return
Trueif this dataset is also a dimension scale.Return
Falseotherwise.
- Dataset.is_virtual
Check if this is a virtual dataset
- Dataset.maxshape
Shape up to which this dataset can be resized. Axes with value None have no resize limit.
- Dataset.name
Return the full name of this object. None if anonymous.
- Dataset.nbytes
Numpy-style attribute giving the raw dataset size as the number of bytes
- Dataset.ndim
Numpy-style attribute giving the number of dimensions
- Dataset.parent
Return the parent group of this object.
This is always equivalent to obj.file[posixpath.dirname(obj.name)]. ValueError if this object is anonymous.
- Dataset.ref
An (opaque) HDF5 reference to this object
- Dataset.regionref
Create a region reference (Datasets only).
The syntax is regionref[<slices>]. For example, dset.regionref[…] creates a region reference in which the whole dataset is selected.
Can also be used to determine the shape of the referenced dataset (via .shape property), or the shape of the selection (via the .selection property).
- Dataset.scaleoffset
Scale/offset filter settings. For integer data types, this is the number of bits stored, or 0 for auto-detected. For floating point data types, this is the number of decimal places retained. If the scale/offset filter is not in use, this is None.
- Dataset.shape
Numpy-style shape tuple giving dataset dimensions
- Dataset.shuffle
Shuffle filter present (T/F)
- Dataset.size
Numpy-style attribute giving the total dataset size
Methods
- Dataset.__init__(bind, *, readonly=False)[source]
Create a new Dataset object by binding to a low-level DatasetID.
- Dataset.asstr(encoding=None, errors='strict')[source]
Get a wrapper to read string data as Python strings:
The parameters have the same meaning as in
bytes.decode(). Ifencodingis unspecified, it will use the encoding in the HDF5 datatype (either ascii or utf-8).
- Dataset.astype(dtype)[source]
Get a wrapper allowing you to perform reads to a different destination type, e.g.: :rtype:
AsDtypeWrapper[generic]>>> double_precision = dataset.astype('f8')[0:100:2]
- Dataset.fields(names, *, _prior_dtype=None)[source]
Get a wrapper to read a subset of fields from a compound data type:
>>> 2d_coords = dataset.fields(['x', 'y'])[:]If names is a string, a single field is extracted, and the resulting arrays will have that dtype. Otherwise, it should be an iterable, and the read data will have a compound dtype.
- Dataset.flush()[source]
Flush the dataset data and metadata to the file. If the dataset is chunked, raw data chunks are written to the file.
This is part of the SWMR features and only exist when the HDF5 library version >=1.9.178
- Dataset.iter_chunks(sel=None)[source]
Return chunk iterator. If set, the sel argument is a slice or tuple of slices that defines the region to be used. If not set, the entire dataspace will be used for the iterator.
For each chunk within the given region, the iterator yields a tuple of slices that gives the intersection of the given chunk with the selection area.
A TypeError will be raised if the dataset is not chunked.
A ValueError will be raised if the selection region is invalid.
- Dataset.len()[source]
The size of the first axis. TypeError if scalar.
Use of this method is preferred to len(dset), as Python’s built-in len() cannot handle values greater then 2**32 on 32-bit systems.
- Dataset.make_scale(name='')[source]
Make this dataset an HDF5 dimension scale.
You can then attach it to dimensions of other datasets like this:
other_ds.dims[0].attach_scale(ds)You can optionally pass a name to associate with this scale.
- Dataset.read_direct(dest, source_sel=None, dest_sel=None)[source]
Read data directly from HDF5 into an existing NumPy array.
The destination array must be C-contiguous and writable. Selections must be the output of numpy.s_[<args>].
Broadcasting is supported for simple indexing.
- Dataset.refresh()[source]
Refresh the dataset metadata by reloading from the file.
This is part of the SWMR features and only exist when the HDF5 library version >=1.9.178
- Dataset.resize(size, axis=None)[source]
Resize the dataset, or the specified axis.
The dataset must be stored in chunked format; it can be resized up to the “maximum shape” (keyword maxshape) specified at creation time. The rank of the dataset cannot be changed.
“Size” should be a shape tuple, or if an axis is specified, an integer.
BEWARE: This functions differently than the NumPy resize() method! The data is not “reshuffled” to fit in the new shape; each axis is grown or shrunk independently. The coordinates of existing data are fixed.