PrognosAIs.IO package¶

class PrognosAIs.IO.DataGenerator.Augmentor(example_sample: tensorflow.python.framework.ops.Tensor, brightness_probability: float = 0, brightness_delta: float = 0, contrast_probability: float = 0, contrast_min_factor: float = 1, contrast_max_factor: float = 1, flip_probability: float = 0, to_flip_axis: Union[int, list] = 0, crop_probability: float = 0, crop_size: list = None, rotate_probability: float = 0, max_rotate_angle: float = 0, to_rotate_axis: Union[int, list] = 0)[source]¶

Bases: object

__init__(example_sample: tensorflow.python.framework.ops.Tensor, brightness_probability: float = 0, brightness_delta: float = 0, contrast_probability: float = 0, contrast_min_factor: float = 1, contrast_max_factor: float = 1, flip_probability: float = 0, to_flip_axis: Union[int, list] = 0, crop_probability: float = 0, crop_size: list = None, rotate_probability: float = 0, max_rotate_angle: float = 0, to_rotate_axis: Union[int, list] = 0) → None[source]¶

Augmentor to randomly augment the features of a sample.

Parameters

example_sample (tf.Tensor) – Example sample from which settings for augmentation will be derived
brightness_probability (float, optional) – Probability of augmenting brightness. Defaults to 0.
brightness_delta (float, optional) – Brightness will be adjusted with value from -delta to delta. Defaults to 0.
contrast_probability (float, optional) – Probability of augmenting contrast. Defaults to 0.
contrast_min_factor (float, optional) – Minimum contrast adjustment factor. Defaults to 1.
contrast_max_factor (float, optional) – Maximum contrast adjustment factor. Defaults to 1.
flip_probability (float, optional) – Probability of a random flip. Defaults to 0.
to_flip_axis (Union[int, list], optional) – Axis to flip the feature over. Defaults to 0.
crop_probability (float, optional) – Probability of cropping the feature. Defaults to 0.
crop_size (list, optional) – Size to crop the feature to. Defaults to None.

apply_augmentation(augmentation_probability: float, seed: tensorflow.python.framework.ops.Tensor = None) → bool[source]¶

Whether the the augmentation step should be applied based on the probability.

Parameters

augmentation_probability (float) – The probability with which the step should be applied
seed (tf.Tensor) – Seed to make operation repeatable. Defaults to None.

Returns

bool – Whether the step should be applied

augment_sample(sample: tensorflow.python.framework.ops.Tensor, seed=None, is_mask=False) → tensorflow.python.framework.ops.Tensor[source]¶

Apply random augmentations to the sample based on the config.

Parameters: sample (tf.Tensor) – sample to be augmented
Returns: tf.Tensor – augmented sample

get_seed() → tensorflow.python.framework.ops.Tensor[source]¶

Get a random seed that can be used to make other operation repeatable.

Returns: tf.Tensor – The seed

pad_to_original_size(sample: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor[source]¶

Pad back a (potentially) augmented sample to its original size.

Parameters: sample (tf.Tensor) – The sample to pad
Returns: tf.Tensor – The padded sample with the same size as before any augmentation steps

random_brightness(sample: tensorflow.python.framework.ops.Tensor, seed: tensorflow.python.framework.ops.Tensor = None) → tensorflow.python.framework.ops.Tensor[source]¶

Randomly adjusts the brightness of a sample.

Brightness is adjusted by a constact factor over the whole image, drawn from a distribution between -delta and delta as set during the initialization of the augmentator.

Parameters

sample (tf.Tensor) – Sample for which to adjust brightness.
seed (tf.Tensor) – Seed to make operation repeatable. Defaults to None.

Returns

tf.Tensor – The augmented sample.

random_contrast(sample: tensorflow.python.framework.ops.Tensor, seed: tensorflow.python.framework.ops.Tensor = None) → tensorflow.python.framework.ops.Tensor[source]¶

Randomly adjust the contrast of a sample.

The contrast is adjusted by keeping the mean of the sample the same as for the original sample, and squeezing or expending the distribution of the intensities around the mean. The amount of squeezing or expanding is randomly drawn from the minimum and maximum contrast set during initialization.

Parameters

sample (tf.Tensor) – Sample for which to adjust contrast
seed (tf.Tensor) – Seed to make operation repeatable. Defaults to None.

Returns

tf.Tensor – The augmented sample

random_cropping(sample: tensorflow.python.framework.ops.Tensor, seed: tensorflow.python.framework.ops.Tensor = None) → tensorflow.python.framework.ops.Tensor[source]¶

Randomly crop a part of the sample.

The crop will have the size of the crop size defined upon initialization of the augmentator. The crop will happen for all channels in the same way, but will not crop out channels. The location of the crop will be randomly drawn from throughout the whole image.

Parameters

sample (tf.Tensor) – The sample to be cropped
seed (tf.Tensor) – Seed to make operation repeatable. Defaults to None.

Returns

tf.Tensor – The augmented sample

random_flipping(sample: tensorflow.python.framework.ops.Tensor, seed: tensorflow.python.framework.ops.Tensor = None) → tensorflow.python.framework.ops.Tensor[source]¶

Randomly flip the sample along one or multiple axis.

Parameters

sample (tf.Tensor) – Sample for which to apply flipping
seed (tf.Tensor) – Seed to make operation repeatable. Defaults to None.

Returns

tf.Tensor – The augmented sample

random_rotate(feature: tensorflow.python.framework.ops.Tensor, seed: tensorflow.python.framework.ops.Tensor = None, interpolation_order: int = 3) → tensorflow.python.framework.ops.Tensor[source]¶

class PrognosAIs.IO.DataGenerator.HDF5Generator(root_folder: str, batch_size: int = 16, shuffle: bool = False, max_steps: int = - 1, drop_batch_remainder: bool = True, labels_only: bool = False)[source]¶

Bases: object

__init__(root_folder: str, batch_size: int = 16, shuffle: bool = False, max_steps: int = - 1, drop_batch_remainder: bool = True, labels_only: bool = False) → None[source]¶

Generate data from HDF5 files to be used in a TensorFlow pipeline.

This generator loads sample data from HDF5 files, and does this efficiently making us of TensorFlow dataset functions. The inputs and outputs are dict, which allows for easy us in a multi-input and/or multi-output model

Parameters

root_folder (str) – Folder in which the HDF5 files are stored
batch_size (int, optional) – Batch size of the generator. Defaults to 16.
shuffle (bool, optional) – Whether datset should be shuffled. Defaults to False.
data_augmentation (bool, optional) – Whether data augmentation should be applied. Defaults to False.
augmentation_factor (int, optional) – Number of times dataset should be repeated for augmentation. Defaults to 5.
augmentation_settings (dict, optional) – Setting for the data augmenation. Defaults to None.
max_steps (int, optional) – Maximum number of (iteration) steps to provide. Defaults to -1, in which case all samples are provied.
drop_batch_remainder (bool, optional) – Whether to drop the remainder of the batch if it does not fit perfectly. Defaults to True.
labels_only (bool, optional) – Whether to only provide labels. Defaults to False.
feature_index (str, optional) – Name of the feature group in the HDF5 file. Defaults to “sample”.
label_index (str, optional) – Name of the label group in the HDF5 file. Defaults to “label”.

_get_all_dataset_attributes(h5py_object: Union[h5py._hl.files.File, h5py._hl.dataset.Dataset, h5py._hl.group.Group]) → dict[source]¶

Run through al groups and dataset to get the attributes.

Parameters: h5py_object (Union[h5py.File, h5py.Dataset, h5py.Group]) – Object for which to return the attributes
Returns: dict – Mapping between feature/label name and its attributes

_get_dataset_names(h5py_object: Union[h5py._hl.files.File, h5py._hl.dataset.Dataset, h5py._hl.group.Group]) → list[source]¶

Run through all groups and dataset to get the names.

Parameters: h5py_object (Union[h5py.File, h5py.Dataset, h5py.Group]) – Object for which to return the dataset names
Returns: list – Dataset names in object

apply_augmentation(features: dict, labels: dict) → Tuple[dict, dict][source]¶

feature_loader(sample_location: tensorflow.python.framework.ops.Tensor) → dict[source]¶

Load the features from a hdf5 sample file.

This loader only loads the labels, instead of the features and labels as done by features_and_labels_loader

Parameters: sample_location (tf.Tensor) – Location of the sample file
Returns: dict – Features loaded from the sample file

features_and_labels_loader(sample_location: tensorflow.python.framework.ops.Tensor) → Tuple[dict, dict, tensorflow.python.framework.ops.Tensor][source]¶

Load the features and labels from a hdf5 file to be used in a TensorFlow dataset pipeline.

This loader loads the features and labels from a hdf5 file using TensorFlowIO. The outputs are therefor directly cast to tensor and can be used in a TensorFlow graph. All features and labels from the file are loaded, and a dict is returned mapping the name of each feature and label to its respective value

Parameters

sample_location (tf.Tensor) – Location of the sample file

Returns

Tuple[dict, dict] –

The features (first output) and labels (second output) loaded: from the sample.

fits_in_memory(used_memory: int = 0)[source]¶

get_all_dataset_attributes(sample_file: str = None) → dict[source]¶

Get the attributes of the features and labels stored in the file.

Returns: dict – Mapping of the feature/label name to its attributes

get_dataset_attribute(dataset_name: str, attribute_name: str) → Any[source]¶

Get the attribute of a specific dataset

Parameters

dataset_name (str) – Name of dataset for which to get the attribute
attribute_name (str) – Name of attribute to get

Returns

Any – The value of the attribute

get_dataset_names() → list[source]¶

Get the names of all datasets in the sample.

Returns: list – Dataset names in the sample

get_feature_attribute(attribute_name: str) → dict[source]¶

Get a specific attribute for all features.

Parameters: attribute_name (str) – Name of attribute to get
Returns: dict – Mapping between feature names and the attribute value

get_feature_dimensionality() → dict[source]¶

Get the dimensionality of each feature.

Returns: dict – Dimensionality of each feature

get_feature_metadata() → dict[source]¶

Get all metadata of all features.

Returns: dict – The metadata of all features

get_feature_metadata_from_sample(sample_location: str) → dict[source]¶

Get the feature metadata of a specific sample.

Parameters: sample_location (str) – The file location of the sample
Returns: dict – The feature metadata of the sample

get_feature_shape() → dict[source]¶

Get the shape of each feature.

Returns: dict – Shape of each feature

get_feature_size() → dict[source]¶

Get the size of each feature.

The size only of the feature does not take into account the number of channels and only represents the size of an individual channel of the feature.

Returns: dict – Size of each feature

get_label_attribute(attribute_name: str) → dict[source]¶

Get a specific attribute for all labels.

Parameters: attribute_name (str) – Name of attribute to get
Returns: dict – Mapping between label names and the attribute value

get_labels_are_one_hot() → dict[source]¶

Get whether labels are one-hot encoded.

Returns: dict – One-hot encoding status of each label

get_number_of_channels() → dict[source]¶

Get the number of feature channels.

Returns: dict – Number of channels for each feature

get_number_of_classes() → dict[source]¶

Get the number of output classes.

Returns: dict – Number of output classes for each label

get_numpy_iterator() → numpy.nditer[source]¶

Construct a numpy iterator instead of TensorFlow dataset.

The numpy iterator will provide exactly the same data as the TensorFlow dataset. However, it might be easier to inspect the data when using a numpy iterator instead of a TensorFlow dataset

Returns: np.nditer – The dataset

get_spec() → dict[source]¶

Get the TensorSpec for all input features.

Returns: dict – Maps the name of each input feature to the TensorSpec of the input.

get_tf_dataset(num_parallel_calls: int = - 1) → tensorflow.python.data.ops.dataset_ops.DatasetV2[source]¶

Construct a TensorFlow dataset.

The dataset is constructed based on the settings supplied to the DataGenerator. The dataset can then directly be used to train or evaluate a TensorFlow model

Parameters: num_parallel_calls (int) – Number of parallel process to use. Defaults to tf.data.experimental.AUTOTUNE.
Returns: tf.data.Dataset – The constructed dataset

label_loader(sample_location: tensorflow.python.framework.ops.Tensor) → dict[source]¶

Load the labels from a hdf5 sample file.

This loader only loads the labels, instead of the features and labels as done by features_and_labels_loader

Parameters: sample_location (tf.Tensor) – Location of the sample file
Returns: dict – Labels loaded from the sample file

load_features(loaded_hdf5: tensorflow_io.core.python.ops.io_tensor.IOTensor) → dict[source]¶

Load the features from a HDF5 tensor.

Parameters: loaded_hdf5 (tfio.IOTensor) – Tensor from which to load features
Returns: dict – Mapping between feature names and features

load_labels(loaded_hdf5: tensorflow_io.core.python.ops.io_tensor.IOTensor) → dict[source]¶

Load the labels from a HDF5 tensor.

Parameters: loaded_hdf5 (tfio.IOTensor) – Tensor from which to load labels
Returns: dict – Mapping between label names and labels

setup_augmentation(augmentation_factor: int = 1, augmentation_settings: dict = {}) → None[source]¶

Set up data augmentation in the generator.

Parameters

augmentation_factor (int) – Repeat dataset this many times in augmentation. Defaults to 1.
augmentation_settings (dict) – Setting to parse to augmentation instance. Defaults to {}.

setup_caching(cache_in_memory: Union[bool, str] = 'AUTO', used_memory: int = 0) → None[source]¶

Set up caching of the dataset in RAM.

Parameters

cache_in_memory (Union[bool, str]) – Whether dataset should be cached in memory. Defaults to PrognosAIs.Constants.AUTO, in which case the dataset will be cached in memory if it fits, otherwise it will not be cached
used_memory (int) – Amount of RAM (in bytes) that is already being used. Defaults to 0.

Raises

ValueError – If an unknown cache setting is requested

setup_caching_shuffling_steps(dataset: tensorflow.python.data.ops.dataset_ops.DatasetV2) → tensorflow.python.data.ops.dataset_ops.DatasetV2[source]¶

Set-up caching, shuffling and the iteration step in the dataset pipeline.

This function helps to ensure that caching, shuffling and step limiting is done properly and efficiently, no matter where in the dataset pipeline it is included.

Parameters: dataset (tf.data.Dataset) – Datset for which to include the steps
Returns: tf.data.Dataset – Datset with caching, shuffling and iteration steps included

setup_sharding(n_workers: int, worker_index: int) → None[source]¶

Shard the dataset according to the number of workers and worker index

Parameters

n_workers (int) – number of workers
worker_index (int) – worker index

PrognosAIs.IO.LabelParser module¶

class PrognosAIs.IO.LabelParser.LabelLoader(label_file: str, filter_missing: bool = False, missing_value: int = - 1, make_one_hot: bool = False, new_root_path: str = None)[source]¶

Bases: object

__init__(label_file: str, filter_missing: bool = False, missing_value: int = - 1, make_one_hot: bool = False, new_root_path: str = None) → None[source]¶

Create a label loader, that can load the image paths and labels from a text file to be used for a data generator

Parameters

label_file – The label file from which to read the labels
filter_missing – Whether missing values should be masked when generating one hot labels and class weights
missing_value – If filter_missing is True, this value is used to mask
make_one_hot – Whether labels should be transformed to one hot labels
new_root_path – If you want to move the files, this will be the new root path

encode_labels_one_hot() → None[source]¶

Encode sample labels as one hot

Parameters: None
Returns: None

get_class_weights(json_serializable=False) → dict[source]¶

Get class weights for unbalanced labels

Parameters

None

Returns

Scaled_weights –

the weights for each class of each label category, scaled: such that the total weights*number of samples of each class approximates the total number of samples

get_data() → dict[source]¶

Get all data from the label file

Parameters: None
Returns: data – Dictionary mapping each sample to each label

get_label_categories() → list[source]¶

Get categories of labels

Parameters: None
Returns: label_categories – Category names

get_label_category_type(category_name: str) → type[source]¶

Get the type of a label of a specific category/class

Parameters: category_name – Name of the category/class to get type of
Returns: type – Type of the labels of the category

get_label_from_sample(sample: str) → dict[source]¶

Get label from a sample

Parameters: sample – The sample from which to get the label
Returns: label – Label of the sample

get_labels() → list[source]¶

Get all labels of all samples

Parameters: None
Returns: labels – List of labels

get_labels_from_category(category_name: str) → list[source]¶

Get labels of a specific category/class

Parameters: category_name – Name of the category/class to get
Returns: list – Labels of the category

get_number_of_classes() → dict[source]¶

Get number of classes for all categories

Parameters: None
Returns: number_of_classes – The number of classes for each category

get_number_of_classes_from_category(category_name: str) → int[source]¶

Get number of classes for a label category

Parameters: category_name – Category to get number of classes for
Returns: number_of_classes – The number of classes for the category

get_number_of_samples() → int[source]¶

Get number of samples

Parameters: None
Returns: number_of_samples – The number of samples

get_original_label_category_type(category_name: str) → type[source]¶

Get the original type of a label of a specific category/class

Parameters: category_name – Name of the category/class to get type of
Returns: type – Type of the labels of the category

get_original_labels_from_category(category_name: str) → list[source]¶

Get original labels of a specific category/class

Parameters: category_name – Name of the category/class to get
Returns: list – Original labels of the category

get_samples() → list[source]¶

Get all labels of all samples

Parameters: None
Returns: samples – List of samples

replace_root_path() → None[source]¶

Replace the root path of the sample files in case they have been moved to a different a different directory.

Parameters: new_root_path – Path in which the files are now located
Returns: None

PrognosAIs.IO.utils module¶

PrognosAIs.IO.utils.copy_directory(original_directory, out_directory)[source]¶

PrognosAIs.IO.utils.create_directory(file_path, exist_ok=True)[source]¶

PrognosAIs.IO.utils.delete_directory(file_path)[source]¶

PrognosAIs.IO.utils.find_files_with_extension(file_path, file_extension)[source]¶

PrognosAIs.IO.utils.get_available_ram(used_memory: int = 0) → int[source]¶

Get the available RAM in bytes.

Returns: int – available in RAM in bytes

PrognosAIs.IO.utils.get_cpu_devices() → list[source]¶

PrognosAIs.IO.utils.get_dir_size(root_dir)[source]¶: Returns total size of all files in dir (and subdirs)

PrognosAIs.IO.utils.get_file_name(file_path, file_extension)[source]¶

PrognosAIs.IO.utils.get_file_name_from_full_path(file_path)[source]¶

PrognosAIs.IO.utils.get_file_path(file_path)[source]¶

PrognosAIs.IO.utils.get_gpu_compute_capability(gpu: tensorflow.python.eager.context.PhysicalDevice) → tuple[source]¶

PrognosAIs.IO.utils.get_gpu_devices() → list[source]¶

PrognosAIs.IO.utils.get_number_of_cpus()[source]¶

PrognosAIs.IO.utils.get_number_of_gpu_devices() → int[source]¶

PrognosAIs.IO.utils.get_number_of_slurm_nodes() → int[source]¶

PrognosAIs.IO.utils.get_parent_directory(file_path)[source]¶

PrognosAIs.IO.utils.get_root_name(file_path)[source]¶

PrognosAIs.IO.utils.get_subdirectories(root_dir: str) → list[source]¶

PrognosAIs.IO.utils.gpu_supports_float16(gpu: tensorflow.python.eager.context.PhysicalDevice) → bool[source]¶

PrognosAIs.IO.utils.gpu_supports_mixed_precision(gpu: tensorflow.python.eager.context.PhysicalDevice) → bool[source]¶

PrognosAIs.IO.utils.load_module_from_file(module_path)[source]¶

PrognosAIs.IO.utils.normalize_path(path)[source]¶

PrognosAIs.IO.utils.setup_logger()[source]¶

PrognosAIs.IO package¶

Submodules¶

PrognosAIs.IO.ConfigLoader module¶

PrognosAIs.IO.Configs module¶

PrognosAIs.IO.DataGenerator module¶

PrognosAIs.IO.LabelParser module¶

PrognosAIs.IO.utils module¶

Module contents¶