simlearner3d.processing

Processing includes images normalization, standardization as well as augmentation routines adapted for stereo matching

simlearner3d.processing.datamodule.hdf5

class simlearner3d.processing.datamodule.hdf5.HDF5StereoDataModule(data_dir: str, split_csv_path: str, hdf5_file_path: str, tile_width: numbers.Number = 1024, tile_height: numbers.Number = 1024, patch_size: numbers.Number = 768, sign_disp_multiplier: numbers.Number = 1, masq_divider: numbers.Number = 1, subtile_overlap_train: numbers.Number = 0, subtile_overlap_predict: numbers.Number = 0, batch_size: int = 12, num_workers: int = 1, prefetch_factor: int = 2, transforms: Optional[Dict[str, List[Callable]]] = None, **kwargs)[source]

Datamodule to feed train and validation data to the model.

property dataset: simlearner3d.processing.dataset.hdf5.HDF5Dataset

Abstraction to ease HDF5 dataset instantiation.

Parameters

image_paths_by_split_dict (IMAGE_PATHS_BY_SPLIT_DICT_TYPE, optional) – Maps split (val/train/test) to file path. If specified, the hdf5 file is created at dataset initialization time. Otherwise,a precomputed HDF5 file is used directly without I/O to the HDF5 file. This is usefule for multi-GPU training, where data creation is performed in prepare_data method, and the dataset is then loaded again in each GPU in setup method. Defaults to None.

Returns

the dataset with train, val, and test data.

Return type

HDF5Dataset

prepare_data(stage: Optional[str] = None)[source]

Prepare dataset containing train, val, test data.

setup(stage: Optional[str] = None) None[source]

Instantiate the (already prepared) dataset (called on all GPUs).

test_dataloader()[source]

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

train_dataloader()[source]

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set reload_dataloaders_every_n_epochs to a positive integer.

For data processing use the following pattern:

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

val_dataloader()[source]

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set reload_dataloaders_every_n_epochs to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

simlearner3d.processing.dataset.hdf5

class simlearner3d.processing.dataset.hdf5.HDF5Dataset(hdf5_file_path: str, image_paths_by_split_dict: typing.Dict[typing.Union[typing.Literal['train'], typing.Literal['val'], typing.Literal['test']], typing.List[typing.Tuple[str]]], images_pre_transform: typing.Callable = <function read_images_and_create_full_data_obj>, tile_width: numbers.Number = 1024, tile_height: numbers.Number = 1024, patch_size: numbers.Number = 768, subtile_width: numbers.Number = 50, sign_disp_multiplier: numbers.Number = 1, masq_divider: numbers.Number = 1, subtile_overlap_train: numbers.Number = 0, train_transform: typing.Optional[typing.List[typing.Callable]] = None, eval_transform: typing.Optional[typing.List[typing.Callable]] = None)[source]

Single-file HDF5 dataset for collections of large LAS tiles.

property samples_hdf5_paths

Index all samples in the dataset, if not already done before.

simlearner3d.processing.dataset.hdf5.create_hdf5(image_paths_by_split_dict: dict, hdf5_file_path: str, tile_width: numbers.Number = 1024, tile_height: numbers.Number = 1024, patch_size: numbers.Number = 768, subtile_width: numbers.Number = 50, subtile_overlap_train: numbers.Number = 0, images_pre_transform: typing.Callable = <function read_images_and_create_full_data_obj>)[source]

Create a HDF5 dataset file from left, right , disparities and masqs.

Args: image_paths_by_split_dict ([IMAGE_PATHS_BY_SPLIT_DICT_TYPE]): should look like

image_paths_by_split_dict = {‘train’: [(‘dir/left.tif’,’dir/right.tif’,’dir/disp1.tif’,’dir/msq1.tif’),…..], ‘test’: […]},

hdf5_file_path (str): path to HDF5 dataset, tile_width: (Number, optional): width of an image tile. 1024 by default, tile_height: (Number, optional): height of an image tile. 1024 by default, patch_size: (Number, optional): considered subtile size for training subtile_width: (Number, optional): effective width of a subtile (i.e. receptive field). 50 by default, pre_filter: Function to filter out specific subtiles. “pre_filter_below_n_points” by default, subtile_overlap_train (Number, optional): Overlap for data augmentation of train set. 0 by default, images_pre_transform (Callable): Function to load images and GT and create one Data Object.

simlearner3d.processing.dataset.toy_dataset

Generation of a toy dataset for testing purposes.

class simlearner3d.processing.dataset.toy_dataset.TASK_NAMES(value)[source]

An enumeration.

simlearner3d.processing.dataset.toy_dataset.make_toy_dataset_from_test_file()[source]

Prepare a toy dataset from a single, small LAS file.

The file is first duplicated to get 2 LAS in each split (train/val/test), and then each file is splitted into .data files, resulting in a training-ready dataset loacted in td_prepared

Parameters
  • src_las_path (str) – input, small LAS file to generate toy dataset from

  • split_csv (str) – Path to csv with a basename (e.g. ‘123_456.las’) and

  • `split` (a) –

  • prepared_data_dir (str) – where to copy files (raw subfolder) and to prepare

  • files (dataset) –

Returns

path to directory containing prepared dataset.

Return type

str

simlearner3d.processing.dataset.utils

simlearner3d.processing.dataset.utils.find_file_in_dir(data_dir: str, basename: str) str[source]

Query files matching a basename in input_data_dir and its subdirectories. :param _sphinx_paramlinks_simlearner3d.processing.dataset.utils.find_file_in_dir.input_data_dir: data directory :type _sphinx_paramlinks_simlearner3d.processing.dataset.utils.find_file_in_dir.input_data_dir: str

Returns

first file path matching the query.

Return type

[str]

simlearner3d.processing.transforms.compose

class simlearner3d.processing.transforms.compose.CustomCompose(transforms: List[Callable])[source]

Composes several transforms together. Edited to bypass downstream transforms if None is returned by a transform. :param _sphinx_paramlinks_simlearner3d.processing.transforms.compose.CustomCompose.transforms: List of transforms to compose. :type _sphinx_paramlinks_simlearner3d.processing.transforms.compose.CustomCompose.transforms: List[Callable]

simlearner3d.processing.transforms.transforms

class simlearner3d.processing.transforms.transforms.StandardizeIntensity[source]

Standardize gray scale image values.

standardize_channel(channel_data: torch.Tensor, clamp_sigma: int = 3)[source]

Sample-wise standardization y* = (y-y_mean)/y_std. clamping to ignore large values.

class simlearner3d.processing.transforms.transforms.StandardizeIntensityCenterOnZero[source]

Standardize gray scale image values.

class simlearner3d.processing.transforms.transforms.ToTensor(input: numpy.ndarray)[source]

Turn np.arrays into Tensor.

simlearner3d.processing.transforms.augmentations