Preparing data for training
Peprocessing functions
The loading function is lread_images_and_create_full_data_obj
by default. It takes image_paths
which a quadruplet of left
, right
, disparity
and occlusion_mask
image names and generates a Data
object simlearner3d.utils.utils.Data
which is simply a set of torch.Tensor
representations of the image set.
Creating Train/Val/Test split dataset and split.csv
To create the famous folder of sub sets arranged as Train, Val and Test, we use task.task_name="prepare_dataset"
which takes files:
Item left
full path
image namesItem right
full path
images namesItem disparity
full path
images namesItem occlusion masks
full path
images names
Here is an example of a command that create Stereo
folder with samples distributed as Train, Val or Test
DATASET_DIR="/path/to/images/disparities/maks/sets/of/files"
python simlearner3d/processing/dataset/toy_dataset.py +task.task_name="prepare_dataset" \
prepare_dataset.left="${DATASET_DIR}eurosdr_vahingen_left_train.txt" \
prepare_dataset.right="${DATASET_DIR}eurosdr_vahingen_right_train.txt" \
prepare_dataset.disp="${DATASET_DIR}eurosdr_vahingen_disp_train.txt" \
prepare_dataset.masq="${DATASET_DIR}eurosdr_vahingen_masq_train.txt" \
prepare_dataset.out_dir="${DATASET_DIR}Stereo"
The resulting Stereo
folder contains the following sub folders:
Item train/
Item val/
Item test/
It additionally contains a split.csv
file that tells which quadruplet of images (left, right, disparity, mask) belongs to which subset (train,val,test)
Here is an example of split.csv content
basename_l,basename_r,disparity,masq,split
DMTrain_SDR-05_20_0003_Im1.tif,DMTrain_SDR-05_20_0003_Im2.tif,DensifyPx_DMTrain_SDR-05_20_0003_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-05_20_0003_Im1.tif,train
DMTrain_SDR-20_30_0002_Im1.tif,DMTrain_SDR-20_30_0002_Im2.tif,DensifyPx_DMTrain_SDR-20_30_0002_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-20_30_0002_Im1.tif,train
DMTrain_SDR-06_20_0001_Im1.tif,DMTrain_SDR-06_20_0001_Im2.tif,DensifyPx_DMTrain_SDR-06_20_0001_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-06_20_0001_Im1.tif,train
DMTrain_SDR-05_20_0006_Im1.tif,DMTrain_SDR-05_20_0006_Im2.tif,DensifyPx_DMTrain_SDR-05_20_0006_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-05_20_0006_Im1.tif,train
DMTrain_SDR-05_21_0014_Im1.tif,DMTrain_SDR-05_21_0014_Im2.tif,DensifyPx_DMTrain_SDR-05_21_0014_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-05_21_0014_Im1.tif,train
DMTrain_SDR-05_19_0017_Im1.tif,DMTrain_SDR-05_19_0017_Im2.tif,DensifyPx_DMTrain_SDR-05_19_0017_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-05_19_0017_Im1.tif,train
DMTrain_SDR-04_29_0005_Im1.tif,DMTrain_SDR-04_29_0005_Im2.tif,DensifyPx_DMTrain_SDR-04_29_0005_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-04_29_0005_Im1.tif,train
DMTrain_SDR-05_21_0004_Im1.tif,DMTrain_SDR-05_21_0004_Im2.tif,DensifyPx_DMTrain_SDR-05_21_0004_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-05_21_0004_Im1.tif,train
DMTrain_SDR-07_18_0011_Im1.tif,DMTrain_SDR-07_18_0011_Im2.tif,DensifyPx_DMTrain_SDR-07_18_0011_Im1.tif,Nocc_refine_DensifyPx_DMTrain_SDR-07_18_0011_Im1.tif,train
If you have an additional dataset with left, right, disparity and masks files, you can apprend this set to the previous one by re-running the same command with the new files and keeping prepare_dataset.out_dir
the same so that they could be split and appended to the previous subset.
Create a HDF5 Dataset
After generating the sub splits of a dataset (train/val/test) and their subsequent split.scv file, we can generate an overall hdf5 file that encapsulates all the dataset into one file thus reducing training times. To do that, you can run the following command line which takes the following arguments:
Item
datamodule.data_dir
: The directory where train,val,test folders existItem
datamodule.split_csv_path
: the split.csv split file full pathItem
datamodule.hdf5_file_path
: the to-generate hdf5 file full path
Here is an example:
DATASET_DIR="/path/to/images/disparities/maks/sets/of/files"
python run.py task.task_name=create_hdf5 \
datamodule.data_dir="${{DATASET_DIR}}Stereo" \
datamodule.split_csv_path="${{DATASET_DIR}}Stereo/split.csv" \
datamodule.hdf5_file_path="${DATASET_DIR}Stereo/eurosdr.hdf5"
Getting started quickly with a toy dataset
A quadruplet of left,right,disparity,occlusion_mask images is provided as part of the test suite. It can be turned into a small, training-ready dataset to get started with the package.
To create a toy dataset run :
python simlearner3d/processing/dataset/toy_dataset.py +task.task_name="make_hdf5"
You will see a new file: /test/data/toy_dataset.hdf5
.