Creating reference CSVs for model training and inference¶
When you train models with
solaris, it uses reference CSV files to find images and matching labels. Let’s go through what those are and what they should include. You’ll create (up to) three different reference files:
Training data: Required for Training
Epoch-wise validation data: Optional
Inference data: Required for inference
Training Data CSV¶
Your training data CSV must have two columns with the exact names below:
imagecolumn defines the paths to each image file to be used during training, one path per row. You can use either the absolute path to the file or the path relative to the path that you run code in - we recommend using the absolute path for consistency.
The image and label in each row must match! This is how
solaris matches your training images to the expected outputs.
If you choose to have
solaris split validation data out for you, it will randomly select a fraction of the rows for validation. The fraction used for validation is defined in the config YAML file - for more on how to do so, see the YAML config reference.
For more control over what data is used for training vs. validation, you can create a separate validation CSV.
Validation Data CSV¶
This CSV is the same as the Training Data CSV, but specifies images and masks to be used for epoch-wise validation. Make sure there’s no overlap between your training and validation sets - you don’t want any data leaks! If you want
solaris to split the validation data out of the training data automatically, you don’t need to provide this.
Inference Data CSV¶
This reference file points to the image files that you wish to make predictions on. It therefore only needs to contain one column: image.