Using the solaris CLI to make training masks

Geospatial data labels are rarely in the form of pixel masks; however, such masks are essential for training neural networks for segmentation tasks. We’ve provided functions and a CLI to standardize training mask creation so that users can convert their geospatial-format vector labels into ML-compatible training masks. If you’d prefer the Python API implementation, there’s a tutorial here.

There are two ways to create these masks with the CLI: one file at a time, or in batch based on a reference file. Each is described below. We’ll start with describing the simple single-file mask creation case, then describe how to complete batch processing. If you have any questions about what the different types of masks look like, check out the Python API tutorial for mask creation.

Single mask creation with the CLI

Once you have installed solaris, you will have access to the make_masks command in your command line prompt. This command has a number of possible arguments to control mask creation, described below. If you need a refresher on these within your command line, you can always run make_masks -h for usage instructions.

make_masks arguments

  • –source_file, -s: [str] The full path to a vector file file to create a mask from.

  • –reference_image, -r, [str] The full path to a georegistered image in the same coordinate system (for conversion to pixels) or in the target coordinate system (for conversion to a geographic coordinate reference system)

  • –output_path, -o: [str] The full path to the output file for the generated mask image.

  • –geometry_column, -g: [str] (default: 'geometry') The column containing footprint polygons to transform.

  • –transform, -t: Use this flag if the geometries are in a georeferenced coordinate system and need to be converted to pixel coordinates.

  • –value, -v: [int] (default: 255) The value to set for labeled pixels in the mask.

  • –footprint, -f: If this flag is set, the mask will include filled-in building footprints as a channel.

  • –edge, -e: If this flag is set, the mask will include the building edges as a channel.

  • –edge_width, -ew: [int] (default: 3) The pixel thickness of the edges in the edge mask. Only has an effect if –edge or -e is used.

  • –edge_type, -et: [str] (default: inner) Type of edge: either 'inner' or 'outer'. Only has an effect if –edge or -e is used.

  • –contact, -c: If this flag is set, the mask will include contact points between buildings as a channel.

  • –contact_spacing, -cs: [int] (default: 10) Sets the maximum distance between two buildings, in pixel units unless –metric_widths is provided, that will be identified as a contact. Only has an effect if –contact or -c is used.

  • –metric_widths, -m: Use this flag if widths should be in metric units instead of pixel units.

  • –batch, -b: Use this flag if you wish to operate on multiple files in batch. In this case, –argument_csv must be provided. See the batch processing section below for more details.

  • –argument_csv, -a: [str] The reference file for variable values for batch processing. It must contain columns to pass the source_file and reference_image arguments, and can additionally contain columns providing other arguments if you wish to define them differently for items in the batch. Only has an effect if the –batch or -b arguments are used. These columns must have the same names as the corresponding arguments. See the next section for more details on batch processing.

  • –workers, -w: [int] (default: 1) The number of parallel processing workers to use for batch processing. This should not exceed the number of CPU cores available. See the next section for more details on batch processing.

make_masks CLI usage examples

Assume you have fies for a GeoTIFF, image.tif, and georegistered building footprint labels, building_labels.geojson:

Creating building footprint labels:

$ make_masks --source_file building_labels.geojson --reference_image image.tif --footprint --transform

Let’s change the burn value to 1 for the footprints instead of 255:

$ make_masks --source_file building_labels.geojson --reference_image image.tif --footprint --transform --value 1

What if your building labels are already in pixel coordinates in a CSV named building_labels.csv, and the geometries are in a column named WKT_Pix?

$ make_masks --source_file building_labels.csv --reference_image image.tif --footprint --geometry_column WKT_Pix

What if you have the same CSV file as above, but instead of making just building footprints, you want outer borders of width 10 and also contact points for anything within 10 meters?

$ make_masks --source_file building_labels.csv --reference_image image.tif --geometry_column WKT_Pix --footprint --edge --edge-type outer --edge-width 10 --contact --contact_spacing 10 --metric_widths

Batch mask creation using the solaris CLI

There’s one additional requirement for batch mask creation: a CSV specifying the location of the label files, the reference images, and optionally any other arguments that you wish to modify on a mask-by-mask basis.

Creating the argument CSV

The reference CSV has three required columns, which must be named exactly as below:

  • source_file: the paths to vector-formatted label files that you wish to transform to masks.

  • reference_image: The paths to images that correspond to the same geographies as the vector labels that you’re using.

  • output_path: The paths to save the output masks to. The values in these two columns must be matching geographies across the row, or you’ll get empty masks! For both cases, we recommend using the absolute path to the files in each column rather than a relative path for consistency and clarity.

If you wish to use different values for the other arguments to make_masks (e.g., if you wish to have different burn values for different masks), you can provide those values in the CSV as well. Just create a column with the same name as the argument that you’re replacing, and make sure to provide a value for every row.

make_masks CLI batch processing examples

Assume you have a CSV mask_reference.csv that specifies the path to your .geojson labels, matching reference images, and where you want those files saved, as described in the last section.

Let’s create footprint masks:

$ make_masks --batch --argument_csv mask_reference.csv --footprint

What if you have a lot of masks to make and you want to parallelize over four CPUs? (Make sure you have access to four CPU cores first!)

$ make_masks --batch --argument_csv mask_reference.csv --footprint --workers 4