solaris.data
API reference¶
solaris.data.coco
COCO label format management¶
-
solaris.data.coco.
coco_categories_dict_from_df
(df, category_id_col, category_name_col, supercategory_col=None)[source]¶ Extract category IDs, category names, and supercat names from df.
- Parameters
df (
pandas.DataFrame
) – Apandas.DataFrame
of records to filter for category info.category_id_col (str) – The name for the column in df that contains category IDs.
category_name_col (str) – The name for the column in df that contains category names.
supercategory_col (str, optional) – The name for the column in df that contains supercategory names, if one exists. If not provided, supercategory will be left out of the output.
- Returns
A
list
ofdict
s that contain category records per the COCO dataset specification .- Return type
-
solaris.data.coco.
df_to_coco_annos
(df, output_path=None, geom_col='geometry', image_id_col=None, category_col=None, score_col=None, preset_categories=None, supercategory_col=None, include_other=True, starting_id=1, verbose=0)[source]¶ Extract COCO-formatted annotations from a pandas
DataFrame
.This function assumes that annotations are already in pixel coordinates. If this is not the case, you can transform them using
solaris.vector.polygon.geojson_to_px_gdf()
.Note that this function generates annotations formatted per the COCO object detection specification. For additional information, see the COCO dataset specification.
- Parameters
df (
pandas.DataFrame
) – Apandas.DataFrame
containing geometries to store as annos.image_id_col (str, optional) – The column containing image IDs. If not provided, it’s assumed that all are in the same image, which will be assigned the ID of
1
.geom_col (str, optional) – The name of the column in df that contains geometries. The geometries should either be shapely
shapely.geometry.Polygon
s or WKT strings. Defaults to"geometry"
.category_col (str, optional) – The name of the column that specifies categories for each object. If not provided, all objects will be placed in a single category named
"other"
.score_col (str, optional) – The name of the column that specifies the ouptut confidence of a model. If not provided, will not be output.
preset_categories (
list
of :class:`dict`s, optional) – A pre-set list of categories to use for labels. These categories should be formatted per `the COCO category specification`_.starting_id (int, optional) – The number to start numbering annotation IDs at. Defaults to
1
.verbose (int, optional) – Verbose text output. By default, none is provided; if
True
or1
, information-level outputs are provided; if2
, extremely verbose text is output._the COCO category specification (.) –
- Returns
output_dict – A dictionary containing COCO-formatted annotation and category entries per the COCO dataset specification
- Return type
-
solaris.data.coco.
geojson2coco
(image_src, label_src, output_path=None, image_ext='.tif', matching_re=None, category_attribute=None, score_attribute=None, preset_categories=None, include_other=True, info_dict=None, license_dict=None, recursive=False, override_crs=False, explode_all_multipolygons=False, remove_all_multipolygons=False, verbose=0)[source]¶ Generate COCO-formatted labels from one or multiple geojsons and images.
This function ingests optionally georegistered polygon labels in geojson format alongside image(s) and generates .json files per the COCO dataset specification . Some models, like many Mask R-CNN implementations, require labels to be in this format. The function assumes you’re providing image file(s) and geojson file(s) to create the dataset. If the number of images and geojsons are both > 1 (e.g. with a SpaceNet dataset), you must provide a regex pattern to extract matching substrings to match images to label files.
- Parameters
image_src (
str
orlist
ordict
) –Source image(s) to use in the dataset. This can be:
1. a string path to an image, 2. the path to a directory containing a bunch of images, 3. a list of image paths, 4. a dictionary corresponding to COCO-formatted image records, or 5. a string path to a COCO JSON containing image records.
If a directory, the recursive flag will be used to determine whether or not to descend into sub-directories.
label_src (
str
orlist
) – Source labels to use in the dataset. This can be a string path to a geojson, the path to a directory containing multiple geojsons, or a list of geojson file paths. If a directory, the recursive flag will determine whether or not to descend into sub-directories.output_path (str, optional) – The path to save the JSON-formatted COCO records to. If not provided, the records will only be returned as a dict, and not saved to file.
image_ext (str, optional) – The string to use to identify images when searching directories. Only has an effect if image_src is a directory path. Defaults to
".tif"
.matching_re (str, optional) – A regular expression pattern to match filenames between image_src and label_src if both are directories of multiple files. This has no effect if those arguments do not both correspond to directories or lists of files. Will raise a
ValueError
if multiple files are provided for both image_src and label_src but no matching_re is provided.category_attribute (str, optional) – The name of an attribute in the geojson that specifies which category a given instance corresponds to. If not provided, it’s assumed that only one class of object is present in the dataset, which will be termed
"other"
in the output json.score_attribute (str, optional) – The name of an attribute in the geojson that specifies the prediction confidence of a model
preset_categories (
list
of :class:`dict`s, optional) – A pre-set list of categories to use for labels. These categories should be formatted per `the COCO category specification`_. example: [{‘id’: 1, ‘name’: ‘Fighter Jet’, ‘supercategory’: ‘plane’}, {‘id’: 2, ‘name’: ‘Military Bomber’, ‘supercategory’: ‘plane’}, … ]include_other (bool, optional) – If set to
True
, and preset_categories is provided, objects that don’t fall into the specified categories will not be removed from the dataset. They will instead be passed into a category named"other"
with its own associated categoryid
. IfFalse
, objects whose categories don’t match a category from preset_categories will be dropped.info_dict (dict, optional) –
A dictonary with the following key-value pairs:
- ``"year"``: :class:`int` year of creation - ``"version"``: :class:`str` version of the dataset - ``"description"``: :class:`str` string description of the dataset - ``"contributor"``: :class:`str` who contributed the dataset - ``"url"``: :class:`str` URL where the dataset can be found - ``"date_created"``: :class:`datetime.datetime` when the dataset was created
license_dict (dict, optional) –
A dictionary containing the licensing information for the dataset, with the following key-value pairs:
- ``"name": :class:`str` the name of the license. - ``"url": :class:`str` a link to the dataset's license.
Note: This implementation assumes that all of the data uses one license. If multiple licenses are provided, the image records will not be assigned a license ID.
recursive (bool, optional) – If image_src and/or label_src are directories, setting this flag to
True
will induce solaris to descend into subdirectories to find files. By default, solaris does not traverse the directory tree.explode_all_multipolygons (bool, optional) – Explode the multipolygons into individual geometries using sol.utils.geo.split_multi_geometries. Be sure to inspect which geometries are multigeometries, each individual geometries within these may represent artifacts rather than true labels.
remove_all_multipolygons (bool, optional) – Filters MultiPolygons and GeometryCollections out of each tile geodataframe. Alternatively you can edit each polygon manually to be a polygon before converting to COCO format.
verbose (int, optional) – Verbose text output. By default, none is provided; if
True
or1
, information-level outputs are provided; if2
, extremely verbose text is output.
- Returns
coco_dataset – A dictionary following the COCO dataset specification . Depending on arguments provided, it may or may not include license and info metadata.
- Return type
-
solaris.data.coco.
make_coco_image_dict
(image_ref, license_id=None)[source]¶ Take a dict of
image_fname: image_id
pairs and make a coco dict.Note that this creates a relatively limited version of the standard COCO image record format record, which only contains the following keys:
* id ``(int)`` * width ``(int)`` * height ``(int)`` * file_name ``(str)`` * license ``(int)``, optional
- Parameters
- Returns
coco_images – A list of COCO-formatted image records ready for export to json.
- Return type