OpenLABEL Concept Paper

1. Introduction

1.1. Foreword

OpenLABEL is a standard for annotation formats, which can be used for data streams and scenarios created during the development of automated driving features. For an automated vehicle to operate within its design domain a understanding of its surroundings is crucial. The vehicle senses its environment by using different sensors (e.g. cameras, lidar, radar). The data from the sensors must be interpreted to create a processable image of the world, based on this image the vehicle can choose its action. For the development of such features a ground truth is necessary. This ground truth is created by annotating the raw data. With in the data the following can be annotated:

objects (traffic participants, static objects, etc.)
relations between objects (pedestrian and bicycle)
areas (e.g. free space, non-drivable areas)
…

When labeling an object it is possible to use different methods. On the one hand the labelling method depends on the data, but also on the use case. Labeling a vehicle in Lidar data requires a 3 dimensional method, while labeling the same car in a video image requires only a 2 dimensional method.

During the development of an automated vehicle it is very important to identify the scenarios that fit the use case and the design domain the vehicle is supposed to operate in. For that purpose it is useful to label scenarios according to their content and relevance to the project.

Developing an automated vehicle is a very complex tasks and usually many parties are involved in such a project. This makes an exchangeable format very valuable.

In the OpenLABEL concept project it is the goal of the project group to create concepts to establish a common format that can provide the industry with a standard for the annotation of sensor data and scenarios. The format will be machine processable and still be human readable. Also the group creates a basis to build a user guide. The user guide will explain to any future user how to use the available labeling methods, that are supported by OpenLABEL. The concept paper created in this project will serve as a basis for a follow up standard development project.

1.2. Overview

This concept paper serves as a source for a future development of the OpenLABEL standard. In the OpenLABEL Concept Project the basic concepts for a future annotation standard are created. To this the project has four different working groups. Each working group created their own concept and aligned them within the project and with the other OpenX Activities at ASAM.

The Concept project has four workpackages:

OpenLABEL Format
OpenLABEL User Guide for Labeling methods
OpenLABEL Taxonomy as Interface for OpenXOntology
OpenLABEL Scenario Labeling

1.3. Relation to other Standards

The OpenLABEL Concept has a close link to the OpenXOntology. This link will ensure that the OpenLABEL Taxonomy requirements are met in the upcoming OpenX Core Domain Model.

Relation to other Standards:

ASAM OpenDRIVE
ASAM OpenSCENARIO
ASAM OSI
ASAM OpenXOntology (Standard is still under work)
ASAM OpenODD
BSI PAS 1883

2. Annotation Format

2.1. Introduction annotation format

This section details the annotation format of OpenLABEL. The format is key to make OpenLABEL flexible enough to host different labeling use cases, ranging from simple object-level annotation in images (e.g. with bounding boxes), to complex multi-stream labeling of scene semantics (e.g. with actions, relations, contexts). The annotation format is then understood as the materialization of labels in files or messages, that can be stored or exchanged between machines. The format shall address a number of requirements:

Different scene elements (objects, actions, contexts, events)
Temporal description of elements (with frames and timestamps)
Hierarchical structures, with nested attributes
Semantic relations between elements (e.g. object performing an action)
Multiple source information (i.e. multi-sensor)
Preservation of identities of elements through time
Encoding mechanisms to represent different geometries (e.g. bounding boxes, cuboids, pixel-level segmentation, polygons, etc.)
Enable linkage to ontologies and knowledge repositories
Ability to update annotations in online processes (extensible)
Scalable and searchable (good traceability properties)

The annotation format management shall also define which properties are mandatory and which optional, types of variables, and serialization mechanisms to create persistent content, such as files or messages.

Next sections detail all these aspects, with examples and definitions.

2.2. JSON schema

One modern approach to define data structure is to create a JSON schema, which is itself a JSON document that contains descriptions and constraints on the structure and content of JSON files, and it also provides a data model.

A JSON schema can be used to validate the content of a JSON file, guaranteeing that the file follows the constraints and structure dictated by the schema. Also, schemas can be used by programming languages to create object-oriented structures which facilitate manipulation, edition and access to information of a JSON file.

The annotation format of OpenLABEL is then proposed to be hosted on a detailed JSON schema file, and as a consequence, annotation files will be JSON files following the schema. Appended to this document the draft openlabel_schema_json-v1.0.0.json can be found.

2.3. Structure of the OpenLABEL format

In OpenLABEL, a scene can be either a subset of the reality that needs to be described for further analysis, or a virtual situation that needs to be materialized. In the former case, reality is typically perceived by sensors, which get discrete measures of magnitudes from the scene at a certain frequency. In the latter, sensors can be ignored, and the scene described by its components and logical sequence.

Several concepts conform the basis of the OpenLABEL format. As it will be shown, these pieces constitute the foundations to create rich descriptions of scenes, either as an entire block (e.g. serialized as a file), or frame-by-frame (e.g. serialized as message strings).

Elements: objects, actions, events, contexts and relations that compose the scene, each of them with an integer unique identifier for the entire scene.
Frames: discrete containers of information of Elements and Streams for a specific time instant.
Streams: information about the discrete data sources (e.g. coming from sensors), to describe how reality is perceived at each stream (e.g. with intrinsics/extrinsics of cameras, timestamps from sensors, etc.).
Coordinate Systems: the spatial information that defines the labeled geometries refer to specific coordinate systems, which can be defined and labeled themselves within OpenLABEL. Transforms between coordinate systems determine how geometries can be projected from one reference to another (e.g. from one sensor to a static reference, or because of odometry entries.) See coordinate systems section.
Metadata: descriptive information about the format version, file version, annotator, name of the file, and any other administrative information about the annotation file.
Ontologies: pointers to knowledge repositories (URLs of ontologies) that are used in the annotation file. Elements labeled can point to concepts at these ontologies, so a consuming application can consult the element meaning or investigate additional properties.

The basic serialization of an OpenLABEL JSON string (prettified), with just administrative information and versioning is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
   "openlabel": {
       "metadata": {
           "annotator": "John Smith",
           "file_version": "0.1.0",
           "schema_version": "1.0.0",
           "comment": "Annotation file produced manually",
       }
   }
}

Next subsections show how to add different concepts (Elements, Frames and Streams) and show examples for relevant use cases considered in this concept-paper. For the sake of space and readability, partially collapsed JSON strings will be shown.

2.3.1. Elements

Elements is the name for Objects, Actions, Events, Contexts and Relations, which are all treated similarly within the OpenLABEL format, in terms of properties, types and hierarchies.

Elements have a name, a unique identifier, a semantic type, and an ontology identifier.

name: this is a friendly identifier of the Element, not unique, but serves for human users to rapidly identify Elements in the scene (e.g. "Peter").
uid: this is a unique identifier which determines the identity of the Element. It can be a simple unsigned integer (from 0 upwards, e.g. "0"), or a Universal Unique Identifier (UUID) of 32 hexadecimal characters (e.g. "123e4567-e89b-12d3-a456-426614174000").
type: this is the semantic type of the Element. It determines to which class the Element belongs to (e.g. "Car", "Running").
ontology id: this is the identifier (in the form of a unsigned integer) of the ontology URL which contains the full description of the class referred as the semantic type. See Ontologies.

Next subsections show the purpose of each of the Element types.

Objects

Objects are the main placeholders of information about physical entities in scenes. Examples of Objects are pedestrians, cars, the ego-vehicle, traffic signs, lane markings, building, trees, etc.

An Object in OpenLABEL is defined by its name, type, and indexed inside the annotation file by an integer unique identifier:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
   "openlabel": {
       ...
       "objects": {
           "1": {
               "name": "van1",
               "type": "Van"
           },
           "2": {
               "name": "cyclist2",
               "type": "Cyclist"
           },
           ...
           "16": {
               "name": "Ego-vehicle",
               "type": "Car"
           },
           "17": {
               "name": "road17",
               "type": "Road",
           }
       }
   }
}

When using UUIDs, the keys are substituted by 32 hexadecimal character strings:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
   "openlabel": {
       ...
       "objects": {
           "c44c1fc2-ee48-4b17-a20e-829de9be1141": {
               "name": "van1",
               "type": "Van"
           },
       }
   }
}

unique identifiers need not to be sequential nor start at 0, which is useful to preserve identifiers from other label files. They only need to be unique for each element type. Each element type (action, object, event, context and relation) has its own list of unique identifiers.

name and type are mandatory fields according to the JSON schema. However, they can be left empty as they are not used to index. In general, name can be used as a friendly descriptor, while type refers to the semantic category of the element (see more about semantics in Ontologies).

JSON only permits keys to be strings. Therefore, the integer unique identifiers are converted to strings, "0". Though, carefully written APIs can parse JSON strings into integers for better access efficiency and sorting capabilities.

In addition, some Objects can be defined for certain sets of frame intervals, while others are left frame-less.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
   "openlabel": {
       ...
       "objects": {
           "1": {
               "name": "van1",
               "type": "Van"
               "frame_intervals": [{
                       "frame_start": 0,
                       "frame_end": 10
                   }
               ]
           },
           ...
           "16": {
               "name": "Ego-vehicle",
               "type": "Car",
           },
       }
   }
}

Frame intervals are represented as an array, as the same Object might appear and disappear from the scene, and thus be represented by several frame intervals.

When Objects are defined with such time information, entries of them are added to Frames.

Actions, Events, Contexts

Almost completely analogous to Objects, other elements defined in OpenLABEL are Actions, Events and Contexts.

Action: a description of a semantically meaningful situation. It can be defined for several frame intervals (just like Objects). E.g. "isWalking".
Event: a single instant in time which has a semantic load, and that typically triggers other Events or Actions, e.g. "startsWalking".
Context: any other descriptive information about the scene that has either not spatial or temporal information, or does not suit well under the term Action or Event. For instance, Context can refer to properties of the scene (e.g. "Urban", "Highway"), weather conditions (e.g. "Sunny", "Cloudy"), general information about the location (e.g. "Germany", "Spain"), or any other relevant tag.

These elements are included into the OpenLABEL JSON structure just like Objects are:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
   "openlabel": {
       ...
       "actions": {
           "0": {
               "name": "following1",
               "type": "following",
               "frame_intervals": [{"frame_start": 0, "frame_end": 10}]
            }
       },
       "events": {
           "0": {
               "name": "crossing1",
               "type": "startsCrossing",
               "frame_intervals": [{"frame_start": 5, "frame_end": 5}]
            }
       },
       "contexts": {
           "0": {
               "name": "",
               "type": "Urban"
            }
        }
    }
}

In the example above, the Context is defined frame-less, and as so, assumed to exist or be valid for the entire scene. As there are other elements (e.g. actions) with defined frame intervals, this Context also appears in all defined Frames.

Contexts can have frame intervals defined, as contextual information may vary through time (e.g. a scene starts in a Urban environment and them ends within a highway).

Relations

Relations are elements used to define relations between other elements. Though represented just like any other element within the OpenLABEL JSON schema, i.e. with name, type, defined with static and dynamic information, they have special features as they are foundational elements that allow advanced semantic labeling.

A Relation is defined as an RDF triple subject-predicate-object. The predicate can be seen as the edge connecting the two vertices (subject and object), if we imagine the triple as a graph.

The predicate is labeled as the Relation's type, while the subject and object are added as rdf_subjects and rdf_objects respectively. In OpenLABEL, rdf_objects and rdf_subjects are pointers to other defined elements in the scene, for instance an Object, a Context, Event or Action.

The predicate itself determines what is the relation between these elements. It is possible to define the relation with free text, using terms like "isNear", "belongsTo", "isActorOf", but, in general, it is a recommended practice to use the ontology_uid property to use relation concepts well defined in a domain model.

Although an RDF triple strictly defines a connection between one object and one subject, in OpenLABEL it is possible to define multiple rdf_subjects and rdf_objects for the same predicate/relation. This feature is useful for compositional relations such as "isPartOf".

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
   "openlabel": {
       ...
       "objects": {
           "0": {
               "name": "car0",
               "type": "#Car",
               "ontology_uid": 0
           },
           ...
       },
       "actions": {
           "0": {
               "name": "",
               "type": "#isWalking",
           }
       },
       "relations": {
           "0" : {
				"name" : "",
				"type" : "isSubjectOfAction",
				"rdf_subjects" : [{
						"uid" : 0,
						"type" : "object"
					}
				],
				"rdf_objects" : [{
						"uid" : 0,
						"type" : "action"
					}
				]
			},
        }
    }
}

Describing scenes can be done by decomposing a high-level description into atomic triples that can then be written formally in an OpenLABEL JSON file.

*Relation*s provide complete flexibility to represent any kind of linkage between other Elements (including Objects, Actions, Events, Contexts, and even Relations).

How to represent each particular case is left to the user of OpenLABEL. A typical, and recommended practice for transitive actions is as follows: a transitive action (an action with a subject and an object) can be added using two RDF triplets, one defining the subject of the action, with "isSubjectOfAction" and another defining the object of the action, with "isObjectOfAction".

The terms object (and subject) in RDF language need not to be confused with the term Object in OpenLABEL.

Let’s consider the following example:

Ego-vehicle follows cyclist

It can be decoupled into two RDFs triples:

Ego-vehicle isSubjectOfAction follows and cyclist isObjectOfAction follows

This pair of RDF triples are way easier to manage from an ontology point of view, and also in graphical databases implementations, since this way, not only the physical objects (Ego-vehicle and cyclist), but also the action itself (Follows) are defined as concepts (classes) in the ontology, and thus have properties, and be part of a hierarchy of classes. Whereas the edges (links or relations) are left as isSubjectOfAction, and isObjectOfAction. Other possible useful relations are: isPartOf, sameAs, hasAttribute, and other spatio-temporal relations, such as isNear, happensBefore, etc. Most of this discussion is inherited from ongoing discussions in the OpenXOntology project.

2.3.2. Frames

Dynamic information about elements is stored within the corresponding Frames. Each frame is indexed within the OpenLABEL file with an integer unique identifier exactly as elements are.

Each frame contain structures of elements with only the non-static information, i.e. name, type and other static structures are ommitted.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
   "openlabel": {
       ...
       "frames": {
           "0": {
               "objects": {
                   "1": {}
               }
           }
       },
       "objects": {
           "1": {
               "name": "",
               "type": "Van",
               "frame_intervals": [{"frame_start": 0, "frame_end": 10}]
           },
           ...
       }
   }
}

If the specific information of the Object for a given frame is nothing but its existence, then, the Object's information at such frame is just a pointer to its unique identifier.

When frame-specific information is added, it is enclosed inside the corresponding frame and object:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
   "openlabel": {
       ...
       "frames": {
           "0": {
               "objects": {
                   "1": {
                       "object_data": {
                           "bbox": [{
                                   "name": "shape",
                                   "val": [12, 867, 600, 460]
                               }
                           ]
                       }
                   }
               }
           }
       }
      ...
   }
}

More information about the "object_data" structure of the example above is discussed in Element data.

A Frame can also contain information about its own properties, such as its timestamp. In general terms, a Frame shall be seen as the container of information corresponding to a single instant. Synchronization information for multiple streams can also be labeled in order to precisely define which annotations correspond to what instant and from which sensor (see Streams).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
   "openlabel": {
       ...
       "frames": {
           "0": {
               "objects": { ... },
               "frame_properties": {
                   "timestamp": "2020-04-11 12:00:01"
               }
           }
       }
    }
}

Even when only pointers are present within a Frame, this structure ensures:

Frames can be serialized independently and sent via messaging to other computers or systems online
Efficient access to static information using pointers, and avoiding repetition of static information

The union of frame intervals of all elements in the scene define the frame intervals of the annotation file itself:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
   "openlabel": {
       ...
       "frame_intervals": [{
               "frame_start": 0,
               "frame_end": 150
           },{
               "frame_start": 160,
               "frame_end": 180
           }
       ],
       "frames": {
           "0": { ... },
           "1": { ... },
           ...
        }
    }
}

Then, frame_intervals define which frames exist for this annotation file.

2.3.3. Ontologies

The OpenLABEL JSON schema defines the allowed names of the "keys" of the key-value pairs of the JSON file. And also the expected type, structure and format of the "values" (in some minor cases also the allowed values, specially for strings).

However, in most cases, the provision of meaning of the "values" is left free for the annotator. For instance, the type of an Object can be declared as Person, while other annotator might choose Pedestrian if the labeling tool imposes no restrictions.

OpenLABEL provides a door to link to ontologies, as representations of the domain-model of interest. This is achieved labeling the ontologies and the ontology_uid for elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
   "openlabel": {
       ...
       "ontologies": {
           "0": "http://www.somedomain.org/ontology",
           "1": "http://www.someotherdomain.org/ontology"
        },
       "objects": {
           "0": {
               "name": "car0",
               "type": "#Car",
               "ontology_uid": 0
           },
           "1": {
               "name": "person1",
               "type": "#Person",
               "ontology_uid": 1
            }
        }
    }
}

The object’s type can then be read as the concatenation of the url of the ontology pointed out by the ontology_uid and the type entry of the object. Labeling tools might provide the ability to parse the ontologies (either remote or local) and offer the annotator a list of options, suggestions, or translation capabilities.

Also, the numbers used to describe some geometries, such as cuboid require that the same consensus and criteria is maintained and guaranteed by the standard. As a consequence, a default ontology for OpenLABEL is assumed to exist, to be aligned with the OpenXOntology project, where all the terms used in the OpenLABEL JSON schema are defined.

2.3.4. Relations

A Relation is defined as an RDF triple subject-predicate-object. The predicate can be seen as the edge connecting the two vertices (subject and object), if we imagine the triple as a graph.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
   "openlabel": {
       ...
       "objects": {
           "0": {
               "name": "car0",
               "type": "#Car",
               "ontology_uid": 0
           },
           ...
       },
       "actions": {
           "0": {
               "name": "",
               "type": "#isWalking",
           }
       },
       "relations": {
           "0" : {
				"name" : "",
				"type" : "isSubjectOfAction",
				"rdf_subjects" : [{
						"uid" : 0,
						"type" : "object"
					}
				],
				"rdf_objects" : [{
						"uid" : 0,
						"type" : "action"
					}
				]
			},
       }
   }
}

Describing scenes can be done by decomposing a high-level description into atomic triples that can then be written formally in an OpenLABEL JSON file.

TODO: add examples of transitive maneuvers

The terms object (and subject) in RDF language need not to be confused with the term Object in OpenLABEL.

2.3.5. Element data

The OpenLABEL JSON schema defines the possibility to nest element data within elements. For instance, object_data can be embedded inside Objects, action_data inside Actions, and event_data and context_data inside Events and Contexts, respectively.

This gives the ability to describe to great level of detail any aspect of the elements. On the one hand, element-level descriptions, as those defined in sections above, provide the ability to describe intrinsic, high-level information about objects, actions, etc. On the other hand, element data-level information can be used to add how the elements are perceived by sensors, or details about their geometry or any other relevant aspect.

Since the general structure defines equivalent hierarchies for elements at the root and inside each frame, element data can then be naturally defined statically (time-less), or dynamically (for specific frame intervals).

The OpenLABEL JSON schema defines a comprehensive list of primitives that can be used to encode element data information. Some of them are completely generic, such as text, num or boolean, while others are specific to geometric magnitudes, like poly2d, cuboid, etc.

The list of currently supported element data is:

boolean: true or false
num: a number (can be integer or floating)
text: a string of chars
vec: a vector or array of numbers
bbox: a 2D bounding box
rbbox: a 2D rotated bounding box
binary: a binary content stringified to base64
cuboid: a 3D cuboid
image: an image payload encoded and stringified to base64
mat: a NxM matrix
point2d: a point in 2D space
point3d: a point in 3D space
poly2d: a 2D polygon defined by a sequence of 2D points
poly3d: a 3D polygon defined by a sequence of 3D points
area_reference: a reference to an area
line_reference: a reference to a line
mesh: a 3D mesh of points, vertex and areas

See the OpenLABEL JSON schema for details on each of them.

One interesting distinction is that the first four element data types of the list (boolean, num, text, vec) are defined as non-geometric, and thus can be themselves being nested within other geometric element data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
   "openlabel": {
       ...
       "frame_intervals": [ ... ],
       "frames": {
           "0": {
               "objects": {
                   "0": {
                       "object_data": {
                           "bbox": [{
                                   "name": "shape",
                                   "val": [300, 200, 50, 100]
                                   "attributes": {
                                       "boolean": [{
                                               "name": "visible",
                                               "val": true
                                           }
                                       ]
                                   }
                               },{
                                   "name": "shadow",
                                   "val": [250, 200, 100, 200]
                               }
                           ]
                       }
                   }
               }
           }
        },
       ...
       "objects": {
           "0": {
               "name": "car0",
               "type": "car",
               "frame_intervals": [{"frame_start": 0, "frame_end": 0}],
               "object_data": {
                   "text": [{
                           "name": "color",
                           "val": "blue"
                       }
                   ],
                }
            }
        }
    }
}

The same concept applies to action_data, event_data and context_data, with the main different that they can not have geometric element data inside (e.g. bbox, cuboid, etc.), but only non-geometric types such as text, vec, num and boolean.

Full detail of the inner structure of each of these element types is provided in Element Data types.

Since element data is not indexed by integer unique identifiers like elements, the structure defines a mechanism to have an index over each element element data by adding element data pointers. For instance, object_data_pointers within an Object contain key-value pairs to identify which object_data names are used, and which are their associated frame_intervals.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
   "openlabel": {
       ...
       "objects": {
           "0": {
               "name": "car0",
               "type": "car",
               "frame_intervals": [{"frame_start": 0, "frame_end": 0}],
               "object_data": {
                   "text": [{
                           "name": "color",
                           "val": "blue"
                       }
                   ],
                },
                "object_data_pointers": {
                    "color": {
                        "type": "text",
                    },
                    "shape": {
                        "type": "bbox",
                        "frame_intervals": [{"frame_start": 0, "frame_end": 0}],
                        "attributes": {
                            "visible": "boolean"
                        }
                    }
                }
            }
        }
    }
}

As can be seen from the example above, the pointers refer to both static (frame-less) and dynamic (frame-specific) object_data, and also contain information about the nested attributes. In practice, this feature is extremelly useful for fast retrieval of element data information from the JSON file, without the need to explore the entire set of frames.

2.3.6. Streams

Complex scenes may be observed by several sensing devices, and thus producing multiple streams of data. Each of these streams might have different properties, intrinsic and extrinsic information, and frequency. The OpenLABEL JSON schema defines the possibility to specify such information for a multi-sensor (and thus, a multi-stream) set-up, by allocating space for such metadata descriptions, and the ability to specific, for each labeled element, what stream they correspond to.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
   "openlabel": {
       ...
       "metadata": {
           "streams": {
               "Camera1": {
                   "type": "camera",
                   "uri": "./somePathORIPAddress/someVideo.mp4"
                   "description": "Frontal camera from vendor X",
                   "stream_properties": {
                       "intrinsics_pinhole": {
                           "camera_matrix_3x4": [ 1000.0,    0.0, 500.0, 0.0,
                                                     0.0, 1000.0, 500.0, 0.0,
                                                     0.0,    0.0,   0.0, 1.0],
                            "distortion_coeffs_1xN": [],
                            "height_px": 480,
                            "width_px": 640
                       },
                   }
               }
           }
       },
       ...
       "frame_properties": {
           "streams": {
               "Camera1": {
                   "stream_properties": {
                       "intrinsics_pinhole": {
                           "camera_matrix_3x4": [ 1000.0,    0.0, 500.0, 0.0,
                                                     0.0, 1000.0, 500.0, 0.0,
                                                     0.0,    0.0,   0.0, 1.0],
                            "distortion_coeffs_1xN": [],
                            "height_px": 480,
                            "width_px": 640
                       },
                       "sync": {
                           "frame_stream": 1,
                           "timestamp": "2020-04-11 12:00:02"
                       }
                   }
               }
           },
           "timestamp": "2020-04-11 12:00:01"
       }
   }
}

As shown in the example, stream_properties can be defined either within the static part (i.e. inside the "metadata/streams" field), or frame-specific, inside the "streams" field of a given frame.

The sync field within stream_properties can define the frame number of this stream that corresponds to this frame, along with timestamping information if needed. This feature is extremelly handy to enable the annotation of multiple cameras which might not be perfectly aligned. In such case, frame 0 of the annotation file corresponds to frame 0 of the first stream to occurr. In general, frame_stream identifies which frame of this stream corresponds to the frame in which it is enclosed.

To specify that a certain object data information corresponds to a certain stream, the OpenLABEL JSON schema defines the property stream for both elements and element data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
   "openlabel": {
       "frames": {
           "0": {
               "objects": {
                   "0": {
                       "object_data": {
                           "bbox": [{
                                    "name": "head",
                                    "val": [600, 500, 100, 200],
                                    "stream": "Camera1"
                               }
                           ]
                       }
                   }
               }
           }
       },
       ...
       "objects": {
           "0": {
               "name": "",
               "type": "Person",
               "stream": "Camera1"
           }
       }
   }
}

2.3.7. Coordinate Systems and Transforms

As described in the coordinate system section, labels can be defined as relative to specific coordinate systems. This is particularly necessary for geometric labels, such as polygons, cuboids or bounding boxes, which define magnitudes under a certain coordinate system. For instance, a 2D line can be defined within the coordinate system of an image frame, and a 3D cuboid inside a 3D Cartesian coordinate system.

In addition, as multiple coordinate systems can be defined, it is necessary to define as well mechanisms to declare how to convert values of magnitudes from one coordinate system to another. Therefore, Transforms, between two coordinate systems are also defined.

Coordinate systems can be declared with a friendly name, used as index, and in the form of parent-child links, to establish their hierarchy:

type: the type of coordinate system is defined so reading applications have a simplified view of the hierarchy: can be scene_cs (this corresponds to static coordinate system), local_cs (this is a coordinate system of a rigid body moving sensors), sensor_cs (a coordinate system attached to a sensor) or custom_cs (any other coordinate system defined by the user).
parent: despite the type of coordinate system defined, each coordinate system can declare its parent coordinate system in the hierarchy.
pose_wrt_parent: a default or static pose of this coordinate system with respect to the declared parent. Can be set in the form of a 4x4 matrix enclosing a 3D rotation and 3D translation.
children: the list of children for this coordinate system.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
   "openlabel": {
       ...
       "coordinate_systems": {
            "odom": {
                "type": "scene_cs",
                "parent": "",
                "pose_wrt_parent": [],
                "children": [
                    "vehicle-iso8855"
                ]
            },
            "vehicle-iso8855": {
                "type": "local_cs",
                "parent": "odom",
                "pose_wrt_parent": [1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0],
                "children": [
                    "Camera1",
                    "Camera2"
                ]
            },
            "Camera1": {
                "type": "sensor_cs",
                "parent": "vehicle-iso8855",
                "pose_wrt_parent": [1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0],
                "children": []
            },
            "Camera2": {
                "type": "sensor_cs",
                "parent": "vehicle-iso8855",
                "pose_wrt_parent": [1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0],
                "children": []
            }
        },
       ...
   }
}

Of course, the transforms between coordinate systems can also be defined for each frame, overriding the default-static pose defined above. Transforms are defined with a friendly name used as index, and the following properties:

src: the name of the source coordinate system, whose magnitudes the transform converts into destination coordinate system. This must be the name of a valid (declared) coordinate system.
dst: the destination coordinate system in which the source magnitudes are converted after applying the transform. This must be the name of a valid (declared) coordinate system.
transform_src_to_dst: this is the transform expressed in algebraic form, for instance as a 4x4 matrix enclosing a 3D rotation and a 3D translation between the coordinate systems.
additional properties: as most elements in the OpenLabel format, it is also possible to add customized content as additional properties.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
   "openlabel": {
       ...
       "frames": {
           "2": {
               "frame_properties": {
                   "transforms": {
                       "vehicle-iso8855_to_Camera1": {
                           "src": "vehicle-iso8855",
                           "dst": "Camera1",
                           "transform_src_to_dst": [[1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0]]
                       }
                   }
               }
           }
       }
       ...
   }
}

For non-3D coordinate systems, or with non-linear transforms, the format still applies, as long as the transform_src_to_dst is an array of numbers which contain all the necessary parameters to express the transform.

In the example above, the destination coordinate system of the transform is Camera1 which is also the friendly name of a Stream. Indeed, Stream, which describe typically a sensor, such as a Camera or a LIDAR, should have associated coordinate systems, to defined their extrinsics or pose with respect to other coordinate systems, such as the ego-vehicle ISO8855 origin. Internal processes, such as intrinsic parameters or distortion coefficients (for pinhole or fisheye cameras) are defined inside the Stream fields as shown in Streams.

With this structure, it is possible to describe particular and typical transform cases, such as odometry entries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
   "openlabel": {
       ...
       "frames": {
           "0": {
               "frame_properties": {
                   "transforms": {
                       "odom_to_vehicle-iso8855": {
                           "src": "odom",
                           "dst": "vehicle-iso8855",
                           "transform_src_to_dst": [1.0, 3.7088687554289227e-17, ...]
                       },
                       "raw_gps_data": [49.011212804408,8.4228850417969, ...],
                       "status": "interpolated"
                   }
               }
           }
       }
       ...
   }
}

Using additional properties it is possible to embedd detailed and customized information about the transforms, such as additional non-linear coefficients, etc (in the example above, the raw gps entries are labeled for completeness).

2.4. Data types

This section provides details of the on the object_data primitives defined in OpenLABEL annotation format. Most of them are self-explanatory, as they represent primitives types like string, num (single number, floating point precision), vec (array of numbers), bool (boolean).

Geometric types are more complex. Next sub-sections describe their format.

2.4.1. Bounding box: `bbox`

The 2D bounding box is defined as in section 2. It is defined as a array of 4 floating point numbers that define the center of the rectangle, and its width and height.

Thus, in the JSON schema file, a bounding box is defined as:

And example bounding box entry serialized in JSON is:

1
2
3
4
"bbox": {
    "name": "head",
    "val": [400, 200, 100, 120]
},

Which means the center of the rectangle is the point (x, y)=(400, 200), while its dimensions are width=100, and height=120.

For complex set-ups it is possible to defined the coordinate_system these magnitudes are expressed with respect to. Also, it is possible to embed non-geometric object data inside:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
"bbox": {
    "name": "head",
    "val": [400, 200, 100, 120],
    "coordinate_system": "Camera1",
    "attributes" : {
        "boolean" : [{
                "name" : "visible",
                "val" : false
            }, {
                "name" : "occluded",
                "val" : false
            }
        ]
    }
},

2.4.2. Cuboid: `cuboid`

The 3D bounding box or cuboid is defined as in bounding box section. It is defined as a array of 10 floating point numbers that define the center of the rectangle (x, y, z), and its pose, as a quaternion vector (a, b, c, d) plus a dimensions vector (sx, sy, sz).

An example cuboid:

1
2
3
4
"cuboid": {
    "name": "shape",
    "val": [12.0, 20.0, 0.0, 1.0, 1.0, 1.0, 1.0, 4.0, 2.0, 1.5]
},

2.4.3. Semantic segmentation: `image` and `poly2d`

Semantic segmentation responds to the need to define one or more labels per pixel of a given image (see semantic segmentation for details about the different possible use cases).

In terms of data format, such dense information can be tackled with different approaches, each of them having different purposes or responding to different needs:

Separate images: historically, semantic segmentation information has been stored as separate images, usually formatted as PNG images (lossless). This is possibly the simplest approach, and the one offering the smallest storage footprint, at the cost of the need to manage separate files in the file system. Therefore, the main OpenLabel JSON file may contain the URL/URIs of these images (one or many):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
"objects": {
    "0": {
        "name": "",
        "type": "",
        "object_data": {
            "string": [
                {
                    "name": "semantic mask uri - dictionary 1",
                    "val": "/someURLorURI/someImageName1.png"
                },{
                    "name": "semantic mask uri - dictionary 2",
                    "val": "/someURLorURI/someImageName2.png"
                }
            ]
        }
    }
},

Embedded images: image content can be expressed in base64 and then embedded within the JSON file. This approach will create largest JSON files (base64 adds 4/3 overhead) but alleviates the need to manage multiple files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
"objects": {
    "0": {
        "name": "",
        "type": "",
        "object_data": {
            "image": [
                {
                    "name": "semantic mask - dictionary 1",
                    "val": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAKACAIAAADLqjwFAAAKu0lEQVR42u3dPW7VYBCGUSe6JWW6NCyDErEvKvaFKFkGDR0lfYiEkABFN8n9+fzOzDkFDRLgAT0a2Z/NtgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABc0uevb25MASCwzo8/CjRAYp0FGiC0zgINEFpngQYIrbNAA4TWWaABQuss0AChdRZogMQ0CzRAbp0FGiC0zgINEFpngQYIrbNAA4TWWaABQuv84d1PgQZIrLMNGiC0zrmBfv/2o7/U0r58+2QIcE6dH90aH0BgnQUaILTOjw6GCLBvmp+ssw0aILTOAg0QWmeBBgits0ADhNZ585AQYH2dn02zDRogt85VN+jjb6nt9RbizY/vD3f3/rGCOl+kzptbHCdU+LSf1W5Q51fVWaDPLfLJv45egzoL9M5dfvbXV2pQZ4FOSfOTv51MQ9c0n1xngd4zzTIN6izQ0WmWaVBngY5Os0yDOgt0dJplGtT5b0PfJAyvc7k/J6jzxetcdYM+513BcsmzSkOhOl8qzRM36LoLqVUaptV5VqCrN06jYVSdBwW6R900GubUeUqgO3VNo2FInUcEul/RNBom1Hlrfw66a8t8exp2T/O169x8g+69adqjoXedOwd6Qr80GhrXuW2g55RLo6FrnXsGelqzNBpa1nnzNTtAnQPT3HODnrlOWqKhX527BXpypzQamtV5G/u5UUCdw+vcKtBWSBOATnW2QQPqHFrnPoG2PJoDNKvz5pgdIM2ZdW6yQVsbTQP61XlzDxpQ58w6dwi0hdFMoGWdbdCAOofWeav+kNCqeGQyvuiPOtdNsw0aUOfcOgs0oM65Cgfa/Q3zgcZ1tkED6izQAOr8Sl71BgaluVCdbdCAOgv0pXkCZkqoc+8626ABdRZoAHUWaECdG9R5c4oDaFznumm2QQPqLNAA6izQgDr3qLNAA+os0Bfl/QuzQp3b13kreorj4e5ed14+K0NgQpr71XlziwNQZ4EGUGeBBtRZoAHU+Xq86g2UrHPvNNugAXUWaAB1FmhAnQV6Z96/MCXUWaAB1HkfTnEA6WmeWWcbNKDOAg2gznMC7QmY+aDOAg2gzgINqLM6/1H7FIcv9x+ZjCFQtM7SbIMG1FmgrYpmgjqrsw0aUGeBtjCaBqizQAPqPFWTb3E4zmF9pmKa1dkGDaizQFseTQB1VmeBBtRZoK2Qrh3UeR/dPtg/82mhOlOlztI8d4MG1FmgrZOuF3VWZ4HWLFeKOgu0crlGUGeB1i9XhzozO9CNK6bOqPMEh/ZX2O/gnTqTn2Z1tkFPLJo6o84CrdGuAtRZoNVNnVFnhge6dOPUGXUe6DDtgn+XrtBjQ2mmRJ2l2QY9rnrqjDrboOc2OnaVlmbUmcPw6w/MtDSjzgh0XKalGXVGoOMyLc2oMwIdl2lpplya1VmgIzJ9vVLrMuqMQF+4pCf3WpFRZwR6aa//a7cKo85ckP80dkW7QZ0RaECd+3CLA9RZmm3QgDoj0IA6CzSgzgg0oM4CDagzCZziAGlWZxs0oM4INKizOgs0oM4INKDOw3hICHPrLM02aECdEWhQZ3UWaECdEWhAnQUaUGcEGlBnnuWYHfRPszrboAF1RqBBndVZoAF1RqABdeYfHhJCwzpLsw0aUGcEGtRZnQUaUGcEGlBnBBrUmYKc4oDaaVZnGzSgzgg0qLM6I9Cgzgg0oM4INKgzXTjFAZXqLM02aECd2d+NEYA6I9CAOiPQoM4INKDOCDTMSrM6I9Cgzgg0qLM6I9Cgzgg0oM4INHSvszQj0KDOCDSoszoj0KDOCDSgzgg0qDMCDSxOszoj0KDOCDSoszoj0KDOCDSgzgg0qDMINKyvszQj0KDOCDSoszoj0KDOCDSgzgg0qDMINCxOszoj0KDOCDSoszoj0KDOINCgzgg0dK+zNCPQoM4INKizOiPQoM4g0KDOCDSoMwg0qDMCDdKszgg0qDMINOqszgg0qDMINKgzAg3t6yzNCDSoMwg06qzOCDSoMwg0qDMCDeoMAg2L06zOCDSoMwg06qzOCDSoMwg0qDMCDeoMAg3r6yzNCDSoMwg06qzOCDSoMwg0qDMCDeoMAg2L06zOCDSoMwg06qzOCDSoMwg0qDMINN3rLM0INKgzCDTqrM4INKgzCDSoMwg06gwCDeoMAo00qzMCDeoMAo06qzMINOoMAg3qDAJN+zpLMwIN6gwCjTqrMwg06gwCDeoMAo06g0DD4jSrMwg06gwCjTqrMwg06gwCDeoMAo06g0DD+jpLMwg06gwCjTqrMwg06gwCDeoMAo06AwLN4jSrMwg06gwCjTqrMwg06gwCDeoMAk33OkszCDTqDAKNOqszCDTqDAIN6gwCjToDAs3iNKszCDTqDAKNOqszCDTqDAg06gwCjToDAs36OkszCDTqDAKNOqszCDTqDAg06gwCjToDAs3iNKszCDTqDAi0OqszCDTqDAg06gwCTfc6SzMINOoMCLQ6qzMINOoMCDTqDAKNOgMCjToDAi3N6gwCjToDAq3O6gwCjToDAo06g0DTvs7SDAKNOgMCrc7qDAKNOgMCjToDAq3OgECzOM3qDAKNOgMCrc7qDAKNOgMCjToDAq3OgECzvs7SDAKNOgMCrc7qDAJtBOoMCDTqDAi0OgMCzeI0qzMINOoMCLQ6qzMg0OoMCDTqDAh09zpLMwg06gwItDqrMyDQ6gwINOoMCLQ6AwKNOgMCLc3qDAi0OgMCrc7qDAi0OgMCjToDAt2+ztIMCLQ6AwKtzuoMCLQ6AwKNOgMCrc4AAr04zeoMCLQ6AwKtzuoMCLQ6AwKNOgMCrc4AAr2+ztIMCLQ6AwKtzuoMCLQ6Awi0OgMCrc4AAr04zeoMCLQ6AwKtzuoMCLQ6Awi0OgMC3b3O0gwItDoDAq3O6gwItDoDCLQ6AwKtzgACvTjN6gwItDoDTA20OgMCrc4AAq3OgECrM4BA71lnaQYEWp0BpgZanQGBVmcAgVZnQKDVGUCgd0uzOgMCrc4AUwOtzoBAqzOAQKszQN1AO7ABCLQ6Awi0OgPUDbQ6AwKtzgACrc4AdQOtzoBAl0+zOgMCrc4AUwOtzgCJgVZngMRAqzNAYqAd2ABIDLQ6AyQGWp0BEgOtzgCJgVZngMRAqzNAXKAdpwNIDLQ6AyQGWp0BEgOtzgCJgVZngMRAqzNAYqAdpwNIDLQ6AyQGWp0BEgOtzgCJgVZngMRAqzPAmW4T/hDqDJAYaHUGeNJVbnG8/P6GOgMsDfQLG63OAEfsdotDnQGOO0gzQKbV56DVGSDICd+xAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAqOcXcO/DOJCe2z8AAAAASUVORK5CYII=",
                    "mime_type": "image/png",
                    "encoding": "base64"
                }
            ]
        }
    }
},

Polygons: another option is to decompose the entire semantic segmentation mask into their inner pieces corresponding to the different classes or object instances. This approach has the benefit of identifying individual objects directly within the JSON file. Thus, a user application can directly read specific objects, without the need to load the PNG image and find the object of interest. The con is the increased JSON size. Polygons (2D) can be expressed directly as lists of (x,y) coordinates. However this may create very large and redundant information. Lossless compression mechanisms (e.g. RLE or Chain Code algorithms; in the example below, we are using the algorithm SRF6DCC, a reference implementation of this and other algorithms will be provided during the standardisation project of OpenLabel) can be applied, to convert the (possibly long) list of (x,y) coordinates into smaller strings:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
"objects": {
    "0": {
        "name": "car1",
        "type": "#Car",
        "object_data": {
            "poly2d": [
                {
                    "name": "poly1",
                    "val": ["5","5","1","mBIIOIII"],
                    "mode": "MODE_POLY2D_SRF6DCC",
                    "closed": false
                }, {
                    "name": "poly2",
                    "val": [5,5,10,5,11,6,11,8,9,10,5,10,3,8,3,6,4,5],
                    "mode": "MODE_POLY2D_ABSOLUTE",
                    "closed": false
                }
            ]
        },
    }
}

NOTE: Using polygons then implies that labels are created at object-level, rather than image-level, which might be extremelly useful for searching applications, which may be interesting in locating all objects of type `car`.

NOTE: Using PNG masks (either as separate files or embedded inside the JSON file) is definitely the preferred way to store labels for machine learning applications, which don't search inside the masks, but rather fed them directly into training pipelines.

2.5. Frames and Streams Synchronization

This section provides detail on the synchronization of multiple streams and their time information frames.

Labels can be produced to be related to specific streams (e.g. cameras, LIDAR). When multiple such streams are present and labels need to be produced for several of them (e.g. bounding boxes for images of the camera, and cuboids for the point clouds of the LIDAR), then, a synchronization and matching strategy is needed.

Determining the synchronization of the data streams (e.g. images and point clouds) correspond to the data source set-up, and not to the annotation stage. For example, the data container may contain precise HW timestamps for images and point clouds, and in addition, the correspondence between frame indexes for multiples cameras (e.g. Frame 45 of camera 1 corresponds, because of proximity in time, to Frame 23 of camera 2, maybe because they have different frequency or have started with some delay).

Therefore, when producing labels for such different, the annotation format need to allocate space and structure for such timing information, such that all labels are perfectly and easily associated to their corresponding data and time.

The JSON schema defines the frame data containers, which correspond to "Master Frame Indexes".

2.5.1. One stream

In many cases, there is a single stream of data (e.g. an image sequence) that needs to be labeled.

Simple case

The simplest case, where nothing needs to be specified (sensor names, timestamps, etc). Frame indexes are integers, starting from 0. Master Frame index coincides with Stream-specific frames index (thus, stream-specific frame index is not labeled).

1
2
3
4
5
6
7
8
{
    "openlabel": {
        "frames": {
            "0": { ... },
            "1": { ... }
        }
    }
}

Stream Frame index not coincident with Master Frame index

Though, it is possible to defined a specific frame numbering for Stream-specific frames inside the Master Frame Index (which always starts from 0). Thus, these counts are non-coincident and can reflect that the stream indexes is discontinuous or starting at a certain value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
    "openlabel": {
        "frames": {
            "5": {
                "frame_properties": {
                    "timestamp": "2020-04-11 12:00:01",
                    "streams": {
                        "Camera1": {
                            "stream_properties": {
                                "sync": { "frame_stream": 91}
                            }
                        }
                    }
                }
            },
        }
    }
}

Other properties such as timestamps can be added for detailed timing information of each stream frame.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
    "openlabel": {
        "frames": {
            "0": {
                "frame_properties": {
                    "timestamp": "2020-04-11 12:00:01",
                    "aperture_time_us": "56"
                }
            },
        }
    }
}

2.5.2. Multiple streams

Complex labeling set-ups include multiple streams (e.g. labels that need to be defined for different sensors).

Same frequency, same start and indexes

This is the fully synchronized case, where the Master Index coincides with each of the Stream indexes.

Same frequency, different start and indexes

However, it is possible to have Stream indexes defined independently, to reflect for instance that one stream is delayed one frame (but still synced).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
    "openlabel": {
        "frames": {
            "1": {
                "frame_properties": {
                    "timestamp": "2020-04-11 12:00:01",
                    "streams": {
                        "Camera1": {
                            "stream_properties": {
                                "sync": { "frame_stream": 1}
                            }
                        },
                        "Camera2": {
                            "stream_properties": {
                                "sync": { "frame_stream": 0}
                            }
                        }
                    }
                }
            },
        }
    }
}

Other possible differences in syncing can be labeled, for instance jitter, by embedding timestamping information for each stream frame.

Same frequency, constant shift

If the frame shift is known to be constant, a more compact representation is possible by specifying the shift at root stream_properties rather than on each frame (as in the previous examples):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    "openlabel": {
        "streams": {
            "Camera1": {
                "stream_properties": {
                    "sync": { "frame_stream": 1}
                }
            },
            "Camera2": {
                "stream_properties": {
                    "sync": { "frame_stream": 0}
                }
            }
        }
    }
}

Different frequency

Streams might represent data coming from sensors with different capturing frequency (e.g. a Camera at 30 Hz, and a LIDAR at 10 Hz). Following previous examples, it is possible to embed Stream frames inside Master frames so the frequency information is also included.

Next figures show typical configurations, where the Master Frame Index follows the "fastest" Stream (e.g. the "Camera1" Stream in the first figure), or the "slowest" (e.g. the "Lidar1" Stream in the second figure)

2.5.3. Specifying "coordinate_system" for each label

After defining the coordinate systems (see Coordinate Systems and Transforms), and the timing information as in the examples above, labels for Elements and Element data can be declared for specific coordinate systems.

Coordinate systems of specific Streams can be defined as well. This way, at each frame, the information about labels, timing, and coordinate systems is specified alltogether.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
{
    "openlabel": {
        "frames": {
            "0": {
                "objects": {
                    "0": {
                        "object_data": {
                            "bbox": [
                                {
                                    "name": "shape2D",
                                    "val": [600, 500, 100, 200],
                                    "coordinate_system": "Camera1"
                                }
                            ],
                            "cuboid": [
                                {
                                    "name": "shape3D",
                                    "val": [...],
                                    "coordinate_system": "Lidar1"
                                }
                            ]
                        }
                    }
                },
                "frame_properties": {
                    "streams": {
                        "Camera1": {
                            "stream_properties": {
                                "sync": { "frame_stream": 1, "timestamp": "2020-04-11 12:00:07"},
                            }
                        },
                        "Lidar1": {
                            "stream_properties": {
                                "sync": { "frame_stream": 0, "timestamp": "2020-04-11 12:00:10"}
                            }
                        }
                    }
                }
            }
        },
        "objects": {
            "0": {
                "name": "",
                "type": "car",
                "coordinate_system": "Camera1"
            }
        }
    }
}

3. Labeling Methods

In this chapter the labeling methods for Object and scenario labeling are explaned. This chapter and all its content will be transferred to the user guide for OpenLABEL and the standrd it self.

Having a single labeling approach either for objects or scenarios is very important to asure that the datasets can be exchanged between parties, as of now, many datasets use different labeling methods and this causes the exchange or extension of datasets to be very time intensive and expensive.

3.1. Coordinate Systems

An OpenLABEL format data stream may contain sensor data from multiple sensors and multiple styles of label information associated with that sensor data. It needs to be clear how data from different sensors is related, how the label data relates to the sensor data and how the sensor data relates to the real world. Multiple coordinate systems with transforms between them are used to achieve this.

For example, a data stream may contain images from two forward facing cameras arranged as a stereo pair together with LIDAR return data from a single top mounted mechanically rotating LIDAR, all on a moving vehicle also containing a GNSS/INS system.

Note that given example is more complex than an ADAS L2 applications this would be, but the given example is much simpler than a L4 AV application would.

In this example it is necessary to understand the positions and orientations of the 4 sensors (two cameras, one LIDAR, one GNSS/INS) with respect to each other and with respect to the vehicle.

The data stream may also, for example, contain label data for 2D object bounding boxes for the left camera plus 3D object bounding boxes which are a human /algorithm annotator’s best estimate of where in the world objects are based on both the stereo camera information and the LIDAR information. It is necessary to understand how the 2D and 3D labels relate to each other.

It is also necessary in many cases to understand how the sensor data and the labels relate to the real world as represented by the absolute positions and orientations the GNSS/INS system can provide, or as represented by map data. For example, it is often necessary to understand where objects are in relation to the road structure as provided by a map.

More specifically, for a camera it is desirable to understand the position of the ray of light that generated the value of each pixel in the image. This requires knowledge of the position and orientation of the camera in the world together with the positions of objects/road/etc in the world. Any distortion introduced by the camera lens also needs to be understood. For a LIDAR system, again, it is desirable to understand the point in the world which each LIDAR return is generated from. This requires knowledge of the position of the LIDAR device in the world, and the position of objects hit by the LIDAR in the world. Also since in this example the LIDAR is a relatively slow scanning mechanical device it is necessary to understand the motion of the LIDAR device through the world in order to be able to understand the positions of all the LIDAR returns relative to each other (and the world).

In general it is convenient to have a coordinate system for each sensor, together with a coordinate system for the vehicle and also a coordinate system for the world.

Sensor data is naturally stored in the coordinate system for the sensor that produced it. Labels that are specific to a sensor, e.g. 2D bounding boxes for a camera, are also naturally stored in the coordinate system of the relevant sensor. Labels that are related to world coordinates, like the fused camera and LIDAR 3D bounding box in this example are most naturally stored in the world coordinate system (that the human did the annotation in).

It may be thought that providing a fixed small number of coordinate frames, e.g. sensor(s), vehicle, world would be sufficient, however this is not the case. Different sensor set ups and different system level choices can lead to many more coordinate systems being needed with significant variation between systems.

For example, with a stereo camera system, it is usual to have a pre-processing stage that takes the ‘raw’ camera images and undistorts and rectifies them to produce a new pair of ‘rectified’ images. This removes the lens distortion and aligns the horizontal lines of both cameras (so that the rectified images fit an ideal pinhole camera model and searching for the same point in the scene in both cameras can be performed by just searching along a line). These rectified images appear to come from virtual cameras with slightly different orientations and focal lengths than the physical cameras. This difference can be represented by having an ‘image sensor’ coordinate system and a ‘(virtual) camera’ coordinate system and a transform between them. Some systems will record the ‘raw’ images and the transforms necessary to generate the rectified images (and may or may not also record the rectified images). Some systems will record only the rectified images.

For example, with a LIDAR system, it is usual to have a pre-processing stage that takes one rotation of raw LIDAR return data and converts it into a point cloud. This removes the rotating over time nature of the LIDAR scan by taking into account how the LIDAR sensor has moved (i.e. how the vehicle has moved) over the time taken for the LIDAR to make one rotation. This ‘untwisting’ process generates a point cloud in a coordinate system that is fixed in the world. In some systems the point cloud is in a coordinate system that is locked to where the LIDAR was at the start (or end) of the scan; in other systems it is in a coordinate system that is fixed in the world but can drift over time relative to a map (an ‘odom’ type coordinate system); and in some systems it is in a coordinate system that is absolutely fixed in the world (a GNSS or map type coordinate system).

Another reason different systems may use different coordinate systems is that the maps they use may be different. For example, some systems may use GNSS type coordinates (an elliptical coordinate system); some systems may use UTM coordinates (rectangular coordinates that are appropriate for local areas of the earth); and some systems may use country specific mapping systems (such as the UK Ordinance Survey map coordinates). Other systems may use proprietary maps such as those created by dense LIDAR surveys of an area together with a LIDAR localisation method.

Yet another reason that different systems may use different coordinate systems is that they may have made different system level choices for how to represent motion of the vehicle through the world. For example, many systems use the model that the ROS (Robot Operating System) middleware uses where the chain of transforms includes the ‘odom’ coordinate system which is a coordinate system that is approximately fixed in the world but can drift over time. In this model the transform sequence is typically: sensor → vehicle → odom → map → earth. In this case the system generates its own idea of motion (odometry) through the world using local sensor data like camera or inertial sensors, then this ‘odom’ coordinate system is localised by placing it on a map using some global sensor data like GNSS. The benefit of this system is that it can gracefully handle situations where there is a loss of global sensor data such as when a vehicle enters a tunnel and loses GNSS signal. In this case the vehicle will be able to detect its movement through the tunnel, but only approximately and over time the vehicle’s idea of its position on a map will drift from its actual position on a map, then when the vehicle emerges from the tunnel it will regain GNSS signal and be able to correct its idea of position on the map. This is represented by the vehicle motion being smooth in the odom coordinate system, but there being discontinuous changes of the transformation between the odom and map coordinate systems.

Yet another system level choice for different systems using different coordinate systems might be choices like whether to compensate for tilt of the road. Some systems may have an additional coordinate system (at some point in the transform sequence) that is referenced to ‘down’ as detected by using accelerometers (with compensation for acceleration due to motion).

For all these reasons the OpenLABEL standard provides a method to describe an arbitrary number of coordinate systems and a method to describe the transforms between those coordinate systems.

However, despite the ability to describe an arbitrary set of coordinate systems, there are some coordinate systems that are commonly used in many systems and so are defined by the standard. The coordinate systems with fixed definitions include: “vehicle-iso8855”, “odom”, “map-UTM”, “geographic-wgs84”. Whenever these names are used for a coordinate system, they shall have the meaning defined in the standard.

It is also important to note that the transformations between coordinate systems can vary over time. In the example above the odom to map transform varies as the vehicle emerges from the tunnel. Even transforms that might appear fixed, because they are rigidly connected like the transformation from camera to vehicle, can in fact vary over time. For example, a camera system may have a dynamic re-calibration system which may change the transformation between the camera sensor coordinate system and the (virtual) camera coordinate system, thus changing the transform between the camera and the vehicle.

The OpenLABEL standard therefore provides a method to describe transforms which are fixed for all time, which vary occasionally (at a specific frame), and which vary continuously (every frame).

Concepts

The key concepts are:

Coordinate-system – a way of using one or more numbers, ‘coordinates’, to specify the location of points in some space. E.g. the 2D position of a pixel within an image or the 3D position of a LIDAR return point in the world relative to the vehicle’s rear axle. Coordinate systems are often, but by no means always, 3D right-handed cartesian systems.
Transform – a transformation allowing the coordinates one coordinate-system to be converted into coordinates in another coordinate-system such that they represent the same point in space. A transform always has two coordinate-systems associated with it, a source and a target.

Each coordinate-system has a textual name and a uid that is used to reference it within the OpenLABEL JSON data stream. A small number of names are reserved and refer to pre-defined coordinate systems specified in the standard. All other names are user defined. Each coordinate-system is defined by either being associated with a sensor, or by being the source or target of a transform, where there is a sequence of such transforms that ends in either a sensor or a pre-defined coordinate system.

Each transform is one of a fixed number of types. The types supported are:

camera-transform – a projective transform describing how a point in the real world is translated into a pixel on the camera sensor. Usually split into several components, intrinsics, distortion coeficients and extrinsics. [TODO, should we support multiple types of camera-transform with different complexities of distortion model?]
cartesian-transform – a 3D to 3D transform offering a change of origin, scale and rotation. Represented as a matrix and a quaternion.
geospatial-transform – a transform from a 3D Cartesian coordinate system into an ellipsoidal GNSS style coordinate system. E.g. from map-UTM to WGS84 latitude, longitude, altitude.

[ Note that the current JSON schema does not seem to allow for arbitrary numbers of coordinate systems each having their own name. It seems to assume the existence of just vehicle and world coordinate systems and then describes transforms from sensors to these, and a transform between vehicle and world coordinates. I believe we should change this to allow an arbitrary number of coordinate systems with transforms between them. I suggest a coordinate-system in the JSON should be a string and there should be a way to associate a coordinate system with a sensor. I suggest a transform in the JSON stream should include: source-coordinate-system, target-coordinate-system, transform-type, transform parameters (as arrays). ]

Pre-defined coordinate systems

The following coordinate system names are predefined, and wherever used have the following meaning:

Figure 1. coordinate systems with heading, pitch and roll

"vehicle-iso8855” – a right-handed coordinate system with origin at the centre of the rear axle projected down to ground level. Note the origin is attached to the rigid body of the vehicle (not actually an axle that has suspension components between it and the vehicle body). It is at ground level with the vehicle nominally loaded, depending on the actual loading it may in fact be above or below ground level. Similarly, the axis pointing forward may actually point slightly upwards or downwards relative to ground level depending on the front to back loading of the vehicle. The x axis is forward, the y axis to the left and the z axis upwards. See the ISO 8855 specification.

Figure 2. Vehicle coordinate system, ISO 8855

“odom” – a 3D cartesian coordinate system that is approximately fixed in the world. The transform between the vehicle-iso8855 coordinate system and this one is guaranteed to be continuous (i.e. will vary smoothly over time). Note that the transform between odom and map-UTM is may be discontinuous (i.e. there may be sudden jumps in the value of the transform). The odom origin is often the starting point of the vehicle at the time the system is switched on. See the ROS documentation.
“map-UTM” – a 3D cartesian coordinate system useful for mapping moderately sized regions of the earth. It is locked to the earth and is a set of slices of flat coordinates that cover the earth. See the UTM specification. ]
geospatial-wgs84” – 3D ellipsoidal coordinate system used for GNSS systems. I.e. latitude, longitude, altitude. It is fixed to the earth (ignoring continental drift etc ) and covers the entire earth. See the various GPS specifications.

Typical transform trees

There are several sets of coordinate systems (blue boxes) and transforms (blue lines) between them that are commonly used.

For example, a ROS based system with the sensors described in the example system in the introduction might have the following transform tree:

A set of data captured from a dash-cam (single camera plus GPS) might look like:

A single camera with no other data, with with the movement of the camera deduced by structure from motion, might look like:

3.2. Geometries for labeling

When labelling objects with in data streams different geometries are necessary. depending on the typ of sensor stream either 2D or 3D geometries are needed. Therefore the OpenLABEL Standard will provide a set of primitives that can used to label objects and areas in the sensor streams.

The format described in chapter 1 supports the following geometry types :

bbox: a 2D bounding box
rbbox: a 2D rotated bounding box
cuboid: a 3D cuboid
point2d: a point in 2D space
point3d: a point in 3D space
poly2d: a 2D polygon defined by a sequence of 2D points
poly3d: a 3D polygon defined by a sequence of 3D points
area_reference: a reference to an area
line_reference: a reference to a line
mesh: a 3D mesh of points, vertex and areas

3.2.1. Point

A point in the two dimensional space has two coordinates: x and y in the three dimensional space a point is defined by three coordinates x, y and z.

In the below example a 2D point is defined at the coordinates x=100 and y=100.

"point2d": {
    "name": "2D_point",
    "val": [100,100]
}

In the below example a 3D point is defined at the coordinates x=100 and y=100 and z=50.

"point3d": {
    "name": "3D_point",
    "val": [100,100,50]
}

3.2.2. Line

A Line is a basic element which is defined by two points. Defining a line by using two points will make reduce differences when computing the lines on different systems with different implementations. When defining a line with a starting point and a length the endpoint of the line has to be calculated. Depending on the system the results can differ slightly.

"line_reference": {
    "name": "line",
    "val": ["TODO: definition of the line"]
}

3.2.3. Boxes

Boxes are used as a very basic tool to label objects. usually a box is placed around an object and enclosed this completely. For many use cases the exact outline of an object is not necessary and therefor would only cost computational power. to avoid this boxes can be used. For example when labeling a parking car, the exact outline of the car is not needed as the car would not pass so close to the parking car as that the exact outline would be beneficial for the path calculation. In OpenLABEL there are two kinds of boxes:

2D Boxes
3D Boxes also called cuboids

Example for and 2D bounding box:

"bbox" : [{
            "name" : "",
            "stream" : "CAM_LEFT",
            "val" : [296.74, 161.75, 158.48, 130.62]
        }
    ],

Example for and 3D bounding box or cuboid:

"cuboid" : [{
            "name" : "",
            "val" : [14.44, 4.55, -0.2, 0, 0, -2.11, 1.82, 4.43, 2.0]
        }
    ]

ASAM OSI definition for a 2d or 3d box: Allowed number of referenced points: 2 or 3

Allowed number of referenced points = 2: first and third corner of the box. Box is aligned horizontal resp. vertical.

Allowed number of referenced points = 3: first, second and third corner of the box. fourth corner is calculated by first+third-second corner.

"OSI Box documentation"

3.2.4. Polygons

a more complex primitive is a polygon, in OpenLABEL there are two types of polygons. Polygons can be very useful to label more complex outlines when necessary.

2D Polygons
3D Polygons

ASAM OSI definition for a polygon

Allowed number of referenced points: 3 .. n

Polygon is defined by the first, second, third and so on points. The polygon shape is closed (last and first point are different).

"OSI Polygon documentation"

"poly2d" :[{
    "name" :"2d-polygon"
    "val" : ["a","b","c","d"]
}]

it is also possible to create 3d Polygons for 3 dimensional data e.g. Lidar point clouds

"poly3d" :[{
    "name" :"3d-polygon"
    "val" : ["..."]]
}]

3.3. Spatial Rotation

There are several ways to describe spatial rotations, each of which has its own advantages and disadvantages. This chapter discusses these methods as well as their differences.

3.3.1. Rotation matrices

A rotation matrix is a 3x3 matrix which consists of orthogonal unit vectors, i. e. it is an orthonormalized matrix.

The multiplication of rotation matrices equals the concatenation of rotations and thus yields rotations matrices. However, because of floating point errors, the resulting matrix needs to be orthonormalized. Therefore, the Gram-Schmidt process can be applied.

3.3.2. Euler Angles

A rotation can be described using Euler Angles (roll, pitch, yaw).

3.3.3. Quaternions

To understand quaternions, it might help to remember complex numbers, their properties and the reason why they describe rotations in two dimensional space.

Complex Nnumbers

Complex numbers, introduce the imaginary value \(i\) with the property \(i^2=-1\). A complex number \(a+bi\) can be represented as a point in a 2d-coordinate system.

With \(i^2=-1\), the multiplication of two complex numbers yields a rotation around the origin (and a multiplication of their radii).

Quaternions

Quaternions use a similar mathematical concept as complex numbers. Therefore, we introduce three imaginary values \(i, j, k\) and the following axioms in form of a multiplication table:

-1

-j

-k

-1

-i

-1

\(i^2=j^2=k^2=-1\)
\(i*j=k, \quad\quad j*k=i, \quad\quad k*i=j\)

Considering a 3d-coordinate system where each imaginary value is associated a unique dimension, these axioms already yield the rotation around certain axis. The following picture shows a primitive example. To rotate vectors by a quaternion, vectors are represented as quaternions by linear combination of the i, j, and k with the same coefficients as the corresponding unit vectors. To rotate the \(i\)-vector around the latexmk:[j]-vector, simply multiply it with j. The result is the latexmk:[k]-vector. Hence, \(i\) was rotated around \(j\) by 90 degrees.

In general, quaternions can be thought of as rotations around a certain axis with a certain angle.

The math behind quaternions

A quaternion \(q=w+xi+yj+zk\) is normalized if its norm \(n(q)\) equals \(1\), where

\[n(q) = \sqrt{w^2 + x^2 + y^2 + z^2}.\]

The conjugate of \(q\) lets the "rotation axis" point into the other direction.

\[\bar q = w-xi-yj-zk\]

Quaternions form an algebraic field, thus addition, subtraction, multiplication, and division are defined.
- The neutral element regarding multiplication is \(\quad 1 = 1 + 0i + 0j + 0k.\)
- The inverse regarding multiplication is

\[q^{-1} = \frac{\bar q}{n(q)^2}.\]

A vector \(p=(p_1, p_2, p_3)\) is rotated by q via \(q' = q*(0 + p_1i + p_2j + p_3k)*\bar q\).
The unit quaternion which does the rotation around a unit vector considered as axis \(u=(u_1, u_2, u_3)\) with an angle \(a\) is constructed via

\[q = cos\left(\frac w 2\right) + sin\left(\frac x 2\right) u_1i + sin\left(\frac y 2\right) u_2j + sin\left(\frac z 2\right) u_3k\]

Comparison between rotation representations

Euler angles have the advantage of being easy to understand and explain. However, they are the most difficult representation for computers since they first have to be transformed to rotation matrices or quaternions before being applied. Euler angles also suffer from the gimbal lock, which is a state of rotation where all further rotation degenerates into having only two degrees of freedom instead of three. However, Euler angles don’t need to be normalized as do quaternions and rotation matrices.

For these reasons, the following table only compares the computation time and storage requirements of quaternions and rotation matrices.

	Quaternion	Rotation Matrix
Storage	4(3)	9
Operations for chaining rotation	24	45
Operations for vector rotation	30	15
Normalization	cheap (normalize)	expensive (Gram Schmidt process)

The fourth component of a unit quaternion can always be derived from the other three, i.e. quaternions need as few storage as Euler angles. However, this would always need an extra computation step.

The normalization of a quaternion is quite simple since it just needs to be divided by its norm. On the other hand, the normalization of rotation matrices is quite complex, since it is not just the length of all column vectors which need to be normalized, but also the right angle between all column vectors. This can be achieved with the Gram Schmidth process which requires much more computation steps than just one normalization.

For the reasons described above, we consider unit quaternions stored with four floating point values to describe rotations of objects such as bounding boxes or frames of coordinate systems.

Helper functions for quaternions

The following function returns the angle, a quaternion would rotate a vector around its rotation axis.

function angle( Quaternion q )
  q = n( q )
  if( q.z >= 0 )
    return 2*acos( q.w )
  else
    return -2*acos( q.w )

The axis of a quaternion can be retrieved as well. If the rotation angle is (very close to) zero, the rotation axis is ambiguous since it could be any rotation axis. In this case the function returns a default axis (e.g. the last unit vector).

function axis( Quaternion q )
  if( n(q) < epsilon )
    return ( 0, 0, 1 )
  else if( q.z >= 0 )
    return 1/n(q) * ( q.x, q.y, q.z )
  else
    return -1/n(q) * ( q.x, q.y, q.z )

Please note, that the quaternions \(q\) and \(-q\) represent the same rotation. This is easy to see when imagining the quaternion as a rotation around an axis: Negating the axis simply switches the rotation direction. However, negating the rotating angle too cancels out the change of rotation direction.

Also, we normalized the rotation axis in such a way that it always points up (i. e. in the positive \(z\)-axis).

Transformation between rotation representations

Quaternion to rotation matrix:

Let \(q=w+xi+yj+zk\) be a normalized quaternion. Then, the corresponding rotation matrix is

\[M(q) = \begin{pmatrix} x^2-y^2-z^2 + w^2 & 2*(x*y - z*w) & 2*(x*z + y*w) \\ 2*(x*y + z*w) & -x^2 + y^2 - z^2 + w^2 & 2*(y*z-x*w) \\ 2*(x*z - y*w) & 2*(y*z + x*w) & -x^2 -y^2 + z^2 + w^2 \end{pmatrix}\]

Rotation matrix to quaternion:

On the other hand, a rotation matrix \(m\) can be transformed into a quaternion:

\[Q(m) = \frac 1 2 \sqrt{1 + trace(m)} + \frac{(m_{2, 1} - m_{1, 2})}{ 2 \sqrt{1 + trace(m)} }i + \frac{(m_{0, 2} - m_{2, 0})}{ 2 \sqrt{1 + trace(m)} }j + \frac{(m_{1, 0} - m_{0, 1})}{ 2 \sqrt{1 + trace(m)} }k\]

with \(trace(m) = m_{0,0} + m_{1,1} + m_{2,2}.\)

Quaternion to Euler angles:

The Euler angles roll (\(\gamma\)), pitch (\(\beta\)), and yaw (\(\alpha\)) can be retrieved from a quaternio \(q\) as follows:

\[\alpha(q) = \text{atan2}\left( 2*q.y*q.w - 2*q.x*q.z, 1-2*q.y^2-2*q.z^2 \right) \\ \beta(q) = \text{asin}\left( 2*q.x*q.y + 2*q.z*q.w \right)\\ \gamma(q) = \text{atan2}\left( 2*q.x*q.w - 2*q.y*q.z, 1-2*q.x^2-2*q.z^2 \right)\]

However, there are two exceptions at the poles: If \(q.x*q.y+q.z*q.w = 0.5\), then

\[\alpha(q) = 2*\text{atan2}( q.x, q.w ) \\ \beta(q) = \frac\pi 2 \\ \gamma(q) = 0\]

and if \(q.x*q.y+q.z*q.w = -0.5\), then

\[\alpha(q) = -2*\text{atan2}( q.x, q.w ) \\ \beta(q) = -\frac\pi 2 \\ \gamma(q) = 0\]

Euler angle to quaternion:

A different order of application of roll, pitch, and yaw yields different overall rotations. We suppose that roll (\(\alpha\)), pitch (\(\beta\)), and yaw (\(\gamma\)) are applied in the order \(\gamma, \beta, \alpha\). Then, the overall rotation is represented by the following quaternion:

\[Q(\alpha, \beta, \gamma) = \cos\frac\gamma 2 \cos\frac\beta 2 \cos\frac\alpha 2 - \sin\frac\gamma 2 \sin\frac\beta 2 \sin\frac\alpha 2 \\ +\left(\sin\frac\gamma 2 \sin\frac\beta 2 \cos\frac\alpha 2 + \cos\frac\gamma 2 \cos\frac\beta 2 \sin\frac\gamma 2 \right)i \\ +\left(\sin\frac\gamma 2 \cos\frac\beta 2 \cos\frac\alpha 2 + \cos\frac\gamma 2 \sin\frac\beta 2 \sin\frac\gamma 2 \right)j \\ +\left(\cos\frac\gamma 2 \sin\frac\beta 2 \cos\frac\alpha 2 + \sin\frac\gamma 2 \cos\frac\beta 2 \sin\frac\gamma 2 \right)k\]

Standards

Rotations of bounding boxes, point clouds, or other three-dimensional objects shall be represented using quaternions. A quaternion shall be notated using four floating point numbers \(q=(w,x,y,z)\) of double precision where the first component shall represent the real part of the quaternion (TODO: reference to the user guide). Although rotations are represented by normalized quaternions which only have three degrees of freedom, the quaternion shall be written using four numbers such that the norm acts as a checksum.

Whenever the \(z\)-component of a quaternion is negative, the quaternion should be negated (i. e. consider \(-q\)) in order to avoid ambiguities. The angle of rotation should always be considered in the interval \([-\pi, \pi\)].

The rotation axis of a quaternion \(q\) representing a rotation with no angle (i. e. it corresponds to the identity function) should be the \(z\)-axis, i. e. \(axis(q)=(0, 0, 1)\).

3.4. Bounding Boxes

Description:

Bounding Boxes are used to label objects and entities detected by sensors mounted e.g on a vehicle. There are different types of bounding boxes 2D/3D.
Ususally the primary sensor for 2D bounding boxes is the camera.
Ususally the primary sensor for 3D bounding boxes is the lidar or radar.

Using bounding boxes is a cost and time efficient way to label data. It is easiery to draw boxes over detailed "painting" of areas in the data. Data sets that are labeled with bounding boxes are also cheaper to process in terms of processing power and use less space in storage.

Depending on the target Machine Learning Network datasets labeled with bounding boxes are mandatory.

3.4.1. 3D Bounding Boxes / Cuboids

Description: A 3D bounding box provides a rough size estimation of an object in height, width and length, along with its position and rotation in 3D space. A 3D bounding is defined in as rectangular cuboid (from now on, cuboid), having 9 degrees of freedom:

3 for position (x,y,z)
3 for rotation (rx, ry, rz) = (roll, pitch, yaw)
3 for size (sx, sy, sz) = (length, width, height)

To have an unambiguous representation of the cuboid in 3D space, it is necessary to declare the convention that provides meaning to these rotation and size magnitudes. Therefore, a cuboid is defined as a 9-dimensions vector:

\(c=(x, y, z, r_x, r_y, r_z, s_x, s_y, s_z)\)

\((x, y, z)\) is the position of the center point of the cuboid, in meters;
\((r_x, r_y, r_z)\) are the (improper) Euler angles associated to the x, y and z-axes, in radians. These angles shall be expressed as intrinsic, active (alibi) rotations so that a transformation built with these angles and position can be used to change the cuboid as a rigid body with respect to a certain coordinate system. The convention is that \(r_x\) =roll, \(r_y\) = pitch, and \(r_z\) = yaw, to follow usual industrial standards. Rotations shall be applied Z→Y'→X''.
\((s_x, s_y, s_z)\) are the dimensions of the cuboid, in meters. Note \(s_x\) expresses "length", \(s_y\) "width", and \(s_z\) "height", although these terms are only meaningful depending on the observer coordinate system and conventions. In this document, the ISO 8855 is taken as example, where x-axis is the longitudinal axis (thus x="length"), y-axis is the transversal axis (y="width"), and the z-axis is the vertical axis (z="height"), so a ground plane in reality will coincide with z=0.

The order of rotations shall be Z→Y'→X'', and must be followed, otherwise, the cuboid rotation will be different, as there are multiple Euler angles to express the same rotation, and consequently, different order execution of Euler angles produce different rotations.

The rotations on axes Z, Y, and X must be intrinsic. That means that after each rotation, the next rotation is applied on the new axes of the object, which have been rotated after each step. This is commonly notated as Z→Y'→X'' contrary to Z→Y→X which assumes that all rotations are expressed with respect to the first/original axes.

Examples: Simple examples of the described concepts can be seen in the following images.

In practical terms, labeling a full 9-DoF cuboid could be more cost intensive than simpler 2D bounding box labeling. In some cases, a cuboid could be simplified to have \(r_x=r_y=0\), that is, being always parallel to a flat ground plane (z=0), showing only yaw rotation.

An alternative to (improper) intrinsic Euler angles is to use quaternions, which provide an unambiguous expression of a rotation. A quaternion is a 4-dimensions vector which encodes the degree of rotation around a 3D vector, \(q=(w, x, y, z)\). However, quaternions are more difficult to interpret by human beings. And, many software packages are already written to work with Euler angles and 3x3 rotation matrices.

Conversion from quaternions into the proposed (improper) intrinsics Z→Y'→X'' angles is possible:

\[q=(w, x, y, z); t0 = 2(wx + yz); t1 = 1 - 2(x^2 +y^2); r_x = atan2(t0, t1); t2 = 2(wy -zx); r_y = asin(t2); t3 = 2(wz +xy); t4 = 1 -2(y^2 + z^2); r_z = atan2(t3, t4)\]

3.4.2. 2D Bounding Boxes

Description: A 2D bounding box only provides a rough size estimation of an object in height and width A 2D bounding box is defined in as rectangle, having 4 degrees of freedom:

2 for position (x,y)
2 for size (height, width)

Why and when to use 2D Bounding Boxes:

Example:

Figure 3. Example of 2D bounding boxes

In the above image two objects are labeled using 2D bounding boxes. The 2D bounding box only roughly covers the outline of the object.

2D Bounding Box specific Rules:

TODO add rules on what is allowed when using 2D Bounding Boxes

3.5. Semantic Segementation

General Description: <what is semantic segementation in general?>

Sematic image segmentation, also called pixel-level classification, is the task of clustering parts of an image together which belong to the same object class. Technically, it means assigning to each pixel a value/code corresponding to a certain class of interest (object/entity category).

For convenient human visual consumption, this is typically achieved through the designation of a color code for each class. The information of e certain pixel belonging to a certain category is thus expressed by assigning to that pixel a specific RGB value, which visually represents that category.

Figure 1: Semantic segmentation in Cityscapes dataset

It is important to notice how the semantic segmentation task treates objects as stuff, which is amorphous and uncountable. Multiple objects of the same class are treated as a single entity. Thus, no information exists about specific instances of a class. Cars are all assigned a blue color code (for example) and are treated as being part of the same amorphous "car stuff".

Semantic segmentation annotations follow the form of the objects and have no fixed shape. Manually, this is usually achieved by drawing refined polygons around the regions of interest, or by painting the region of interest through a paintbrush-like feature. The result is a precise mask that isolates only the object of interest and no surrounding pixels.

In the 2D Annotation Space this method provides the highest accuracy of the objects. However, this comes at increased costs in comparison to other annotation methods. Furthermore, segmentations take up more time during the labeling process than other 2D annotation methods and thus have lower throughput.

Formal Definition

Formally, Semantic Segmentation can be defined as follows: Let \(P=\{p_{1}, p_{2}, ... p_{p}\}\) be the set of all the pixels in a given frame (image).

Then the cardinality \(|P|\) is equal to the number of pixels in such a frame.

Let \(C=\{c_{1}, c_{2}, ... c_{c}\}\) be the set of all the classes that are defined for a labeling task (For example \(c_1=car, c_2=pedestrian\)) and so on. Then the cardinality

\(|C|\) is equal to the number of classes that are defined for such a task.

To perform semantic segmentation labeling on an image, it means to establish a relation that is valid when a pixel \(p_{x}\) represents a portion of an object belonging to one of the defined classes \(c_{y}\).

We can define \(R_{seg}\) as a relation between the sets \(P\) and \(C\). Formally, this means to define a subset of the cartesian product \(R_{seg} \subset P \times C\), where \(P \times C = \{ (p_{1},c_{1}), (p_{1},c_{2}), ... (p_{n},c_{m}) \}\)

Let \(D \subseteq P\) be the domain of the Semantic Segmentation relation \(R_{seg}\), we have then the following taxonomy:

Description:

Semantic segmentation Taxonomy

Partial scene Segmentation when \(\exist p_{x} \in P: (p_{x}, c_{y}) \notin R_{seg}\). There are some pixels that have no classes associated with them. In this case \(D \subset P\)
Full scene Segmentation when \(\forall p_{x} \in P, \exist c_{y} \in C : (p_{x},c_{y}) \in R_{seg}\). All the pixels have a class associated. In this case \(D\) coincides with \(P\). Notice that in the case we use the class "unlabeled", or "other" to indicate all the pixels outside the real classes of interest, we are still performing a form of full scene segmentation.
Single-class per pixel Segmentation when \(\forall p_{x} \in D, \exist! c_{y} \in C: (p_{x},c_{y}) \in R_{seg}\) This is the case when each labeled pixel is associated with exactly one class.
Multi-class per pixel Segmentation when \(\exist p_{x} \in D, \exist c_{1}, c_{2}... c_{k} \in C: (p_{x},c_{1}), (p_{x},c_{2}), ...(p_x,c_{k}) \in R_{seg}\) This is the case when at least one labeled pixel is associated with more than one class.

3.5.2. Instance Segementation

Instance segmentation enriches the semantic segmentation information adding a separation among specific different instances of objects belonging to a class. This method allows to separate stuff into individual, countable things. In contrast with semantic segmentation task, where each pixel belongs to a set predefined classes, in instance segmentation the number of instances is not known a priori.

Formal Definition

Formally, instance segmentation can be defined as an extension of Semantic Segmentation as follows: Let \(I=\{i_{1}, i_{2}, ...i_{n}\}\) be the set of all the instances of countable objects in the scene (image). Then the cardinality of the set \(|I|\) is equal to the total number of object instances that populate the scene. To perform Instance segmentation labeling on an image it means to establish a ternary relation \(I_{seg} \in P \times C \times I\) that is valid when a pixel \(p_{x}\) represents a portion of an object belonging to one of the defined classes \(c_{y}\), and to a specific object instance \(i_{z}\). \(P \times C \times I = \{ (p_{1},c_{1},i_{1}), (p_{1},c_{1},i_{2}), ... (p_{n},c_{m},i_{l}) \}\)

Notice how Instance awareness can be added to any kind of semantic segmentation described before, just by extending the relation to an additional "instance" set.

Let \(D_{in} \subseteq P\) be the domain of the Instance Segmentation relation \(I_{seg}\).

Instance-unique segmentation \(\forall p_{x} \in D_{in}, \exist! c_{y} \in C, \exist! i_{z} \in I: (p_{x},c_{y},i_{z}) \in I_{seg}\) This is the case when each labeled pixel is associated with exactly one class and exactly one instance of that class.
Multi-class Multi-Instance segmentation \(\exist p_{x} \in D_{in}, \exist c_{1},c_{2}, ... c_{c} \in C, \exist i_{1},i_{2},... i_{i} \in I : (p_{x},c_{1},i_{1}),(p_{x},c_{1},i_{2})... (p_{x},c_{c},i_{i}) \in I_{seg}\). This is the case when each labeled pixel can be associated with more than one class and with more than one instance of those classes.

Notice how from these general definition one can cover all the possible - particular cases - permutations, or ways to construct semantic and instance segmentation labeling.

Figure 2: Instance segmentation in Cityscapes dataset

Description:

4. Taxonomy

The taxonomy working group is closely related to the OpenXOntology working group. In the OpenLABEL Concept Project this workpackage will come up with a sample Taxonomy that will help the OpenLABEL Concept project to proceed with the concept paper. This taxonomy will also serve as requirements for the OpenXOntology Domain model.

The elements of the taxonomy shall be used as labels within OpenLABEL. The Taxonomy can be found within the OpenXOntology

5. Scenario Labels

5.1. Introduction to Scenario Labels

Current methods of scenario definition focus on the representation of scenarios for the purpose of scenario execution (either in simulation or real-world) or for training Machine Learning systems.

When working with scenarios it is often found that additional data beyond that which can be represented in a scenario definition is necessary to facilitate their use, discoverability, management, and portability. Similarly, there some types of data, where it is not always easy or efficient to attempt to extract it from a scenario, and for some types of data it is impossible due to the limitations of the scenario definition language being used.

Two types of scenario generation methods exist: bottom-up approach and top-down approach.

Top-down scenario generation methods using predefined languages:

Scenarios created manually, e.g. following guidelines
Scenarios synthesized from statistical analysis of non-sensor data
Scenarios generated using formal methods of systems analysis of vehicle control systems
Scenarios generated using machine-learning

Bottom-up scenario generation methods:

Scenarios generated from the annotation of sensor data captured from real-world driving
Scenarios generated from the annotation of sensor data captured from simulation

5.2. Goals

Identify a set of scenario labels to assist and promote the use, discoverability, management, and portability of scenarios.
Facilitate the association of additional data with scenario definitions.
Facilitate the efficient searching for relevant scenarios where performing real-time inspection of scenario definitions, while searching, would not be performant.
Facilitate the sharing of scenarios between systems where the systems may not have the ability to inspect the scenario definition or underlying scenario data.
Facilitate scenario storage systems which are independent of scenario definition representation, i.e. enable scenario databases to store scenarios defined using more than one scenario definition language and future proof such systems against new versions of scenario definition standards.
Enhance Machine Learning training and validation data sets with additional information to identify scenarios / events / actions.

To illustrate how the goals for the scenario labels can look like. the below image demonstrates the workflow. In the left bucket all available labels are gathered, this includes, higher level labels, scenario content related labels, odd labels etc. As soon as a scenario needs to be labeled the applicable labels are selected from the "bucket". Depending on the purpose/objective of the scenario the selected labels might differ. Now the scenario (with all its necessary files) is stored along the annotation data (labels).

Now one use for the labels is shown in the below image is to form search queries to get the corresponding scenarios. In the below example the scenario label searched is "school zone", all matching scenarios are returned. This example only illustrates a very simple use case, when using the labels also relations between labels might be part of the search query, e.g. right turn at a playground. How the relation between the two labels [right-turn] and [play-ground] are realized needs to be determined in the follow-up project.

Figure 4. scenario labeling, saving and searching use case

One important task for the standardization project is to determine the how the behavior and maneuver labels will look like. The user guide will help future users get started labeling scenarios and implement applications using the scenario labels (e.g. search engine).

5.3. Definition of Scenario in the OpenLABEL Context

NOTE: The scenario labelling concept was started late in the OpenLABEL concept project runtime. Certain aspects of the concept paper are still considered by the project group as open issues and need further reflection and development as part of the standardization project. Still the project group decided to publish the content to show what is being discussed in the working group and gather wider inputs. The clear goal is to finalize and implement these concepts in the OpenLABEL Standard which will be developed in the follow-up project.

Multiple definitions and meanings of the term “scenario” exist today within the context of Automated Driving. It is not the intention of the current document to shed light on the best definition, nor it is to increase the body of knowledge around scenarios. Our aim is to provide a high-level definition of the term “Scenario” which is relevant to the current concept document for OpenLABEL standard.

The concept of scenario we are interested in “involves telling a “story” about a more or less detailed sequence of events, including at least one situation within a scene, its scenery and dynamic elements, and ongoing activities of one or more actors.” [Geyer et. al., 2014]

This “story” unfolds in a specific domain of discourse which is relevant to us: the one of road traffic. It is then useful to decompose this domain into its main elements in order to provide a conceptual mental model that will prove to be a useful tool of orientation throughout this and the other OpenX standards.

For Scenarios we need the following 3 conceptual constructs + 1 additional introduced later for clarity:

Table 1. Scenario Conceptional Elements
Dynamic Agents:	Refers to Dynamic agents and their behavior such as motor vehicles and vulnerable road users and their relations and events/actions evolving over time
Environment:	Refers to environmental variables that have a relevance: rain, illumination etc.
Scenery:	Refers to road topology and geometry (curvature, etc.), including semantics: number of lanes, lane markings etc. and the traffic infrastructure: barriers, signs, traffic lights and their position, etc.

Conceptually, a scenario is nothing more than a description of a time interval detailing a set of events and activities performed by a relevant set of dynamic agents. Such a description is grounded in its environment and scenery, and may include the goals of one or more of the participants.

Operationally, scenarios can be described and substantiated at many levels of abstraction and semantics, either quantitatively through rigorous mathematical definitions and state-space variables, qualitatively, through the use of natural language, or a mixture of both, through the use of human and machine-readable languages like SDLs (Scenario Description Languages) that usually have both a procedural and declarative nature.

So, while “scenarios” usually refer to the same conceptual construct and domain of discourse (that of sequences of events happening on a certain road environment), they can take shape in many different “artifacts” such as specific file formats (e.g - openScenario 1.x, etc.), Domain Specific Languages files (OSC 2.0, M-SDL, etc.), written natural languages sentences, or even raw sensor recordings enriched with additional metadata.

In sum, It is important to disambiguate and clarify the distinction between “scenario” as a concept and its domain of discourse, which is unique, and the embodiment of such domain into a specific artifact, which can take many forms, for different purposes.

Usually, as of today, in the Automotive domain, such an embodiment process - which we call “Scenario Generation” - happens along two main axes: top-down and bottom-up.

5.4. Scenario Artifacts Generation

It is useful to distinguish two streams in Scenario Artifacts Generation: top-down and bottom-up.

5.4.1. Bottom-up

Bottom-up scenario generation refers to the generation of scenario objects from the direct observation of events in the world, be it real or simulated. This process entails the presence of sensors (real or simulated) that can extrapolate information from the environment; this information is usually then enriched with metadata (labels) that describe the content of the scenario captured by the sensors.

Examples of Bottom-up scenario generation methods:

Scenarios generated from the annotation of sensor data captured from real-world driving
Scenarios generated from the annotation of sensor data captured from simulation

5.4.2. Top-down

Top down Scenario Generation usually refers to the process in which scenario objects are produced from pre-existing bodies of knowledge, not from the direct observation of events. This knowledge can live in a database of accidents, in existing computer programs, or in human brains.

Examples of Top-down scenario generation methods using predefined languages:

Scenarios created manually, e.g. following guidelines
Scenarios synthesized from statistical analysis of non-sensor data
Scenarios generated using formal methods of systems analysis of vehicle control systems
Scenarios generated using machine-learning

5.5. Scenario Artifacts and OpenLABEL

Regardless of the generation method, it becomes clear that many different objects and artifacts exist that deal in some way with the concept of a scenario and the 3 constructs of the domain of discourse introduced previously.

The common artifacts that embody a description of scenario in a human and machine readable manner are often referred to as Scenario Description Languages (SDLs), and these are frequently used in the context of simulating scenarios. The ASAM OpenSCENARIO SDL is particularly relevant as it sits within the same family of OpenX standards as OpenLABEL. OpenSCENARIO 1.x is a relatively concrete SDL, whereas OpenSCENARIO 2.x is broader, and can describe both concrete scenarios and more abstract scenarios.

OpenLABEL attempts to be independent of the specific SDL used to represent scenarios; this means that it should provide a consistent set of labels to be queried, regardless of the details of the representation capabilities of the SDL being used. Given this independence from SDLs, the labels in OpenLABEL may capture a certain level of detail about the content of the scenario - related to the 3 constructs - (for example, characteristics of the location where it occurs, the vehicles involved, and perhaps the maneuvers they execute).

It then becomes useful to introduce another construct in our conceptual model, “Additional Labels”, which does not deal with the purely ontological concepts of the scenario domain of discourse - its specific content or narrative -, but rather introduces the aforementioned additional concepts and information.

Table 2. Scenario Conceptional Elements
Dynamic Agents:	Refers to Dynamic agents and their behavior such as motor vehicles and vulnerable road users and their relations and events/actions evolving over time
Environment:	Refers to environmental variables that have a relevance: rain, illumination etc. Some may impact other layers as well (rain impacts road friction coefficient etc.)
Scenery:	Refers to road topology and geometry (curvature, etc.), including semantics: number of lanes, lane markings etc. and the traffic infrastructure: barriers, signs, traffic lights and their position, etc.
Additional labels:	Information indirectly related to the content (criticality, NCAP relevance, exposure, etc.) Information non content-related, but Scenario Artifact related (author, version, revisions, ADAS features under test, etc.)

The above table summarizes our thought and provides a birdseye view over the conceptual mental model of the domain of discourse we introduced, and its interaction with multiple scenario artifacts. OpenSC1.x files cover concepts related to the constructs of “Dynamic Agents” and possibly “Environment” of our mental model in a pretty concrete manner, and may point to OpenDRIVE files which concretely and describe elements of “Scenery”. OSC2.x instead, while sharing some similarities to Osc1.x, provides a broader spectrum of abstraction and covers more use cases. In fact it can be considered a superset of OSC1.x.

Figure 5. relation between OpenLABEL and other OpenX Standards

5.6. Interaction with Scenario Data

Scenario data can be either recorded real-world data, or an abstracted representation of a scenario intended for use in scenario recreations. In either case it is assumed that the scenario data is stored in file(s) in a common format, and in addition to the core scenario data these files may themselves contain labels about the scenario. Therefore the source files for the scenario must contain the scenario data (which describes the observable features and evolution of the scenario), but (depending on the format) may also contain labels.

For abstracted scenario representations, the common file formats are often referred to as Scenario Description Languages (SDLs), and these are frequently used in the context of simulating scenarios. OpenLABEL attempts to be independent of the specific SDL used to represent scenarios; this means that OpenLABEL should provide a consistent set of data to be queried, regardless of the details of the representation capabilities of the SDL being used.

Given this independence from SDLs, the scenario labels in OpenLABEL should capture a certain level of detail about the scenario (for example, characteristics of the location where it occurs, the vehicles involved, and perhaps the maneuvers they execute). In some ways it can therefore resemble a highly-abstract SDL itself. This is both unavoidable and desirable — if OpenLABEL held no information about the scenario data, it would not be of any use for finding scenarios according to core observable characteristics.

5.7. Illustrating the role of OpenLABEL using the structures of a book

In order to gain further clarification of the role of OpenLABEL and also its relations to other ASAM projects such as OpenSCENARIO, OpenXOntology, an example of the basic structure of a book is used to illustrate.

Within a book, there are content section, preface, individual chapters. In this case OpenLABEL would be the book content section, it provides clear link to each individual chapters. The preface can be considered as a high level scenario description language which describe the chapters in a very compact format. Each individual chapters can be considered as a detail low level scenario description language. The quality or the logic of the actual written content can be related to the ontology.

5.8. UserStories and Use cases for the scenario Labeling

5.8.1. Actors for the Scenario Labeling

Scenario Developer
Test Engineer (ADAS Developer)
Perception Engineer
Technical Authority / Regulatory Parties
Development Engineer
Scenario Database Provider
AV/ADAS Lawyer
Regulator

5.8.2. Scenario Developer

As a Scenario Developer, I want to be able to label scenarios using a standardized label set to bring about consistency to labelling and so that they can easily be consumed by other systems and users.
As a Scenario Developer, I want to be able to keep track of different versions of a scenario.
As a Scenario Developer, I want to be able to add descriptive detail to a scenario to help users of the scenario understand what it represents.
As a Scenario Developer, I want to be able to add my own labels for situations where there is no suitable standard label.

5.8.3. Test Engineer

As a Test Engineer, I want to be able to search a database for scenarios which match my ODD using a standardized set of labels.
As a Test Engineer, I want to be able to identify scenarios suitable for testing in a specified testing environment.
As a Test Engineer, I want to be able to identify scenarios suitable for testing a specified vehicle type.
As a Test Engineer, I want to be able to identify scenarios suitable for testing vehicles of a specified autonomy level.
As a Test Engineer, I want to be able to identify scenarios suitable for testing a specified ADAS feature.
As a Test Engineer, I want to be able to identify scenarios that can be used to test for a specified certification standard
As a Test Engineer, I want to be able to identify scenarios based on the level of hazard they represent.
As a Test Engineer, I want to be able to see a visualization of the scenario so that I can quickly judge whether it suits the test requirements.
As a Test Engineer, I want to be able to search for scenarios at the functional, logical, and concrete abstraction levels, to match my testing methodology.
As a Test Engineer, I want to be able to identify scenarios which have a scenario definition which is compatible with my existing testing toolchain.

5.8.4. Perception Engineer

As a perception engineer, I want to be able to label not only objects, but also specific maneuvers, actions, events, relations, so that my perception models can be trained to recognize all of the above
As a perception engineer, I want to be able to look for scenarios in a database that contain some specific events/actors/environments to efficiently tackle model training
As a perception/control engineer, I want to be able to smoothly replay real world events into a simulator without losing semantics and portability

5.8.5. Scenario Database Provider

As a Scenario Database Provider, I want to be able to import and export scenarios and their associated data in standardized file format.
As a Scenario Database Provider, I want to be able to import a scenario without having to analyse the scenario definition in order to construct a set of labels which represents the scenario content.
As a Scenario Database Provider, I want to be able to uniquely identify scenarios to avoid duplication.
As a Scenario Database Provider, I want to provide users with the ability to search on scenario content which is performant.
As a Scenario Database Provider, I want to futureproof my database against changes to scenario definition languages by making it scenario definition language agnostic.

5.8.6. AV /ADAS Lawyer

As an AV/ADS Lawyer, I want to have evidence for the provenance of a scenario and the terms under which the author has licensed the scenario for use, to ensure that a scenario can be used for product development without any copyright/IP infringement.

5.8.7. Regulator

As a regulator, I want to be able to endorse scenarios to indicate that they may or must be used as part of a regulatory test.
As a regulator, I want there to be a robust method of tracking changes and approvals to scenarios.
As a regulator, I want to be able to extract a set of scenarios conforming to a supplied specification from a large database. This specification will correspond to the ODD of the system-under-test together with (potentially) other regulatory requirements. It is critical to be able to extract all scenarios that match this specification, while not including any non-matching scenarios.
As a regulator, I want core functionality (e.g. a core set of labels) to be consistent across implementations.
As a regulator, I want access to contextual information about a scenario that makes it easier to assess the performance of the subject vehicle (for example, location-specific regulations that would not be represented in road network files, or whether any non-subject actor vehicles were violating regulations during the scenario).

5.8.8. Focus Use case: Scenario Database Storage and Retrieval

The use case of Database Searching proved to be tangential to all the main different actors and deserves further attention.

When working with scenarios it is often found that additional data - namely “Additional Metadata” in our mental model - beyond what can be represented in a scenario definition is necessary. This additional data would facilitate the use, discoverability, management, and portability of scenarios artifacts.

6. Meta Model for Scenario Labels

6.1. Meta-Model Concept

The complexity of the information associated with scenarios can raise challenge to the scenario labelling, at a high level the information requires labelling can be divided into derived data and non-derived data. Derived data represents domain specific information that can be derived directly from the scenarios, whereas non-derivable data corresponding to externally added information that cannot be derived directly from the scenarios. Examples of the derived data include road layout, vehicle manoeuvre, traffic signs, etc. Examples of non-derived data include authorship, ADAS feature under test, licenses, etc.

While the elements within the non-derived data normally does not contain domain specific relationships or class hierarchies, the elements within the derived data resemble the domain model and contain complex relationships and hierarchies, therefore it needs to be divided further into sub classes. In this meta-model for scenario labelling, the non-derived data falls within the Admin Labels category.

The Operational Design Domain (ODD) for Automated Driving Systems (ADSs) contains all the required attributes that describe the ODD of ADSs. ODD refers to the operating conditions under which an ADS can perform safely, it covers the non-behavior attributes of an scenario and it is highly desirable to enable ODD based scenario labelling and searching. Defining ODD is usually the first step of the development process of an ADS, it allows clear definition of the capabilities and limitations of the ADS, and it enables clear communication to the end user. ODD is also crucial for testing and validating an ADS, test engineers would create test scenarios that are tailored towards the ODD boundaries. Even during runtime (could be either in simulation of real-world), being able to monitor whether the ADS is operating within its ODD is of key interests for various stakeholders such as insurance provider, ADS developer and end users. Therefore the second category of the meta-model for scenario labelling will be ODD Labels.

Within a scenario, the ODD elements cover the non-behavior content. Behavior related content such as vehicle manoeuver or event triggers are not inside the operating design domain, therefore the third category of the meta-model for scenario labelling needs to cover Behavior Labels.

In order for OpenLABEL to be able to accommodate the requirements of various users, the meta-model shall include custom labels, these are the labels that are unique to individuals and should not be part of the standard. The organization of these custom labels, ie. whether custom labels will be embedded into the three categories under specific hierarchical position or whether they should be placed all under a separate custom label category, will be left for the standardization project.

Labelling a scenario is potentially labour-intensive, so OpenLABEL should not mandate that every label has a value for every scenario. However, many use cases will rely on certain labels being present: therefore, the concept of LabelSets should be used to allow users to specify which labels are mandatory for them. The LabelSet mechanism needs to be developed by the standardization project (see the open issue Optional labels and purpose bound LabelSets)

TODO: introduce a mechanism to reference labels from external standards to differentiate between OpenLABEL labels and external labels. TODO: add more examples to the topic to fully understand how the meta model shall be used. TODO: The user guide needs to clearly explain how to label scenarios, so they can be found using search engines

6.2. Label Value Definitions

OpenLABEL should include definitions for the values of each label. For all labels that map to OpenXOntology concepts, these definitions should align closely with the OpenXOntology definitions. A toy example could be:

Label

Value

Definition

Relative manoeuver

Cut-in

A challenger vehicle that was not previously in the ego vehicle’s lane moves into the ego vehicle’s lane. The challenger vehicle must be moving faster than the ego vehicle, and the challenger’s bounding-box must first enter the ego vehicle’s lane between 0m and (2v)m in front of the ego-vehicle’s centerline, where v is the ego velocity in m/s.

Relative manoeuver

Cut-out

A challenger vehicle ….

However, in many cases it will be hard to find a single definition for a value that meets the needs of all users. In this Concept paper, high-level descriptions are given for each suggested label, but the final OpenLABEL standard should additionally provide definitions for values.

Complexities with providing unambiguous definitions include:

Different interpretations may make sense to different stakeholders. For example, rainfall may be measured differently by weather forecasters to how it is reported by vehicle sensors.
Different definitions may make sense in different regions. For example, driving conventions differ on what counts as an aggressive manoeuver.
Further complexity arises because labels may not have a single 'correct' value for a scenario, as discussed in Semantics. This could happen because the value changes during the course of a scenario, because the scenario is non-concrete and allows a range of values for relevant variables, or because the behavior of one or more actors is not completely defined within the scenario.

6.3. Top Level Label Classes

There is the need to identify and, where needed, define and standardize a set of labels that can robustly serve this use case. We identified three main axis that will serve the purpose of organizing the storage, search and retrieval of scenarios artifacts within a database:

Admin Labels
- Contextual (e.g. exposure, ADAS feature tested)
- Organizational (e.g. version)
ODD Labels
- Scenery element
- Environment element
- Dynamic element (traffic properties, subject vehicle designated speed)
Behavior Labels (Actions)
- Vehicle manoeuver (eg. TurnRight_CutIn)
- Event triggers (eg. distance, time, activities)

While the ODD labels and Behavior labels are derived, the contextual/admin metadata is mainly non-derived, the details of this will depend on the format of the underlying scenario data.

6.3.1. Admin Labels

Admin Labels refer to concepts and information not directly, nor indirectly related to the content of a scenario. In relation to the conceptual mental model, they would belong to “Additional Metadata” . These labels will have to come from some source external to the data in the file and provide information such as: Author, Version, Revisions, Safety Test Case, Hazard level, and so on.

The rationale behind choosing this as one of the main axes for organizing the storage, search and retrieval of scenarios artifacts lies in the fact that for many use cases, these types of labels will need to be queried.

Examples of this are:

Authorship: we may wish to record who created the scenario

Version: we may wish to record a version number in situations where we want to keep track of different revisions of a scenario

Below lists some of the attributes identified so far under admin labels:

Attributes

Element

Attribute of (object)

Description

Example

Owning standard or specification

Scenario

Reference to external standard or specification which defines the scenario (e.g. a UNECE document). Should contain sufficient information to identify the exact scenario

EURO-NCAP, C-ECAP, …

Overall scenario name & description

Scenario

Human readable descriptions of scenario content; a summary description and a verbose description

Visualization

File

This is a link to a static image or animation of the scenario to allow users to easily see what the scenario represents.

URL

Test purpose

Scenario

Human readable description of test purpose

Target autonomy level

Scenario

SAE J3016 level of autonomy which the scenario is intended to test

0 to 5

ADAS feature under test

Vehicle

If scenario represents a test of an ADAS feature, name of the feature - there could be multiple which are applicable.

Lane keeping system, Automated parking, Adaptive cruise control, …

Dynamic content source type

Scenario

This identifies the source data that was used to create the scenario

UK STATS-19 Accident Data

Static content source type

Scenario

This defines the type of source from which the scenario was created.

Real-world, Simulation, Hand crafted, Machine learning

Associated Files

File

The set of files associated with the OpenLabel file

For each file; location, format (e.g. OpenDRIVE), type (e.g. road network logical description), version number (1.2.3.4), license, human-readable summary text

Source/owning organization(s)/individual

File

Email, URL, digital signature of person or organization owning the file

Licence

File, Scenario

This defines the license that governs allowed usage of the scenario

MIT License, OGL, …

Scenario type

Scenario

This identifies the abstraction level of the scenario.

Functional, Logical or Concrete

Scenario Unique Reference

Scenario

This is a universally unique identifier (UUID) assigned to the scenario which allows the scenario to be identified.

{123e4567-e89b-12d3-a456-426614174000}

Parent scenario

Scenario

This is a universally unique identifier (UUID) which identifies the scenario which this one is a derivative of

{893e4AA7-e89b-12d3-a456-426614174000}

Approvals

File,Scenario

Identity of approving organization, organization specific extensible fields

Versioning

File,Scenario

Version history of scenario with version numbers, creation and modification dates and times

{Major: 2, Minor: 1, Revision: 5, Build: 1000} 11/09/2020 09:11:00 UTC

Exposure

Scenario

Information relating to how often the scenario occurs in the real-world

Type/quality of data for perception systems

File

Description of data types and quality included (for each file/object?)

Target vehicle type

Vehicle

Intended classification of vehicle under test

Car, Bus, Truck, …

Safety Test

Scenario

This defines whether the scenario represents a safety test case.

Yes/No

Geographic Reference

Scenario

This identifies the geographic area where the scenario was recorded or which it represents.

GeoJson data

Hazard Level

Scenario

This represents the level of hazard present in the scenario.

None, Near miss, Minor collision, Major collision, Fatal collision

Digital Signature

File

This would allow the scenario to be verified to see if it had been modified or tampered with.

Test Environment

Scenario

Describes the environment(s) suitable for conducting testing with the scenario, e.g. NCAP testing not suitable for public roads.

Public road, Simulator, Test laboratory, Test track

Accident Cause

Scenario

If this scenario represents a real-world accident then what was the cause.

Human, AI, Animal, Act of God

Rule/advice broken

Vehicle/actor

Rules or advisory rules broken by actors during the scenario

Stop at red light, Keep to left/right unless overtaking

Rule set

Scenario

Name of rule set applicable (e.g. country specific highway code), noting countries may have multiple rule sets. May include URL

UK (https://www.gov.uk/guidance/the-highway-code)

Location specific regulations

Zone (within ODD)

Location specific rules contained within the scenario. Should link to road network element which represents that rule(e.g. double white lines) where applicable.

Speed Limit, No Overtaking, Congestion Charging

Road designation and name

Road network

Labels for road (name/number as known to human drivers)

M25 (London Orbital Motorway), A203 (Newshire High Street)

Feature requiring interaction

Road network

Description of any features requiring manual interaction of the type typically performed by a human

Toll booth, Gate

6.3.2. ODD Labels

SAE J3016 defines an Operational Design Domain (ODD) as “operating conditions under which a given driving automation system or feature thereof is specifically designed to function, including, but not limited to, environmental, geographical, and time-of-day restrictions, and/or the requisite presence or absence of certain traffic or roadway characteristics.”

Defining ODD is usually the first step of the development process of an ADS, it allows clear definition of the capabilities and limitations of the ADS, and it enables clear communication to the end user. ODD is also crucial for testing and validating an ADS; test engineers would create test scenarios that are tailored towards the ODD boundaries. Even during runtime (could be either in simulation or real-world), being able to monitor whether the ADS is operating within its ODD is of key interests for various stakeholders such as insurance providers, ADS developers and end users.

BSI PAS 1883 is a recent standard that provides requirements for the minimum hierarchical taxonomy for specifying an Operational Design Domain (ODD) to enable the safe deployment of an automated driving system (ADS). This taxonomy is organized in e tree structure according to three main elements.

Scenery Elements
consist of the non-movable elements of the operating environment, e.g. road layout or traffic lights
Environmental Conditions
consists of weather and atmospheric conditions
Dynamic Elements
consists of the movable elements of the ODD, e.g. traffic or subject vehicle.

ODD as an artifact is not the same as a Scenario artifact, as it serves different purposes, and uses. Nonetheless, an ODD covers the same domain of discourse of scenarios, and it nicely fits within our conceptual mental model. In fact, an ODD can be seen as an instrument to put constraints on some aspects of scenarios that an ADS can tackle by design. To clarify its pertinence to the domain of discourse, we’ll use figure y and map it into our conceptual model, keeping in mind the substantial difference - Represented in the oval shape vs rectangular - in use and nature between scenario artifacts and an ODD definition. Where the latter supersedes and clarifies constraints on aspects of a set of the former.

Very often users, across multiple actors and use cases, would want to look for scenarios within a certain ODD. It naturally follows then how the ODD and its taxonomy should be the second relevant axis under which we organize the Scenario artifacts storage search and retrieval. Moreover, the fact that there already exists a standardized taxonomy of concepts (labels) in place for ODD - BSI 1883 - increases the practicality and usefulness of such a choice.

Figure 1 - Top level ODD attributes

overview

Scenery Elements

As shown in Figure 1, scenery elements can be further divided into zones, drivable area, junctions, special structures, fixed road structures and temporary road structures.

Zones

Zones include special road configurations which may differ from typical conditions for driving, or areas with specific driving regulations or environmental conditions. Zones can be further divided into: a) geo-fenced areas, b) traffic management zones, c) school zones, d) regions or states, e) interference zones (eg. dense foliage or loss of GPS due to tall buildings).

Figure 2 - ODD elements//scenery//zones

overview

Drivable area

Drivable area defines the road type, geometry, conditions, lane markings and road signs. For any individual section of a road, the Drivable area includes the elements that are directly related to the vehicle’s maneuverability.

Figure 3 - ODD elements//scenery//drivable area

overview

Within the Drivable area category, the road type element can be further divided into five main types: a) motorways, b) radial roads, c) distributor roads, d) minor roads and e) parking. Motorways are high traffic roads where non-motorized vehicle and pedestrian are prohibited. Radial roads or A-roads are high density traffic roads that connect the motorway to distributor roads. Distributor roads or B-roads connect A-roads with minor or local roads, they generally have low to moderate capacity. Minor roads or local roads provide access to residential areas and other local developments. Furthermore, motorway will be classified as with activate traffic management or without.

The geometry defines the shape of the road layout in three-dimensional space – a) horizontal plane, b) vertical plane and c) transverse plane. The horizontal alignment can be visualized as the road being projected onto the horizontal plane, in this plane the road layout can be classified as straight or curvy road. The vertical alignment can be visualized as the elevation property of the road central line, in this alignment it can be classified into up-slope, down-slope and level plane. The transverse plane can be visualized as the cross section of the road layout, it consists of road edge, lanes, lane markings, etc.

To define the lane specification, the following parameters need to be specified: 1) lane dimensions, such as lane width, 2) lane marking (broken line, solid line etc), 3) lane type, this can be further categorized into bus lane, traffic lane, cycle lane, tram lane, emergency lane or other special lane, 4) number of lanes in the road, 5) traffic direction (left hand, or right hand).

All the road signs will be classified into three main group based on their functionality: information signs, regulatory signs and warning signs. Furthermore, they can be variable (such as electronic signs) or uniform, full-time or temporary. When specifying the road signs, their individual sign name alongside the categories and sign properties need to be indicated.

The roadway edge illustrates the details at the outer boundary of the road element, possible roadway edge attributes include line markers, road shoulder (paved, gravel, or grass), roadway edge barriers (grating, rail, curb, cones etc), temporary line markers.

The road surface indicates the condition of the road, the road surface features and surface type. The road surface conditions can be weather induced conditions such as icy roads, flooded roadways, mirages, snow, standing water or wet road. The road surface features may include damages caused by the traffic such as cracks, potholes, ruts and swells. The road surface type may include loose surface, segmented or uniform surface.

Junction

Junctions are areas on the map where two or more roads meet, it can be divided into two groups: 1) intersections, and 2) roundabouts.

As shown in Figure 4, intersection may be further classified into T-junction, staggered, Y-junction, grade separated, traffic manoeuvre, cross roads etc. And roundabouts contain normal, large compact, double, mini roundabouts. Furthermore, both intersections and roundabouts need to specify whether they are signalized or non-signalized.

Figure 4 - ODD elements//scenery//junction

overview

Special structures, fixed road structures, temporary road structures

Figure 5 - ODD elements//scenery//special structures, fixed road structures, temporary road structures

overview

Special structures, fixed road structure and temporary road structures are displayed in Figure 5. Temporary road structures might be placed on the road due to local requirements or accidents, which include temporary emergency signage which obstruct or impact normal driving.

Environmental Conditions

Many of the environmental elements that impact the ADS will demonstrate high degrees of variability over time and distance, therefore traditional meteorological reports of weather parameters require some degree of interpretation to be truly applicable to the ADS. The following environmental attributes represent many of those with the highest expected impact. As shown in Figure 1, Environmental Conditions can be further divided into weather, particulates, illumination and connectivity.

Weather

The detailed categorization of the weather attributes can be found in Figure 6. Wind speed shall be specified in the unit of m/s. It shall be characterized as an average over a specified time interval (recommended 2 min to 10 min) and a gust value in m/s, which is the peak value of a 3 s rolling mean wind speed. Rainfall intensity shall be specified in the units of mm/h. The interval and spatial scale over which the intensity has been defined shall also be stated. In addition to the average rainfall intensity, the type of rainfall may also be categorized to inform the degree of spatial variability and the rate of onset as well as the relative abundance of smaller or larger drop sizes. Rainfall may be described as: 1) dynamic (commonly ‘frontal’) – associated with large scale weather systems; 2) convective – typically showery and potentially very intense; 3) orographic (commonly ‘relief’) – associated with hilly/mountainous terrain. Snowfall intensity shall be determined by human-inferred visibility, where it is clear that the visibility is affected by snow alone.

Figure 6 - ODD elements//environmental condition//weather

overview

Particulates

The impact of small airborne particulates on sensory perception is commonly expressed in terms of ‘visibility’. As visibility is related to human perception it is only directly applicable to sensors operating at human-visible wavelengths. The degree of obscuration will be dependent on the amount of particulate matter, the sensor wavelength and also the composition and size distribution of the particles in question. Figure 7 illustrates the attributes within particulates.

Figure 7 - ODD elements//environmental condition//particulates

overview

Illumination

Illumination impacts can be both beneficial (e.g.improving the visibility of targets) or detrimental (e.g.due to rapid changes in shadowing or glare), the attributes for illumination can be found in Figure 8. Daytime is referred to as a condition where the ambient illuminance is greater than 2000 lx. Night time is referred to as a condition where the ambient illuminance is less than 1 lx. Low-ambient lighting condition is when ambient light is between daytime and night time. Cloud cover is the amount of sky covered by cloud and can affect the illumination during any time of the day/night. Artificial illumination can be streetlights or oncoming vehicle lights. Other weather attributes, such as temperature, humidity, air pressure, surface temperature, hail, freezing rain, or solar flares may be taken into account as part of the ODD definition.

Figure 8 - ODD elements//environmental condition//illumination

overview

Connectivity

Connectivity indicates the ability of a vehicle to receive data from and/or transmit data to an external system to determine positioning or to communicate with other vehicles and the wider infrastructure, which is viewed as a key enabler for autonomy. Figure 9 lists the attributes within connectivity.

Figure 9 - ODD elements//environmental condition//connectivity

overview

Dynamic Elements

Dynamic elements within the context of ODD shall include information on the macroscopic traffic properties and the speed of the subject vehicle. Traffic might include vehicles, two-wheelers or bicycles, traffic agent type shall include vulnerable road users and animals.

Figure 10 - ODD elements//dynamic elements

overview

6.3.3. Behavior Labels

Behavior labels should include information that contributes towards the dynamic elements of the actors and the subject vehicle. For example, manoeuver, event phases, triggers etc. To generate a detailed list of behavior labels with correct hierarchical structure shall be left for the standardization project. Below illustrate some of the behavior label contents.

Actors

Actors - actors may include on-road vehicles, pedestrians and animals, the Safety Pool Taxonomy for dynamic objects could be a good source which provides a detailed classifications of dynamic objects with definitions. Figure 11 lists a high level classification of the dynamic actors.

Figure 11 - Behavior labels//actors

overview

manoeuver

Here presents a concept for defining manoeuver by using both absolute manoeuver and relative manoeuver. These manoeuvre types can fit into the RDF triples format as describe in the Annotation Format section.

Traditional terms such as 'Drive', 'Turn right', 'Change lane left' are all considered as absolute manoeuvre of the actor that performs such manoeuvre. Absolute manoeuver are independent from other actors within the scenario. Relative manoeuver refer to terms such as 'Cut in', 'Cut out', 'Towards', these are the description that is referred to other actors within the scenario. It might be of the interest of the project to define manoeuvre using absolute terms as well as its relationship to others, such relationship could prevent ambiguities of the behavior labels, it could also be used for various applications such as reasoning or inference.

On-road vehicle manoeuver types

The following absolute and relative manoeuver can be used to describe on-road vehicles.

Absolute manoeuver: Drive, Stop, Lane Change Right, Lane Change Left, Turn Right, Turn Left, Reverse.

Relative manoeuver: Cut In, Cut Out, Towards, Away

By combining the absolute and relative manoeuver, additional details can be reflected for the vehicle behavior. For example, LaneChangeRight + CutOut → LaneChgRight_CutOut

Pedestrian / Animal manoeuver types

For pedestrian / animal manoeuver, the following absolute and relative manoeuver can be used:

Absolute manoeuver: Stop, Walk Forward, Turn Right, Turn Left, Turn Backward, Run, Slide.

Relative manoeuver: Towards agent, Cross agent’s lane, Away from agent

The absolute and relative manoeuver for pedestrian/ animal can also be combined to provide additional details, such as Walk Forward + Towards agent → WalkForward_MovT

High level manoeuvre types

The following are high level vehicle manoeuvred, they are not only limited to the follow three and shall be determined during the standardization phase. High level manoeuver types consist sequences of basic manoeuvre types, this enable users to search/label the scenarios based on common high level manoeuver rather then search on a sequence of basic manoeuver.

No	Manoeuver types
1	Overtake
2	Parking
3	Platooning

Event Triggers

Time based [Example: when timer >= 5s]
Actor’s dynamic behavior [Example: when subject vehicle reach 30mph]
Actor’s location on map [Example: when subject vehicle is 2-3 meters from Junction]
Map element’s event [Example: when Traffic light is Green]
Environment element’s event [Example: when precipitation is heavy rain]
Or combination of one or more types listed above.

These triggers listed here only indicate whether such triggers exist in the scenario. For example, if during a scenario there is a change in the weather condition eg. no rain → raining weather, this will be labelled as the 'Environment elements' event trigger'.

6.4. File Format extension for Scenario Labels

Proposal:

openlabel": {
    "ontologies": {},
    "metadata": {
        "scenario_metadata": {},
        "frame_metadata:{}"
    },
    "frame_intervals": {},
    "frames": {},
    "objects": {},
    "actions": {},
    "events": {},
    "relations": {},

6.5. Open Issues

TODO: introduce a mechanism to reference labels from external standards to differentiate between OpenLABEL labels and external labels.
TODO: add more examples to the topic to fully understand how the meta model shall be used.
TODO: The user guide needs to clearly explain how to label scenarios, so they can be found using search engines

6.5.1. Semantics

The final OpenLABEL standard should define clear rules and semantics for labelling scenarios, in order to facilitate consistent searches across scenario sets which were labelled by different organizations or generated for different purposes. For example, for a label of DrivableArea.LaneSpecification.NumberOfLanes, semantics are needed because a scenario may include roads with differing numbers of lanes. It must be clear whether a particular NumberOfLanes value represents the minimum encountered in a scenario, the maximum, the number for every road in the scenario, or the number for just one of the roads.

Clear semantics means that the results of a query of a set of OpenLABEL files are well defined. Some of the issues that must be resolved to achieve a clear set of semantics are discussed in this section and in Open Issues with the need for further investigation.

The semantics of OpenLABEL does not define or dictate the query language to use when searching scenarios with the aid of OpenLABEL; for example, you could use a language that doesn’t support the AND or NOT operators.

TODO create some abstract concept for the Semantics as a starting point for the standard development project

While the scenario labels available in OpenLABEL are clearly laid out in Top Level Label Classes, semantics will allow clarity about:

Labels that could have different values for different items within a scenario
Labels where the value varies spatially or temporally during a scenario
Relationships between different label values (spatial, temporal, or other). For example, the semantics could allow a NumberOfLanes value to be associated with a RoadType value to indicate they occur together.
Associating label values with item instances within the scenario (e.g. a particular vehicle)

6.5.2. Custom labels

6.5.3. Relationship with OpenODD

The ASAM OpenODD standard will define a format for representing ODDs, using a taxonomy closely aligned with BSI PAS 1883. With using the labels derived from the ODD it shall be possible separate any set of scenarios into those within the ODD and those outside it.

Therefore, OpenLABEL should support using labels derived from OpenODD. The potential differences between OpenODD specifications and OpenLABEL searches are:

An OpenLABEL search could (and frequently will) include non-ODD attributes, such as the scenario type (logical/concrete etc) or the ego vehicle manoeuver. However, it would make sense to be able to represent these attributes using the same taxonomy as OpenODD.
OpenODD may make it possible to construct complex ODDs with detailed criteria (e.g., 2-lane roundabouts are in the ODD, unless they occur on a road with a speed limit of greater than 80 kmh). As discussed in Representative power versus data duplication, it may not make sense to encode enough information in OpenLABEL to resolve this query, so such queries may be resolved by examining both OpenLABEL and the underlying scenario files.

6.5.4. Optional labels and purpose bound LabelSets

It is probably not reasonable to expect everything to be labelled by every user. This means that each possible feature has 2 states: labelled and not labelled. if a feature is labelled it can be labelled as present, not present, unknown.

labelled
- present
    - values according to label definition
- not present
- unknown
not labelled

In cases where OpenLABEL is used to select scenarios for safety or regulatory purposes, it is vital to distinguish between these states clearly, so key scenarios are not missed from test and certification processes.

For example, a database may contain some scenarios where all vehicles are labelled with a type, and some where they are not. A user wishes to search for scenarios which do not contain bicycles. They need a mechanism to exclude scenarios where road user types have not been labelled. Another complexity is that it is hard to know in advance what level of information will be present in the scenario data, and therefore what labels are possible.

However, the value of OpenLABEL is that the labels for the search query are standardized; if not all scenarios have that label present, it means search results can no longer be guaranteed to include all the scenarios the user desired.

Introducing two new features into OpenLABEL may help resolve this issue:

Any property can be labelled as present, not present or unknown. The file should specify how it should be interpreted when none of these states are defined.
The standard can provide the capability of creating purpose bound LabelSets, where each LabelSet states which labels cannot be left as unknown and/or must be defined by reference to an external file. Certain use cases (e.g. regulatory testing) could require scenarios labelled to a particular LabelSet standard.

6.6. Open Issues with the need for further investigation

6.6.1. Representative power versus data duplication

A key issue for the standardization project in relation to the Semantics is how to balance the representative power of OpenLABEL against duplication of data in the underlying scenario files.

The scenario labels are intended to provide contextual or summarized data about a scenario. If OpenLABEL were to replicate too much of the information in the underlying files, then

In any reasonable size database, it is likely that some OpenLABEL files will become out-of-sync with their underlying scenario data
OpenLABEL becomes an SDL instead of an annotation format. There is little point to the labelling process: the value of labels comes from them summarizing or abstracting data available in the underlying files

On the other hand, if not enough of the scenario information is included in OpenLABEL, the value of labelling is again degraded: not enough data will be available in OpenLABEL to support common search queries.

For simple cases, searching is easy: it just has to be checked whether the applicable label is present in the scenario. However, it should also be possible for users to construct complex searches which consider spatial (and maybe temporal) relations. For example, a user might want to exclude scenarios with a left turn at a roundabout. It should be possible to do this without filtering out all scenarios which include both a left turn and a roundabout but in different places.

A potential resolution to these conflicting requirements is to store enough data in OpenLABEL for the expected common use cases, while allowing OpenLABEL searches to work in conjunction with a search of the underlying scenario files for the complex edge-cases. Given that OpenLABEL is a standard for annotating scenario files, it is expected that the underlying files will be both present and machine-readable and so amenable to being searched.

6.6.2. Support for query languages

While the Semantics for scenario labels in OpenLABEL are not explicitly a scenario query language, they have some of the features of one. A rough analogy is a relational database: OpenLABEL defines the tables, columns, and column values, whereas a query language could be more like SQL with many powerful filtering features.

The standardization project will have to decide whether it is appropriate to define a query language as part of OpenLABEL, or to leave it to the end users and/or other standards. In either case, it should be possible to use OpenLABEL with different query languages. As well as providing different filtering features, different query languages might for example support a subset or a superset of the labels available in OpenLABEL.

Scenario data references

To support searching the underlying scenario files, it would be valuable for OpenLABEL to store information which ties label values to the underlying scenario format. This would allow for use cases where more detailed information was required, while still ensuring that OpenLABEL does not become an SDL. For example, a search such as "a vehicle overtakes a bicycle just before a roundabout" could be resolved much more easily if the Overtake event in OpenLABEL included data on the road ID/location in the underlying road network file where it takes place.

OpenLABEL Concept Paper

1. Introduction

1.1. Foreword

1.2. Overview

1.3. Relation to other Standards

2. Annotation Format

2.1. Introduction annotation format

2.2. JSON schema

2.3. Structure of the OpenLABEL format

2.3.1. Elements

Objects

Actions, Events, Contexts

Relations

2.3.2. Frames

2.3.3. Ontologies

2.3.4. Relations

2.3.5. Element data

2.3.6. Streams

2.3.7. Coordinate Systems and Transforms

2.4. Data types

2.4.1. Bounding box: bbox

2.4.2. Cuboid: cuboid

2.4.3. Semantic segmentation: image and poly2d

2.5. Frames and Streams Synchronization

2.5.1. One stream

Simple case

Stream Frame index not coincident with Master Frame index

2.5.2. Multiple streams

Same frequency, same start and indexes

Same frequency, different start and indexes

Same frequency, constant shift

Different frequency

2.5.3. Specifying "coordinate_system" for each label

3. Labeling Methods

3.1. Coordinate Systems

3.2. Geometries for labeling

3.2.1. Point

3.2.2. Line

3.2.3. Boxes

3.2.4. Polygons

3.3. Spatial Rotation

3.3.1. Rotation matrices

3.3.2. Euler Angles

3.3.3. Quaternions

Complex Nnumbers

Quaternions

The math behind quaternions

Comparison between rotation representations

Helper functions for quaternions

Transformation between rotation representations

Standards

3.4. Bounding Boxes

3.4.1. 3D Bounding Boxes / Cuboids

3.4.2. 2D Bounding Boxes

3.5. Semantic Segementation

Formal Definition

3.5.2. Instance Segementation

4. Taxonomy

5. Scenario Labels

5.1. Introduction to Scenario Labels

5.2. Goals

5.3. Definition of Scenario in the OpenLABEL Context

5.4. Scenario Artifacts Generation

5.4.1. Bottom-up

5.4.2. Top-down

5.5. Scenario Artifacts and OpenLABEL

5.6. Interaction with Scenario Data

5.7. Illustrating the role of OpenLABEL using the structures of a book

5.8. UserStories and Use cases for the scenario Labeling

5.8.1. Actors for the Scenario Labeling

5.8.2. Scenario Developer

5.8.3. Test Engineer

5.8.4. Perception Engineer

5.8.5. Scenario Database Provider

5.8.6. AV /ADAS Lawyer

5.8.7. Regulator

5.8.8. Focus Use case: Scenario Database Storage and Retrieval

6. Meta Model for Scenario Labels

6.1. Meta-Model Concept

6.2. Label Value Definitions

2.4.1. Bounding box: `bbox`

2.4.2. Cuboid: `cuboid`

2.4.3. Semantic segmentation: `image` and `poly2d`