|
Disclaimer
This document is the copyrighted property of ASAM e. V. In alteration to the regular license terms, ASAM allows unrestricted distribution of this standard. ยง2 (1) of ASAM’s regular license terms is therefore substituted by the following clause: "The licensor grants everyone a basic, non-exclusive and unlimited license to use the standard ASAM OpenLABEL". |
1. Foreword
ASAM e. V. (Association for Standardization of Automation and Measuring Systems) is a non-profit organization that promotes standardization of tool chains in automotive development and testing. Our members are international car manufacturers, suppliers, tool vendors, engineering service providers and research institutes. ASAM standards are developed by experts from our member companies and are thus based on real use cases. They enable the easy exchange of data or tools within a tool chain. ASAM is the legal owner of these standards and responsible for their distribution and marketing.
ASAM standards define file formats, data models, protocols and interfaces for tools that are used for the development, test and validation of electronic control units (ECUs) and of the entire vehicle. The standards enable easy exchange of data and tools within a tool chain. They are applied worldwide.
2. Introduction
2.1. Overview
ASAM OpenLABEL standardizes the annotation format and the labeling methods for multi-sensor data streams and scenario files. Using a standardized format helps cut costs and save resources used in creating, converting, and transferring annotated and tagged data. ASAM OpenLABEL is represented in a JSON format and can therefore be easily parsed by tools and applications.
ASAM OpenLABEL specifies the different labeling methods that can be applied to multi-sensor data streams, for example, 2D bounding boxes for image data. With ASAM OpenLABEL, several labeling methods are provided which enable users to label common data streams, such as images or point clouds. Besides adding labels to multi-sensor data streams (labeling), ASAM OpenLABEL also provides methods to add tags to scenarios (tagging). These tags can be used to categorize scenarios and make them searchable in large databases. They can also provide additional information about the individual scenario, such as who captured or created the scenario, and with what setup was the scenario captured.
ASAM OpenLABEL provides a common data structure for organizing annotations for labeling multi-sensor data streams and tagging simulation and test scenarios.
2.1.1. Multi-sensor data labeling
For the development, testing, and validation of highly automated driving functions, the industry makes extensive use of Machine Learning (ML), especially for realizing perception and prediction tasks. Machine learning requires significant amounts of training data. The data has to be annotated and enriched with metadata to be useful in the training and validation phases.
The lack of an industry standard aligning the structure and organization of these annotations creates several difficulties:
-
It limits the reuse of annotated datasets.
-
It poses challenges regarding the maintenance and updating of the annotations.
-
It limits the sharing of datasets across the industry and between industry and academia.
-
It has a negative impact on the quality of annotations.
The goals of the multi-sensor data labeling use case in ASAM OpenLABEL are as follows:
-
Enable efficient sharing of annotated perception datasets and object lists.
-
Increase the overall quality of annotations by providing a common data structure for annotations.
-
Improve the maintainability and reuse of annotated datasets.
The multi-sensor data labeling use case in ASAM OpenLABEL fulfills the requirements of the following main target groups:
-
Perception/computer-vision engineers
-
Machine-learning engineers
-
Perception/computer-vision research scientists
-
Machine-learning research scientists
-
Data-annotation engineers
-
Data-annotation analysts
-
Test engineers
2.1.2. Scenario tagging
Scenario databases storing multi-sensor data, annotated multi-sensor data, simulation scenarios, and test scenarios can be very extensive. The sensor data and scenarios stored in these databases must be organized and tagged using semantic, meaningful tags. These tags refer, for example, to the content of the data, its ODD, the high-level behavior of the dynamic agents, and administrative information. Extracting the information required for the tags from scenario artifacts can be difficult and inefficient, and for some types of data it is impossible. This is due to the fact that the scenario definition language used is limited. Scenario tagging based on ASAM OpenLABEL addresses these issues.
The goals of the scenario tagging use case in ASAM OpenLABEL are as follows:
-
Enable standardized clustering of test scenarios in scenario databases.
-
Facilitate scenario storage systems that are separate to scenario definition representation.
-
Enable efficient search and filtering of test scenarios in scenario databases.
-
Enable sharing of information on test scenario categories and clusters between different databases or owners.
-
Facilitate the sharing of scenarios between systems that may not have the ability to inspect the scenario definition or underlying scenario data.
-
Improve maintainability and reuse of test scenarios and scenario data.
-
Enable and enhance machine-learning training and validation datasets with additional information to organize the datasets.
-
Enable specific machine-learning classification tasks to be performed on scenario data.
The scenario tagging use case in ASAM OpenLABEL fulfills the requirements of the following main target groups:
-
Systems engineers
-
Validation and verification engineers
-
Functional-safety engineers
-
Simulation specialists
2.1.3. Deliverables
-
ASAM OpenLABEL specification, this document
-
ASAM OpenLABEL annotation schema provided in the openlabel_json_schema.json file
-
ASAM OpenLABEL standardized set of tags for the scenario tagging use case provided in the openlabel_ontology_scenario_tags.ttl file
-
ASAM OpenLABEL JSON examples provided at the openlabel.asam.net website
2.2. Conventions and notation
2.2.1. Naming conventions
The following conventions apply in this document:
-
Element names should be meaningful names with defined semantics.
-
Element names should be written in camel case, ascii strings.
-
The first character shall be a letter, an underscore, or a dollar sign ($).
-
Subsequent characters may be a letter, a digit, an underscore, or a dollar sign.
-
Reserved JavaScript keywords should be avoided.
-
All element names should be uniquely defined in one ontology.
2.2.2. Units
Unless stated otherwise, all numeric values within this specification are in SI units. Table 1 represents details of the units used.
| Unit of | Unit | Symbol |
|---|---|---|
Length |
Meter |
m |
Duration, (relative) time |
Second |
s |
Speed |
Meters per second |
m/s |
Mass |
Kilogram |
kg |
Angle |
Radians |
rad |
Light intensity |
Lux |
lx |
Image coordinate |
Pixel |
px |
Timestamp
The timestamp used in labeling depends on the raw sensor data. Different sensors sample data with various timestamp formats:
-
UT (Universal Time): UT is derived from the rotation of the Earth. With the improvement of measurement, UT has several versions: UT0, UT1, UT2. UT time scale is irregular, since the rotation rate of the Earth is not constant.
-
TAI (Temps Atomique International): TAI is the international atomic time scale based on a continuous counting of the SI second. It is provided by several laboratories around the world. The instruments "producing" TAI are ensembles of atomic frequency standards, such as rubidium oscillators, cesium oscillators, and hydrogen masers. TAI was set to coincide exactly with UT1 (universal Time version 1) at 0 hours of 1 January 1958.
-
UTC (Universal Time Coordinated): UTC was introduced for the purpose of having a time with a constant scale but not deviating too much from UT1. UTC has the same time scale as TAI. A leap second is introduced into UTC once the difference between UT1 and UTC is longer than 0.9s.
The time reference of many GNSS (Global Navigation Satellite System) systems are based on the time scale of UTC and TAI with a specific constant offset [1].
-
GPST (GPS Time) [2]: GPST is based on TAI as provided by the frequency standards of the GPS control center. It was introduced at 0 hours on 6 January 1980 (UTC) and always has a constant offset of -19s to TAI.
-
GST (Galileo System Time): GST is a continuous time scale maintained by the Galileo Central Segment and synchronized with TAI. GST started from 0 hours on 22 August 1999 (UTC) and the offset between GST and TAI is -13 seconds.
-
GLONASST (GLONASS Time) [3]: GLONASST is generated by the GLONASS Central Synchroniser and is synchronized with TAI. The constant offset between GLONASS and UTC (SU) is three hours.
-
BDT (BeiDou Time): BDT is a continuous time scale starting at 0 hours on 1 January 2006 (UTC). It is synchronized with UTC (BSNC). The constant offset to TAI is -33 seconds.
The following overview shows how different timestamp standards can be transformed:
-
UTC = TAI - LS
-
GPST = UTC(USNO) + LS - 19s
-
GST = TAI - 13s
-
GLONASST = UTC(SU) + 3h
-
BDT = UTC(BSNC) + LS - 33s
Figure 1 shows the relationship between GNSS time systems and UTC. It was derived from Timescales [4].
Unix time is widely used in operating systems. It is the number of seconds that have elapsed since the Unix epoch, not counting UTC leap seconds. The Unix epoch started at 00:00:00 UTC on 1 January 1970. Every day is treated as if it contains exactly 86,400 seconds. Due to its handling of leap seconds, it is not a linear representation of UTC.
Representation of date and time format
The representation of data and time format is specified by the ISO 8601 standard [5]. The following format pattern is used:
yyyy-MM-ddTHH:mm:ss.FFFZ
Here, T is used as time designator.
. is used as separator for the following millisecond portion.
An explanation is given in the table below:
| Specifiers | Meaning | Example |
|---|---|---|
yyyy |
Year (four digits) |
2021 |
M,MM |
Month in year (without/with leading zero) |
9, 09 |
d,dd |
Day in month (without/with leading zero) |
3, 03 |
H,HH |
Hours, 0-23 count (without/with leading zero) |
7, 07 |
m,mm |
Minutes (without/with leading zero) |
2, 02 |
s,ss |
Seconds (without/with leading zero) |
4, 04 |
F,FF,FFF |
Milliseconds (without/with leading zeros) |
357, 04, 002 |
Z |
RFC 822 time zone shifted to GMT |
Z, +0100 |
If the time is in UTC, add a Z character directly after the time without a space.
Z is the zone designator for the zero UTC offset.
For example, 11:45 UTC is represented as 11:45Z or T1145Z.
If the time is in time zone other than UTC, the UTC offset is appended to the time in the same way that Z was above, in the form ยฑ[hh]:[mm], ยฑ[hh][mm], or ยฑ[hh].
At a given date and time of 2021-09-03 11:23:56 in the Central European Time zone (CET), the following standard-format output is produced:
2021-09-03T11:23:56.000+0100
2.2.3. Modal verbs
To ensure compliance with the ASAM OpenLABEL standard, users need to be able to distinguish between mandatory requirements, recommendations, permissions, as well as possibilities and capabilities.
The following rules for using modal verbs apply:
| Provision | Verbal form |
|---|---|
Requirements |
shall |
Recommendations |
should |
Permissions |
may |
Possibilities and capabilities |
can |
Obligations and necessities |
must |
2.2.4. Typographic conventions
This documentation uses the following typographical conventions:
| Mark-up | Definition |
|---|---|
|
This format is used for code elements, such as technical names of classes and attributes, as well as attribute values. |
Terms |
This format is used to introduce glossary terms, new terms and to emphasize terms. |
2.2.5. Use of IDs
The following rules apply to the use of IDs in ASAM OpenLABEL:
-
IDs shall be unique within a class.
3. Scope
ASAM OpenLABEL establishes the basic principles and methods for annotating multi-sensor data streams and for tagging test scenarios for automated driving development, validation, and verification.
The ASAM OpenLABEL standard
-
specifies the annotation schema to which valid ASAM OpenLABEL annotation instances shall conform.
-
represents the annotation schema for ASAM OpenLABEL in JSON schema. The JSON schema defines the structure, sequence, elements, and values of ASAM OpenLABEL.
-
explains relationships between different elements in the ASAM OpenLABEL annotation schema, for example, actions, objects, events, contexts, relations, frames, tags.
-
gives guidelines for using ASAM OpenLABEL.
This version of ASAM OpenLABEL does not discuss quality nor provide quality criteria related to annotations. Future versions of ASAM OpenLABEL may deal with this issue.
3.1. Multi-sensor data labeling
The ASAM OpenLABEL standard
-
defines and organizes the annotation data structures, including geometries, coordinate systems and transforms, and other concepts relevant to spatiotemporal annotations for multi-sensor data labeling.
-
does not provide a taxonomy/ontology of physical/abstract entities relevant to the road traffic domain. Instead, it specifies mechanisms to include external knowledge repositories/ontologies and recommends the use of ASAM OpenXOntology as the ontology of reference.
-
does not provide rules, specifications, or guidelines on how to annotate entities for multi-sensor data labeling. Nor does it provide any recommendations as to what elements of a physical entity should be included or not included in a geometry.
| An ASAM OpenLABEL multi-sensor data labeling instance shall follow the provided multi-sensor data labeling schema to be considered valid and compliant with ASAM OpenLABEL. |
3.2. Scenario tagging
The ASAM OpenLABEL standard
-
defines and organizes the annotation data structure for test scenario tagging.
-
defines the set of ASAM OpenLABEL tags, their relationships, and the mechanisms to include the ASAM OpenLABEL set of scenario tags in valid annotation instances of test scenarios.
-
does not define a language or format to describe test scenarios.
| An ASAM OpenLABEL scenario tagging instance shall use the tagging schema and the set of tags provided in ASAM OpenLABEL to be considered valid and compliant with ASAM OpenLABEL. |
4. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes some of the requirements set out in this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
5. Terms and definitions
- AD (Autonomous Driving)
-
Non-abbreviated form: Autonomous Driving
- ADAS (Advanced Driver Assistance System)
-
Non-abbreviated form: Advanced Driver Assistance System
- Annotation (process)
-
Process of enriching raw data, for example, test scenario artifacts or data streams from multiple sensors, such as cameras, LiDARs, and radars with metadata. This metadata describes the content of the raw data, for example, static or dynamic objects populating a video, actions that are performed, or environmental conditions. Additional information regarding the data may also be included. Already enriched data can be enriched even further as well.
- Annotation instance
-
Enriches raw data with metadata required for the specific task. Annotation instances are usually serialized in a text-based file format, for example, JSON. Annotation instances have to conform to a pre-defined annotation schema.
- Annotation instance format
-
File format for serialization and storage of annotation instances. ASAM OpenLABEL uses JSON as annotation instance format.
- Annotation schema
-
Provides structure and constraints for annotation instances. Annotation instances shall adhere to the schema to be considered well-formed and valid. The definition of an annotation schema is the core of ASAM OpenLABEL.
- Annotation schema format
-
File format for serialization and storage of an annotation schema. ASAM OpenLABEL uses JSON schema as annotation schema format.
- Knowledge repository
-
Database that stores, organizes, and categorizes knowledge. In the context of ASAM OpenLABEL, knowledge repositories organize, structure, and define domain concepts relevant to the annotation task, for example, the road traffic domain. Knowledge repositories may be defined, for example, as free texts, structured taxonomies, or formal ontologies.
- Labeling
-
Process for generating spatiotemporal descriptions for data, using labeling geometries and other constructs to provide richer information compared to tags.
| Labeling is a specialization of Annotation. |
- Labeling geometries
-
Spatiotemporal constructs used to identify, isolate, and localize specific semantic concepts to be annotated in the raw data, for example, bounding boxes, cuboids, and others.
- LiDAR (Light Detection and Ranging)
-
Restricted term: LIDAR
Method for measuring distances by illuminating the target with laser light and measuring the reflection with a sensor.
- ODD (Operational Design Domain)
-
Source: SAE J3016 (2021) [12]
Operating conditions under which a given driving automation system or feature thereof is specifically designed to function, including, but not limited to, environmental, geographical, and time-of-day restrictions, and/or the requisite presence or absence of certain traffic or roadway characteristics.
- Ontology
-
Formal, explicit specification of a shared conceptualization. Ontologies may be defined in formal knowledge representation languages. In the context of ASAM OpenLABEL, an ontology is a machine-readable artifact that organizes and defines semantic concepts relevant to the labeling tasks.
- Radar (Radio Detection and Ranging)
-
Restricted term: RADAR
Device or system that consists of a synchronized radio transmitter and receiver that emits radio waves and processes their reflections for display. A radar is used especially for detecting and locating objects.
- Raw data
-
Data that can be enriched with metadata. Raw data may take many forms, for example, individual files, file streams, or test scenarios artifacts. Relevant examples of raw data for ASAM OpenLABEL are png images, frames in a video sequence, pcd point clouds, OpenSCENARIO files, and OpenLABEL files themselves.
- Tagging
-
Process for adding simple and complex semantic tags to any information container, such as images, videos, or test scenarios.
Tagging is a specialization of the annotation process. - Test scenario
-
Scenario intended for testing and assessment of Advanced Driver Assistance Systems (ADAS) and system under test.
6. Conceptual overview
6.1. Data annotation in ASAM OpenLABEL
Data annotation is the process of enriching raw data, for example, data streams from multiple sensors, such as cameras, LiDAR, radar, or test scenario artifacts with additional metadata. These metadata are related to the content of the raw data, for example, static or dynamic objects populating a video, actions they are performing, or environmental conditions. Additional information regarding the data may also be included.
Figure 2 shows the concept and terms related to data annotation.
Raw data is data that can be enriched with metadata. Raw data can take many forms, for example, individual files, file streams, or test scenario artifacts. Relevant examples of raw data for ASAM OpenLABEL are png images, frames in a video sequence, pcd point clouds, or OpenSCENARIO files.
Annotation instances enrich raw data with metadata required for the specific task. Annotation instances are usually serialized in a text-based file, for example, JSON. JSON is the format used for ASAM OpenLABEL. Annotation instances shall conform to a predefined annotation schema.
The annotation schema provides the specific structure and set of constraints that the annotation instances need to follow to be considered well-formed and valid. The definition of an annotation schema is the core of ASAM OpenLABEL. The annotation schema for ASAM OpenLABEL is represented as a JSON schema.
For applications with heavy semantic load, such as the use cases relevant for ASAM OpenLABEL, it is advisable to refer to external knowledge repositories, for example, ontologies or vocabularies. An annotation schema regulates the data validity of annotation instances providing its data model. Knowledge repositories can add value to this: They provide information about the content of the annotations and analyze the validity of the content. Such external resources organize, structure, and define the semantics of the entities that annotations are referring to. Ontologies additionally define the relationships between the entities. ASAM OpenLABEL assumes the use of external knowledge repositories to organize the semantic content of annotations.
ASAM OpenLABEL defines annotation schemas that are valid for specific use cases with specific raw data to be annotated. The two primary use cases considered for ASAM OpenLABEL are multi-sensor data labeling and scenario tagging.
6.1.1. Multi-sensor data labeling
Figure 3 shows the concepts related to data annotation as representation for multi-sensor data labeling. ASAM OpenLABEL covers the definition of the annotation schema for multi-sensor data labeling.
Multi-sensor data labeling use cases focus on raw data that is the output of multiple sensors, for example, cameras, LiDAR, or radar. These sensors equip typical advanced driver assistance systems (ADAS) and autonomous driving (AD) systems. The format of such raw data is often pcd, png, other common image formats, point cloud, or video formats.
For this type of raw data, there is lots of semantic content that has to be annotated. The annotations require geometries, for example, bounding boxes, polygons, or other primitives to isolate and localize relevant semantic concepts within the raw data. Semantically, labels usually refer to agents type identification, their relations, actions they are performing, and contexts in which these actions or agents take place or exist.
Additional information included in this annotation use case encompass details about spatial calibration across sensors, temporal synchronizations, coordinate transforms, and consistent entity IDs across frames and sensor streams.
Example
Figure 4 shows an example using ASAM OpenLABEL for multi-sensor data labeling.
The files example.pcd, example.png and example.json contain multiple raw sensor data streams that are annotated according to the ASAM OpenLABEL annotation schema.
The ASAM OpenLABEL annotation schema is contained in the openlabel_json_schema.json file.
The example.json file contains annotations of the example.pcd, example.png and example.json files.
The annotations in the example.json file contain references to an external ontology in the example.owl file.
The example.json file can be validated using the openlabel_json_schema.json file.
The example.owl file is used to semantically enrich the annotations in the example.json file.
6.1.2. Scenario tagging
Figure 5 shows the concepts related to data annotation as representation of scenario tagging. ASAM OpenLABEL covers the definition of the annotation schema for scenario tagging and an ontology for tags.
Scenario tagging use cases focus on raw data that is used in the development, testing, and validation process of ADAS and AD functions, for example, test scenarios or simulation scenarios. Often the format of such raw data is OpenSCENARIO, GEOscenario, M-SDL, or other domain specific languages or formats used to describe and store simulation and test scenarios.
| In addition to the raw data types mentioned above, videos, natural language descriptions, or any other data that contains a visualization or a description of a driving situation evolving through time, and so even valid OpenLABEL annotation instances for multi-sensor data labeling, can be treated as relevant raw data for the scenario tagging use case. |
Annotations for this type of data usually are not semantically dense, and consist of a set of tags that are associated with a specific (set of) scenario instance(s). Semantically, tags usually refer to elements related to the content of the scenario, such as its ODD, or the behavior of some agents.
Additional information included in this annotation use case encompass details about authorship, versioning, and other high-level administrative information related to the scenario.
Example
Figure 6 shows an example using ASAM OpenLABEL for tagging scenario files.
The example.xosc file contains a scenario description that was annotated following the ASAM OpenLABEL annotation schema.
The ASAM OpenLABEL annotation schema is contained in the openlabel_json_schema.json file.
The annotations of the example.xosc file are contained in the example.json file.
The annotations in the example.json file contain references to an external ontology in the openlabel_ontology_scenario_tags.ttl file.
The example.json file can be validated using the openlabel_json_schema.json file.
The openlabel_ontology_scenario_tags.ttl file is used to semantically enrich the annotations in the example.json file.
6.2. Annotation schema and its format
The annotation schema defines the structure of annotations, data types, and conventions needed to unambiguously interpret the annotations. It also specifies how the annotation data is encoded for storage into computer files.
The annotation schema of ASAM OpenLABEL is designed to be flexible enough to tackle annotation tasks, ranging from simple object-level labeling in single images, using, for example, bounding boxes or semantic segmentation, to complex multi-sensor data labeling tasks, involving, for example, cuboids, odometry, coordinate systems, and transforms. The annotation schema and its format (JSON schema) is also designed to facilitate serialization of labels in files or messages that can be stored and exchanged between computers and stay readable for humans at the same time.
6.2.1. Annotation schema (JSON schema)
The annotation schema is described and formatted as a JavaScript Object Notation schema (JSON schema). It defines the shape against which valid JSON annotation instances should conform to. The structure of ASAM OpenLABEL annotation schema is serialized in the ASAM OpenLABEL JSON schema file. The annotation schema itself conforms to the JSON schema Draft 7 specification [13].
There are several software packages in different programming languages that can be used to validate a JSON payload against the JSON schema. A JSON schema validation asserts constraints on the structure of the instance JSON data.
The JSON schema validation only inspects the structure and type of the key-value pairs. A JSON schema does not validate the semantics behind the content of key-value pairs. Certain level of semantic validation can be achieved by using external resources, such as the ontologies of ASAM OpenXOntology, reasoning engines, and validation scripts.
The annotation schema data structure of ASAM OpenLABEL represents annotations as a dictionary. Therefore, all data is represented as key-value pairs. These key-value pairs are sometimes referred to as items in certain programming languages. Keys are strings, that is, arrays of characters. Values can be the following:
-
Primitives (string, number, and Boolean)
-
Arrays of primitives
-
Dictionaries
-
null(A special type to denote the key exists but has no value.)
Keys, as strings, encode either keywords defined in the JSON schema, for example object, coordinate_system, name, type, or identifiers.
Identifiers can be numerical, for example 0, 5, strings, for example CAM, ODOM, or unique identifiers, for example, 123e4567-e89b-12d3-a456-426614174000.
The JSON schema determines which pattern keys shall follow for different types of items, for example, regular expressions to determine that keys shall be string representations of numbers from 0 to 9.
This data structure matches with the syntax of JSON data formatting. As a consequence, ASAM OpenLABEL content can be expressed as JSON strings and made persistent as JSON files.
JSON payloads and files
Any ASAM OpenLABEL annotation instance can be expressed as JSON string payloads. That means the actual data pack that contains the key-value pairs is expressed as a string.
A JSON file, for example, openlabel_annotation.json, can be created by storing the JSON string payload using any computer programming language that serializes it into a text file.
In ASAM OpenLABEL, UTF-8 (8-bit Unicode Transformation Format) shall be used as the encoding format of characters.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"openlabel" : {
"metadata" : {
"schema_version" : "1.0.0"
},
"objects" : {
"0" : {
"name" : "object1",
"type" : "car"
},
"1" : {
"name" : "object2",
"type" : "pedestrian"
}
}
}
}
JSON data can be shown clearly arranged using tabular spaces.
Nevertheless, other representations are equally valid.
They are preferred for reducing the size of the JSON files.
See, for example, the above code: {"openlabel":{"metadata":{"schema_version":"1.0.0"},"objects":{"0":{"name":"object1","type":"car"},"1":{"name":"object2","type":"pedestrian"}}}}
|
JSON parsers
Any JSON parser application, package, and programming language can be used to interpret (parse) the content.
Example languages and libraries supporting reading and writing JSON data and validating the JSON schema are, for example, Python, Typescript/JavaScript, and C++.
It is out of the scope of this standard to define reference implementations of parsers to load and save JSON data compliant with the JSON schema.
Other encoding formats
The ASAM OpenLABEL format matches the syntax of JSON. It was originally developed using the JSON schema as the main pillar to define the structure. Therefore, this version of ASAM OpenLABEL enforces the utilization of JSON as an annotation and file format.
Nevertheless, other encoding formats may be considered for future versions of ASAM OpenLABEL as long as they satisfy the same structure, type, and constraints requirements defined by the JSON schema.
6.2.2. Structure
Figure 7 shows the high-level structure of the ASAM OpenLABEL annotation schema. ASAM OpenLABEL can be used for labeling and tagging.
Labeling focuses on producing spatiotemporal descriptive information of data, such as images. Objects, actions, events, contexts, and relations provide flexibility and complex labels.
Tagging aims to provide mechanisms to add simple and complex tags to any content, such as images, data files, or scenarios.
Additional structures provide details for metadata, ontologies, frames, and coordinate systems.
The following list shows all objects used in ASAM OpenLABEL.
JSON schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"openlabel" : {
"properties": {
"actions": {...},
"contexts": {...},
"coordinate_systems": {...},
"events": {...},
"frame_intervals": {...},
"frames": {...},
"metadata": {...},
"objects": {...},
"ontologies": {...},
"relations": {...},
"resources": {...},
"streams": {...},
"tags": {...}
}
}
}
}
The annotation schema format is represented in the ASAM OpenLABEL JSON schema.
The main object is the openlabel object.
It contains the basic objects used in ASAM OpenLABEL.
Some objects are utilized in both multi-sensor data labeling and scenario tagging use cases, for example, the metadata and ontologies objects.
Some other objects are exclusively used in one and not the other of the two use cases.
The following list shows all objects used in the domain of multi-sensor data labeling.
JSON schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"openlabel" : {
"properties": {
"actions": {...},
"contexts": {...},
"coordinate_systems": {...},
"events": {...},
"frame_intervals": {...},
"frames": {...},
"metadata": {...},
"objects": {...},
"ontologies": {...},
"relations": {...},
"resources": {...},
"streams": {...}
}
}
}
The following list shows all objects used in the domain of scenario tagging.
JSON schema
1
2
3
4
5
6
7
8
9
{
"openlabel" : {
"properties": {
"metadata": {...},
"ontologies": {...},
"tags": {...}
}
}
}
The specific annotation schema for multi-sensor data labeling and scenario tagging, including detailed descriptions of each object, can be found in each corresponding section.
Related topics
6.3. Metadata
In ASAM OpenLABEL, metadata is understood as additional information about the labels and the content to be labeled. Examples for metadata are the ASAM OpenLABEL version used, file version, authorship, or any other custom information.
The information inside metadata shall be used for informative purposes by applications or humans managing ASAM OpenLABEL files.
Class
metadata
This JSON object contains information, that is, metadata, about the annotation file itself.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Description |
|---|---|---|---|
annotator |
string |
Name or description of the annotator that created the annotations. |
|
comment |
string |
Additional information or description about the annotation content. |
|
file_version |
string |
Version number of the OpenLABEL annotation content. |
|
name |
string |
Name of the OpenLABEL annotation content. |
|
schema_version |
string |
true |
Version number of the OpenLABEL schema this annotation JSON object follows. |
tagged_file |
string |
File name or URI of the data file being tagged. |
6.4. Coordinate systems
| This section contains concepts that are relevant for multi-sensor data labeling use cases. |
A coordinate system is a system of numbers, that is designed as a way to uniquely determine the position of points over a manifold, such as the Euclidean space, for example, the 2D position of a pixel within an image, or the 3D position of a LiDAR return point in the world relative to the rear axle of the vehicle.
A coordinate transform or coordinate transformation is a relation that expresses the mapping from coordinates on one coordinate system to coordinates in another coordinate system. A coordinate transform always requires two coordinate systems containing the source and the target coordinate system.
Raw data to be annotated with ASAM OpenLABEL may contain multiple streams of sensor data coming from different exteroceptive and interoceptive sensors. This triggers the need to define multiple coordinate systems and several transforms that express the following:
-
How data from different sensors are spatiotemporally related.
-
How the labels relate to the sensor data.
-
How the sensor data relates to the real world.
ASAM OpenLABEL defines mechanisms to represent information about coordinate systems and transforms in the annotation schema.
More specifically, coordinate systems and their transforms fulfill the need to express spatiotemporal relation for the following, non-exhaustive, set of use cases:
-
Express how the labeled objects of interest are spatially located with respect to a GNSS/INS system, to map data, or to other sensors.
-
Express how light rays generated intensity values.
-
Express how LiDAR points are geolocated with respect to the world coordinates, vehicle coordinates, etc.
-
Express the intrinsic calibration parameters of a camera sensor.
-
Express the distortion coefficients from fish-eye camera lenses to rectified images.
To accommodate for all these and more potential use cases, the ASAM OpenLABEL standard provides a method to describe an arbitrary number of coordinate systems and a method to describe the transforms between those coordinate systems.
In addition, the ASAM OpenLABEL standard provides a way to describe transforms that are fixed over time, transforms that vary occasionally at specific time instants, frames, and transforms that vary continuously.
As specified in section Coordinate systems, users may define arbitrary names for coordinate systems. However, despite the ability to describe an arbitrary set of coordinate systems, a small set of names is reserved and refers to pre-defined coordinate systems specified in ASAM OpenLABEL as there are some coordinate systems that are commonly used in many systems and are standardized. The coordinate systems with standardized namespaces include:
-
vehicle-iso8855 -
odom -
map-UTM -
geographic-wgs84
Whenever these names are used for a coordinate system, they shall have the meaning defined in the related standard.
-
vehicle-iso8855A right-handed coordinate system with the origin at the center of the rear axle projected down to ground level. Note that the origin is attached to the rigid body of the vehicle and not to an axle suspended between it and the body. It is at ground level when the vehicle is nominally loaded but it may be above or below ground level, depending on the actual load. Similarly, the axis pointing forward may point slightly upwards or downwards relative to ground level depending on the front to back loading of the vehicle. The x-axis is forward, the y-axis to the left, and the z-axis upwards. See also the ISO 8855 specification [11].
-
odomA 3D cartesian coordinate system that is approximately fixed in the world. The transform between thevehicle-iso8855coordinate system andodomis guaranteed to be continuous so that it varies smoothly over time.
The transform between odom and map-UTM may be discontinuous.
That means there may be sudden jumps in the value of the transform.
The odom origin is often the starting point of the vehicle at the time the system is switched on.
See the ROS documentation [14].
|
-
map-UTMA 3D cartesian coordinate system useful for mapping moderately sized regions of the Earth. It is locked to the Earth and is a set of slices of flat coordinates that cover the Earth. See the UTM specification [15]. -
geospatial-wgs84A 3D ellipsoidal coordinate system used for GNSS systems, meaning latitude, longitude, and altitude. It is fixed to the Earth, which means that it ignores, for example, continental drift, and covers the entire Earth.
For common use cases, there may be several sets of coordinate systems (blue boxes) and transforms between them that are commonly used, as the following diagrams show.
Figure 11 shows an example of a Robot Operating System (ROS) based system.
The sensors described in the example system in the introduction might have the following coordinate systems and transform tree.
Figure 12 shows how a set of data captured from a dash-cam, a single camera including a GPS, might look like.
Figure 13 shows how a single camera with no other data, with the movement of the camera deduced by structure from motion, might look like.
Related topics
6.5. Semantic segmentation
Semantic image segmentation, also called pixel-level classification, is the task of clustering those parts of an image together which belong to the same object class. Technically, it means assigning to each pixel a value/code corresponding to a certain class of interest (object/entity category).
The semantic segmentation task treats objects as stuff, which is amorphous and uncountable. Multiple objects of the same class are treated as a single entity. Thus, no information exists about specific instances of a class. Cars are all assigned a color code, for example blue, and are treated as being part of the same amorphous "car stuff".
Semantic segmentation annotations follow the form of the objects and have no fixed shape. Manually, this is usually achieved by drawing refined polygons around the regions of interest, or by painting the region of interest through a paintbrush-like feature. The result is a precise mask that isolates only the object of interest and no surrounding pixels.
In the 2D annotation space this method provides the highest accuracy of the objects. However, this comes at an increased cost in comparison with other annotation methods. Furthermore, segmentations take up more time during the labeling process than other 2D annotation methods and thus have lower throughput.
| This section contains concepts that are relevant for multi-sensor data labeling use cases. |
6.5.1. Formal definition
Formally, semantic segmentation can be defined as follows:
Let \(P={p_{1}, p_{2}, ... p_{p}}\) be the set of all the pixels in a given frame, for example, an image.
Then, the cardinality \(|P|\) is equal to the number of pixels in such a frame.
Let \(C={c_{1}, c_{2}, ... c_{c}}\) be the set of all the classes that are defined for a labeling task, for example, \(c_1=car, c_2=pedestrian\).
Then, the cardinality \(|C|\) is equal to the number of classes that are defined for such a task.
To perform semantic segmentation labeling on an image, it means establishing a relation that is valid when a pixel \(p_{x}\) represents a portion of an object belonging to one of the defined classes \(c_{y}\).
\(R_{seg}\) can be defined as a relation between the sets \(P\) and \(C\). Formally, this means defining a subset of the cartesian product \(R_{seg} \subset P \times C\), where \(P \times C = { (p_{1},c_{1}), (p_{1},c_{2}), ... (p_{n},c_{m}) }\)
Let \(D \subseteq P\) be the domain of the semantic segmentation relation \(R_{seg}\), the following taxonomy is produced:
Semantic segmentation taxonomy
-
Partial scene segmentation when \(\exists p_{x} \in P: (p_{x}, c_{y}) \notin R_{seg}\). There are some pixels that have no classes associated with them. In this case \(D \subset P\).
-
Full scene segmentation when \(\forall p_{x} \in P, \exists c_{y} \in C : (p_{x},c_{y}) \in R_{seg}\). All pixels have a class associated. In this case \(D\) coincides with \(P\). Note that in the use case, despite the class
unlabeledorotherindicating all pixels outside of the real classes of interest, there is still a form of full scene segmentation performed. -
Single-class per pixel segmentation when \(\forall p_{x} \in D, \exists! c_{y} \in C: (p_{x},c_{y}) \in R_{seg}\). This is the case when each labeled pixel is associated with exactly one class.
-
Multi-class per pixel segmentation when \(\exists p_{x} \in D, \exists c_{1}, c_{2}... c_{k} \in C: (p_{x},c_{1}), (p_{x},c_{2}), ...(p_x,c_{k}) \in R_{seg}\). This is the case when at least one labeled pixel is associated with more than one class.
6.5.2. Instance segmentation
Instance segmentation enriches the semantic segmentation information, adding a separation among specific different instances of objects belonging to a class. This method is used to separate stuff into individual, countable things. Semantic classes can be either things (objects with a well-defined shape, for example a car, a person) or stuff (amorphous background regions, for example grass, sky). In contrast with semantic segmentation task, where each pixel belongs to a set of predefined classes, in instance segmentation the number of instances is not known before.
Formal definition
Formally, instance segmentation can be defined as an extension of semantic segmentation as follows:
-
Let \(I={i_{1}, i_{2}, ...i_{n}}\) be the set of all the instances of countable objects in the scene (image).
-
Then the cardinality of the set \(|I|\) is equal to the total number of object instances that populate the scene.
-
To perform instance segmentation labeling on an image, it means establishing a ternary relation \(I_{seg} \in P \times C \times I\) that is valid when a pixel \(p_{x}\) represents a portion of an object belonging to one of the defined classes \(c_{y}\) and to a specific object instance \(i_{z}\). \(P \times C \times I = { (p_{1},c_{1},i_{1}), (p_{1},c_{1},i_{2}), ... (p_{n},c_{m},i_{l}) }\)
| Instance awareness may be added to any kind of semantic segmentation described before by extending the relation to an additional instance set. |
Let \(D_{in} \subseteq P\) be the domain of the instance segmentation relation \(I_{seg}\).
-
Instance unique segmentation when \(\forall p_{x} \in D_{in}, \exists! c_{y} \in C, \exists! i_{z} \in I: (p_{x},c_{y},i_{z}) \in I_{seg}\). This is the case when each labeled pixel is associated with exactly one class and exactly one instance of that class.
-
Multi-class multi-instance segmentation when \(\exists p_{x} \in D_{in}, \exists c_{1},c_{2}, ... c_{c} \in C, \exists i_{1},i_{2},... i_{i} \in I : (p_{x},c_{1},i_{1}),(p_{x},c_{1},i_{2})... (p_{x},c_{c},i_{i}) \in I_{seg}\). This is the case when each labeled pixel may be associated with more than one class and with more than one instance of those classes.
| Starting from this general definition, all possible particular cases, permutations, or ways to construct semantic and instance segmentation labeling can be covered. |
Related topics
7. Multi-sensor data labeling
7.1. Introduction
Multi-sensor data labeling is the process of enriching data streams with information on the location and the characteristics of labeled objects or the entire scenario at a given point in time.
Labels summarize relevant semantic entities and show their spatiotemporal location within the data through spatiotemporal constructs, such as labeling geometries. There are different types of labeling geometries. Each type provides a suitable input representation for specific computer vision and machine-learning tasks.
This chapter covers multi-sensor data labeling in detail, including the following topics:
-
List the raw data considered relevant for the multi-sensor data labeling use case.
-
Introduce and describe in detail the annotation schema, its structure, elements, and the different ways of expressing labeling geometries, coordinate systems, transforms, and other information relevant for multi-sensor data labeling.
-
Describe the mechanisms that govern the reference to external knowledge repositories, such as ontologies, that organize and define the semantics of the labels.
-
Supported data types and their representation.
-
Provide examples that show how to utilize the schema to produce valid annotation instances in relevant specific cases.
Related deliverables
Related topics
7.1.1. Raw data sources for multi-sensor data labeling
Examples for raw data sources:
-
Images
-
Videos
-
Point clouds
7.2. Annotation schema
The annotation schema defines the structure of annotations, data types, and conventions needed to unambiguously interpret the annotations. The annotation data format specifies how the annotation data is encoded for storage in computer files.
The annotation schema is described and formatted as a JSON schema. It defines the shape which valid JSON annotation instances shall conform to. The structure of the ASAM OpenLABEL annotation schema is serialized in the ASAM OpenLABEL JSON schema file. The annotation schema itself conforms to the JSON schema Draft 7 specification [13].
The annotation schema of ASAM OpenLABEL addresses the following general features related to multi-sensor data labeling:
-
Labeling different spatiotemporal objects.
-
Static and dynamic (time) properties of objects.
-
Geometric and non-geometric attributes for objects.
-
Nested attributes.
-
Management of coordinate systems, odometry and sensor configuration.
-
Multi-source (sensor) annotations for objects.
-
Persistent identities of objects through time.
-
Linkage to ontologies and external resources.
-
Relations between elements, for example, object performs action.
-
Different type of elements: objects, actions, events, and contexts.
-
Customizable and optional fields.
The annotation schema defines three main characteristic aspects of annotation data:
-
Structure: How data is organized, using hierarchies and key-value dictionaries.
-
Types: Primitive data types for key-value items.
-
Conventions: Documented interpretation of data values.
The annotation schema for multi-sensor data labeling follows the same principles of the annotation schema for scenario tagging, meaning JSON and JSON schema, as described in chapter Scenario tagging.
7.3. Structure
The ASAM OpenLABEL annotation schema for multi-sensor data labeling is structured as a dictionary and can be described from top to bottom. This section contains diagrams intended to visualize the structure. The details of the structure can all be consulted at the ASAM OpenLABEL JSON schema file.
Any ASAM OpenLABEL JSON data shall have a root key named openlabel.
Its value is a dictionary containing the rest of the structure as described in the next sections.
The version of the schema shall be defined inside the metadata structure, using the key schema_version.
All other entries are optional.
JSON example
1
2
3
4
5
6
7
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
}
}
}
The following example shows a JSON payload corresponding to the first level items inside the root openlabel value, which are related to multi-sensor data labeling.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"openlabel": {
"objects": { ... },
"actions": { ... },
"events": { ... },
"contexts": { ... },
"relations": { ... },
"frames": { ... },
"frame_intervals": { ... },
"metadata": { ... },
"ontologies": { ... },
"resources": { ... },
"coordinate_systems": { ... },
"streams": { ... }
}
}
For multi-sensor data labeling, the ASAM OpenLABEL structure defines dictionaries for the elements, meaning objects, actions, events, contexts, and relations.
Each entry of the dictionary is a key-value pair where the key is a unique identifier of the element, for example, an object.
The value is the container of static information.
Supporting structures define the following:
-
ontologiesthat are used. -
External
resourcesto enable linked data. -
coordinate_systemsto explicitly specify how to transform data. -
streamswhich contain information on the data being labeled, for example, sensor information, such as intrinsic calibration parameters of cameras.
In case time-information is needed, for example, for labeling video sequences, the item frames contains a dictionary of containers at frame level.
frame_intervals summarize the frame intervals that contain information for this ASAM OpenLABEL annotation file.
Figure 14 shows the ASAM OpenLABEL data structure for multi-sensor data labeling.
Figure 15 shows the structure of the frame value.
Its structure is similar to the openlabel value as it contains dictionaries for the elements, meaning objects, actions, events, contexts, and relations.
Only the dynamic information inside them is detailed.
In addition, frame_properties may contain information about timestamping details, or transforms of specific coordinate systems and other stream properties.
Annotation data is stored as element data, for example, object_data, which each element may contain in the form of arrays of structures, organized per data type.
Figure 16 shows the structure of generic attributes (see Data types (generic)).
Figure 17 shows the structure of the geometric attributes (see Data types (geometric)).
7.4. Elements
objects, actions, events, contexts, and relations are elements.
These structures share similar properties in terms of attributes, types, and hierarchies.
-
objects: A structure to represent information about physical entities in scenes. Examples ofobjectsare pedestrians, cars, the ego-vehicle, traffic signs, lane markings, building, and trees. -
actions: A description of semantically meaningful acts being done. They may be defined for several frame intervals, similar toobjects, for example,isWalking. -
events: Instants in time which have semantic load.eventsmay trigger othereventsoractions, for example,startsWalking. -
contexts: Other descriptive information about the scene that contains no spatial or temporal information and therefore is not targeted byactionsorevents, for example:-
properties of the scene, such as
UrbanorHighway. -
weather conditions, such as
SunnyorCloudy. -
general information about the location, such as
GermanyorSpain.
-
Attributes
-
uid: A unique identifier that determines the identity of the element. It can be a simple unsigned integer (from 0 upwards, for example0) or a Universal Unique Identifier (UUID) of 32 hexadecimal characters, for example123e4567-e89b-12d3-a456-426614174000.uidmay not be sequential nor start at 0, which is useful for preserving identifiers from other label files. -
name: A friendly identifier of the element, is not unique but employed by human users to rapidly identify elements in the scene, for example,Peter. -
type: A semantic type of the element. It determines which class the element belongs to, for example,Car,Running, see Ontologies.
Optionally, elements may also have the following items:
-
ontology_uid: A string identifier of the ontology which contains the definition of thetypeof the element (see Ontologies). -
Element data, for example
object_data: Container of static information about the object (see Data types (geometric)). -
Element data pointers, for example,
object_data_pointers: Pointers to element data at frames (see Frames). -
frame_intervals: An array of frame intervals where the element exists.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"objects": {
"0": {
"name": "car1",
"type": "Car"
}
}
}
}
The example shows a sample object with the mandatory items name and type.
JSON only permits keys to be strings. Therefore, the integer unique identifiers shall be stringified: 0. However, carefully written APIs can parse JSON strings into integers for better access efficiency and sorting capabilities.
|
Rules
-
All elements shall have a
uidas key. -
The
uidshall be unique for each element type. -
Each element type (action, object, event, context, and relation) may have its own list of unique identifiers.
-
All elements shall have a
type. -
All elements shall have a
name. The entry can be left empty as they are not used to index the elements.
7.4.1. Element data
The main mechanism to add information about an element is to define element data, using the data types defined in Data types (geometric). Element data can be added statically or dynamically.
Rules
-
Static element data shall be added at the element value, under the corresponding key, for example,
object_data.
Static element data specifies the type of data used, for example,bboxorvec, which becomes the key for an array of such data types in order to have one or more of those data types.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"objects": {
"0": {
"name": "pedestrian1",
"type": "Pedestrian",
"object_data": {
"bbox" : [{
"name" : "body",
"val" : [303.73, 935.58, 135.62, 330.88]
}, {
"name" : "head",
"val" : [289.93, 814.08, 38.20, 39.96]
}
]
}
}
}
}
}
The example shows a single object of type Pedestrian with two bbox items, one to describe the body and the other for the head.
Rules
-
Dynamic element data shall be added similarly, but inside the corresponding
frame(see Frames). -
Element data may be nested inside other element data as
attributes.
| Only non-geometrical element data types can be nested (see Data types (geometric)). |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"objects": {
"0": {
"name": "car1",
"type": "Car",
"object_data": {
"bbox" : [{
"name" : "shape",
"val" : [100, 100, 500, 300],
"attributes": {
"boolean": [{
"name": "visible",
"val": true
},
{
"name": "interpolated",
"val": false
}
]
}
}
]
}
}
}
}
}
The example shows string and num attributes added to a bbox.
| Attributes are nested just like any other element data and therefore can contain arrays of element data, indexed by type. |
7.4.2. Universal Unique Identifiers (UUID)
UUIDs in this specification are derived by using RFC 4122 [16].
When using UUIDs, the keys are substituted by 32 hexadecimal character strings.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"objects": {
"c44c1fc2-ee48-4b17-a20e-829de9be1141": {
"name": "van1",
"type": "Van"
}
}
}
}
The example shows that the key identifier of an object is a string containing 32 hexadecimal characters following the UUID convention.
7.5. Frames
In frames all dynamic (temporal) information of the annotations shall be specified at frame level.
Each frame is indexed within the ASAM OpenLABEL JSON data with an integer number.
The frame number is a ASAM OpenLABEL identifier of a certain instant in time. Properties of the frame can be specified to match specific timestamps or frame indexes in video sequences (see Frame properties).
| In multi-stream annotation data, a frame may represent several time instants as sensor data might not be perfectly aligned (see Synchronization). |
Class
frame
A frame is a container of dynamic, timewise, information.
Additional properties: |
false |
Type: |
object |
| Name | Type | Additional properties | Reference | Description |
|---|---|---|---|---|
object |
false |
#/definitions/action_data |
This is a JSON object that contains dynamic information on OpenLABEL actions. Action keys are strings containing numerical UIDs or 32 bytes UUIDs. Action values may contain an "action_data" JSON object. |
|
object |
false |
#/definitions/context_data |
This is a JSON object that contains dynamic information on OpenLABEL contexts. Context keys are strings containing numerical UIDs or 32 bytes UUIDs. Context values may contain a "context_data" JSON object. |
|
object |
false |
#/definitions/event_data |
This is a JSON object that contains dynamic information on OpenLABEL events. Event keys are strings containing numerical UIDs or 32 bytes UUIDs. Event values may contain an "event_data" JSON object. |
|
object |
true |
#/definitions/stream |
This is a JSON object which contains information about this frame. |
|
object |
false |
#/definitions/object_data |
This is a JSON object that contains dynamic information on OpenLABEL objects. Object keys are strings containing numerical UIDs or 32 bytes UUIDs. Object values may contain an "object_data" JSON object. |
|
relations |
object |
false |
This is a JSON object that contains dynamic information of OpenLABEL relations. Relation keys are strings containing numerical UIDs or 32 bytes UUIDs. Relation values are empty. The presence of a key-value relation pair indicates the specified relation exists in this frame. |
7.5.1. Frame intervals
The frame_intervals key defines the array of frame intervals for which the ASAM OpenLABEL JSON data contains information.
Class
frame_interval
A frame interval defines a starting and ending frame number as a closed interval. That means the interval includes the limit frame numbers.
Additional properties: |
false |
Type: |
object |
| Name | Type | Description |
|---|---|---|
frame_end |
integer |
Ending frame number of the interval. |
frame_start |
integer |
Initial frame number of the interval. |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"frame_intervals": [{
"frame_start": 0, "frame_end": 1
}, {
"frame_start": 5, "frame_end": 7
}
],
"frames": {
"0": { ... },
"1": { ... },
"5": { ... },
"6": { ... },
"7": { ... }
}
}
}
The example shows frames indexed as 0, 1, 5, 6, and 7.
The frame_intervals show the corresponding two intervals.
| Frame intervals are also properties of elements, specifying the periods of time where the element exists or has data. Using several frame intervals makes it possible to explicitly declare time gaps where the element disappears or does not exist, while maintaining the same uid. |
Inside each frame, dynamic information about elements may be included, using the same structure defined for elements.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"frames": {
"0": {
"objects": {
"1": {}
}
},
"1": {
"objects": {
"1": {}
}
}
},
"objects": {
"1": {
"name": "van1",
"type": "Van",
"frame_intervals": [{"frame_start": 0, "frame_end": 1}]
}
}
}
}
The example shows an object which exists in frames 0 and 1 but has no specific information at those frames.
If the specific information of the object for a given frame is nothing but its existence, then the object information at such frame is just a pointer to its unique identifier, as shown in the example above.
When frame-specific information is added, it is enclosed as object_data inside the corresponding frame and object (see Element data).
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"frames": {
"0": {
"objects": {
"1": {
"object_data": {
"bbox": [{
"name": "shape",
"val": [12, 867, 600, 460]
}
]
}
}
}
},
"1": { ... }
},
"objects": {
"1": {
"name": "van1",
"type": "Van",
"frame_intervals": [{"frame_start": 0, "frame_end": 1}]
}
}
}
}
The example shows an object which exists in frames 0 and 1. The object has specific geometric information, for example, a bbox named shape at frame 0.
7.5.2. Element data pointers
Since element data is not indexed by integer unique identifiers, such as elements, the structure defines a mechanism to have an index over each element data by adding element data pointers.
For example, object_data_pointers within an object contain key-value pairs to identify which object_data names are used and which are their associated frame_intervals.
Class
element_data_pointers
This is a JSON object which contains OpenLABEL element data pointers. Element data pointer keys shall be the "name" of the element data this pointer points to.
Additional properties: |
false |
Type: |
object |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"objects": {
"0": {
"name": "car0",
"type": "car",
"frame_intervals": [{"frame_start": 0, "frame_end": 10}],
"object_data": {
"text": [{
"name": "color",
"val": "blue"
}
],
},
"object_data_pointers": {
"color": {
"type": "text",
},
"shape": {
"type": "bbox",
"frame_intervals": [{"frame_start": 0, "frame_end": 10}],
"attributes": {
"visible": "boolean"
}
}
}
}
},
"frames": {
"0": { ... },
...
"10": { ... }
}
...
}
}
The example shows that the pointers may refer to static (frame-less, color attribute) and dynamic (frame-specific, shape attribute) object_data and also contains information about the nested attributes (visible attribute of shape).
This feature is useful for rapidly retrieving element data information from the ASAM OpenLABEL JSON data, without the need to explore the entire set of frames.
7.5.3. Frame properties
Frame properties may include three types of details about the frame:
-
timestamp: A relative or absolute time reference that specifies the time instant this frame corresponds to. -
streams(see Streams): Sensors may have dynamic properties for a certain specific instant, such as intrinsic calibration data orsyncdetails (see Synchronization). -
transforms: Coordinate systems may have changed their relative position with respect to parent coordinate systems for specific frames (see Coordinate Systems and Transforms).
Class
frame
A frame is a container of dynamic, timewise, information.
Additional properties: |
false |
Type: |
object |
| Name | Type | Additional properties | Reference | Description |
|---|---|---|---|---|
object |
false |
#/definitions/action_data |
This is a JSON object that contains dynamic information on OpenLABEL actions. Action keys are strings containing numerical UIDs or 32 bytes UUIDs. Action values may contain an "action_data" JSON object. |
|
object |
false |
#/definitions/context_data |
This is a JSON object that contains dynamic information on OpenLABEL contexts. Context keys are strings containing numerical UIDs or 32 bytes UUIDs. Context values may contain a "context_data" JSON object. |
|
object |
false |
#/definitions/event_data |
This is a JSON object that contains dynamic information on OpenLABEL events. Event keys are strings containing numerical UIDs or 32 bytes UUIDs. Event values may contain an "event_data" JSON object. |
|
object |
true |
#/definitions/stream |
This is a JSON object which contains information about this frame. |
|
object |
false |
#/definitions/object_data |
This is a JSON object that contains dynamic information on OpenLABEL objects. Object keys are strings containing numerical UIDs or 32 bytes UUIDs. Object values may contain an "object_data" JSON object. |
|
relations |
object |
false |
This is a JSON object that contains dynamic information of OpenLABEL relations. Relation keys are strings containing numerical UIDs or 32 bytes UUIDs. Relation values are empty. The presence of a key-value relation pair indicates the specified relation exists in this frame. |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"openlabel": {
"frames": {
"0": {
"frame_properties": {
"timestamp": "2020-04-11 12:00:01",
"streams": {
"Camera1": {
"stream_properties": {
"intrinsics_pinhole": {
"camera_matrix_3x4": [ 1000.0, 0.0, 500.0, 0.0,
0.0, 1000.0, 500.0, 0.0,
0.0, 0.0, 0.0, 1.0],
"distortion_coeffs_1xN": [],
"height_px": 480,
"width_px": 640
},
"sync": {
"frame_stream": 1,
"timestamp": "2020-04-11 12:00:02"
}
}
}
}
}
}
}
}
}
The example shows frame_properties of frame 0, containing information about a timestamp and some properties specific for frame 0 corresponding to stream Camera1.
The sync field within stream_properties defines the frame number of the stream that corresponds to this frame, along with timestamping information, if needed.
This feature is useful for annotating multiple cameras which might not be perfectly aligned.
In such cases, frame 0 of the ASAM OpenLABEL JSON data corresponds to frame 0 of the first stream to occur.
In this way, frame_stream shall identify which frame of this stream corresponds to the frame in which it is enclosed.
|
7.5.4. Synchronization
This section provides detail on the synchronization of multiple streams and their time information frames.
Labels can be produced to be related to specific streams, for example, cameras and LiDAR. When multiple streams of this type are present and labels need to be produced for several of them, for example, bounding boxes for images of the camera and cuboids for the point clouds of the LiDAR, a synchronization and matching strategy is needed.
In determining the synchronization of the data streams, for example, images and point clouds correspond to the data source set-up and not to the annotation stage.
That means that the data container may contain precise HW timestamps for images and point clouds.
In addition, the correspondence between frame indexes for multiple cameras, for example, frame 45 of camera 1, corresponds because of proximity in time to frame 23 of camera 2 may be due to a different frequency they use or if they started with some delay.
Therefore, when producing labels for such different frames, the annotation format needs to allocate space and structure for such timing information. This shall be done in a way that all labels are easily associated with their corresponding data and time.
The JSON schema defines the frame data containers, which correspond to master frame indexes.
One stream
In many cases, there is a single stream of data that needs to be labeled, for example, an image sequence.
Simple case
The simplest use-case for a stream:
-
Nothing needs to be specified, for example, sensor names or timestamps.
-
Frame indexes are integers, starting from 0.
-
master frame index coincides with stream-specific frames index. This means stream-specific frame index is not labeled.
Figure 22 shows a simple timeline where frames represent discrete samples of time and are indexed using a master frame index.
JSON example
1
2
3
4
5
6
7
8
{
"openlabel": {
"frames": {
"0": { ... },
"1": { ... }
}
}
}
The example shows the indexing approach in ASAM OpenLABEL where frames are indexed using an ordered numeric string, for example, 0 and 1.
Stream frame index not coincident with master frame index
It is possible to define a specific frame numbering for stream-specific frames inside the master frame index, which always starts from 0. This means that these counts are non-coincident, and this reflects the fact that the stream indexes are discontinuous or start at a certain value.
Figure 23 shows a simple timeline where the master frame index starts at 0 and corresponds to a specific frame index of a stream, starting at 45.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"openlabel": {
"frames": {
"5": {
"frame_properties": {
"timestamp": "2020-04-11ย 12:00:01",
"streams": {
"Camera1": {
"stream_properties": {
"sync": { "frame_stream": 91}
}
}
}
}
}
}
}
}
The example shows how the master frame index, for example, 5, can be linked to a stream-specific frame index, for example, 91, using stream_properties inside frame_properties.
Other properties, such as timestamps, may be added for detailed timing information of each stream frame.
Figure 24 shows a simple timeline with defined frames which span over a certain period of time, for example, corresponding to the exposure time of a camera.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
{
"openlabel": {
"frames": {
"0": {
"frame_properties": {
"timestamp": "2020-04-11ย 12:00:01",
"aperture_time_us": "56"
}
}
}
}
}
The example shows how a certain frame may have customized frame_properties, such as aperture_time_us, to define the exposure time in microseconds.
Multiple streams
Complex labeling examples may include multiple streams, for example, labels that need to be defined for different sensors.
Same frequency and same start and indexes
The master frame index coincides with each of the stream indexes. It is fully synchronized.
Figure 25 shows two timelines corresponding to two streams, Camera1 and Camera2, with stream-specific frame indexes coinciding with the master frame index.
Same frequency and different start and indexes
It is possible to define stream indexes independently to reflect, for example, that one stream is delayed by one frame but still synchronized.
Figure 26 shows how two different timelines corresponding to two different streams can be shifted so the stream-specific frame indexes do not match with the master frame index.
In the example, the master frame index = 1 corresponds to Camera1 in frame 1 and Camera2 in frame 80.
Note, that in this example, for master frame = 0, there is no information about Camera2 to represent that this stream started producing information after the stream of Camera1.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
"openlabel": {
"frames": {
"1": {
"frame_properties": {
"timestamp": "2020-04-11ย 12:00:01",
"streams": {
"Camera1": {
"stream_properties": {
"sync": { "frame_stream": 1}
}
},
"Camera2": {
"stream_properties": {
"sync": { "frame_stream": 0}
}
}
}
}
}
}
}
}
The example shows how different stream specific frame indexes can be defined by a certain master frame index as frame_properties.
Other possible differences in synchronization, for example jitter, may be labeled by embedding timestamping information for each stream frame.
Figure 27 shows another use-case where frames do not follow a perfectly periodic sampling rate.
This feature can be labeled, adding a jitter variable as a frame_properties.
Same frequency and constant shift
If the frame shift is constant, a more compact representation is possible by specifying the shift at root stream_properties rather than on each frame, as was shown in the previous examples:
Figure 28 shows a specific case where the time shift between two streams (Camera1 and Camera2) is constant and kept fixed for the entire scene.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"openlabel": {
"streams": {
"Camera1": {
"stream_properties": {
"sync": { "frame_stream": 0}
}
},
"Camera2": {
"stream_properties": {
"sync": { "frame_stream": 1}
}
}
}
}
}
The example shows how to represent a fixed time shift between a certain stream and the master frame index as stream_properties instead of as frame_properties.
In the example, Camera2 is shifted one frame ahead of the master frame index, while Camera1 has shift 0.
Different frequency
Streams might represent data coming from sensors with different capturing frequency, for example, a camera at 30 Hz and a LiDAR at 10 Hz. Following the previous examples, it is possible to embed stream frames inside master frames so the frequency information is also included.
Figure 29 shows a typical configuration where the master frame index follows the fastest stream, in this case the Camera1 stream.
Figure 30 shows a typical configuration where the master frame index follows the slowest stream, in this case the Lidar1 stream.
Specifying coordinate system for each label
After defining the coordinate systems (see Coordinate Systems and Transforms) and the timing information, as shown in the examples above, labels for elements and element data may be declared for specific coordinate systems.
Coordinate systems of specific streams can be defined as well. In this way, for each image the information about labels, timings and coordinate systems are given together.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
{
"openlabel": {
"frames": {
"0": {
"objects": {
"0": {
"object_data": {
"bbox": [
{
"name": "shape2D",
"val": [600, 500, 100, 200],
"coordinate_system": "Camera1"
}
],
"cuboid": [
{
"name": "shape3D",
"val": [ ... ],
"coordinate_system": "Lidar1"
}
]
}
}
},
"frame_properties": {
"streams": {
"Camera1": {
"stream_properties": {
"sync": { "frame_stream": 1, "timestamp": "2020-04-11 12:00:07"},
}
},
"Lidar1": {
"stream_properties": {
"sync": { "frame_stream": 0, "timestamp": "2020-04-11 12:00:10"}
}
}
}
}
}
},
"objects": {
"0": {
"name": "car1",
"type": "car",
"coordinate_system": "Camera1",
...
}
}
}
}
The example shows that objects may be expressed with respect to a specific coordinate_system.
For example, objects = 0 bounding box with the name shape2D is expressed with respect to the Camera1 coordinate system.
The cuboid with name shape3D is expressed with respect to the Lidar1 coordinate system.
7.6. Streams
Complex scenes may be observed by several sensing devices, which produce multiple streams of data. Each of these streams might have different properties, for example, intrinsic calibration parameters and frequency. The ASAM OpenLABEL JSON schema defines the option to specify such information for a multi-sensor, and thus a multi-stream, which is set-up by allocating space for such stream-specific descriptions. In addition, it offers the ability to choose for each specific labeled element what stream they correspond to.
Class
streams
This is a JSON object which contains OpenLABEL streams. Stream keys can be any string, for example, a friendly stream name.
Additional properties: |
false |
Type: |
object |
stream
A stream describes the source of a data sequence, usually a sensor.
Additional properties: |
false |
Type: |
object |
| Name | Type | Reference | Description |
|---|---|---|---|
description |
string |
Description of the stream. |
|
#/definitions/stream_properties |
Additional properties of the stream. |
||
type |
string |
A string encoding the type of the stream. |
|
uri |
string |
A string encoding the URI, for example, a URL, or file name, for example, a video file name, the stream corresponds to. |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"streams": {
"Camera1": {
"type": "camera",
"uri": "./some_path/some_video.mp4",
"description": "Frontal camera",
"stream_properties": {
"intrinsics_pinhole": {
"camera_matrix_3x4": [ 1000.0, 0.0, 500.0, 0.0,
0.0, 1000.0, 500.0, 0.0,
0.0, 0.0, 0.0, 1.0],
"distortion_coeffs_1xN": [],
"height_px": 480,
"width_px": 640
}
}
}
}
}
}
The example shows the item streams, which contains information about the streams that contain the data to be labeled.
In the example, a stream with name Camera1 is defined to be of type camera and to have some stream_properties, such as intrinsic calibration parameters.
7.7. Coordinate systems
A coordinate system is a numerical system to specify the coordinates of points and other geometric elements in a given space.
ASAM OpenLABEL defines mechanisms to represent labels which are often related to numerical properties of objects, such as position, size, or other physical magnitudes. Different coordinate systems may exist in arbitrary scenes that contain objects. Therefore, labels that represent numerical magnitudes of the objects need to be specified with respect to specific coordinate systems.
ASAM OpenLABEL has been devised to consider scenes as Euclidean spaces and right-handed Cartesian coordinate systems, where coordinates specify the distance from the origin along the specified axis. 2D and 3D coordinate systems are considered.
Points and other geometries expressed with respect to a particular coordinate system can be expressed with respect to another coordinate system using transformations between the coordinate systems.
Labels may be defined as relative to specific coordinate systems. This is particularly necessary for geometric labels, such as polygons, cuboids, or bounding boxes, which define magnitudes under a certain coordinate system. For example, a 2D line may be defined within the coordinate system of an image frame, and a 3D cuboid inside a 3D Cartesian coordinate system.
Coordinate systems shall be declared with a friendly name, used as an index, and in the form of parent-child links to establish their hierarchy:
-
type: The type of coordinate system is defined so reading applications have a simplified view of the hierarchy:-
scene_cs, this corresponds to static coordinate systems. -
local_cs, this is a coordinate system of a rigid body, such as a vehicle, which carries with it the sensors. -
sensor_cs, a coordinate system attached to a sensor. -
custom_cs, any other coordinate system defined by the user.
-
type does not restrict the definition of complex coordinate system hierarchies.
It is only intended to give a hint for parsing applications.
|
-
parent: Each coordinate system can declare its parent coordinate system in the hierarchy. -
pose_wrt_parent: A default or static pose of this coordinate system with respect to the declared parent. It may be defined in several ways:-
4x4 homogeneous matrix
-
quaternion and translation
-
Euler angles and translation
-
| If not defined, the coordinate system is assumed to be exactly the same as its parent coordinate system. |
-
children: The list of children for this coordinate system.
In addition, as multiple coordinate systems may be defined, it is necessary to define mechanisms to declare how to convert values of magnitudes from one coordinate system to another. Therefore, transforms between two coordinate systems are also defined.
Class
coordinate_systems
This is a JSON object which contains OpenLABEL coordinate systems. Coordinate system keys can be any string, for example, a friendly coordinate system name.
Additional properties: |
false |
Type: |
object |
coordinate_system
A coordinate system is a 3D reference frame. Spatial information on objects and their properties can be defined with respect to coordinate systems.
Additional properties: |
true |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
children |
array |
List of children of this coordinate system. |
||
parent |
string |
true |
This is the string UID of the parent coordinate system this coordinate system is referring to. |
|
#/definitions/transform_data |
JSON object containing the transform data. |
|||
type |
string |
true |
This is a string that describes the type of the coordinate system, for example, "local", "geo"). |
7.8. Transforms
A transform is a mathematical expression which determines how a coordinate system relates to another. In ASAM OpenLABEL, transforms are composed of a rotation and a translation component in 3D Euclidean space. Transformations are understood as passive and are thus equivalent to positions between coordinate systems. Different alternatives are supported:
-
Quaternion and translation vector
-
4x4 Homogeneous matrix
-
Vector of Euler angles with sequence code, and translation vector
Class
transform
This is a JSON object with information about this transform.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
dst |
string |
true |
The string UID, that is, the name, of the destination coordinate system for geometric data converted with this transform. |
|
src |
string |
true |
The string UID, that is, the name, of the source coordinate system of geometrical data this transform converts. |
|
true |
#/definitions/transform_data |
JSON object containing the transform data. |
transform_data
JSON object containing the transform data.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"coordinate_systems": {
"odom": {
"type": "scene_cs",
"parent": "",
"children": [
"vehicle-iso8855"
]
},
"vehicle-iso8855": {
"type": "local_cs",
"parent": "odom",
"children": [
"CAM_1",
"CAM_2"
]
},
"CAM_1" : {
"type" : "sensor_cs",
"parent" : "base",
"children" : [],
"pose_wrt_parent" : {
"matrix4x4" : [0.984807753012208, 0.0, 0.17364817766693033, 2.3, 0.0, 1.0, 0.0, 0.0, -0.17364817766693033, 0.0, 0.984807753012208, 1.3, 0.0, 0.0, 0.0, 1.0]
}
},
"CAM_2" : {
"type" : "sensor_cs",
"parent" : "base",
"children" : [],
"pose_wrt_parent" : {
"euler_angles" : [0.0, 0.17453292519943295, 0.0],
"translation" : [2.3, 0.0, 1.3],
"sequence" : "ZYX"
}
}
},
...
}
}
The example shows the coordinate_systems item having several coordinate systems defined, including coordinate systems specific for the cameras (CAM_1 and CAM_2) and other coordinate systems for the local and scene-level frameworks.
The transforms between coordinate systems may also be defined for each frame, overriding the default static pose defined above.
Transforms are defined with a friendly name used as index and the following properties:
-
src: The name of the source coordinate system. This shall be the name of a valid (declared) coordinate system. -
dst: The destination coordinate system. This shall be the name of a valid (declared) coordinate system. -
transform_src_to_dst: This is the transform expressed in algebraic form, for example, as a 4x4 matrix enclosing a 3D rotation and a 3D translation between the coordinate systems.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
{
"openlabel" : {
"metadata" : {
"schema_version" : "1.0.0"
},
"coordinate_systems" : {
"base" : {
"type" : "local_cs",
"parent" : "",
"children" : []
},
"world" : {
"type" : "scene_cs",
"parent" : "",
"children" : []
}
},
"frames" : {
"10" : {
"frame_properties" : {
"transforms" : {
"base_to_world" : {
"src" : "base",
"dst" : "world",
"transform_src_to_dst" : {
"matrix4x4" : [1.0, 0.0, 0.0, 0.1, 0.0, 1.0, 0.0, 0.1, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0]
}
}
}
}
},
"11" : {
"frame_properties" : {
"transforms" : {
"base_to_world" : {
"src" : "base",
"dst" : "world",
"transform_src_to_dst" : {
"euler_angles" : [0.0, 0.0, 0.0],
"translation" : [1.0, 1.0, 0.0],
"sequence" : "ZYX"
},
"custom_property1" : 0.9,
"custom_property2" : "Some tag"
}
}
}
}
}
}
}
The example shows that the relationship between coordinate systems can be defined with transforms which can be defined for specific frames inside frame_properties.
In the example, the transform between base and world coordinate systems is defined for frames 10 and 11.
In general, coordinate systems associated with sensors may have the same name as the corresponding streams.
For instance, Camera1 can be the name of a coordinate system and also the name of a stream.
In this way, a sensor, such as a camera or a LiDAR, has internal data, for example intrinsics, defined at streams.
External data is set-up with respect to other sensors at coordinate_systems or transforms at frame level.
|
With this structure, it is possible to describe particular and typical transformation cases, such as odometry poses of a vehicle with respect to a certain scene coordinate system:
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"frames": {
"0": {
"frame_properties": {
"transforms": {
"odom_to_vehicle-iso8855": {
"src": "odom",
"dst": "vehicle-iso8855",
"transform_src_to_dst": {
"matrix4x4": [1.0, 3.7088687554289227e-17, ...]
}
},
"raw_gps_data": [49.011212804408,8.4228850417969, ...],
"status": "interpolated"
}
}
}
}
...
}
}
The example shows a typical use case where the transforms encode the odometry, that is, the accumulated relative pose between a fixed coordinate system (in the example odom) and a moving coordinate system.
In the example, vehicle-iso8855 represents the usual coordinate system of a moving vehicle located in the rear axle, following the ISO 8855 convention, specified in [11].
By using additional properties, it is possible to embed detailed and customized information about the transforms, such as additional non-linear coefficients. In the example, the entries for raw_gps_data are only exemplary.
|
7.9. Ontologies
The ontologies item shall contain pointers to knowledge repositories, for example, URLs of ontologies that are used in the ASAM OpenLABEL JSON data to define the semantic type of elements.
Elements can then point to concepts in these ontologies, so an application may consult an element’s meaning or investigate additional properties.
The format of the pointers shall use a key-value structure, where the key is a non-constrained string as a unique identifier, and the value may be the URL of the ontology or knowledge repository.
Class
ontologies
This is the JSON object of OpenLABEL ontologies. Ontology keys are strings containing numerical UIDs or 32 bytes UUIDs. Ontology values may be strings, for example, encoding a URI. JSON objects containing a URI string and optional lists of included and excluded terms.
Additional properties: |
false |
Type: |
object |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"openlabel": {
"metadata": {
"schema_version": "1.0.0"
},
"ontologies": {
"0": "https://www.somedomain.org/ontology",
"1": "https://www.someotherdomain.org/ontology"
},
"objects": {
"0": {
"name": "car1",
"type": "Car",
"ontology_uid": 0
},
"1": {
"name": "person1",
"type": "Person",
"ontology_uid": 0
},
"2": {
"name": "mobile_phone1",
"type": "MobilePhone",
"ontology_uid": 1
}
}
}
}
The example shows that the objects car1 and person1 are of types Car and Person.
The definition of these types can be found at the ontology with ontology_uid = 0.
The definition of object mobile_phone1 can be found at the ontology with ontology_uid = 1.
7.10. Data types (geometric)
ASAM OpenLABEL defines geometric and non-geometric (generic) data types, which all together add the needed flexibility to represent any kind of information of labels or tags.
This section provides details about geometric data types for the multi-sensor data labeling use case.
Examples of object_data are used, but the ASAM OpenLABEL JSON schema also includes definitions of action_data, event_data, and context_data.
The difference is that only object_data can be of the geometric and non-geometric type.
Geometric object_data types are more complex and have specific fields.
Also, these types may contain generic object_data as attributes.
Rules
-
objectsshall have a unique identifier. -
object_datashall have a uniquename.
Related topics
7.10.1. Bounding boxes
Bounding boxes are geometric entities which enclose the shape of an object in Cartesian coordinates. Bounding boxes define minimum and maximum limits at each dimension so the entire object lies within the specified limits.
Bounding boxes are used to label objects and entities in 2D and 3D data representations, such as images or point clouds. Bounding boxes are useful as the most basic and compact representation of the position and size of an object. Bounding boxes have become the most popular labeling type for computer vision and machine learning because of its simplicity and good alignment with matrix operations in programming languages and hardware architectures.
There are three main bounding box types supported by ASAM OpenLABEL:
-
2D bounding box
-
2D rotated bounding box
-
3D bounding box (cuboid)
2D bounding box (bbox)
A 2D bounding box is defined as a rectangle by an array of four floating point numbers:
| Attribute | Unit | Description |
|---|---|---|
x |
px |
Specify the x-coordinate of the center of the rectangle. |
y |
px |
Specify the y-coordinate of the center of the rectangle. |
w |
px |
Specify the width of the rectangle in the x/y-coordinate system. |
h |
px |
Specify the height of the rectangle in the x/y-coordinate system. |
Table 12 shows the available attributes of a 2D bounding box.
Figure 38 shows a 2D bounding box on an image, enclosing an entire object defined by its center position (in pixels) and its width and height.
Class
bbox
A 2D bounding box is defined as a 4-dimensional vector [x, y, w, h], where [x, y] is the center of the bounding box and [w, h] represent the width (horizontal, x-coordinate dimension) and height (vertical, y-coordinate dimension), respectively.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
array |
true |
The array of 4 values that define the [x, y, w, h] values of the bbox. |
JSON example
1
2
3
4
"bbox": [{
"name": "head",
"val": [400, 200, 100, 120]
}]
The example shows a 2D bounding box serialized in JSON.
The center of the rectangle is specified by the points x=400 and y=200.
The dimensions of the rectangle are specified by width=100 and height=120.
For complex set-ups, it is possible to define the coordinate_system in which these magnitudes are expressed.
JSON example
It is also possible to embed non-geometric object data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
"bbox": {
"name": "head",
"val": [400, 200, 100, 120],
"coordinate_system": "Camera1",
"attributes" : {
"boolean" : [{
"name" : "visible",
"val" : false
}, {
"name" : "occluded",
"val" : false
}
]
}
}
The example shows non-geometric object data, such as visible and occluded, embedded in a bounding box.
An object can contain multiple bbox entries, for example, to represent the body, head, and arms of a human.
The same applies to all other object_data.
|
2D rotated bounding box (rbbox)
A 2D rotated bounding box is defined as a 5-dimensional vector by five numbers:
| Attribute | Unit | Description |
|---|---|---|
x |
px |
Specify the x-coordinate of the center of the rectangle. |
y |
px |
Specify the y-coordinate of the center of the rectangle. |
w |
px |
Specify the width of the rectangle in the x/y-coordinate system (horizontal, x-coordinate dimension). |
h |
px |
Specify the height of the rectangle in the x/y-coordinate system (vertical, y-coordinate dimension). |
alpha |
radians |
Specifies the rotation of the rotated bounding box. It is defined as a right-handed rotation, meaning positive from x-axes to y-axes. The origin of rotation is placed at the center of the bounding box, meaning x, y. |
Table 14 shows the available attributes of a 2D rotated bounding box.
Figure 40 shows a 2D rotated bounding box on an image, enclosing an entire object defined by its center position (in pixels), its width and height, and the rotation angle.
Class
rbbox
A 2D rotated bounding box is defined as a 5-dimensional vector [x, y, w, h, alpha], where [x, y] is the center of the bounding box and [w, h] represent the width (horizontal, x-coordinate dimension) and height (vertical, y-coordinate dimension), respectively. The angle alpha, in radians, represents the rotation of the rotated bounding box, and is defined as a right-handed rotation, that is, positive from x to y axes, and with the origin of rotation placed at the center of the bounding box (that is, [x, y]).
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
array |
true |
The array of 5 values that define the [x, y, w, h, alpha] values of the bbox. |
JSON example
1
2
3
4
"rbbox": [{
"name": "outline",
"val": [400, 200, 100, 120, 0.785]
}]
The example shows a 2D rotated bounding box serialized in JSON.
The center of the 2D rotated bounding box is specified by the points x=400 and y=200.
The dimensions of the 2D rotated bounding box are specified by the width=100 and height=120.
The rotation of the 2D rotated bounding box is specified by alpha=0.785.
3D bounding box (cuboid)
A 3D bounding box is a cuboid in 3D Euclidean space. It is defined by position, rotation, and size. Position and size are defined as 3-vectors, while rotation can be expressed in two alternative forms, using 4-vector quaternion notation or 3-vector Euler notation (to be applied in ZYX order equivalent to yaw-pitch-roll order).
One option is that the cuboid is defined as (x, y, z, qa, qb, qc, qd, sx, sy, and sz), where:
| Attribute | unit | Description |
|---|---|---|
x |
m |
Specifies the x-coordinate of the 3D position of the center of the cuboid. |
y |
m |
Specifies the y-coordinate of the 3D position of the center of the cuboid. |
z |
m |
Specifies the z-coordinate of the 3D position of the center of the cuboid. |
qa |
Specify the quaternion in non-unit form (x, y, z, and w) as in the SciPy convention. |
|
qb |
Specify the quaternion in non-unit form (x, y, z, and w) as in the SciPy convention. |
|
qc |
Specify the quaternion in non-unit form (x, y, z, and w) as in the SciPy convention. |
|
qd |
Specify the quaternion in non-unit form (x, y, z, and w) as in the SciPy convention. |
|
sx |
m |
Specifies the x-dimension of the cuboid or the x-coordinate. |
sy |
m |
Specifies the y-dimension of the cuboid or the y-coordinate. |
sz |
m |
Specifies the z-dimension of the cuboid or the z-coordinate. |
Table 16 shows the available attributes of a 3D bounding box (cuboid) using quaternion. The quaternions conform to the SciPy convention [17].
Another option is that the cuboid is defined as (x, y, z, rx, ry, rz, sx, sy, and sz), where:
| Attribute | unit | Description |
|---|---|---|
x |
m |
Specifies the x-coordinate of the 3D position of the center of the cuboid. |
y |
m |
Specifies the y-coordinate of the 3D position of the center of the cuboid. |
z |
m |
Specifies the z-coordinate of the 3D position of the center of the cuboid. |
rz |
rad |
Specify Euler angles, rz = yaw. |
ry |
rad |
Specify Euler angles, ry = pitch. |
rx |
rad |
Specify Euler angles, rx = roll. |
sx |
m |
Specifies the x-dimension of the cuboid or the x-coordinate. |
sy |
m |
Specifies the y-dimension of the cuboid or the y-coordinate. |
sz |
m |
Specifies the z-dimension of the cuboid or the z-coordinate. |
Table 17 shows the available attributes of a 3D bounding box (cuboid) using Euler angles.
Figure 42 shows a 3D bounding box (cuboid) on 3D space plot. The same cuboid can be expressed using the two defined alternatives: using Euler angles in ZYX order, or with a Quaternion. Note the center of the cuboid is used as origin of the cuboid coordinate system.
Class
cuboid
A cuboid or 3D bounding box. It is defined by the position of its center, the rotation in 3D, and its dimensions.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
true |
List of values encoding the position, rotation and dimensions. Two options are supported, using 9 or 10 values. If 9 values are used, the format is (x, y, z, rx, ry, rz, sx, sy, sz), where (x, y, z) encodes the position, (rx, ry, rz) encodes the Euler angles that encode the rotation, and (sx, sy, sz) are the dimensions of the cuboid in its object coordinate system. If 10 values are used, then the format is (x, y, z, qx, qy, qz, qw, sx, sy, sz) with the only difference of the rotation values which are the 4 values of a quaternion. |
JSON example
1
2
3
4
"cuboid": [{
"name": "shape",
"val": [12.0, 20.0, 0.0, 1.0, 1.0, 1.0, 1.0, 4.0, 2.0, 1.5]
}]
An alternative is defined by nine numbers, substituting the quaternion vector by 3 Euler angles (rx, ry, rz) and respectively defining the rotation of the object coordinate system in the x-, y- and z-axes.
The rotation is assumed to be applied ZYX.
7.10.2. Semantic segmentation: image and poly2d
Semantic segmentation responds to the need for more detailed annotations by defining one or more labels per pixel of a given image (for details about the different possible use cases and semantic segmentation taxonomy, see concept Semantic segmentation and example Semantic segmentation).
To facilitate visual perception, a color code for each class may be specified. The information on a certain pixel belonging to a certain category is expressed by assigning a specific RGB value to that pixel, which visually represents that category.
In terms of the data format, such dense information can be tackled with different approaches. Each of them has different purposes or responds to different needs:
-
Separate images: Historically, semantic segmentation information has been stored as separate images, usually formatted as PNG images (lossless). This is the simplest approach and the one offering the smallest storage footprint. However, there are many separate files in the file system. Therefore, the main ASAM OpenLABEL JSON file may contain one or more URLs/URIs of these images.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"objects": {
"0": {
"name": "",
"type": "",
"object_data": {
"string": [
{
"name": "semantic mask uri - dictionary 1",
"val": "/someURLorURI/someImageName1.png"
},{
"name": "semantic mask uri - dictionary 2",
"val": "/someURLorURI/someImageName2.png"
}
]
}
}
}
-
Embedded images: Image content can be written in code, using any image processing software. The code is expressed as a string in base64 and then embedded within the JSON file. This approach creates large JSON files (base64 adds 4/3 overhead) but mitigate the need to manage multiple files:
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"objects": {
"0": {
"name": "",
"type": "",
"object_data": {
"image": [
{
"name": "semantic mask - dictionary 1",
"val": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAKACAIAAADLqjwFAAAKu0lEQVR42u3dPW7VYBCGUSe6JWW6NCyDErEvKvaFKFkGDR0lfYiEkABFN8n9+fzOzDkFDRLgAT0a2Z/NtgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABc0uevb25MASCwzo8/CjRAYp0FGiC0zgINEFpngQYIrbNAA4TWWaABQuss0AChdRZogMQ0CzRAbp0FGiC0zgINEFpngQYIrbNAA4TWWaABQuv84d1PgQZIrLMNGiC0zrmBfv/2o7/U0r58+2QIcE6dH90aH0BgnQUaILTOjw6GCLBvmp+ssw0aILTOAg0QWmeBBgits0ADhNZ585AQYH2dn02zDRogt85VN+jjb6nt9RbizY/vD3f3/rGCOl+kzptbHCdU+LSf1W5Q51fVWaDPLfLJv45egzoL9M5dfvbXV2pQZ4FOSfOTv51MQ9c0n1xngd4zzTIN6izQ0WmWaVBngY5Os0yDOgt0dJplGtT5b0PfJAyvc7k/J6jzxetcdYM+513BcsmzSkOhOl8qzRM36LoLqVUaptV5VqCrN06jYVSdBwW6R900GubUeUqgO3VNo2FInUcEul/RNBom1Hlrfw66a8t8exp2T/O169x8g+69adqjoXedOwd6Qr80GhrXuW2g55RLo6FrnXsGelqzNBpa1nnzNTtAnQPT3HODnrlOWqKhX527BXpypzQamtV5G/u5UUCdw+vcKtBWSBOATnW2QQPqHFrnPoG2PJoDNKvz5pgdIM2ZdW6yQVsbTQP61XlzDxpQ58w6dwi0hdFMoGWdbdCAOofWeav+kNCqeGQyvuiPOtdNsw0aUOfcOgs0oM65Cgfa/Q3zgcZ1tkED6izQAOr8Sl71BgaluVCdbdCAOgv0pXkCZkqoc+8626ABdRZoAHUWaECdG9R5c4oDaFznumm2QQPqLNAA6izQgDr3qLNAA+os0Bfl/QuzQp3b13kreorj4e5ed14+K0NgQpr71XlziwNQZ4EGUGeBBtRZoAHU+Xq86g2UrHPvNNugAXUWaAB1FmhAnQV6Z96/MCXUWaAB1HkfTnEA6WmeWWcbNKDOAg2gznMC7QmY+aDOAg2gzgINqLM6/1H7FIcv9x+ZjCFQtM7SbIMG1FmgrYpmgjqrsw0aUGeBtjCaBqizQAPqPFWTb3E4zmF9pmKa1dkGDaizQFseTQB1VmeBBtRZoK2Qrh3UeR/dPtg/82mhOlOlztI8d4MG1FmgrZOuF3VWZ4HWLFeKOgu0crlGUGeB1i9XhzozO9CNK6bOqPMEh/ZX2O/gnTqTn2Z1tkFPLJo6o84CrdGuAtRZoNVNnVFnhge6dOPUGXUe6DDtgn+XrtBjQ2mmRJ2l2QY9rnrqjDrboOc2OnaVlmbUmcPw6w/MtDSjzgh0XKalGXVGoOMyLc2oMwIdl2lpplya1VmgIzJ9vVLrMuqMQF+4pCf3WpFRZwR6aa//a7cKo85ckP80dkW7QZ0RaECd+3CLA9RZmm3QgDoj0IA6CzSgzgg0oM4CDagzCZziAGlWZxs0oM4INKizOgs0oM4INKDOw3hICHPrLM02aECdEWhQZ3UWaECdEWhAnQUaUGcEGlBnnuWYHfRPszrboAF1RqBBndVZoAF1RqABdeYfHhJCwzpLsw0aUGcEGtRZnQUaUGcEGlBnBBrUmYKc4oDaaVZnGzSgzgg0qLM6I9Cgzgg0oM4INKgzXTjFAZXqLM02aECd2d+NEYA6I9CAOiPQoM4INKDOCDTMSrM6I9Cgzgg0qLM6I9Cgzgg0oM4INHSvszQj0KDOCDSoszoj0KDOCDSgzgg0qDMCDSxOszoj0KDOCDSoszoj0KDOCDSgzgg0qDMINKyvszQj0KDOCDSoszoj0KDOCDSgzgg0qDMINCxOszoj0KDOCDSoszoj0KDOINCgzgg0dK+zNCPQoM4INKizOiPQoM4g0KDOCDSoMwg0qDMCDdKszgg0qDMINOqszgg0qDMINKgzAg3t6yzNCDSoMwg06qzOCDSoMwg0qDMCDeoMAg2L06zOCDSoMwg06qzOCDSoMwg0qDMCDeoMAg3r6yzNCDSoMwg06qzOCDSoMwg0qDMCDeoMAg2L06zOCDSoMwg06qzOCDSoMwg0qDMINN3rLM0INKgzCDTqrM4INKgzCDSoMwg06gwCDeoMAo00qzMCDeoMAo06qzMINOoMAg3qDAJN+zpLMwIN6gwCjTqrMwg06gwCDeoMAo06g0DD4jSrMwg06gwCjTqrMwg06gwCDeoMAo06g0DD+jpLMwg06gwCjTqrMwg06gwCDeoMAo06AwLN4jSrMwg06gwCjTqrMwg06gwCDeoMAk33OkszCDTqDAKNOqszCDTqDAIN6gwCjToDAs3iNKszCDTqDAKNOqszCDTqDAg06gwCjToDAs36OkszCDTqDAKNOqszCDTqDAg06gwCjToDAs3iNKszCDTqDAi0OqszCDTqDAg06gwCTfc6SzMINOoMCLQ6qzMINOoMCDTqDAKNOgMCjToDAi3N6gwCjToDAq3O6gwCjToDAo06g0DTvs7SDAKNOgMCrc7qDAKNOgMCjToDAq3OgECzOM3qDAKNOgMCrc7qDAKNOgMCjToDAq3OgECzvs7SDAKNOgMCrc7qDAJtBOoMCDTqDAi0OgMCzeI0qzMINOoMCLQ6qzMg0OoMCDTqDAh09zpLMwg06gwItDqrMyDQ6gwINOoMCLQ6AwKNOgMCLc3qDAi0OgMCrc7qDAi0OgMCjToDAt2+ztIMCLQ6AwKtzuoMCLQ6AwKNOgMCrc4AAr04zeoMCLQ6AwKtzuoMCLQ6AwKNOgMCrc4AAr2+ztIMCLQ6AwKtzuoMCLQ6Awi0OgMCrc4AAr04zeoMCLQ6AwKtzuoMCLQ6Awi0OgMC3b3O0gwItDoDAq3O6gwItDoDCLQ6AwKtzgACvTjN6gwItDoDTA20OgMCrc4AAq3OgECrM4BA71lnaQYEWp0BpgZanQGBVmcAgVZnQKDVGUCgd0uzOgMCrc4AUwOtzoBAqzOAQKszQN1AO7ABCLQ6Awi0OgPUDbQ6AwKtzgACrc4AdQOtzoBAl0+zOgMCrc4AUwOtzgCJgVZngMRAqzNAYqAd2ABIDLQ6AyQGWp0BEgOtzgCJgVZngMRAqzNAXKAdpwNIDLQ6AyQGWp0BEgOtzgCJgVZngMRAqzNAYqAdpwNIDLQ6AyQGWp0BEgOtzgCJgVZngMRAqzPAmW4T/hDqDJAYaHUGeNJVbnG8/P6GOgMsDfQLG63OAEfsdotDnQGOO0gzQKbV56DVGSDICd+xAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAqOcXcO/DOJCe2z8AAAAASUVORK5CYII=",
"mime_type": "image/png",
"encoding": "base64"
}
]
}
}
}
-
Polygons: Another option is to decompose the entire semantic segmentation mask into different classes or object instances. This approach has the benefit of identifying individual objects directly within the JSON file. Thus, a user application can directly read specific objects, without the need to load the PNG image and find the object of interest. The counterpart is an increased JSON size. Polygons (2D) can be expressed directly as lists of x,y-coordinates, using
MODE_POLY2D_ABSOLUTE. However, this may create very large and redundant information. Lossless compression mechanisms can be applied to convert the, possibly long, list of x,y-coordinates into smaller strings:
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"objects": {
"0": {
"name": "car1",
"type": "#Car",
"object_data": {
"poly2d": [
{
"name": "poly1",
"val": ["5","5","1","mBIIOIII"],
"mode": "MODE_POLY2D_SRF6DCC",
"closed": false
}, {
"name": "poly2",
"val": [5,5,10,5,11,6,11,8,9,10,5,10,3,8,3,6,4,5],
"mode": "MODE_POLY2D_ABSOLUTE",
"closed": false
}
]
}
}
}
The example shows the following:
-
RLE or Chain Code algorithms can losslessly compress a sequence of x,y-coordinates. The poly2d.py script is used for polyline
poly1, and specified using modeMODE_POLY2D_SRF6DCC.Polylinepoly2is encoded with no compression, and thus the specified mode isMODE_POLY2D_ABSOLUTE. -
Using polygons implies that labels are created at object-level, rather than image-level. This might be useful, for example, for searching applications that locate all objects of type
car.
| Using PNG masks, either as separate files or embedded inside the JSON file, is the preferred way to store labels for machine-learning applications. They do not search inside the masks, but rather move them directly into training pipelines. |
7.10.3. Poly3d
A poly3d is an object_data that represents a polygon in 3D space.
It is defined as a list of 3D points.
The array is a concatenation of x,y,z-values, corresponding to the x,y,z-coordinate of each point with respect to the defined coordinate system.
Therefore, the array shall always have a number of values multiple of 3.
Class
poly3d
A 3D polyline defined as a sequence of 3D points.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
closed |
boolean |
true |
A boolean that defines whether the polyline is closed or not. In case it is closed, it is assumed that the last point of the sequence is connected with the first one. |
|
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
array |
true |
List of numerical values of the polyline, according to its mode. |
JSON example
1
2
3
4
5
6
"poly3D" : [{
"closed" : false,
"coordinate_system" : "vehicle_iso8855",
"name" : "lane_marking",
"val" : [557.02, 29.69, -1.63, 562.51, 29.97, -1.59, 568.00, 30.36, -1.58, 571.98, 30.76, -1.57]
}]
The example shows a poly3D object_data specified to have four points, and thus 4 x 3 = 12 values.
7.10.4. Mesh
mesh is a special type of object_data, which describes a complex structure with point-line-area hierarchies.
It is intended to represent 3D meshes, where points, lines, and areas compose the mesh by defining their interrelations.
The elements point, line, and area may have their own properties, just like any other object_data.
Class
mesh
A mesh encodes a point-line-area structure. It is intended to represent flat 3D meshes, such as several connected parking lots, where points, lines and areas composing the mesh are interrelated and can have their own properties.
Additional properties: |
true |
Type: |
object |
| Name | Type | Additional properties | Reference | Description |
|---|---|---|---|---|
object |
false |
#/definitions/area_reference |
This is the JSON object for the areas defined for this mesh. Area keys are strings containing numerical UIDs. |
|
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
object |
false |
#/definitions/line_reference |
This is the JSON object for the 3D lines defined for this mesh. Line reference keys are strings containing numerical UIDs. |
|
name |
string |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
||
object |
false |
#/definitions/point3d |
This is the JSON object for the 3D points defined for this mesh. Point3d keys are strings containing numerical UIDs. |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
"mesh" : [{
"name" : "parkslot1",
"point3d" : {
"0" : {
"name" : "Vertex0",
"val" : [25, 25, 0],
},
"1" : {
"name" : "Vertex1",
"val" : [26, 25, 0],
},
"2" : {
"name" : "Vertex2",
"val" : [26, 26, 0],
},
"3" : {
"name" : "Vertex3",
"val" : [25, 26, 0],
},
"4" : {
"name" : "Vertex4",
"val" : [27, 25, 0],
},
"5" : {
"name" : "Vertex5",
"val" : [27, 26, 0],
}
},
"line_reference" : {
"0" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [0, 1],
},
"1" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [1, 2],
},
"2" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [2, 3],
},
"3" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [3, 0],
},
"4" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [1, 4],
},
"5" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [4, 5],
},
"6" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [5, 2],
}
},
"area_reference" : {
"0" : {
"name" : "Slot",
"reference_type" : "line_reference",
"val" : [0, 1, 2, 3],
},
"1" : {
"name" : "Slot",
"reference_type" : "line_reference",
"val" : [4, 5, 6, 1],
}
}
}]
The example shows an ideal object_data to describe complex parking areas, where parking lots can share lines and points.
Properties of areas may define whether the parking lot is empty or used.
Mesh contains a dictionary of point3d.
Their keys may be used to specify lines as a line_reference.
This line_reference is also stored as a dictionary, so their keys may be used to specify areas as area_reference.
The elements point3d, line_reference, and area_reference are object_data.
They may have attributes of non-geometric type, that is, boolean, text, num and vec.
This gives them full flexibility to describe complex meshes.
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
"6" : {
"name" : "Edge",
"reference_type" : "point3d",
"val" : [5, 2],
"attributes" : {
"text" : [{
"name" : "line_type",
"val" : "dashed"
}, {
"name" : "line_color",
"val" : "yellow"
}
],
}
}
The example shows a line_reference with attributes.
A line_reference shall have only two reference points, as a line is defined by two points.
An area_reference may have as many line references as desired as it may represent a complex polyline.
|
7.10.5. Mat and binary
Matrices and binary data are a special form of data and may be expressed using types mat and bin object_data.
-
Matrices are defined by the number of rows, columns, and channels. The numerical values are stored as a list of numbers.
-
Binary data may be defined by an encoding format and data type.
mat is useful to define list of points, such as a 3xN array of N 3D points in homogeneous coordinates, which may be points from a point cloud file.
|
Class
mat
A matrix.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
channels |
number |
true |
Number of channels of the matrix. |
|
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
data_type |
string |
true |
This is a string that declares the type of the numerical values of the matrix, for example, "float". |
|
height |
number |
true |
Height of the matrix. Expressed in number of rows. |
|
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
array |
true |
Flattened list of values of the matrix. |
|
width |
number |
true |
Width of the matrix. Expressed in number of columns. |
binary
A binary payload.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
data_type |
string |
true |
This is a string that declares the type of the values of the binary object. |
|
encoding |
string |
true |
This is a string that declares the encoding type of the bytes for this binary payload, for example, "base64". |
|
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
string |
true |
A string with the encoded bytes of this binary payload. |
7.10.6. Point2d and Point3d
Point2d and Point3d are basic structures to define individual points in 2D and 3D space.
They are object_data.
point2d and point3d are defined by their value, as a list of 2 and 3 floating point numbers.
In addition, point2d and point3d have an id attribute as a numerical identifier.
This may be used to integrate them into larger structures, for example, a mesh.
Class
point2d
A 2D point.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
id |
integer |
This is an integer identifier of the point in the context of a set of points. |
||
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
array |
true |
List of two coordinates to define the point, for example, x, y. |
point3d
A 3D point.
Additional properties: |
true |
Type: |
object |
| Name | Type | Required | Reference | Description |
|---|---|---|---|---|
#/definitions/attributes |
Attributes is the alias of element data that can be nested inside geometric object data. For example, a certain bounding box can have attributes related to its score, visibility, etc. These values can be nested inside the bounding box as attributes. |
|||
coordinate_system |
string |
Name of the coordinate system in respect of which this object data is expressed. |
||
id |
integer |
This is an integer identifier of the point in the context of a set of points. |
||
name |
string |
true |
This is a string encoding the name of this object data. It is used as index inside the corresponding object data pointers. |
|
val |
array |
true |
List of three coordinates to define the point, for example, x, y, z. |
7.11. Resources
The resources item shall contain pointers to external resources, such as files or databases, which may contain additional information about elements labeled in the ASAM OpenLABEL data.
Inside each resource, a unique identifier of the element shall be used to create the link.
An example is a lane marking labeling task.
If an existing high-definition map exists in the form of an ASAM OpenDRIVE file, then road or lane elements labeled in ASAM OpenLABEL may exist in the map.
Then, a link to the matched road or lane can be created using a resource_uid and a id at the resource.
Class
resources
This is the JSON object of OpenLABEL resources. Resource keys are strings containing numerical UIDs or 32 bytes UUIDs. Resource values are strings that describe an external resource, for example, file name, URLs, that may be used to link data of the OpenLABEL annotation content with external existing content.
Additional properties: |
false |
Type: |
object |
JSON example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"openlabel" : {
"metadata" : {
"schema_version" : "1.0.0"
},
"resources" : {
"0" : "../resources/xodr/multi_intersections.xodr"
},
"objects" : {
"0" : {
"name" : "road1",
"type" : "road",
"resource_uid" : {
"0" : "217"
}
},
"1" : {
"name" : "lane1",
"type" : "lane",
"resource_uid" : {
"0" : "3"
}
}
}
}
}
The example shows that lane1 is labeled as an object of type lane.
lane1 exists in the resource 0 with resource_uid 3.
That means that the id of the lane inside the resource is 3.
7.12. Use cases
The following section provides practical use cases for ASAM OpenLABEL.
7.12.1. 2D bounding boxes
This use case shows object labeling with 2D bounding boxes in images.
Single image and sequences of images are presented separately to show the differences between static and dynamic labeling, for example, with a persistent ID for tracked objects.
Single image
The single image approach aims at adding bounding boxes to define the position and size of objects in a single image. Variants of this labeling task may include adding other properties of the object or attributes to the bounding boxes, for example, confidence values.
Figure 51 shows an exemplary traffic situation.