Skip to content
Snippets Groups Projects
Unverified Commit 3d823817 authored by Nikolaos Passalis's avatar Nikolaos Passalis Committed by GitHub
Browse files

Documentation for performance evaluation (#239)


* Pose estimation speed documentation

* Added pose estimation results

* Formatting fix

* Update lightweight-open-pose.md

* Apply suggestions from code review

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update lightweight-open-pose.md

* Add performance evaluation for EfficientPS

* Added evalutation metrics

* evaluation documentation

* evaluation metrics added
Skeleton-based HAR, Landmark-based FER

* Add activity recognition results

* performance documentation for mxnet-based detectors

* Update human-model-generation.md

* Update human-model-generation.md

* Update human-model-generation.md

* Update human-model-generation.md

* mobileRL performance metrics

* Update semantic-segmentation.md

Add performance evaluation for BiseNet.

* Update semantic-segmentation.md

Add performance evaluation for BiseNet.

* Add 2d trackiong evaluation results

* Add 3d tracking evaluation results

* Add 3d object detection evaluation results

* docs(gem.md): add performance evaluation tables

* docs(eagerx.md): add performance evaluation tables

* docs(hyperparameter_tuner.md): add performance evaluation tables

* upload end-to-end planning docs

* Add performance evaluation metrics

* Update docs/reference/single-demonstration-grasping.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/single-demonstration-grasping.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Delete end-to-end-planning.md

* Update docs/reference/synthetic_facial_image_generator.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/synthetic_facial_image_generator.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/synthetic_facial_image_generator.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/human-model-generation.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/human-model-generation.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/human-model-generation.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/human-model-generation.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/activity-recognition.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/activity-recognition.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/activity-recognition.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/activity-recognition.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/face-detection-2d-retinaface.md

* Update docs/reference/single-demonstration-grasping.md

Co-authored-by: default avatarthomaspeyrucain <87322480+thomaspeyrucain@users.noreply.github.com>

* Update docs/reference/face-detection-2d-retinaface.md

* Update docs/reference/face-detection-2d-retinaface.md

* Update docs/reference/voxel-object-detection-3d.md

* Update docs/reference/voxel-object-detection-3d.md

* Update docs/reference/voxel-object-detection-3d.md

* Update docs/reference/face-recognition.md

* Update docs/reference/landmark-based-facial-expression-recognition.md

* Update docs/reference/landmark-based-facial-expression-recognition.md

* Update docs/reference/landmark-based-facial-expression-recognition.md

* Update docs/reference/object-tracking-2d-fair-mot.md

* Apply suggestions from code review

* Update docs/reference/mobile-manipulation.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/object-tracking-3d-ab3dmot.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/efficient-ps.md

* Update docs/reference/skeleton-based-action-recognition.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/skeleton-based-action-recognition.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

* Update docs/reference/skeleton-based-action-recognition.md

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>

Co-authored-by: default avatarad-daniel <44834743+ad-daniel@users.noreply.github.com>
Co-authored-by: default avatarNiclas Vödisch <voedisch@cs.uni-freiburg.de>
Co-authored-by: default avatarpavlos <ptosidis@gmail.com>
Co-authored-by: default avatarekakalet <63847549+ekakalet@users.noreply.github.com>
Co-authored-by: default avatarNegar <negar.heidari@eng.au.dk>
Co-authored-by: default avatarLukasHedegaard <lh@eng.au.dk>
Co-authored-by: default avatarVivi <vivinousi@gmail.com>
Co-authored-by: default avatarcharsyme <63857415+charsyme@users.noreply.github.com>
Co-authored-by: default avatarDaniel Honerkamp <daniel.honerkamp@gmail.com>
Co-authored-by: default avatarMaria Tzelepi <mtzelepi@users.noreply.github.com>
Co-authored-by: default avatarIllia Oleksiienko <io@ece.au.dk>
Co-authored-by: default avatarJelle Luijkx <j.d.luijkx@tudelft.nl>
Co-authored-by: default avatarhalil93ibrahim <halil@ece.au.dk>
Co-authored-by: default avatarAlexandre Angleraud <alexandre.angleraud@gmail.com>
Co-authored-by: default avatarLukas Hedegaard <lukasxhedegaard@gmail.com>
Co-authored-by: default avatarNiclas <49001036+vniclas@users.noreply.github.com>
Co-authored-by: default avatarthomaspeyrucain <87322480+thomaspeyrucain@users.noreply.github.com>
Co-authored-by: default avatarOlivier Michel <Olivier.Michel@cyberbotics.com>
parent 4a85d8c5
No related branches found
No related tags found
No related merge requests found
Showing
with 870 additions and 7 deletions
......@@ -398,6 +398,75 @@ Inherited from [X3DLearner](/src/opendr/perception/activity_recognition/x3d/x3d_
```
#### Performance Evaluation
TABLE-1: Input shapes, prediction accuracy on Kinetics 400, floating point operations (FLOPs), parameter count and maximum allocated memory of activity recognition learners at inference.
| Model | Input shape (TxS2) | Acc. (%) | FLOPs (G) | Params (M) | Mem. (MB) |
| ------- | ------------------ | -------- | --------- | ---------- | --------- |
| X3D-L | 16x3122 | 69.29 | 19.17 | 6.15 | 240.66 |
| X3D-M | 16x2242 | 67.24 | 4.97 | 4.97 | 126.29 |
| X3D-S | 13x1602 | 64.71 | 2.06 | 3.79 | 61.29 |
| X3D-XS | 4x1602 | 59.37 | 0.64 | 3.79 | 28.79 |
| CoX3D-L | 1x3122 | 71.61 | 1.54 | 6.15 | 184.37 |
| CoX3D-M | 1x2242 | 71.03 | 0.40 | 4.97 | 68.96 |
| CoX3D-S | 1x1602 | 67.33 | 0.21 | 3.79 | 41.99 |
TABLE-2: Speed (evaluations/second) of activity recognition learner inference on various computational devices.
| Model | CPU | TX2 | Xavier | RTX 2080 Ti |
| ------- | ----- | ---- | ------ | ----------- |
| X3D-L | 0.22 | 0.18 | 1.26 | 3.55 |
| X3D-M | 0.75 | 0.69 | 4.50 | 6.94 |
| X3D-S | 2.06 | 0.95 | 9.55 | 7.12 |
| X3D-XS | 6.51 | 1.14 | 12.23 | 7.99 |
| CoX3D-L | 2.00 | 0.30 | 4.69 | 4.62 |
| CoX3D-M | 6.65 | 1.12 | 9.76 | 10.12 |
| CoX3D-S | 11.60 | 1.16 | 9.36 | 9.84 |
TABLE-3: Throughput (evaluations/second) of activity recognition learner inference on various computational devices.
The largest fitting power of two was used as batch size for each device.
| Model | CPU | TX2 | Xavier | RTX 2080 Ti |
| ------- | ----- | ---- | ------ | ----------- |
| X3D-L | 0.22 | 0.21 | 1.73 | 3.55 |
| X3D-M | 0.75 | 1.10 | 6.20 | 11.22 |
| X3D-S | 2.06 | 2.47 | 7.83 | 29.51 |
| X3D-XS | 6.51 | 6.50 | 38.27 | 78.75 |
| CoX3D-L | 2.00 | 0.62 | 10.40 | 14.47 |
| CoX3D-M | 6.65 | 4.32 | 44.07 | 105.64 |
| CoX3D-S | 11.60 | 8.22 | 64.91 | 196.54 |
TABLE-4: Energy (Joules) of activity recognition learner inference on embedded devices.
| Model | TX2 | Xavier |
| ------- | ------ | ------ |
| X3D-L | 187.89 | 23.54 |
| X3D-M | 56.50 | 5.49 |
| X3D-S | 33.58 | 2.00 |
| X3D-XS | 26.15 | 1.45 |
| CoX3D-L | 117.34 | 5.27 |
| CoX3D-M | 24.53 | 1.74 |
| CoX3D-S | 22.79 | 2.07 |
TABLE-5: Human Activity Recognition platform compatibility evaluation.
| Platform | Test results |
| -------------------------------------------- | ------------ |
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Pass |
| x86 - Ubuntu 20.04 (CPU docker) | Pass |
| x86 - Ubuntu 20.04 (GPU docker) | Pass |
| NVIDIA Jetson TX2 | Pass\* |
| NVIDIA Jetson Xavier AGX | Pass\* |
*On NVIDIA Jetson devices, the Kinetics-400 dataset loader (dataset associated with available pretrained models) is not supported.
While import triggers an error in version 1.0 of the toolkit, a patch has been submitted, which avoids the import-error for the upcoming version.
Model inference works as expected.
#### References
<a name="x3d" href="https://arxiv.org/abs/2004.04730">[1]</a> X3D: Expanding Architectures for Efficient Video Recognition,
[arXiv](https://arxiv.org/abs/2004.04730).
......@@ -35,7 +35,7 @@ Documentation is available online: [https://eagerx.readthedocs.io](https://eager
Instead of using low-dimensional angular observations, the environment now produces pixel images of the pendulum.
In order to speed-up learning, we use a pre-trained classifier to convert these pixel images to estimated angular observations.
Then, the agent uses these estimated angular observations similarly as in 'demo_2_pid' to successfully swing-up the pendulum.
Example usage:
```bash
cd $OPENDR_HOME/projects/control/eagerx/demos
......@@ -48,4 +48,40 @@ Setting `--device cpu` performs training and inference on CPU.
Setting `--name example` sets the name of the environment.
Setting `--eps 200` sets the number of training episodes.
Setting `--eval-eps 10` sets the number of evaluation episodes.
Adding `--render` enables rendering of the environment.
\ No newline at end of file
Adding `--render` enables rendering of the environment.
### Performance Evaluation
In this subsection, we attempt to quantify the computational overhead that the communication protocol of EAGERx introduces.
Ultimately, an EAGERx environment consists of nodes (e.g. sensors, actuators, classifiers, controllers, etc…) that communicate with each other via the EAGERx’s reactive communication protocol to ensure I/O synchronisation.
We create an experimental setup where we interconnect a set of the same nodes in series and let every node run at the same simulated rate (1 Hz).
The nodes perform no significant computation in their callback, and use small messages (ROS message: std.msgs.msg/UInt64) in terms of size (Mb).
Hence, the rate at which this environment can be simulated is mostly determined by the computational overhead of the protocol and the hardware used during the experiment (8 core - Intel Core i9-10980HK Processor).
We will record the real-time rate (Hz) at which we are able to simulate the environment for a varying number of interconnected nodes, synchronisation modes (sync vs. async), and concurrency mode (multi-threaded vs multi-process).
In async mode, every node will produce outputs at the set simulated rate (1 Hz) times a provided real-time factor (i.e. real-time rate = real-time factor * simulated rate).
This real-time factor is set experimentally at the highest value, while each node can still keep its simulated rate.
In sync mode, nodes are connected reactively, meaning that every node will wait for an input from the preceding node, before producing an output message to the node that succeeds it.
This means that we do not need to set a real-time factor.
Instead, nodes will run as fast as possible, while adhering to this simple rule.
The recorded rate provides an indication of the computational overhead that the communication protocol of EAGERx introduces.
The results are presented in table below.
| | Multi-threaded | | Multi-process | |
|------------|:--------------:|:----------:|:-------------:|:----------:|
| # of nodes | Sync (Hz) | Async (Hz) | Sync (Hz) | Async (Hz) |
| 4 | 800 | 458 | 700 | 1800 |
| 5 | 668 | 390 | 596 | 1772 |
| 6 | 576 | 341 | 501 | 1770 |
| 7 | 535 | 307 | 450 | 1691 |
| 12 | 354 | 200 | 279 | 1290 |
The platform compatibility evaluation is also reported below:
| Platform | Test results |
|----------------------------------------------|:------------:|
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Pass |
| x86 - Ubuntu 20.04 (CPU docker) | Pass |
| x86 - Ubuntu 20.04 (GPU docker) | Pass |
| NVIDIA Jetson TX2 | Pass |
......@@ -17,7 +17,7 @@ Bases: `engine.learners.Learner`
The *EfficientPsLearner* class is a wrapper around the EfficientPS implementation of the original author's repository adding the OpenDR interface.
The [EfficientPsLearner](/src/opendr/perception/panoptic_segmentation/efficient_ps/efficient_ps_learner.py) class the following public methods:
The [EfficientPsLearner](/src/opendr/perception/panoptic_segmentation/efficient_ps/efficient_ps_learner.py) class has the following public methods:
#### `EfficientPsLearner` constructor
```python
EfficientPsLearner(lr, iters, batch_size, optimizer, lr_schedule, momentum, weight_decay, optimizer_config, checkpoint_after_iter, temp_path, device, num_workers, seed, config_file)
......@@ -174,3 +174,41 @@ Parameters:
The size of the figure in inches. Only used for the detailed version. Otherwise, the size of the input data is used.
- **detailed**: *bool, default=False*\
If True, the generated figure will be a compilation of the input color image, the semantic segmentation map, a contours plot showing the individual objects, and a combined panoptic segmentation overlay on the color image. Otherwise, only the latter will be shown.
#### Performance Evaluation
The speed (fps) is evaluated for the Cityscapes dataset (2048x1024 pixels):
| Dataset | GeForce GTX 980 | GeForce GTX TITAN X | TITAN RTX | Xavier AGX |
|------------|-----------------|---------------------|-----------|------------|
| Cityscapes | 1.3 | 1.1 | 3.2 | 1.7 |
The memory and energy usage is evaluated for different datasets.
An NVIDIA Jetson Xavier AGX was used as the reference platform for energy measurements.
Note that the exact number for the memory depends on the image resolution and the number of instances in an image.
The reported memory is the max number seen during evaluation on the respective validation set.
The energy is measured during the evaluation.
| Dataset | Memory (MB) | Energy (Joules) - Total per inference AGX |
|------------------------|-------------|-------------------------------------------|
| Cityscapes (2048x1024) | 11812 | 39.3 |
| Kitti (1280x384) | 3328 | 15.1 |
The performance is evaluated using three different metrics, namely Panoptic Quality (PQ), Segmentation Quality (SQ), and Recognition Quality (RQ).
| Dataset | PQ | SQ | RQ |
|------------|------|------|------|
| Cityscapes | 64.4 | 81.8 | 77.7 |
| Kitti | 42.6 | 77.2 | 53.1 |
EfficientPS is compatible with the following platforms:
| Platform | Compatibility |
|----------------------------------------------|---------------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | ❌ |
| x86 - Ubuntu 20.04 (bare installation - GPU) | ✔️ |
| x86 - Ubuntu 20.04 (pip installation) | ❌ |
| x86 - Ubuntu 20.04 (CPU docker) | ❌ |
| x86 - Ubuntu 20.04 (GPU docker) | ✔️ |
| NVIDIA Jetson Xavier AGX | ✔️ |
......@@ -209,6 +209,41 @@ Parameters:
img = draw_bounding_boxes(img.opencv(), bounding_boxes, learner.classes, show=True)
```
#### Performance Evaluation
In terms of speed, the performance of RetinaFace is summarized in the table below (in FPS).
| Variant | RTX 2070 | TX2 | AGX |
|---------|----------|-----|-----|
| RetinaFace | 47 | 3 | 8 |
| RetinaFace-MobileNet | 114 | 13 | 18 |
Apart from the inference speed, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
The measurement was made on a Jetson TX2 module.
| Variant | Memory (MB) | Energy (Joules) - Total per inference |
|-------------------|---------|-------|
| RetinaFace | 4443 | 21.83 |
| RetinaFace-MobileNet | 4262 | 8.73 |
Finally, we measure the recall on the WIDER face validation subset at 87.83%.
Note that RetinaFace can make use of image pyramids and horizontal flipping to achieve even better recall at the cost of additional computations.
For the MobileNet version, recall drops to 77.81%.
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|-------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
| NVIDIA Jetson Xavier NX | :heavy_check_mark: |
#### References
<a name="retinaface-1" href="https://arxiv.org/abs/1905.00641">[1]</a> RetinaFace: Single-stage Dense Face Localisation in the Wild,
[arXiv](https://arxiv.org/abs/1905.00641).
\ No newline at end of file
......@@ -371,3 +371,45 @@ cap.release()
cv2.destroyAllWindows()
```
#### Performance Evaluation
The performance evaluation results of the *FaceRecognitionLearner* are reported in the Table below:
| Backbone | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) | Xavier AGX (FPS) |
|-------------------|--------|----------------|------------------|-----------------|------------------|
| MobileFaceNet | 137.83 | 224.26 | 29.84 | 28.85 | 37.17 |
| IR-50 | 25.40 | 176.25 | 17.99 | 17.18 | 19.58 |
Apart from the inference speed, which is reported in FPS, we also report the memory usage, as well as energy consumption on a reference platform in the Table below:
| Backbone | Memory (MB) | Energy (Joules) - Total per inference |
|---------------|------------|----------------------------------------|
| MobileFaceNet | 949.75 | 0.41 |
| IR-50 | 1315.75 | 1.15 |
NVIDIA Jetson AGX was used as the reference platform for measuring energy requirements for these experiments.
We calculated the average metrics of 100 runs.
The accuracy on Labeled Faces in the Wild (LFW), Celebrities in Frontal-Profile in the Wild, both frontal to frontal (CFP-FF) and frontal to profile (CFP-FP) setups, AgeDB-30 and VGGFace2 datasets is also reported in the Table below:
| Backbone | LFW | CFP-FF | CFP-FP | AgeDB-30 | VGGFace2 |
|-----------------|--------|--------|--------|----------|----------|
| MobileFaceNet | 99.46% | 99.27% | 93.62% | 95.49% | 93.24% |
| IR-50 | 99.84% | 99.67% | 98.11% | 97.73% | 95.3% |
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|-------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
| NVIDIA Jetson Xavier NX | :heavy_check_mark: |
......@@ -269,6 +269,47 @@ bounding_box_list, w_sensor1, _ = learner.infer(m1_img, m2_img)
cv2.imshow('Detections', draw(m1_img.opencv(), bounding_box_list, w_sensor1))
cv2.waitKey(0)
```
#### Performance Evaluation
The performance evaluation results of the *GemLearner* are reported in the Table below.
These tests have been performed on several platforms, i.e. a laptop CPU, laptop GPU and on the Jetson TX2.
Also, the results for two different backbones are shown, i.e. ResNet-50 and MobileNet-v2.
Inference was performed on a 1280x720 RGB image and a 1280x720 infrared image.
| Method | CPU i7-10870H (FPS) | RTX 3080 Laptop (FPS) | Jetson TX2 (FPS) |
|--------------|:--------------------:|:---------------------:|:----------------:|
| ResNet-50 | 0.87 | 11.83 | 0.83 |
| MobileNet-v2 | 2.44 | 18.70 | 2.19 |
Apart from the inference speed, which is reported in FPS, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
Again, inference was performed on a 1280x720 RGB image and a 1280x720 infrared image.
| Method | Memory (MB) RTX 3080 Laptop | Energy (Joules) - Total per inference Jetson TX2 |
|--------------|:---------------------------:|:-------------------------------------------------:|
| ResNet-50 | 1853.4 | 28.2 |
| MobileNet-v2 | 931.2 | 8.4 |
Below, the performance of GEM is in terms of accuracy is presented.
These results were obtained on the L515-Indoor dataset, which can also be downloaded using the GemLearner class from OpenDR.
| Method | Mean Average Precision | Energy (Joules) - Total per inference Jetson TX2 |
|--------------|:----------------------:|:-------------------------------------------------:|
| Resnet-50 | 0.982 | 28.2 |
| MobileNet-v2 | 0.833 | 8.4 |
The platform compatibility evaluation is also reported below:
| Platform | Test results |
|----------------------------------------------|:------------:|
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Pass |
| x86 - Ubuntu 20.04 (CPU docker) | Pass |
| x86 - Ubuntu 20.04 (GPU docker) | Pass |
| NVIDIA Jetson TX2 | Pass |
#### References
<a name="detr-paper" href="https://ai.facebook.com/research/publications/end-to-end-object-detection-with-transformers">[1]</a> Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S. (2020) End-to-End Object Detection with Transformers. In: Vedaldi A., Bischof H., Brox T., Frahm JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12346. Springer, Cham. [doi](https://doi.org/10.1007/978-3-030-58452-8_13),
[arXiv](https://arxiv.org/abs/2005.12872).
......
......@@ -103,6 +103,28 @@ A demo in the form of a Jupyter Notebook is available
model_3D = model_generator.infer(imgs_rgb=[rgb_img], imgs_msk=[msk_img], extract_pose=False)
```
#### Performance Evaluation
TABLE-1: OpenDR 3D human model generation speed evaluation.
| Method | CPU i7-9700K (ms) | RTX 2070 (ms) |
| ----------------------------------------------- | ----------------- | ------------- |
| Human Model Generation only | 488.2 | 212.3 |
| Human Model Generation + 3D pose approximation | 679.8 | 531.6 |
TABLE-2: 3D Human Model Generation platform compatibility evaluation.
| Platform | Test results |
| -------------------------------------------- | ------------ |
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Pass |
| x86 - Ubuntu 20.04 (CPU docker) | Pass* |
| x86 - Ubuntu 20.04 (GPU docker) | Pass* |
| NVIDIA Jetson TX2 | Not tested |
| NVIDIA Jetson Xavier NX | Not tested |
*On docker installation, the skeleton approximation of the 3D human models is not available.
#### References
<a name="pifu-paper" href="https://shunsukesaito.github.io/PIFu/">[1]</a>
......
......@@ -191,7 +191,7 @@ Also, a tutorial in the form of a Jupyter Notebook is available
# Initialize learner with the tuned hyperparameters
learner = DetrLearner(**best_parameters)
```
* **Hyperparameter tuning example with the [DetrLearner](detr.md) with a custom study**
This example shows how to tune a selection of the hyperparameters of the *DetrLearner* and
......@@ -201,7 +201,7 @@ Also, a tutorial in the form of a Jupyter Notebook is available
from opendr.utils.hyperparameter_tuner import HyperparameterTuner
from opendr.perception.object_detection_2d import DetrLearner
from opendr.engine.datasets import ExternalDataset
import optuna
# Create a coco dataset, containing training and evaluation data
......@@ -225,7 +225,7 @@ Also, a tutorial in the form of a Jupyter Notebook is available
# Specify timeout such that optimization is performed for 4 hours
timeout = 14400
# Create custom Study
sampler = optuna.samplers.CmaEsSampler()
study = optuna.create_study(study_name='detr_cma', sampler=sampler)
......@@ -246,6 +246,51 @@ Also, a tutorial in the form of a Jupyter Notebook is available
learner = DetrLearner(**best_parameters)
```
#### Performance Evaluation
In this section, we will present the performance evaluation of this tool.
This tool is not evaluated quantitatively, since hyperparameter tuning is very problem-specific.
Also, the tool provides an interface with the existing Optuna framework, therefore evaluation of the performance of the hyperparameter tuning tool will be nothing more than evaluating the performance of Optuna.
Quantitative results for Optuna on the Street View House Numbers (SVHN) dataset can be found in [[1]](#optuna-paper).
Rather than providing quantitative results, we will here present an evaluation of the tool in terms of support, features and compatibility.
Below, the supported learner base classes and supported hyperparameter types are presented.
Here it is shown that the hyperparameter tuning tool supports all learners that are present in the OpenDR toolkit.
Also, the hyperparameter types that are supported by Optuna are supported by the tool.
More information on these types can be found [here](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html).
| Supported Types |
|---------------------------------------------------------|
| All OpenDR Learners (Learner, LearnerRL, LearnerActive) |
| Categorical Hyperparameters |
| Discrete Hyperparameters |
| Floating Point Hyperparameters |
| Integer Hyperparameters |
| Loguniform/Uniform Continous Hyperparameters |
Below, the sampling algorithms that are available in the tool are shown.
These include both single and multi-objective algorithms.
| Available Sampling Algorithms |
|-------------------------------------------------------------------------------------------------------------|
| Grid Sampling |
| Independent Sampling |
| Tree-structured Parzen Estimator (TPE) Sampling |
| Covariance Matrix Adaptation - Evolution Strategy (CMA-ES) Sampling |
| Partially Fixed Sampling |
| Nondominated Sorting Genetic Algorithm II (NSGA-II) Sampling |
| Multiobjective Tree-Structured Parzen Estimator for Computationally Expensive Optimization (MTSPE) Sampling |
The platform compatibility evaluation is also reported below:
| Platform | Test results |
|----------------------------------------------|:------------:|
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Pass |
| x86 - Ubuntu 20.04 (CPU docker) | Pass |
| x86 - Ubuntu 20.04 (GPU docker) | Pass |
| NVIDIA Jetson TX2 | Pass |
#### References
<a name="optuna-paper" href="https://dl.acm.org/doi/10.1145/3292500.3330701">[1]</a>
Optuna: A Next-generation Hyperparameter Optimization Framework.,
......
......@@ -455,6 +455,57 @@ Parameters:
```
#### Performance Evaluation
The tests were conducted on the following computational devices:
- Intel(R) Xeon(R) Gold 6230R CPU on server
- Nvidia Jetson TX2
- Nvidia Jetson Xavier AGX
- Nvidia RTX 2080 Ti GPU on server with Intel Xeon Gold processors
Inference time is measured as the time taken to transfer the input to the model (e.g., from CPU to GPU), run inference using the algorithm, and return results to CPU.
The PST-BLN model is implemented in *ProgressiveSpatioTemporalBLNLearner*.
Note that the model receives each input sample as a sequence of 150 graphs built by facial landmarks as nodes and the connections between them as edges.
The facial landmarks are extracted by Dlib library as a preprocessing step, and the landmark extraction process is not involved in this benchmarking.
The model is evaluated on the AFEW dataset which contains Acted Facial Expression in the Wild video clips captured from movies.
We report speed (single sample per inference), as the mean of 100 runs, of the optimized ST-BLN model found by PST-BLN algorithm.
The noted memory is the maximum allocated memory on GPU during inference.
Prediction accuracy on AFEW dataset, parameter count and maximum allocated memory of learner's inference are reported in the following table:
| Method | Acc. (%) | Params (M) | Mem. (MB) |
|---------|----------|------------|-----------|
| PST-BLN | 33.33 | 0.01 | 76.41 |
The speed (evaluations/second) of the learner's inference on various computational devices is:
| Method | CPU | Jetson TX2 | Jetson Xavier | RTX 2080 Ti |
|---------|-------|------------|----------------|-------------|
| PST-BLN | 8.05 | 3.81 | 14.27 | 125.17 |
Energy (Joules) of both learners’ inference on embedded devices is shown in the following:
| Method | Jetson TX2 | Jetson Xavier |
|---------|------------|---------------|
| PST-BLN | 5.33 | 1.12 |
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|--------------------------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
## References
<a id="1">[1]</a>
......
......@@ -298,6 +298,68 @@ Parameters:
pose_estimator.save('./parent_dir/optimized_model')
```
#### Performance Evaluation
The performance evaluation results of the *LightweightOpenPoseLearner* are reported in the Table below:
| Method | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) | Xavier AGX (FPS) |
|-------------------|-------|-------|-----|-----|-------|
| OpenDR - Baseline | 13.6 | 50.1 | 5.6 | 7.4 | 11.2 |
| OpenDR - Half | 13.5 | 47.5 | 5.4 | 9.5 | 12.9 |
| OpenDR - Stride | 31.0 | 72.1 | 12.2| 13.8| 15.5 |
| OpenDR - Stages | 19.3 | 59.8 | 7.2 | 10.2| 15.7 |
| OpenDR - H+S | 30.9 | 68.4 | 12.2| 12.9| 18.4 |
| OpenDR - Full | 47.4 | 98.2 | 17.1| 18.8| 25.9 |
We have evaluated the effect of using different inference settings, namely:
- *OpenDR - Baseline*, which refers to directly using the Lightweight OpenPose method adapted to OpenDR with no additional optimizations,
- *OpenDR - Half*, which refers to enabling inference in half (FP) precision,
- *OpenDR - Stride*, which refers to increasing stride by two in the input layer of the model,
- *OpenDR - Stages*, which refers to removing the refinement stages,
- *OpenDR - H+S*, which uses both half precision and increased stride, and
- *OpenDR - Full*, which refers to combining all three available optimization.
Apart from the inference speed, which is reported in FPS, we also report the memory usage, as well as energy consumption on a reference platform in the Table below:
| Method | Memory (MB) | Energy (Joules) - Total per inference |
|-------------------|---------|-------|
| OpenDR - Baseline | 1187.75 | 1.65 |
| OpenDR - Half | 1085.75 | 1.17 |
| OpenDR - Stride | 1037.5 | 1.40 |
| OpenDR - Stages | 1119.75 | 1.15 |
| OpenDR - H+S | 1013.75 | 0.70 |
| OpenDR - Full | 999.75 | 0.52 |
NVIDIA Jetson AGX was used as the reference platform for measuring energy requirements for these experiments.
We calculated the average metrics of 100 runs, while an image with resolution of 640×425 pixels was used as input to the models.
The average precision and average recall on the COCO evaluation split is also reported in the Table below:
| Method | Average Precision (IoU=0.50) | Average Recall (IoU=0.50) |
|-------------------|-------|-------|
| OpenDR - Baseline | 0.557 | 0.598 |
| OpenDR - Half | 0.558 | 0.594 |
| OpenDR - Stride | 0.239 | 0.283 |
| OpenDR - Stages | 0.481 | 0.527 |
| OpenDR - H+S | 0.240 | 0.281 |
| OpenDR - Full | 0.203 | 0.245 |
For measuring the precision and recall we used the standard approach proposed for COCO, using an Intersection of Union (IoU) metric at 0.5.
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|-------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
| NVIDIA Jetson Xavier NX | :heavy_check_mark: |
#### Notes
For the metrics of the algorithm the COCO dataset evaluation scores are used as explained [here](
......
......@@ -257,6 +257,40 @@ Then source the catkin workspace and run the launch file as described in the `RO
The trained agent and environment can also be directly executed in the real world or the gazebo simulator. For this first start the appropriate ros nodes for your robot. Then pass `world_type='world'` for real world execution or `world_type='gazebo'` for gazebo to the `evaluate_on_task()` function.
#### Performance Evaluation
Note that test time inference can be directly performed on a standard CPU.
As this achieves very high control frequencies, we do not expect any benefits through the use of accelerators (GPUs).
TABLE-1: Control frequency in Hertz.
| Model | AMD Ryzen 9 5900X (Hz) |
| -------- | ---------------------- |
| MobileRL | 2200 |
TABLE-2: Success rates in percent.
| Model | GoalReaching | Pick&Place | Door Opening | Drawer Opening |
| -------- | ------------ | ---------- | ------------ | -------------- |
| PR2 | 90.2% | 97.0% | 94.2% | 95.4% |
| Tiago | 71.6% | 91.4% | 95.3% | 94.9% |
| HSR | 75.2% | 93.4% | 91.2% | 90.6% |
TABLE-3: Platform compatibility evaluation.
| Platform | Test results |
| -------------------------------------------- | ------------- |
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Not supported |
| x86 - Ubuntu 20.04 (CPU docker) | Pass |
| x86 - Ubuntu 20.04 (GPU docker) | Pass |
| NVIDIA Jetson TX2 | Not tested |
| NVIDIA Jetson Xavier AGX | Not tested |
#### Notes
##### HSR
......
......@@ -225,6 +225,50 @@ Parameters:
boxes = centernet.infer(img)
draw_bounding_boxes(img.opencv(), boxes, class_names=centernet.classes, show=True)
```
#### Performance Evaluation
In terms of speed, the performance of CenterNet is summarized in the table below (in FPS).
| Method | RTX 2070 | TX2 | AGX |
|---------|----------|-----|-----|
| CenterNet | 88 | 19 | 14 |
Apart from the inference speed, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
The measurement was made on a Jetson TX2 module.
| Method | Memory (MB) | Energy (Joules) - Total per inference |
|-------------------|---------|-------|
| CenterNet | 4784 | 12.01 |
Finally, we measure the performance on the COCO dataset, using the corresponding metrics.
| Metric | CenterNet |
|---------|-----------|
| mAP | 7.5 |
| AP@0.5 | 24.5 |
| AP@0.75 | 1.0 |
| mAP (S) | 0.9 |
| mAP (M) | 5.4 |
| mAP (L) | 17.0 |
| AR | 14.9 |
| AR (S) | 2.9 |
| AR (M) | 12.9 |
| AR (L) | 30.3 |
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|-------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
| NVIDIA Jetson Xavier NX | :heavy_check_mark: |
#### References
<a name="centernet-1" href="https://arxiv.org/abs/1904.08189">[1]</a> CenterNet: Keypoint Triplets for Object Detection,
......
......@@ -210,6 +210,51 @@ Parameters:
draw_bounding_boxes(img.opencv(), boxes, class_names=ssd.classes, show=True)
```
#### Performance Evaluation
In terms of speed, the performance of SSD is summarized in the table below (in FPS).
| Method | RTX 2070 | TX2 | AGX |
|---------|----------|-----|-----|
| SSD | 85 | 16 | 27 |
Apart from the inference speed, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
The measurement was made on a Jetson TX2 module.
| Method | Memory (MB) | Energy (Joules) - Total per inference |
|-------------------|---------|-------|
| SSD | 4583 | 2.47 |
Finally, we measure the performance on the COCO dataset, using the corresponding metrics.
| Metric | SSD |
|---------|-----------|
| mAP | 27.4 |
| AP@0.5 | 45.6 |
| AP@0.75 | 28.9 |
| mAP (S) | 2.8 |
| mAP (M) | 25.8 |
| mAP (L) | 42.9 |
| AR | 36.3 |
| AR (S) | 4.5 |
| AR (M) | 37.5 |
| AR (L) | 53.7 |
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|-------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
| NVIDIA Jetson Xavier NX | :heavy_check_mark: |
#### References
<a name="ssd-1" href="https://arxiv.org/abs/1512.02325">[1]</a> SSD: Single Shot MultiBox Detector,
[arXiv](https://arxiv.org/abs/1512.02325).
......@@ -225,6 +225,51 @@ Parameters:
draw_bounding_boxes(img.opencv(), boxes, class_names=yolo.classes, show=True)
```
#### Performance Evaluation
In terms of speed, the performance of YOLOv3 is summarized in the table below (in FPS).
| Method | RTX 2070 | TX2 | AGX |
|---------|----------|-----|-----|
| YOLOv3 | 50 | 9 | 16 |
Apart from the inference speed, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
The measurement was made on a Jetson TX2 module.
| Method | Memory (MB) | Energy (Joules) - Total per inference |
|-------------------|---------|-------|
| YOLOv3 | 5219 | 11.88 |
Finally, we measure the performance on the COCO dataset, using the corresponding metrics.
| Metric | YOLOv3 |
|---------|-----------|
| mAP | 36.0 |
| AP@0.5 | 57.2 |
| AP@0.75 | 38.7 |
| mAP (S) | 17.3 |
| mAP (M) | 38.8 |
| mAP (L) | 52.3 |
| AR | 44.5 |
| AR (S) | 23.6 |
| AR (M) | 47.2 |
| AR (L) | 62.5 |
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|-------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
| NVIDIA Jetson Xavier NX | :heavy_check_mark: |
#### References
<a name="yolo-1" href="https://arxiv.org/abs/1804.02767">[1]</a> YOLOv3: An Incremental Improvement,
[arXiv](https://arxiv.org/abs/1804.02767).
......@@ -357,6 +357,36 @@ Parameters:
print(result)
```
#### Performance Evaluation
The tests were conducted on the following computational devices:
- Intel(R) Xeon(R) Gold 6230R CPU on server
- Nvidia Jetson TX2
- Nvidia Jetson Xavier AGX
- Nvidia RTX 2080 Ti GPU on server with Intel Xeon Gold processors
Inference time is measured as the time taken to transfer the input to the model (e.g., from CPU to GPU), run inference using the algorithm, and return results to CPU.
Inner FPS refers to the speed of the model when the data is ready.
We report FPS (single sample per inference) as the mean of 100 runs.
Full FPS Evaluation of DeepSORT and FairMOT on MOT20 dataset
| Model | TX2 (FPS) | Xavier (FPS) | RTX 2080 Ti (FPS) |
| -------- | --------- | ------------ | ----------------- |
| DeepSORT | 2.71 | 6.36 | 16.07 |
| FairMOT | 0.79 | 2.36 | 10.42 |
Inner FPS Evaluation (model only) of DeepSORT and FairMOT on MOT20 dataset.
| Model | TX2 (FPS) | Xavier (FPS) | RTX 2080 Ti (FPS) |
| -------- | --------- | ------------ | ----------------- |
| DeepSORT | 2.71 | 6.36 | 16.07 |
| FairMOT | 0.79 | 2.36 | 17.16 |
Energy (Joules) of DeepSORT and FairMOT on embedded devices.
| Model | TX2 (Joules) | Xavier (Joules) |
| -------- | ------------ | --------------- |
| DeepSORT | 11.27 | 3.72 |
| FairMOT | 41.24 | 12.85 |
#### References
<a name="#object-tracking-2d-1" href="https://arxiv.org/abs/1703.07402">[1]</a> Simple Online and Realtime Tracking with a Deep Association Metric,
[arXiv](https://arxiv.org/abs/1703.07402).
......
......@@ -415,6 +415,36 @@ Parameters:
print(result)
```
#### Performance Evaluation
The tests were conducted on the following computational devices:
- Intel(R) Xeon(R) Gold 6230R CPU on server
- Nvidia Jetson TX2
- Nvidia Jetson Xavier AGX
- Nvidia RTX 2080 Ti GPU on server with Intel Xeon Gold processors
Inference time is measured as the time taken to transfer the input to the model (e.g., from CPU to GPU), run inference using the algorithm, and return results to CPU.
Inner FPS refers to the speed of the model when the data is ready.
We report FPS (single sample per inference) as the mean of 100 runs.
Full FPS Evaluation of DeepSORT and FairMOT on MOT20 dataset
| Model | TX2 (FPS) | Xavier (FPS) | RTX 2080 Ti (FPS) |
| -------- | --------- | ------------ | ----------------- |
| DeepSORT | 2.71 | 6.36 | 16.07 |
| FairMOT | 0.79 | 2.36 | 10.42 |
Inner FPS Evaluation (model only) of DeepSORT and FairMOT on MOT20 dataset.
| Model | TX2 (FPS) | Xavier (FPS) | RTX 2080 Ti (FPS) |
| -------- | --------- | ------------ | ----------------- |
| DeepSORT | 2.71 | 6.36 | 16.07 |
| FairMOT | 0.79 | 2.36 | 17.16 |
Energy (Joules) of DeepSORT and FairMOT on embedded devices.
| Model | TX2 (Joules) | Xavier (Joules) |
| -------- | ------------ | --------------- |
| DeepSORT | 11.27 | 3.72 |
| FairMOT | 41.24 | 12.85 |
#### References
<a name="#object-tracking-2d-1" href="https://arxiv.org/abs/2004.01888">[1]</a> FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking,
[arXiv](https://arxiv.org/abs/2004.01888).
......
......@@ -116,6 +116,39 @@ Parameters:
```
#### Performance Evaluation
The tests were conducted on the following computational devices:
- Intel(R) Xeon(R) Gold 6230R CPU on server
- Nvidia Jetson TX2
- Nvidia Jetson Xavier AGX
- Nvidia RTX 2080 Ti GPU on server with Intel Xeon Gold processors
Inference time is measured as the time taken to transfer the input to the model (e.g., from CPU to GPU), run inference using the algorithm, and return results to CPU.
Inner FPS refers to the speed of the model when the data is ready.
We report FPS (single sample per inference) as the mean of 100 runs.
Full FPS Evaluation of AB3DMOT for classes Car, Pedestrian, Cyclist on KITTI dataset.
| Model | Object Class | TX2 (FPS) | Xavier (FPS) | RTX 2080 Ti (FPS) |
| ------------ | -------------------- | --------- | ------------ | ----------------- |
| AB3DMOT | All | 101.26 | 175.25 | 344.84 |
Energy (Joules) of AB3DMOT on embedded devices.
| Model | Object Class | TX2 (Joules) | Xavier (Joules) |
| ------------ | -------------------- | --------- | ------------ |
| AB3DMOT | All | 0.18 | 0.07 |
AB3DMOT platform compatibility evaluation.
| Platform | Test results |
| -------------------------------------------- | ------------ |
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Pass |
| x86 - Ubuntu 20.04 (CPU docker) | Pass |
| x86 - Ubuntu 20.04 (GPU docker) | Pass |
| NVIDIA Jetson TX2 | Pass |
| NVIDIA Jetson Xavier AGX | Pass |
#### References
<a name="#object-tracking-3d-1" href="https://arxiv.org/abs/2008.08063">[1]</a> AB3DMOT: A Baseline for 3D Multi-Object Tracking and New Evaluation Metrics,
......
......@@ -223,7 +223,55 @@ Parameters:
cv2.waitKey(-1)
```
#### Performance Evaluation
In terms of speed, the performance of BiseNet for different input sizes is summarized in the table below (in FPS).
| Input Size | RTX 2070 | TX2 | NX | AGX |
|------------|----------|-----|-----|-----|
|512x512 |170.43 |11.25|21.43|39.06|
|512x1024 |93.84 |5.92 |11.14|20.83|
|1024x1024 |49.11 |3.03 |5.78 |11.02|
|104x2048 |25.07 |1.50 |2.77 |5.44 |
Apart from the inference speed, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
The measurement was made on a Jetson TX2 module.
| Method | Memory (MB) | Energy (Joules) |
|---------|-------------|-----------------|
| BiseNet | 1113 | 48.208 |
Finally, we measure the performance of BiseNet on the CamVid dataset, using IoU.
| Class | IOU (%) |
|------------|----------|
| Bicyclist |60.0 |
| Building |80.3 |
| Car |87.1 |
| Column Pole|33.3 |
| Fence |42.7 |
| Pedestrian |55.2 |
| Road |90.8 |
| Sidewalk |85.5 |
| Sign Symbol|20.9 |
| Sky |91.2 |
| Tree |73.5 |
| Mean |65.5 |
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|-------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
#### References
<a name="bisenetp" href="https://arxiv.org/abs/1808.00897">[1]</a> BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation,
[arXiv](https://arxiv.org/abs/1808.00897).
......@@ -161,4 +161,57 @@ simply run:
5. $ ./single_demo_inference.py
```
## Performance Evaluation
TABLE-1: OpenDR Single Demonstration Grasping platform inference speeds.
| Platform | Inference speed (FPS) |
| --------------------- | ---------------------- |
| Nvidia GTX 1080 ti | 20 |
| Nvidia Geforce 940mx | 2.5 |
| Jetson Xavier NX | 4 |
| CPU | 0.4 |
The energy consumption of the detection model during inference was also measured on Xavier NX and reported accordingly.
It is worth mentioning that the inference on the first iteration requires more energy for initialization which as it can be seen in TABLE-2.
TABLE-2: OpenDR Single Demonstration Grasping energy consumptions and memory usage.
| Stage | Energy (Joules) |
| --------------------------- | ---------------- |
| First step (initialization) | 12 |
| Normal | 3.4 |
TABLE-3: OpenDR Single Demonstration Grasping training.
| Model | Dataset size | Training Time <br> (hr:min:sec) | Model size (MB) |
|--------------- |---------------------------------- |-------------------------------- |------------------------------ |
| A | Faster R-CNN: 1500 <br> CNN: 5000 | 00:14:00 <br> 00:02:00 | Faster R-CNN: 300 <br> CNN: 8 |
| B | 1500 | 00:07:30 | 450 |
| C (simulation) | 1500 | 00:07:00 | 450 |
TABLE-4: OpenDR Single Demonstration Grasping inferences success evaluation.
| Model | Success rate |
|--------------- |-------------- |
| A | 0.913 |
| B | 0.825 |
| C (simulation) | 0.935 |
Finally, we evaluated the ability of the provided tool to run on different platforms.
The tool has been verified to run correctly on the platforms reported in Table TABLE-5.
TABLE-5: OpenDR Single Demonstration Grasping platform compatibility evaluation.
| Platform | Test results |
| -------------------------------------------- | ---------------------- |
| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass |
| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass |
| x86 - Ubuntu 20.04 (pip installation) | Not supported |
| x86 - Ubuntu 20.04 (CPU docker) | Pass* |
| x86 - Ubuntu 20.04 (GPU docker) | Pass* |
| NVIDIA Jetson TX2 | Not tested |
| NVIDIA Jetson Xavier AGX | Not tested |
| NVIDIA Jetson Xavier NX | Pass** |
\* Installation only considers the learner class. For running the simulation, extra steps are required. \*\* The installation script did not include detectron2 module and webots installation which had to be installed manually with slight modifications and building the detectron2 from source as there was no prebuilt wheel for aarch64 architecture.
......@@ -809,6 +809,66 @@ Parameters:
#### Performance Evaluation
The tests were conducted on the following computational devices:
- Intel(R) Xeon(R) Gold 6230R CPU on server
- Nvidia Jetson TX2
- Nvidia Jetson Xavier AGX
- Nvidia RTX 2080 Ti GPU on server with Intel Xeon Gold processors
Inference time is measured as the time taken to transfer the input to the model (e.g., from CPU to GPU), run inference using the algorithm, and return results to CPU.
The ST-GCN, TAGCN and ST-BLN models are implemented in *SpatioTemporalGCNLearner* and the PST-GCN model is implemented in *ProgressiveSpatioTemporalGCNLearner*.
Note that the models receive each input sample as a sequence of 300 skeletons, and the pose estimation process is not involved in this benchmarking.
The skeletal data is from NTU-RGBD dataset. We report speed (single sample per inference) as the mean of 100 runs.
The noted memory is the maximum allocated memory on GPU during inference.
The performance evaluation results of the *SpatioTemporalGCNLearner* and *ProgressiveSpatioTemporalGCNLearner* in terms of prediction accuracy on NTU-RGBD-60, parameter count and maximum allocated memory are reported in the following Tables.
The performance of TA-GCN is reported when it selects 100 frames out of 300 (T=100). PST-GCN finds different architectures for two different dataset settings (CV and CS) which leads to different classification accuracy, number of parameters and memory allocation.
| Method | Acc. (%) | Params (M) | Mem. (MB) |
|-------------------|----------|------------|-----------|
| ST-GCN | 88.3 | 3.12 | 47.37 |
| TA-GCN (T=100) | 94.2 | 2.24 | 42.65 |
| ST-BLN | 93.8 | 5.3 | 55.77 |
| PST-GCN (CV) | 94.33 | 0.63 | 31.65 |
| PST-GCN (CS) | 87.9 | 0.92 | 32.2 |
The inference speed (evaluations/second) of both learners on various computational devices are as follows:
| Method | CPU | Jetson TX2 | Jetson Xavier | RTX 2080 Ti |
|----------------|-------|------------|---------------|-------------|
| ST-GCN | 13.26 | 4.89 | 15.27 | 63.32 |
| TA-GCN (T=100) | 20.47 | 10.6 | 25.43 | 93.33 |
| ST-BLN | 7.69 | 3.57 | 12.56 | 55.98 |
| PST-GCN (CV) | 15.38 | 6.57 | 20.25 | 83.10 |
| PST-GCN (CS) | 13.07 | 5.53 | 19.41 | 77.57 |
Energy (Joules) of both learners’ inference on embedded devices is shown in the following:
| Method | Jetson TX2 | Jetson Xavier |
|-------------------|-------------|----------------|
| ST-GCN | 6.07 | 1.38 |
| TA-GCN (T=100) | 2.23 | 0.59 |
| ST-BLN | 9.26 | 2.01 |
| PST-GCN (CV) | 4.13 | 1.00 |
| PST-GCN (CS) | 5.54 | 1.12 |
The platform compatibility evaluation is also reported below:
| Platform | Compatibility Evaluation |
| ----------------------------------------------|--------------------------|
| x86 - Ubuntu 20.04 (bare installation - CPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (bare installation - GPU) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (pip installation) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (CPU docker) | :heavy_check_mark: |
| x86 - Ubuntu 20.04 (GPU docker) | :heavy_check_mark: |
| NVIDIA Jetson TX2 | :heavy_check_mark: |
| NVIDIA Jetson Xavier AGX | :heavy_check_mark: |
## References
<a id="1">[1]</a>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment