Elixir YOLO v0.2.0: YOLOX Support, Custom Models and Performance Boost

https://github.com/poeticoding/yolo_elixir/releases/tag/0.2.0

In the four months since my daughter was born, I’ve been balancing baby bottles by night and code commits by day. Finally managed to ship this release, sleep-deprived, but very proud!

This isn’t just a version bump, it’s a major release, introducing three key enhancements:

  • YOLOX support – thanks @aspett (Andrew Pett)!
  • Model-agnostic postprocessing.
    Yeah, I know, terrible naming but I couldn’t find a better name for this. But stick with me, because this enables the library to load a new set of models, even models trained on custom dataset.
  • Major post-processing performance improvement.

This release brings also other things like a fully updated documentation, new and updated LiveBooks etc.

YOLOX support

Ultralytics models are a great and easy way to explore the library, they are fast and Ultralytics also has great tooling to train our own models. But I never thought to build this library just around Ultralytics models, one of my goals has always been to have an extensible library built around the YOLO.Model behaviour. So if you need to support a new set of models, you simply need to implement the preprocess/3 and postprocess/4 callbacks, the library takes care of the rest.

But honestly, one of the main reasons I’m excited to add YOLOX support in the library ito the library is the increased licensing freedom. Ultralytics models come with licensing restrictions, they’re released under the AGPL license with an enterprise option. While AGPL technically allows commercial use, it requires that you release the full source code (including any modifications) and also expose the code if you run the model as a service, which can be a dealbreaker for many commercial applications (to know more, here’s the Ultralytics license page). While YOLOX is under Apache License 2.0!

This is the official YOLOX repo where you also find links to model weights and paper: https://github.com/Megvii-BaseDetection/YOLOX

And from here https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/ONNXRuntime you can directly download the ONNX models trained on COCO dataset.

To use these new models is simple, pretty much like before, you simply need to explicit the model_impl option.

model = YOLO.load(
  model_path: "models/yolox_m.onnx", 
  model_impl: YOLO.Models.YOLOX,
  classes_path: "models/coco_classes.json"
)

model
|> YOLO.detect(image)
|> YOLO.to_detected_objects(model.classes)

If you want to try out YOLOX yourself on your machine, you can run the examples/yolox.livemd.

Model-agnostic postprocessing

Before this release, the library worked only with models outputting a tensor with a fixed 8400x84 shape, meaning 8400 detection candidates and 80 object classes (+4 columns for bboxes coordinates)… meaning we could only run ultralytics models trained on COCO dataset (with 80 classes). No models trained on our data!

By removing this constraint we get freedom! We can now use models that output any number of detections and, most importantly, any number of classes; meaning: custom-trained models!

We can now load and run an Ultralytics yolov8x trained on OpenImages V7 dataset (which has more than 600 object classes). Try it yourself by playing with the examples/yolo_oiv7.livemd livebook.

Ok, ok… cool! But this OIV7 is another generic model. What about my data? What if I want to train and run a model that detects just certain objects?

Sure, you can now do it 😊

In the next few weeks, I’ll publish some videos explaining how to train these models on your own data; like building a custom PCB components detector, or a car license plate detector + OCR pipeline.

If you have a real-world use case in mind and want to share it, feel free to reach out! I’d love to hear about it!

Better Performance

While removing the fixed {8400, 84} constraint, I rewrote part of the postprocessing logic inside YOLO.NMS, using Nx.Defn where possible. Although not strictly related to NMS itself, the changes focus on the preceding filtering step. The result is a 100x speedup: postprocessing now runs in ~4ms on my MacBook Air M3, down from ~400ms. This improvement is largely thanks to Nx.Defn.

Previously, running YOLO in real-time almost required the use of the FastNMS Rust NIF, since YOLO.NMS was too slow. But now, the Elixir version performs competitively and is viable for real-time use. FastNMS has also been updated to version 0.2.0 to be model-agnostic, it was originally tied to the {8400, 84} output shape.

To see the difference between YOLO.NMS and FastNMS on your machine, simply run benchmarks/nms.exs.

What’s next?

For sure I’ll soon publish some screencasts on training custom models and use the with this library. Stay tuned!

Object Tracking. Currently, we detect objects in individual frames, but there’s no way to know if an object in one frame is the same as in the next. With object tracking, we can assign a unique identity to each object (like a car or a person) and follow their movement across frames, allowing us to trace their path over time.

Multi-framework support. Right now, we use Ortex to load and run models. It would be great to support additional frameworks like Axon or NxHailo. This would allow us, for example, to skip compiling Ortex on a Raspberry Pi 5 and instead leverage NxHailo to take advantage of the Pi 5 + Hailo-8 accelerator combo, while running everything directly on Nerves!