Success Story
Accelerating
Vectorized
Computing
on AURIX™ TC4x
Customer/Partner: TASKING
Context
The AURIX™ TC4x Parallel Processing Unit (PPU) enables vector-accelerated computation for data-intensive embedded workloads.
In this project, the goal was to enable efficient generation and verification of PPU-optimized functions, including automated function generation and testing both in simulation environments and directly on target hardware.
Challenge
As PPU-optimized software scaled across variants, several technical challenges emerged around scalability, flexibility, and verification.
- Flexible implementation for data arrays of various sizes
The software needed to handle different data dimensions efficiently while maintaining performance and scalability. - Support for multiple PPU configurations
Different vector widths, such as 256-bit and 512-bit, had to be supported to ensure optimal performance across hardware variants. - High numerical accuracy
Maintaining numerical precision was critical to guarantee reliable and consistent results across all computation scenarios. - Comprehensive testing and verification
Simulation and hardware testing were required to validate functionality and robustness, and to enable timing and coverage analysis.
Solution
To address these requirements, emmtrix developed a tool-assisted workflow that streamlines the generation, optimization, and verification of PPU-ready functions.
The automated approach enables efficient generation of functions for different data types and automatically adapts them to arrays of varying dimensions and PPU configurations. In addition, the workflow generates matching test cases that can be executed both in simulators and on physical target hardware.
This allows correctness checks, coverage analysis, and detailed timing measurements to be performed consistently as part of the same workflow.
Implementation Highlights
- Tool-assisted generation of PPU-optimized functions for multiple data types
- Automated handling of data arrays with varying sizes
- Support for different vector-width configurations, including 256-bit and 512-bit
- Automated test generation for simulation and hardware execution
Results
As a result, the customer gained a growing library of functions optimized for use on the PPU, created through a scalable and repeatable process.
The workflow established a framework for the automated generation of new functions while ensuring performance, numerical accuracy, and verification depth across simulation and hardware environments. In addition, a comprehensive test suite was developed to ensure correctness, coverage, and numerical accuracy.
The project was completed with full documentation covering project requirements, software design, implementation details, and testing.
Partner Feedback
TASKING reported that the framework enabled seamless integration into their development and testing environment. They highlighted the close collaboration, technical support, and code quality throughout the project, as well as the early identification and resolution of emerging dependencies. The project feedback also contributed to further compiler improvements.
“Working with emmtrix was very productive. Their expertise in automated testing and compiler optimization helped us accelerate integration of new functionality while ensuring high reliability.”
If you would like to discuss similar requirements in automated optimization, testing, and verification for embedded software, feel free to contact us via our contact form or get directly in touch.
Rainer Heim

