Your Solution to Estimate the Performance of Your Application
The emmtrix tools support different ways to acquire the duration of the tasks of an application. These vary in accuracy, runtime and additional software or hardware requirements. Static code analysis provides basic information without the need for hardware or special software. More accurate numbers can be collected with interfaces to simulators or the hardware. Depending on the requirements, the methods can be combined as desired.
In general, the execution time of a task or block can be modelled as:
texec = execution_frequencyblock* single_durationblock
The C code based static code analysis derives the execution frequency by analyzing loop boundaries using constant folding. For the duration, each instruction in the code is modelled as numbers of cycles of an abstract hardware model of the processor and then summed up. To improve these basic numbers, the LLVM compiler framework can be used to perform source level profiling to get more accurate execution frequencies and to compile the C code to assembler. Using the assembler code, a more accurate hardware model based on the processor’s pipelines is used to calculate the duration of the blocks. Users can provide custom execution frequencies for blocks with special configuration files to bring the knowledge about the algorithm into the tool. This allows setting typical cases for constructs like queues or waiting tasks to get more realistic numbers for the average execution.
To get even more accurate numbers from simulators or directly from the hardware, the emmtrix tools support automated instrumentation of C code by adding the required measuring points into the program and automated import into the performance estimation.
Static Code Analysis:
Shift + ALU
- Automatic generation of reports and visualization for more detailed information
- Confidence levels for classification of results
- Easily integrable into the development workflow
- Static performance estimation without the need to run the code
- Fast evaluation for different target platforms
- Static performance estimation based on C code
- Static performance estimation based on assembly code
- Integration of simulators or hardware profiling into your workflow
Performance Estimation in emmtrix Parallel Studio
In this video, we compare the different performance estimation methods using static code analysis based on C and assembler code as well as a simulator. We show the differences in accuracy and how the results are visualized in emmtrix Parallel Studio.
- Performance estimation early in the development process
- Continuous monitoring of performance changes during the development
- Comparison of performance for different or heterogeneous target platforms
- Detailed information to better understand the timing behavior of your application
- Detect high-runners or critical parts of you software application
- Estimate the core utilization to optimize runnable/task to core mappings
The results of the performance estimation can be visualized using our interactive and zoomable hierarchical program view. The X-axis represents the time therefore the width of each block depends on the actual duration. On the Y-axis, the control structure of the program can be seen. Additional levels are added for structs like function calls, loops or conditions.
To enhance our static performance estimation solution, the emmtrix tools take applied compiler optimizations into consideration by analyzing assembler code. Together with a model of the processor pipeline, the actual timing behavior of the program on the selected hardware can be predicted with significantly higher accuracy. The advanced mapping between C and assembler code is accessed directly from the GUI and can be used for further inspection.
Some Supported Platforms
The performance estimation has already a wide range of supported target platforms ranging from general-purpose processors (e.g. ARM Cortex-A series or X86) to special-purpose microcontrollers (e.g. Infineon Aurix family). In general, the performance estimation can be easily adapted and customized to provide basic supoprt for new processor architectures. More complex and accurate hardware models with respect to the processor pipeline can be supported on demand.