Bots, Bureaucrats, Interface administrators, smwadministrator, smwcurator, smweditor, Administrators
2,557
edits
Timo.stripf (talk | contribs) No edit summary |
Timo.stripf (talk | contribs) No edit summary |
||
Line 3: | Line 3: | ||
== Partial and Full Loop Unrolling == | == Partial and Full Loop Unrolling == | ||
Loop unrolling can be either partial or full. Partial | Loop unrolling can be either partial or full, depending on how many iterations are combined into a single loop body. | ||
* **Partial Unrolling:** This technique reduces the number of iterations by a factor of N, known as the unroll factor. For example, with an unroll factor of 4, a loop that originally runs 16 times would now run only 4 times, processing 4 iterations' worth of data in each pass. The remaining iterations (if the iteration count is not perfectly divisible by the unroll factor) are handled in a separate, smaller loop known as a cleanup loop. Partial unrolling balances reduced control overhead with manageable code size. | |||
* **Full Unrolling:** In full unrolling, the loop is eliminated entirely, and each iteration is explicitly written out as a separate block of code. This maximizes reduction in loop overhead and allows for aggressive compiler optimizations, such as instruction reordering and parallelism. Full unrolling is typically feasible only for small, fixed-size loops where the number of iterations is known at compile time. While this can lead to significant speed improvements, it also increases code size (code bloat), which can negatively impact instruction cache performance in larger programs. | |||
== Loop Unrolling in C/C++ Compilers == | == Loop Unrolling in C/C++ Compilers == | ||
Most C/C++ compilers | Most modern C/C++ compilers apply automatic loop unrolling when aggressive optimization levels (such as `-O2` or `-O3`) are enabled. The decision to unroll a loop depends on factors such as loop trip count, loop body complexity, and potential performance gains. Compilers analyze loops to identify cases where unrolling reduces overhead or exposes further optimization opportunities, such as vectorization. | ||
In addition to automatic unrolling, developers can explicitly influence unrolling behavior through compiler-specific pragmas. These pragmas allow developers to: | |||
- Force a specific unroll factor. | |||
- Disable unrolling for performance or code size reasons. | |||
- Request full unrolling for small loops. | |||
Compiler pragmas for unrolling provide fine-grained control over how loops are transformed, which is useful when compiler heuristics do not align with application-specific performance goals. For example, manually unrolling cache-sensitive loops can improve data locality, while avoiding unrolling in some cases can reduce code bloat. | |||
=== Loop unroll pragmas === | === Loop unroll pragmas === | ||
Pragmas are compiler directives that influence how a compiler processes specific sections of code, such as loops. They offer direct control over transformations like loop unrolling, bypassing the compiler's default heuristics. This allows developers to optimize for performance or code size, depending on the application requirements. Below are pragma options available for different compilers. | |||
**Generic Pragmas (Applicable across multiple compilers)** | |||
* #pragma | * `#pragma unroll(n)` — Requests the compiler to unroll the loop by a factor of `n`. | ||
* `#pragma nounroll` — Explicitly disables unrolling for the annotated loop. | |||
* #pragma | |||
**Clang Pragmas** | |||
* #pragma | * `#pragma clang loop unroll(enable)` — Enables loop unrolling. | ||
* #pragma | * `#pragma clang loop unroll(disable)` — Disables loop unrolling. | ||
* #pragma | * `#pragma clang loop unroll(full)` — Requests full unrolling of the loop. | ||
* `#pragma clang loop unroll_count(4)` — Specifies an unroll factor of 4. | |||
**GCC Pragmas** | |||
* #pragma omp unroll | * `#pragma GCC unroll` — Enables automatic unrolling. | ||
* #pragma omp unroll full | * `#pragma GCC nounroll` — Prevents any unrolling. | ||
* #pragma omp unroll partial | * `#pragma GCC unroll(UNROLLCOUNT)` — Requests a specific unroll factor. | ||
* #pragma omp unroll partial(3) | |||
**OpenMP Pragmas** | |||
OpenMP 5.0 introduced loop unrolling pragmas to allow explicit unrolling in parallel programs: | |||
* `#pragma omp unroll` — Enables unrolling with default heuristics. | |||
* `#pragma omp unroll full` — Requests full unrolling. | |||
* `#pragma omp unroll partial` — Enables partial unrolling. | |||
* `#pragma omp unroll partial(3)` — Specifies an unroll factor of 3. | |||
These pragmas give developers flexibility to tailor loop transformations based on hardware characteristics (e.g., cache size, vector register width) or software constraints (e.g., real-time requirements or binary size limits). | |||
==Loop Unrolling Transformation in emmtrix Studio== | ==Loop Unrolling Transformation in emmtrix Studio== | ||
emmtrix Studio implements loop unrolling using #pragma directives or via the GUI. Unrolling will reduce the iteration count and increase the body of the loop, processing statements from multiple iteration steps in a single iteration. | emmtrix Studio implements loop unrolling using #pragma directives or via the GUI. Unrolling will reduce the iteration count and increase the body of the loop, processing statements from multiple iteration steps in a single iteration. | ||
===Typical Usage and Benefits=== | ===Typical Usage and Benefits=== | ||
Loop unrolling is used to reduce the overhead of the loops and to exploit parallelization on coarser parts. | Loop unrolling is used to reduce the overhead of the loops and to exploit parallelization on coarser parts. | ||
===Example=== | ===Example=== | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 76: | Line 101: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
|} | |} | ||
===Parameters=== | ===Parameters=== | ||
Following parameters can be set (each description is followed by keyword in pragma-syntax and default value): | Following parameters can be set (each description is followed by keyword in pragma-syntax and default value): | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ | ||
Line 86: | Line 114: | ||
|<code>unrollfactor</code> | |<code>unrollfactor</code> | ||
|max_unrollfactor | |max_unrollfactor | ||
|'''Unroll factor''' - divide iteration count & multiply iterating variable. If equal to total number of iterations, loop-construct will be removed from code. If not integer divisor of total number of iterations, additional loop | |'''Unroll factor''' - divide iteration count & multiply iterating variable. If equal to total number of iterations, loop-construct will be removed from code. If not integer divisor of total number of iterations, additional loop processing last iterations will be added | ||
processing last iterations will be added | |||
|} | |} | ||
edits