Bots, Bureaucrats, Interface administrators, smwadministrator, smwcurator, smweditor, Administrators
2,557
edits
Timo.stripf (talk | contribs) No edit summary |
Timo.stripf (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
== Introduction == | |||
. | '''Inline transformation''', also known as ''function inlining'' or ''inline expansion'', is a compiler optimization that replaces a function call with the actual body of the called function <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. Instead of executing a separate call instruction and incurring the overhead of passing arguments and returning a result, the compiler inserts the function’s code directly at each call site. This is conceptually similar to a preprocessor macro expansion, but it is performed by the compiler on the intermediate code without altering the original source text <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. The primary goal of inlining is to improve performance by eliminating function-call overhead and enabling further optimizations. Modern compilers can automatically inline functions they deem profitable, and languages like C/C++ provide an <code>inline</code> keyword to ''suggest'' inlining (though the compiler is free to ignore it) <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref> <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. Inline expansion can occur at compile time or even later – for instance, '''link-time optimization (LTO)''' allows inlining across object files, and Just-In-Time (JIT) runtimes (like the Java HotSpot VM) perform inlining at runtime using profiling information <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. | ||
== Mechanism of Inline Transformation == | |||
When a function call is inlined, the compiler treats the function much like a code snippet to be substituted into the caller. It will evaluate and assign the function’s arguments to local temporaries (as it would for a normal call), then insert the entire function body at the call site, adjusting variable references and control flow as needed <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. This means that the normal process of jumping to the function’s code and then returning is bypassed. Ordinarily, a function call requires a branch (transfer of control) to the function, plus setup and teardown instructions (such as saving registers, pushing arguments, and later restoring registers on return) – with inlining, these steps are eliminated so that execution “drops through” directly into the inlined code without a call or return instruction <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. | |||
<ref>C Programming Techniques: Function Call Inlining - Fabien Le Mentec https://www.embeddedrelated.com/showarticle/172.php</ref> ''Function call overhead.'' A normal function call introduces extra instructions for saving state, passing arguments, the call itself, and restoring state after returning (overhead shown in red) separate from the function’s useful computation (blue). | |||
<ref>C Programming Techniques: Function Call Inlining - Fabien Le Mentec https://www.embeddedrelated.com/showarticle/172.php</ref> ''Inline expansion removes overhead.'' After inlining, the function’s code is substituted at the call site, so the call and its prologue/epilogue overhead are removed (the red overhead boxes are gone). The program continues executing the inlined body as if it were part of the caller <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. | |||
Because the function body is now part of the caller, the compiler can optimize across what was once a call boundary. For example, if certain arguments are constants at the call site, those constant values may propagate into the inlined function, allowing the compiler to simplify calculations or remove branches inside the function <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref>. In effect, inlining can make two separate functions behave as one larger function, which often enables additional compiler optimizations that would not be possible otherwise. | |||
== Advantages and Disadvantages == | |||
'''Advantages:'''<br /> | |||
- '''Eliminating call overhead:''' Inlining avoids the runtime cost of pushing function arguments, jumping to the function, and returning from it <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. This can make execution faster, especially for very small or frequently called functions where the call overhead is a significant portion of the runtime cost.<br /> | |||
- '''Enabling further optimizations:''' By merging the function code into the caller, the compiler gains a wider scope for optimization. It can perform constant propagation, common subexpression elimination, loop optimizations, and other transformations across what used to be a function boundary <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref> <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. For instance, inlined code might expose that a condition is always true in a particular context, allowing dead code elimination in the combined code.<br /> | |||
- '''Potentially improved performance and smaller code for tiny functions:''' If a function is very simple (e.g., just returns a calculation or a field) and is called often, inlining it might not only speed up execution but could ''reduce'' code size by removing the call/return sequence. (In some cases, the overhead of a call is larger than the function body itself.) Inlining such small functions can both save time and avoid duplicate function call setup code, yielding faster and ''sometimes'' smaller binaries <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref>.<br /> | |||
- '''Better use of instruction pipeline:''' Eliminating function calls can help the CPU’s instruction pipeline and branch predictor. With no branch to an external function, there’s no risk of misprediction or pipeline flush for that call, which can improve the instruction flow continuity (though this benefit is context-dependent). | |||
'''Disadvantages:'''<br /> | |||
- '''Code size increase (code bloat):''' The biggest drawback of inlining is that it duplicates the function’s code at every call site. If a function is inlined in N places, there will be N copies of its body in the final program <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. This can dramatically increase the compiled code size, especially for larger functions or many call sites. A larger binary can negatively impact instruction cache usage and paging.<br /> | |||
- '''Instruction cache pressure:''' Excessive inlining can hurt performance by filling up the CPU’s instruction cache with repeated code <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. When code size grows too large, it may no longer fit in fast instruction caches, leading to more cache misses and slower execution. In other words, beyond a certain point, the lost I-cache efficiency outweighs the saved function call overhead. As a rule of thumb, some inlining improves speed at a minor cost in space, but too much inlining can ''reduce'' speed due to cache effects and increased binary size <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>.<br /> | |||
- '''Diminishing returns for large functions:''' Inlining a large function can embed a lot of code into callers, which might not be worth the small constant overhead of a function call. Compilers often refuse to inline functions that are “too large” because the benefit doesn’t justify the cost. In fact, most compilers ignore an <code>inline</code> request if the function’s size exceeds certain heuristics <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. Inlining deeply recursive functions is also usually avoided (or limited to a few unrolls) because it would blow up code size or even be impossible to fully inline infinite recursion.<br /> | |||
- '''Compilation time and memory:''' While not usually a major concern, inlining can increase compile time and memory usage in the compiler, since the optimizer now has to work with larger functions after inlining. Extremely aggressive inlining (especially via compiler flags) might slow down compilation and produce larger intermediate code for the compiler to process.<br /> | |||
- '''Debugging and profiling complexity:''' When a function is inlined, it no longer exists as a separate entity in the compiled output, which can complicate debugging. For example, setting breakpoints or getting stack traces for inlined functions is harder because they don’t have their own stack frame. Similarly, performance profilers might attribute time spent in an inlined function to the caller, which can be confusing. (Modern debuggers and profilers do have support for inlined code, but it can still be less straightforward than with regular function calls.) | |||
In general, inlining is most beneficial for '''small, frequently-called functions''' (such as simple getters or arithmetic functions) and in performance-critical code paths <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. It is usually counterproductive for large or infrequently-called functions. Compilers use sophisticated heuristics to decide an optimal balance (discussed below), and they may ignore a programmer’s inline suggestion if it would lead to worse results <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. | |||
== Implementation in Popular Compilers == | |||
Modern compilers provide various ways to control or hint at inline transformation, including language keywords, special attributes, and optimization flags. Notably, '''the decision to inline is ultimately made by the compiler’s optimizer''', which might inline functions even without explicit hints or skip inlining when hints are provided, based on its own analysis. Below is how inline expansion is handled in a few popular C/C++ compilers: | |||
=== GCC (GNU Compiler Collection) === | |||
In GCC, the <code>inline</code> keyword in C and C++ is a hint that the function’s code should be integrated into callers to avoid call overhead <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref>. For example, writing <code>inline int f(int x) { return x*2; }</code> suggests to the compiler that calls to <code>f</code> can be replaced with <code>x*2</code> directly. In practice, GCC will consider inlining such functions when optimization is enabled, but '''will not inline at all under <code>-O0</code> (no optimizations)''' unless explicitly forced <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref>. At higher optimization levels (<code>-O2</code>, <code>-O3</code>), GCC’s optimizer will inline functions it deems suitable. | |||
By default, GCC applies some inlining at <code>-O2</code> for functions marked <code>inline</code> (and certain trivial functions), and becomes more aggressive at <code>-O3</code>. In fact, the flag <code>-finline-functions</code> (enabled as part of <code>-O3</code>) tells GCC to attempt inlining of ''any'' “simple enough” functions, even those not marked with the <code>inline</code> keyword <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref>. This means at <code>-O3</code> the compiler will use its heuristics to inline more liberally across the codebase, within limits designed to control code bloat. (These limits can be tweaked via internal parameters like the maximum permitted inline instruction growth <ref>Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref>, but such tuning is rarely needed.) The result is that <code>-O3</code> can inline many small or medium-sized functions automatically, while <code>-O2</code> is more conservative (focusing mostly on functions that are explicitly declared inline or very small). | |||
GCC also provides the '''<code>always_inline</code> attribute''' to force inlining. A function declared with <code>__attribute__((always_inline))</code> (and usually also marked <code>inline</code>) will be inlined regardless of the compiler’s normal heuristics '''and even if optimizations are off''' <ref>Function Attributes - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Function-Attributes.html</ref>. In other words, this attribute directs GCC to bypass any cost-benefit analysis for that function. According to GCC’s documentation and source, <code>always_inline</code> causes the compiler to ignore even commands like <code>-fno-inline</code> and to inline the function without regard to size limits (it will even inline functions using constructs like alloca, which ordinary inlining might not allow) <ref>c - what “inline '''attribute'''((always_inline))” means in the function? - Stack Overflow https://stackoverflow.com/questions/22767523/what-inline-attribute-always-inline-means-in-the-function</ref>. This attribute is useful for cases where the programmer is certain that inlining is critical (for example, a performance-sensitive function that must not have call overhead, or functions that must be inlined for correctness in some low-level code). However, misuse of <code>always_inline</code> can lead to the aforementioned problems of code bloat and cache issues if applied indiscriminately. (If for some reason the compiler cannot inline a function marked <code>always_inline</code> – e.g., a recursive call or other unavoidable situation – GCC will emit an error or warning, since it '''must''' honor the attribute’s contract.) | |||
It’s worth noting that in C++ programs, GCC automatically treats any function defined ''inside a class definition'' as inline (this is mandated by the C++ standard). GCC will attempt to inline such functions even without the <code>inline</code> keyword <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref>. Also, if a function is declared <code>inline</code>, GCC still emits a standalone function definition for it ''unless'' it can prove that every call was inlined and no external reference is needed. This means an inline function might not actually be inlined everywhere, but the one-definition rule is respected by outputting one copy if needed (you can prevent outputting unused inline functions with the <code>-fkeep-inline-functions</code> flag, for instance <ref>Inline - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html</ref>). | |||
'''Summary (GCC):''' Use the <code>inline</code> keyword to hint inlining, and the <code>__attribute__((always_inline))</code> to strongly force it (typically combined with <code>inline</code> in the definition). At <code>-O3</code> or with <code>-finline-functions</code>, GCC becomes more aggressive about inlining automatically. The compiler will ignore inline hints for overly large functions or if other constraints prevent inlining <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. | |||
=== Clang/LLVM === | |||
Clang (the C/C++ frontend to LLVM) handles inlining in a manner very similar to GCC. It supports the C++ <code>inline</code> keyword in the same way – as a hint with linkage implications – and it implements GCC-style attributes like <code>always_inline</code>. In practice, Clang’s optimizer will inline functions under optimization levels based on LLVM’s inlining heuristics. Like GCC, at <code>-O0</code> Clang does not perform any inlining (unless forced via always_inline). At <code>-O1</code> and above, it will inline certain calls that it decides are profitable. Clang also recognizes the <code>-finline-functions</code> flag (and enables it at <code>-O3</code>), which allows more aggressive inlining of functions even if they are not marked inline <ref>Clang command line argument reference — Clang 21.0.0git documentation https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. It additionally supports an option <code>-finline-hint-functions</code> which restricts automatic inlining to only those functions that are declared <code>inline</code> (this is analogous to MSVC’s strategy under <code>/Ob1</code>) <ref>Clang command line argument reference — Clang 21.0.0git documentation https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. In practice, Clang’s default at <code>-O2</code> is to inline functions it thinks are worthwhile (whether or not they were marked inline), and at <code>-O3</code> it increases the aggressiveness similar to GCC. | |||
For forcing inline, Clang honors <code>__attribute__((always_inline))</code> on functions just like GCC. If a function is marked always_inline, Clang will emit it inline whenever possible and will issue an error if it cannot (to ensure the function doesn’t end up out-of-line). There is no distinct Clang-specific keyword for this, but Clang in MSVC compatibility mode will accept <code>__forceinline</code> as an alias (since it defines <code>_MSC_VER</code> compatibility). Under the hood, both GCC and Clang attach an internal “always inline” property to such functions in the intermediate representation, which the optimizer’s inline pass will obey strictly. As with GCC, using this power should be done judiciously – Clang’s documentation notes that overusing forced inlining can result in larger code with little benefit, similar to any other compiler. | |||
One difference to mention is that Clang’s diagnostics and reports can help understand inlining decisions. For example, Clang has flags like <code>-Rpass=inline</code> and <code>-Rpass-missed=inline</code> which, at compile time, can report which functions were inlined or not inlined and why. This can be useful to tune code for inlining with Clang. The heuristics themselves (function size thresholds, etc.) are continuously refined in LLVM’s development, but generally align with the goal of balancing performance gain against code growth. | |||
'''Summary (Clang):''' Clang uses the same mechanisms as GCC for inlining – the <code>inline</code> keyword, the <code>always_inline</code> attribute for forcing, and optimization-level-dependent heuristics. At <code>-O3</code> it inlines more aggressively (<code>-finline-functions</code>), while at lower levels it inlines more conservatively or only inline-marked functions <ref>Clang command line argument reference — Clang 21.0.0git documentation https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. Its behavior is largely consistent with GCC’s in this area, given that both use similar inline expansion strategies. | |||
=== MSVC (Microsoft Visual C++) === | |||
MSVC’s approach to inlining in C++ relies on both language keywords and compiler settings. In MSVC, the <code>inline</code> keyword (or its synonym <code>__inline</code>) is also a hint to suggest that a function be inlined. However, as with other compilers, this is not a command – MSVC will perform inline expansion only if it judges the optimization to be worthwhile <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. The MSVC compiler evaluates the size and complexity of the function, and certain usage patterns, before deciding to inline. It will not inline functions in some cases (for example, if a function’s address is taken or if the function is too large or has varargs, it won’t be inlined). By default, MSVC’s optimization settings control how much inlining is done: | |||
* '''<code>/Ob0</code>''' – ''No inlining''. This is the default in debug builds (<code>/Od</code>). The compiler does not inline any function, regardless of the inline keyword. This setting is used to make debugging easier and ensure the binary closely follows the written code structure.<br /> | |||
* '''<code>/Ob1</code>''' – ''Inline only if marked inline''. With this setting, the compiler will expand functions inline '''only''' if they are explicitly declared <code>inline</code> (or <code>__inline</code> or <code>__forceinline</code>), or if they are C++ member functions defined inside class definitions <ref>/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. In other words, it respects the inline hints but does not consider other functions for inlining. This is a moderate level used when some inlining is desired but not aggressive auto-inlining.<br /> | |||
* '''<code>/Ob2</code>''' – ''Auto-inlining''. This is the default in release builds (<code>/O1</code> or <code>/O2</code>) and it allows MSVC to inline any function it wants to, at its discretion <ref>/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. The compiler will inline functions marked inline or <code>__forceinline</code>, and '''may also inline other functions''' even if they aren’t marked, whenever its heuristics indicate there is a benefit and it’s safe to do so <ref>/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. Essentially, <code>/Ob2</code> gives the optimizer freedom to do inlining beyond the programmer’s annotations (similar to GCC’s <code>-finline-functions</code>). Most MSVC optimized builds use this level by default.<br /> | |||
* '''<code>/Ob3</code>''' – ''Aggressive inlining''. Introduced in Visual Studio 2019, <code>/Ob3</code> is an undocumented setting that goes beyond <code>/Ob2</code> in aggressiveness <ref>/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. It uses the same inlining criteria but increases the compiler’s willingness to inline. This might inline even larger functions or more call sites than <code>/Ob2</code> would. (It’s not available directly in the IDE project settings; it must be set manually, and it’s considered experimental.) | |||
For MSVC-specific inline control, the '''<code>__forceinline</code>''' keyword is provided. This keyword attempts to '''override the compiler’s cost analysis''' and force the function to be inlined wherever possible <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. A function declared <code>__forceinline</code> in MSVC is treated with a much stronger inlining preference than a normal <code>inline</code>. In effect, it tells the compiler “I, the programmer, am sure that inlining this is critical, so do it even if your heuristics disagree.” MSVC will make a very strong effort to inline such a function. However – importantly – MSVC still might not inline a <code>__forceinline</code> function in certain situations. The documentation explicitly states that there is ''no guarantee'' a function will be inlined, even with <code>__forceinline</code> <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. For example, if you compile with <code>/Ob0</code> (inlining disabled), even <code>__forceinline</code> functions won’t be inlined <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Similarly, if a function cannot physically be inlined (e.g., it’s recursive without a clear depth limit, or its address is used somewhere, or it has incompatible exception handling settings <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>), MSVC will emit it out-of-line despite the <code>__forceinline</code>. What <code>__forceinline</code> really does is ''lower the threshold'' for inlining decisions dramatically and bypass some checks, but the compiler can still balk if inlining would break the program or if it’s disallowed by global settings. | |||
MSVC’s strategy is therefore to treat <code>inline</code> (and methods defined in-class) as suggestions, and <code>__forceinline</code> as a stronger suggestion, but ultimately to rely on its internal heuristics and the <code>/Ob</code> setting. By default, in a release build (/O2 which implies /Ob2), MSVC will inline many small functions automatically. It will produce warnings if a <code>__forceinline</code> function cannot be inlined (so the developer knows the hint was not honored). The heuristics consider factors like the function’s size in IL instructions, the complexity and nesting of inlined code, etc., similar to other compilers. As an example, MSVC might inline a small getter or simple math function even if not marked inline, but it might refuse to inline an <code>inline</code>-marked function that contains a large loop or heavy logic. | |||
Additionally, MSVC supports '''Link-Time Code Generation (LTCG)''', enabled with <code>/GL</code> (compile for whole-program optimization) and <code>/LTCG</code> (link with whole-program optimization). When using LTCG, MSVC’s linker can inline functions across module boundaries (object files) because it has access to the entire program’s intermediate code. This is analogous to GCC/Clang’s LTO. In fact, MSVC will perform cross-module inlining under LTCG even for functions not marked inline, if profitable <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. This allows inlining of functions from libraries or other translation units that would not be visible to the compiler otherwise. | |||
'''Summary (MSVC):''' The <code>inline</code> keyword and <code>_inline</code> are hints, and C++ inline methods are considered, but the compiler decides based on <code>/Ob</code> level and its analysis <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Use <code>__forceinline</code> to push the compiler harder to inline a function (with the understanding it’s still not absolute) <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. The <code>/Ob2</code> setting (on by default in /O2 builds) allows automatic inlining of suitable functions, while <code>/Ob1</code> requires explicit inline, and <code>/Ob0</code> turns off inlining. Whole-program optimization (/LTCG) can inline across compilation units at link time. | |||
== Code Examples == | |||
Below are simple C++ examples illustrating inline transformation in code for different compilers. In each case, the goal is to show how to declare a function so that the compiler will inline it, and how the call is then replaced by the function’s body. | |||
<syntaxhighlight lang="cpp">// Example 1: Using inline and always_inline in GCC/Clang | |||
#include <cstdio> | |||
// Hint to compiler: inline, and force inline even if not optimizing | |||
inline __attribute__((always_inline)) int add(int a, int b) { | |||
return a + b; | |||
} | |||
int main() { | |||
int x = 5, y = 7; | |||
// The call to add() will be expanded inline (no actual function call at runtime) | |||
int result = add(x, y); | |||
std::printf("result = %d\n", result); | |||
}</syntaxhighlight> | |||
In the above code, <code>add</code> is declared with both <code>inline</code> and the GCC/Clang-specific <code>__attribute__((always_inline))</code>. Under GCC or Clang, this ensures that whenever <code>add(x,y)</code> is used, the compiler will substitute the <code>return a + b;</code> logic directly at the call site. For instance, at optimization level <code>-O2</code> or higher, the generated machine code for the call in <code>main</code> will just compute <code>x + y</code> and store it in <code>result</code>, with no function call overhead. (If you compile this with <code>gcc -O2 -S</code> to assembly, you would see that it contains no call instruction for <code>add</code> – the code is inlined.) The <code>always_inline</code> attribute here forces inlining even if we didn’t use <code>-O2</code>, but note that GCC requires the function to be marked <code>inline</code> as well to avoid linkage issues. In practice, you would typically use <code>always_inline</code> for functions in headers or those that you ''must'' inline for performance or correctness reasons. | |||
<syntaxhighlight lang="cpp">// Example 2: Using __forceinline in MSVC | |||
#include <cstdio> | |||
__forceinline int multiply(int a, int b) { | |||
return a * b; | |||
} | |||
int main() { | |||
int p = 3, q = 4; | |||
// The call to multiply() will be inlined by MSVC in an optimized build (/O2) | |||
int prod = multiply(p, q); | |||
std::printf("prod = %d\n", prod); | |||
}</syntaxhighlight> | |||
In this MSVC example, the function <code>multiply</code> is declared with <code>__forceinline</code>. In an optimized compilation (with <code>/O2</code>, which implies <code>/Ob2</code>), the MSVC compiler will inline the body of <code>multiply</code> at the call site in <code>main</code>. That means the compiled code for <code>main</code> will effectively just compute <code>3 * 4</code> and print the result, with no actual call/return for <code>multiply</code>. If compiled in a debug configuration (/Od, which implies <code>/Ob0</code>), the compiler would ignore the <code>__forceinline</code> and generate a regular function call (since inline expansion is globally disabled in that case). This example demonstrates the syntax for forcing inline in MSVC. When examining the assembly output in an optimized build, you would see no call to <code>multiply</code>; instead, the instructions for the multiplication are directly inside <code>main</code>. | |||
These examples are simplistic, but they show the syntax and effect of inline expansion. In real-world usage, you might inline small math functions, accessor functions, or other performance-critical routines. All major compilers will remove the function-call overhead and embed the logic directly, as shown above, when inlining is applied. | |||
== Challenges and Limitations == | |||
Deciding ''when'' to inline a function is a complex problem, and compilers use sophisticated heuristics to make this decision. Inlining provides a trade-off between speed and size, and the “right” amount of inlining can depend on the target CPU, the overall program structure, and runtime behavior of the code. Some of the challenges and considerations include: | |||
* '''Predicting performance impact is non-trivial:''' While removing a function call generally improves execution speed, the net effect on a large program is not always positive. Inlining can ''increase'' or ''decrease'' performance depending on many factors <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. For example, inlining might speed up one part of the code but cause another part to slow down due to cache misses. The compiler has to predict whether inlining a particular function at a particular call site will be beneficial overall, which is undecidable with perfect accuracy. As studies and experience have shown, ''no compiler can always make the optimal inlining decision'' because it lacks full knowledge of runtime execution patterns and hardware microarchitectural effects <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. The instruction cache behavior is especially critical: a program that fit in cache before might overflow it after inlining one too many functions, causing performance to drop. These complex interactions mean that inlining decisions are essentially heuristic guesses aimed at a balance. | |||
* '''Compiler heuristics:''' Modern compilers treat the inlining decision as an optimization problem. They often set a '''“budget” for code growth''' and try to inline the most beneficial calls without exceeding that budget <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. This is sometimes modeled like a knapsack problem – choosing which function calls to inline to maximize estimated performance gain for a given allowable increase in code size <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. The heuristics involve metrics such as the size of the function (in internal intermediate representation instructions), the number of call sites, and the estimated frequency of each call. For instance, a call inside a loop that runs thousands of times is more profitable to inline than a call in a one-off initialization function. Compilers also consider whether inlining a function will enable ''subsequent optimizations'': if inlining a function exposes a constant or a branch that can simplify the code, the compiler gives that more weight. These factors are combined into a cost/benefit analysis for each call site. If the estimated benefit (e.g., saved cycles) outweighs the cost (e.g., added instructions and bytes of code), the call is inlined – otherwise it’s left as a regular call <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. Different compilers (and even different versions of the same compiler) use different formulas and thresholds for this. For example, GCC and Clang assign a certain “cost” to the function based on IR instruction count and adjust it if the function is marked <code>inline</code> (which gives a hint benefit) or if the call is in a hot path, etc. MSVC similarly has internal thresholds and will inline more aggressively at <code>/Ob2</code> than at <code>/Ob1</code>. These heuristics are continually refined to produce good results across typical programs. | |||
* '''Profile-guided inlining:''' One way to improve inlining decisions is to use ''profile-guided optimization (PGO)''. PGO involves compiling the program, running it on sample workloads to gather actual execution frequencies of functions and branches, and then feeding that profile data back into a second compilation. With PGO, the compiler knows which functions are actually hot (called frequently in practice) and which call sites are executed often. This information can greatly inform the inlining heuristics – for example, the compiler might inline a function it knows is called millions of times a second, but not inline another function that is rarely used, even if they are similar in size. Using PGO, compilers can be more bold about inlining hot paths and avoid code bloat on cold paths. That said, the gains from PGO-based inlining, while real, are often in the single-digit percentages of performance <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. It helps the compiler make more informed decisions, but it doesn’t fundamentally eliminate the trade-offs. In some cases PGO might even cause slight regressions if the profile data misleads the heuristics (e.g., if the runtime usage differs from the training run). Still, PGO is a valuable tool for squeezing out extra performance by fine-tuning inlining and other optimizations based on actual usage. | |||
* '''Limitations and overrides:''' There are practical limits to inlining. Compilers will not inline a function in certain scenarios, no matter what: for example, a recursive function usually can’t be fully inlined (it would lead to infinite code expansion), although some compilers will unroll a recursion a fixed number of times if marked inline. If a function’s address is taken (meaning a pointer to the function is used), most compilers have to generate an actual function body for it, and they might not inline all calls either because the function now needs to exist independently <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Virtual function calls in C++ cannot be inlined unless the compiler can deduce the exact target (e.g., the object’s dynamic type is known or the function is devirtualized); thus, inlining across a polymorphic call often requires whole-program analysis or final devirtualization. Additionally, as mentioned earlier, compilers impose certain limits to avoid ''pathological code expansion'': GCC, for instance, has parameters like '''<code>inline-unit-growth</code>''' and '''<code>max-inline-insns-single</code>''' that prevent inlining from blowing up the code more than a certain factor <ref>Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref> <ref>Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref>. These ensure that even under <code>-O3</code>, the compiler won’t inline everything blindly and will stop if the function grows too large due to inlining. | |||
* '''Link-time optimization (LTO):''' Traditional compilation limits inlining to within a single source file (translation unit) because the compiler can only see one .c/.cpp file at a time. '''Link-time optimization''' lifts this restriction by allowing inlining (and other optimizations) to occur across translation unit boundaries at link time. With LTO enabled (for example, <code>gcc -flto</code> or MSVC’s <code>/LTCG</code>), the compiler effectively sees the entire program or library, so it can inline functions from one module into another <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. This means even if you didn’t mark a function <code>inline</code> or put it in a header, LTO might inline it if it’s beneficial. For instance, a small utility function defined in one source file and called in another could be inlined during LTO, whereas without LTO that call would remain a regular function call (because the compiler wouldn’t have seen the function’s body while compiling the caller). LTO thus increases the scope of inlining and can yield significant performance improvements for codebases split across many files. One common use of LTO-driven inlining is for library functions: the compiler might inline standard library functions or other library calls if LTO is enabled and it has the library’s code. The downside is that LTO can make compile times (or link times) longer and increase memory usage during compilation, due to the larger optimization scope. Also, the same caution applies: even with whole-program visibility, the compiler still uses heuristics to decide what to inline <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. Having more opportunities to inline (thanks to LTO) doesn’t mean it will inline everything; it still must choose carefully to avoid overwhelming code bloat or slower performance from cache misses. | |||
* '''Alternate strategies:''' In cases where inlining is not beneficial or possible, other optimizations may be preferable. For example, compilers might use ''outline'' strategies (the opposite of inline) to reduce code size – i.e., they might decide ''not'' to inline to keep code small (especially at <code>-Os</code> or in constrained environments). Another strategy is '''partial inlining''', where the compiler might extract and inline only a portion of a function. GCC introduced something along these lines (sometimes called “IPA-split” or partial inlining) where it tries to inline the hot parts of a function into callers and keep the cold parts out-of-line, as a compromise. This is advanced and not directly under user control, but it shows that inlining doesn’t have to be all-or-nothing. | |||
In summary, inline transformation is a powerful optimization, but it must be applied with care. Compilers provide keywords and options to guide inlining, but they also rightfully employ their own models to decide when inlining makes sense. As a developer, a good practice is to trust the compiler for general decisions, and only force inlining in cases where you have clear evidence (via profiling or knowledge of the code) that the compiler’s heuristic might be missing an opportunity. Tools like optimization reports or profile-guided optimization can assist in making those decisions. Ultimately, inline transformation is one of many tools in the compiler’s toolbox, and its effectiveness will vary – some code speeds up dramatically with inlining, while in other cases excessive inlining can degrade performance <ref>Inline expansion - Wikipedia https://en.wikipedia.org/wiki/Inline_expansion</ref>. The key is balancing those effects, a task that modern compilers handle through continual refinement of their inlining algorithms. | |||
==Procedure Inline Transformation in emmtrix Studio== | ==Procedure Inline Transformation in emmtrix Studio== | ||
emmtrix Studio can implement procedure inline using #pragma directives or via the GUI. Procedure inline is a transformation that inlines function body. It replaces function calls with their implementation. | emmtrix Studio can implement procedure inline using #pragma directives or via the GUI. Procedure inline is a transformation that inlines function body. It replaces function calls with their implementation. | ||
===Typical Usage and Benefits=== | ===Typical Usage and Benefits=== | ||
The transformation is used to reduce the overhead caused by function calls and to increase the potential for optimizations. The latter is expressed through reduced parallelization difficulties and higher possibility for other transformations to be applied. | The transformation is used to reduce the overhead caused by function calls and to increase the potential for optimizations. The latter is expressed through reduced parallelization difficulties and higher possibility for other transformations to be applied. | ||
===Example=== | ===Example=== | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 48: | Line 175: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
|} | |} | ||
[[Category:Code Transformation]] | [[Category:Code Transformation]] |
edits