|
|
| Line 29: |
Line 29: |
|
| |
|
| Modern compilers provide various ways to control or hint at inline transformation, including language keywords, special attributes, and optimization flags. Notably, '''the decision to inline is ultimately made by the compiler’s optimizer''', which considers factors such as function size, complexity, and the chosen optimization level. Depending on the optimization settings (e.g., -O0 vs. -O3), inlining behavior can vary significantly. Below is how inline expansion is handled in a few popular C/C++ compilers: | | Modern compilers provide various ways to control or hint at inline transformation, including language keywords, special attributes, and optimization flags. Notably, '''the decision to inline is ultimately made by the compiler’s optimizer''', which considers factors such as function size, complexity, and the chosen optimization level. Depending on the optimization settings (e.g., -O0 vs. -O3), inlining behavior can vary significantly. Below is how inline expansion is handled in a few popular C/C++ compilers: |
| | |
| | === Heuristics for Inlining === |
| | Compilers use heuristics to decide when inlining is beneficial. These heuristics balance performance improvements against potential code bloat. Factors considered include: |
| | |
| | * '''Function size:''' Small functions are more likely to be inlined, while large functions may be rejected due to code growth concerns. |
| | * '''Call frequency:''' Frequently called functions are strong candidates for inlining to reduce call overhead. |
| | * '''Control flow complexity:''' Functions with loops, branches, or recursion are less likely to be inlined unless explicitly forced. |
| | * '''Compiler optimization level:''' Higher optimization levels (e.g., <code>-O3</code>) increase the aggressiveness of inlining, while <code>-O0</code> typically disables it. |
| | * '''Interprocedural analysis:''' Some compilers perform whole-program analysis (e.g., GCC with LTO, MSVC with LTCG, Intel ICC/ICX) to determine profitable inlining across translation units. |
| | |
| | Each compiler has its own implementation of these heuristics, which evolve over time to balance performance and maintainability. |
|
| |
|
| === Always Inline Attribute === | | === Always Inline Attribute === |
| Line 34: |
Line 45: |
| Compilers offer mechanisms to enforce inlining of functions even when regular inlining is disabled (e.g. by <code>-O0</code>). | | Compilers offer mechanisms to enforce inlining of functions even when regular inlining is disabled (e.g. by <code>-O0</code>). |
|
| |
|
| * '''GCC & Clang/LLVM''': Both compilers support <code>__attribute__((always_inline))</code>, ensuring that a function is inlined even when optimizations are disabled. If inlining is not possible (e.g., due to recursion or taking the function’s address), an error or warning is emitted. | | * '''GCC & Clang/LLVM''': Both compilers support <code>__attribute__((always_inline))</code>, ensuring that a function is inlined even when optimizations are disabled. If inlining is not possible (e.g., due to recursion or taking the function’s address), an error or warning is emitted.<ref name="gcc_func">Function Attributes - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Function-Attributes.html</ref> |
| * '''MSVC''': Offers the __forceinline keyword, which strongly suggests inlining but does not guarantee it. If inlining is not feasible, the function remains out-of-line, and the compiler may issue a warning. | | * '''MSVC''': Offers the <code>__forceinline</code> keyword, which strongly suggests inlining but does not guarantee it. If inlining is not feasible, the function remains out-of-line, and the compiler may issue a warning. |
|
| |
|
| === No Inline === | | === No Inline === |
| Line 43: |
Line 54: |
| * '''MSVC''': Provides the <code>__declspec(noinline)</code> attribute to prevent function inlining. The compiler option <code>/Ob0</code> disables all inlining, even for functions marked as <code>inline</code>. | | * '''MSVC''': Provides the <code>__declspec(noinline)</code> attribute to prevent function inlining. The compiler option <code>/Ob0</code> disables all inlining, even for functions marked as <code>inline</code>. |
|
| |
|
| === GCC (GNU Compiler Collection) ===
| | === Clang/LLVM Inline Implementation === |
| | |
| In GCC, the <code>inline</code> keyword in C and C++ is a hint that the function’s code should be integrated into callers to avoid call overhead <ref name="gcc"/>. For example, writing <code>inline int f(int x) { return x*2; }</code> suggests to the compiler that calls to <code>f</code> can be replaced with <code>x*2</code> directly. In practice, GCC will consider inlining such functions when optimization is enabled, but '''will not inline at all under <code>-O0</code> (no optimizations)''' unless explicitly forced <ref name="gcc"/>. At higher optimization levels (<code>-O2</code>, <code>-O3</code>), GCC’s optimizer will inline functions it deems suitable.
| |
| | |
| By default, GCC applies some inlining at <code>-O2</code> for functions marked <code>inline</code> (and certain trivial functions), and becomes more aggressive at <code>-O3</code>. In fact, the flag <code>-finline-functions</code> (enabled as part of <code>-O3</code>) tells GCC to attempt inlining of ''any'' “simple enough” functions, even those not marked with the <code>inline</code> keyword <ref name="gcc"/>. This means at <code>-O3</code> the compiler will use its heuristics to inline more liberally across the codebase, within limits designed to control code bloat. (These limits can be tweaked via internal parameters like the maximum permitted inline instruction growth <ref name="gcc_opt">Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref>, but such tuning is rarely needed.) The result is that <code>-O3</code> can inline many small or medium-sized functions automatically, while <code>-O2</code> is more conservative (focusing mostly on functions that are explicitly declared inline or very small).
| |
| | |
| GCC also provides the '''<code>always_inline</code> attribute''' to force inlining. A function declared with <code>__attribute__((always_inline))</code> (and usually also marked <code>inline</code>) will be inlined regardless of the compiler’s normal heuristics '''and even if optimizations are off''' <ref name="gcc_func">Function Attributes - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Function-Attributes.html</ref>. In other words, this attribute directs GCC to bypass any cost-benefit analysis for that function. According to GCC’s documentation and source, <code>always_inline</code> causes the compiler to ignore even commands like <code>-fno-inline</code> and to inline the function without regard to size limits (it will even inline functions using constructs like alloca, which ordinary inlining might not allow) <ref>c - what “inline '''attribute'''((always_inline))” means in the function? - Stack Overflow https://stackoverflow.com/questions/22767523/what-inline-attribute-always-inline-means-in-the-function</ref>. This attribute is useful for cases where the programmer is certain that inlining is critical (for example, a performance-sensitive function that must not have call overhead, or functions that must be inlined for correctness in some low-level code). However, misuse of <code>always_inline</code> can lead to the aforementioned problems of code bloat and cache issues if applied indiscriminately. (If for some reason the compiler cannot inline a function marked <code>always_inline</code> – e.g., a recursive call or other unavoidable situation – GCC will emit an error or warning, since it '''must''' honor the attribute’s contract.)
| |
| | |
| It’s worth noting that in C++ programs, GCC automatically treats any function defined ''inside a class definition'' as inline (this is mandated by the C++ standard). GCC will attempt to inline such functions even without the <code>inline</code> keyword <ref name="gcc"/>. Also, if a function is declared <code>inline</code>, GCC still emits a standalone function definition for it ''unless'' it can prove that every call was inlined and no external reference is needed. This means an inline function might not actually be inlined everywhere, but the one-definition rule is respected by outputting one copy if needed (you can prevent outputting unused inline functions with the <code>-fkeep-inline-functions</code> flag, for instance <ref name="gcc"/>).
| |
| | |
| === Clang/LLVM === | |
| | |
| Clang (the C/C++ frontend to LLVM) handles inlining in a manner very similar to GCC. It supports the C++ <code>inline</code> keyword in the same way – as a hint with linkage implications – and it implements GCC-style attributes like <code>always_inline</code>. In practice, Clang’s optimizer will inline functions under optimization levels based on LLVM’s inlining heuristics. Like GCC, at <code>-O0</code> Clang does not perform any inlining (unless forced via always_inline). At <code>-O1</code> and above, it will inline certain calls that it decides are profitable. Clang also recognizes the <code>-finline-functions</code> flag (and enables it at <code>-O3</code>), which allows more aggressive inlining of functions even if they are not marked inline <ref name="clang_cli">Clang command line argument reference https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. It additionally supports an option <code>-finline-hint-functions</code> which restricts automatic inlining to only those functions that are declared <code>inline</code> (this is analogous to MSVC’s strategy under <code>/Ob1</code>) <ref name="clang_cli">Clang command line argument reference https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. In practice, Clang’s default at <code>-O2</code> is to inline functions it thinks are worthwhile (whether or not they were marked inline), and at <code>-O3</code> it increases the aggressiveness similar to GCC.
| |
| | |
| For forcing inline, Clang honors <code>__attribute__((always_inline))</code> on functions just like GCC. If a function is marked always_inline, Clang will emit it inline whenever possible and will issue an error if it cannot (to ensure the function doesn’t end up out-of-line). There is no distinct Clang-specific keyword for this, but Clang in MSVC compatibility mode will accept <code>__forceinline</code> as an alias (since it defines <code>_MSC_VER</code> compatibility). Under the hood, both GCC and Clang attach an internal “always inline” property to such functions in the intermediate representation, which the optimizer’s inline pass will obey strictly. As with GCC, using this power should be done judiciously – Clang’s documentation notes that overusing forced inlining can result in larger code with little benefit, similar to any other compiler.
| |
|
| |
|
| One difference to mention is that Clang’s diagnostics and reports can help understand inlining decisions. For example, Clang has flags like <code>-Rpass=inline</code> and <code>-Rpass-missed=inline</code> which, at compile time, can report which functions were inlined or not inlined and why. This can be useful to tune code for inlining with Clang. The heuristics themselves (function size thresholds, etc.) are continuously refined in LLVM’s development, but generally align with the goal of balancing performance gain against code growth. | | One difference to mention is that Clang’s diagnostics and reports can help understand inlining decisions. For example, Clang has flags like <code>-Rpass=inline</code> and <code>-Rpass-missed=inline</code> which, at compile time, can report which functions were inlined or not inlined and why. This can be useful to tune code for inlining with Clang. The heuristics themselves (function size thresholds, etc.) are continuously refined in LLVM’s development, but generally align with the goal of balancing performance gain against code growth. |
|
| |
| === MSVC (Microsoft Visual C++) ===
| |
|
| |
| MSVC’s approach to inlining in C++ relies on both language keywords and compiler settings. In MSVC, the <code>inline</code> keyword (or its synonym <code>__inline</code>) is also a hint to suggest that a function be inlined. However, as with other compilers, this is not a command – MSVC will perform inline expansion only if it judges the optimization to be worthwhile <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. The MSVC compiler evaluates the size and complexity of the function, and certain usage patterns, before deciding to inline. It will not inline functions in some cases (for example, if a function’s address is taken or if the function is too large or has varargs, it won’t be inlined). By default, MSVC’s optimization settings control how much inlining is done:
| |
|
| |
| * '''<code>/Ob0</code>''' – ''No inlining''. This is the default in debug builds (<code>/Od</code>). The compiler does not inline any function, regardless of the inline keyword. This setting is used to make debugging easier and ensure the binary closely follows the written code structure.
| |
|
| |
| * '''<code>/Ob1</code>''' – ''Inline only if marked inline''. With this setting, the compiler will expand functions inline '''only''' if they are explicitly declared <code>inline</code> (or <code>__inline</code> or <code>__forceinline</code>), or if they are C++ member functions defined inside class definitions <ref name="ms_ob">/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. In other words, it respects the inline hints but does not consider other functions for inlining. This is a moderate level used when some inlining is desired but not aggressive auto-inlining.
| |
|
| |
| * '''<code>/Ob2</code>''' – ''Auto-inlining''. This is the default in release builds (<code>/O1</code> or <code>/O2</code>) and it allows MSVC to inline any function it wants to, at its discretion <ref name="ms_ob">/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. The compiler will inline functions marked inline or <code>__forceinline</code>, and '''may also inline other functions''' even if they aren’t marked, whenever its heuristics indicate there is a benefit and it’s safe to do so <ref name="ms_ob">/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. Essentially, <code>/Ob2</code> gives the optimizer freedom to do inlining beyond the programmer’s annotations (similar to GCC’s <code>-finline-functions</code>). Most MSVC optimized builds use this level by default.
| |
|
| |
| * '''<code>/Ob3</code>''' – ''Aggressive inlining''. Introduced in Visual Studio 2019, <code>/Ob3</code> is an undocumented setting that goes beyond <code>/Ob2</code> in aggressiveness <ref name="ms_ob">/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. It uses the same inlining criteria but increases the compiler’s willingness to inline. This might inline even larger functions or more call sites than <code>/Ob2</code> would. (It’s not available directly in the IDE project settings; it must be set manually, and it’s considered experimental.)
| |
|
| |
| For MSVC-specific inline control, the '''<code>__forceinline</code>''' keyword is provided. This keyword attempts to '''override the compiler’s cost analysis''' and force the function to be inlined wherever possible <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. A function declared <code>__forceinline</code> in MSVC is treated with a much stronger inlining preference than a normal <code>inline</code>. In effect, it tells the compiler “I, the programmer, am sure that inlining this is critical, so do it even if your heuristics disagree.” MSVC will make a very strong effort to inline such a function. However – importantly – MSVC still might not inline a <code>__forceinline</code> function in certain situations. The documentation explicitly states that there is ''no guarantee'' a function will be inlined, even with <code>__forceinline</code> <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. For example, if you compile with <code>/Ob0</code> (inlining disabled), even <code>__forceinline</code> functions won’t be inlined <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Similarly, if a function cannot physically be inlined (e.g., it’s recursive without a clear depth limit, or its address is used somewhere, or it has incompatible exception handling settings <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>), MSVC will emit it out-of-line despite the <code>__forceinline</code>. What <code>__forceinline</code> really does is ''lower the threshold'' for inlining decisions dramatically and bypass some checks, but the compiler can still balk if inlining would break the program or if it’s disallowed by global settings.
| |
|
| |
| MSVC’s strategy is therefore to treat <code>inline</code> (and methods defined in-class) as suggestions, and <code>__forceinline</code> as a stronger suggestion, but ultimately to rely on its internal heuristics and the <code>/Ob</code> setting. By default, in a release build (/O2 which implies /Ob2), MSVC will inline many small functions automatically. It will produce warnings if a <code>__forceinline</code> function cannot be inlined (so the developer knows the hint was not honored). The heuristics consider factors like the function’s size in IL instructions, the complexity and nesting of inlined code, etc., similar to other compilers. As an example, MSVC might inline a small getter or simple math function even if not marked inline, but it might refuse to inline an <code>inline</code>-marked function that contains a large loop or heavy logic.
| |
|
| |
| Additionally, MSVC supports '''Link-Time Code Generation (LTCG)''', enabled with <code>/GL</code> (compile for whole-program optimization) and <code>/LTCG</code> (link with whole-program optimization). When using LTCG, MSVC’s linker can inline functions across module boundaries (object files) because it has access to the entire program’s intermediate code. This is analogous to GCC/Clang’s LTO. In fact, MSVC will perform cross-module inlining under LTCG even for functions not marked inline, if profitable <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. This allows inlining of functions from libraries or other translation units that would not be visible to the compiler otherwise.
| |
|
| |
|
| == Challenges and Limitations == | | == Challenges and Limitations == |