Bots, Bureaucrats, Interface administrators, smwadministrator, smwcurator, smweditor, Administrators
2,558
edits
Timo.stripf (talk | contribs) No edit summary |
Timo.stripf (talk | contribs) No edit summary |
||
| Line 40: | Line 40: | ||
By default, GCC applies some inlining at <code>-O2</code> for functions marked <code>inline</code> (and certain trivial functions), and becomes more aggressive at <code>-O3</code>. In fact, the flag <code>-finline-functions</code> (enabled as part of <code>-O3</code>) tells GCC to attempt inlining of ''any'' “simple enough” functions, even those not marked with the <code>inline</code> keyword <ref name="gcc"/>. This means at <code>-O3</code> the compiler will use its heuristics to inline more liberally across the codebase, within limits designed to control code bloat. (These limits can be tweaked via internal parameters like the maximum permitted inline instruction growth <ref name="gcc_opt">Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref>, but such tuning is rarely needed.) The result is that <code>-O3</code> can inline many small or medium-sized functions automatically, while <code>-O2</code> is more conservative (focusing mostly on functions that are explicitly declared inline or very small). | By default, GCC applies some inlining at <code>-O2</code> for functions marked <code>inline</code> (and certain trivial functions), and becomes more aggressive at <code>-O3</code>. In fact, the flag <code>-finline-functions</code> (enabled as part of <code>-O3</code>) tells GCC to attempt inlining of ''any'' “simple enough” functions, even those not marked with the <code>inline</code> keyword <ref name="gcc"/>. This means at <code>-O3</code> the compiler will use its heuristics to inline more liberally across the codebase, within limits designed to control code bloat. (These limits can be tweaked via internal parameters like the maximum permitted inline instruction growth <ref name="gcc_opt">Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref>, but such tuning is rarely needed.) The result is that <code>-O3</code> can inline many small or medium-sized functions automatically, while <code>-O2</code> is more conservative (focusing mostly on functions that are explicitly declared inline or very small). | ||
GCC also provides the '''<code>always_inline</code> attribute''' to force inlining. A function declared with <code>__attribute__((always_inline))</code> (and usually also marked <code>inline</code>) will be inlined regardless of the compiler’s normal heuristics '''and even if optimizations are off''' <ref>Function Attributes - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Function-Attributes.html</ref>. In other words, this attribute directs GCC to bypass any cost-benefit analysis for that function. According to GCC’s documentation and source, <code>always_inline</code> causes the compiler to ignore even commands like <code>-fno-inline</code> and to inline the function without regard to size limits (it will even inline functions using constructs like alloca, which ordinary inlining might not allow) <ref>c - what “inline '''attribute'''((always_inline))” means in the function? - Stack Overflow https://stackoverflow.com/questions/22767523/what-inline-attribute-always-inline-means-in-the-function</ref>. This attribute is useful for cases where the programmer is certain that inlining is critical (for example, a performance-sensitive function that must not have call overhead, or functions that must be inlined for correctness in some low-level code). However, misuse of <code>always_inline</code> can lead to the aforementioned problems of code bloat and cache issues if applied indiscriminately. (If for some reason the compiler cannot inline a function marked <code>always_inline</code> – e.g., a recursive call or other unavoidable situation – GCC will emit an error or warning, since it '''must''' honor the attribute’s contract.) | GCC also provides the '''<code>always_inline</code> attribute''' to force inlining. A function declared with <code>__attribute__((always_inline))</code> (and usually also marked <code>inline</code>) will be inlined regardless of the compiler’s normal heuristics '''and even if optimizations are off''' <ref name="gcc_func">Function Attributes - Using the GNU Compiler Collection (GCC) https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Function-Attributes.html</ref>. In other words, this attribute directs GCC to bypass any cost-benefit analysis for that function. According to GCC’s documentation and source, <code>always_inline</code> causes the compiler to ignore even commands like <code>-fno-inline</code> and to inline the function without regard to size limits (it will even inline functions using constructs like alloca, which ordinary inlining might not allow) <ref>c - what “inline '''attribute'''((always_inline))” means in the function? - Stack Overflow https://stackoverflow.com/questions/22767523/what-inline-attribute-always-inline-means-in-the-function</ref>. This attribute is useful for cases where the programmer is certain that inlining is critical (for example, a performance-sensitive function that must not have call overhead, or functions that must be inlined for correctness in some low-level code). However, misuse of <code>always_inline</code> can lead to the aforementioned problems of code bloat and cache issues if applied indiscriminately. (If for some reason the compiler cannot inline a function marked <code>always_inline</code> – e.g., a recursive call or other unavoidable situation – GCC will emit an error or warning, since it '''must''' honor the attribute’s contract.) | ||
It’s worth noting that in C++ programs, GCC automatically treats any function defined ''inside a class definition'' as inline (this is mandated by the C++ standard). GCC will attempt to inline such functions even without the <code>inline</code> keyword <ref name="gcc"/>. Also, if a function is declared <code>inline</code>, GCC still emits a standalone function definition for it ''unless'' it can prove that every call was inlined and no external reference is needed. This means an inline function might not actually be inlined everywhere, but the one-definition rule is respected by outputting one copy if needed (you can prevent outputting unused inline functions with the <code>-fkeep-inline-functions</code> flag, for instance <ref name="gcc"/>). | It’s worth noting that in C++ programs, GCC automatically treats any function defined ''inside a class definition'' as inline (this is mandated by the C++ standard). GCC will attempt to inline such functions even without the <code>inline</code> keyword <ref name="gcc"/>. Also, if a function is declared <code>inline</code>, GCC still emits a standalone function definition for it ''unless'' it can prove that every call was inlined and no external reference is needed. This means an inline function might not actually be inlined everywhere, but the one-definition rule is respected by outputting one copy if needed (you can prevent outputting unused inline functions with the <code>-fkeep-inline-functions</code> flag, for instance <ref name="gcc"/>). | ||
| Line 48: | Line 48: | ||
=== Clang/LLVM === | === Clang/LLVM === | ||
Clang (the C/C++ frontend to LLVM) handles inlining in a manner very similar to GCC. It supports the C++ <code>inline</code> keyword in the same way – as a hint with linkage implications – and it implements GCC-style attributes like <code>always_inline</code>. In practice, Clang’s optimizer will inline functions under optimization levels based on LLVM’s inlining heuristics. Like GCC, at <code>-O0</code> Clang does not perform any inlining (unless forced via always_inline). At <code>-O1</code> and above, it will inline certain calls that it decides are profitable. Clang also recognizes the <code>-finline-functions</code> flag (and enables it at <code>-O3</code>), which allows more aggressive inlining of functions even if they are not marked inline <ref>Clang command line argument reference | Clang (the C/C++ frontend to LLVM) handles inlining in a manner very similar to GCC. It supports the C++ <code>inline</code> keyword in the same way – as a hint with linkage implications – and it implements GCC-style attributes like <code>always_inline</code>. In practice, Clang’s optimizer will inline functions under optimization levels based on LLVM’s inlining heuristics. Like GCC, at <code>-O0</code> Clang does not perform any inlining (unless forced via always_inline). At <code>-O1</code> and above, it will inline certain calls that it decides are profitable. Clang also recognizes the <code>-finline-functions</code> flag (and enables it at <code>-O3</code>), which allows more aggressive inlining of functions even if they are not marked inline <ref name="clang_cli">Clang command line argument reference https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. It additionally supports an option <code>-finline-hint-functions</code> which restricts automatic inlining to only those functions that are declared <code>inline</code> (this is analogous to MSVC’s strategy under <code>/Ob1</code>) <ref name="clang_cli">Clang command line argument reference https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. In practice, Clang’s default at <code>-O2</code> is to inline functions it thinks are worthwhile (whether or not they were marked inline), and at <code>-O3</code> it increases the aggressiveness similar to GCC. | ||
For forcing inline, Clang honors <code>__attribute__((always_inline))</code> on functions just like GCC. If a function is marked always_inline, Clang will emit it inline whenever possible and will issue an error if it cannot (to ensure the function doesn’t end up out-of-line). There is no distinct Clang-specific keyword for this, but Clang in MSVC compatibility mode will accept <code>__forceinline</code> as an alias (since it defines <code>_MSC_VER</code> compatibility). Under the hood, both GCC and Clang attach an internal “always inline” property to such functions in the intermediate representation, which the optimizer’s inline pass will obey strictly. As with GCC, using this power should be done judiciously – Clang’s documentation notes that overusing forced inlining can result in larger code with little benefit, similar to any other compiler. | For forcing inline, Clang honors <code>__attribute__((always_inline))</code> on functions just like GCC. If a function is marked always_inline, Clang will emit it inline whenever possible and will issue an error if it cannot (to ensure the function doesn’t end up out-of-line). There is no distinct Clang-specific keyword for this, but Clang in MSVC compatibility mode will accept <code>__forceinline</code> as an alias (since it defines <code>_MSC_VER</code> compatibility). Under the hood, both GCC and Clang attach an internal “always inline” property to such functions in the intermediate representation, which the optimizer’s inline pass will obey strictly. As with GCC, using this power should be done judiciously – Clang’s documentation notes that overusing forced inlining can result in larger code with little benefit, similar to any other compiler. | ||
| Line 54: | Line 54: | ||
One difference to mention is that Clang’s diagnostics and reports can help understand inlining decisions. For example, Clang has flags like <code>-Rpass=inline</code> and <code>-Rpass-missed=inline</code> which, at compile time, can report which functions were inlined or not inlined and why. This can be useful to tune code for inlining with Clang. The heuristics themselves (function size thresholds, etc.) are continuously refined in LLVM’s development, but generally align with the goal of balancing performance gain against code growth. | One difference to mention is that Clang’s diagnostics and reports can help understand inlining decisions. For example, Clang has flags like <code>-Rpass=inline</code> and <code>-Rpass-missed=inline</code> which, at compile time, can report which functions were inlined or not inlined and why. This can be useful to tune code for inlining with Clang. The heuristics themselves (function size thresholds, etc.) are continuously refined in LLVM’s development, but generally align with the goal of balancing performance gain against code growth. | ||
'''Summary (Clang):''' Clang uses the same mechanisms as GCC for inlining – the <code>inline</code> keyword, the <code>always_inline</code> attribute for forcing, and optimization-level-dependent heuristics. At <code>-O3</code> it inlines more aggressively (<code>-finline-functions</code>), while at lower levels it inlines more conservatively or only inline-marked functions <ref>Clang command line argument reference | '''Summary (Clang):''' Clang uses the same mechanisms as GCC for inlining – the <code>inline</code> keyword, the <code>always_inline</code> attribute for forcing, and optimization-level-dependent heuristics. At <code>-O3</code> it inlines more aggressively (<code>-finline-functions</code>), while at lower levels it inlines more conservatively or only inline-marked functions <ref name="clang_cli">Clang command line argument reference https://clang.llvm.org/docs/ClangCommandLineReference.html</ref>. Its behavior is largely consistent with GCC’s in this area, given that both use similar inline expansion strategies. | ||
=== MSVC (Microsoft Visual C++) === | === MSVC (Microsoft Visual C++) === | ||
MSVC’s approach to inlining in C++ relies on both language keywords and compiler settings. In MSVC, the <code>inline</code> keyword (or its synonym <code>__inline</code>) is also a hint to suggest that a function be inlined. However, as with other compilers, this is not a command – MSVC will perform inline expansion only if it judges the optimization to be worthwhile <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. The MSVC compiler evaluates the size and complexity of the function, and certain usage patterns, before deciding to inline. It will not inline functions in some cases (for example, if a function’s address is taken or if the function is too large or has varargs, it won’t be inlined). By default, MSVC’s optimization settings control how much inlining is done: | MSVC’s approach to inlining in C++ relies on both language keywords and compiler settings. In MSVC, the <code>inline</code> keyword (or its synonym <code>__inline</code>) is also a hint to suggest that a function be inlined. However, as with other compilers, this is not a command – MSVC will perform inline expansion only if it judges the optimization to be worthwhile <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. The MSVC compiler evaluates the size and complexity of the function, and certain usage patterns, before deciding to inline. It will not inline functions in some cases (for example, if a function’s address is taken or if the function is too large or has varargs, it won’t be inlined). By default, MSVC’s optimization settings control how much inlining is done: | ||
* '''<code>/Ob0</code>''' – ''No inlining''. This is the default in debug builds (<code>/Od</code>). The compiler does not inline any function, regardless of the inline keyword. This setting is used to make debugging easier and ensure the binary closely follows the written code structure.<br /> | * '''<code>/Ob0</code>''' – ''No inlining''. This is the default in debug builds (<code>/Od</code>). The compiler does not inline any function, regardless of the inline keyword. This setting is used to make debugging easier and ensure the binary closely follows the written code structure.<br /> | ||
| Line 68: | Line 68: | ||
* '''<code>/Ob3</code>''' – ''Aggressive inlining''. Introduced in Visual Studio 2019, <code>/Ob3</code> is an undocumented setting that goes beyond <code>/Ob2</code> in aggressiveness <ref>/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. It uses the same inlining criteria but increases the compiler’s willingness to inline. This might inline even larger functions or more call sites than <code>/Ob2</code> would. (It’s not available directly in the IDE project settings; it must be set manually, and it’s considered experimental.) | * '''<code>/Ob3</code>''' – ''Aggressive inlining''. Introduced in Visual Studio 2019, <code>/Ob3</code> is an undocumented setting that goes beyond <code>/Ob2</code> in aggressiveness <ref>/Ob (Inline Function Expansion) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170</ref>. It uses the same inlining criteria but increases the compiler’s willingness to inline. This might inline even larger functions or more call sites than <code>/Ob2</code> would. (It’s not available directly in the IDE project settings; it must be set manually, and it’s considered experimental.) | ||
For MSVC-specific inline control, the '''<code>__forceinline</code>''' keyword is provided. This keyword attempts to '''override the compiler’s cost analysis''' and force the function to be inlined wherever possible <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. A function declared <code>__forceinline</code> in MSVC is treated with a much stronger inlining preference than a normal <code>inline</code>. In effect, it tells the compiler “I, the programmer, am sure that inlining this is critical, so do it even if your heuristics disagree.” MSVC will make a very strong effort to inline such a function. However – importantly – MSVC still might not inline a <code>__forceinline</code> function in certain situations. The documentation explicitly states that there is ''no guarantee'' a function will be inlined, even with <code>__forceinline</code> <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. For example, if you compile with <code>/Ob0</code> (inlining disabled), even <code>__forceinline</code> functions won’t be inlined <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Similarly, if a function cannot physically be inlined (e.g., it’s recursive without a clear depth limit, or its address is used somewhere, or it has incompatible exception handling settings <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>), MSVC will emit it out-of-line despite the <code>__forceinline</code>. What <code>__forceinline</code> really does is ''lower the threshold'' for inlining decisions dramatically and bypass some checks, but the compiler can still balk if inlining would break the program or if it’s disallowed by global settings. | For MSVC-specific inline control, the '''<code>__forceinline</code>''' keyword is provided. This keyword attempts to '''override the compiler’s cost analysis''' and force the function to be inlined wherever possible <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. A function declared <code>__forceinline</code> in MSVC is treated with a much stronger inlining preference than a normal <code>inline</code>. In effect, it tells the compiler “I, the programmer, am sure that inlining this is critical, so do it even if your heuristics disagree.” MSVC will make a very strong effort to inline such a function. However – importantly – MSVC still might not inline a <code>__forceinline</code> function in certain situations. The documentation explicitly states that there is ''no guarantee'' a function will be inlined, even with <code>__forceinline</code> <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. For example, if you compile with <code>/Ob0</code> (inlining disabled), even <code>__forceinline</code> functions won’t be inlined <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Similarly, if a function cannot physically be inlined (e.g., it’s recursive without a clear depth limit, or its address is used somewhere, or it has incompatible exception handling settings <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>), MSVC will emit it out-of-line despite the <code>__forceinline</code>. What <code>__forceinline</code> really does is ''lower the threshold'' for inlining decisions dramatically and bypass some checks, but the compiler can still balk if inlining would break the program or if it’s disallowed by global settings. | ||
MSVC’s strategy is therefore to treat <code>inline</code> (and methods defined in-class) as suggestions, and <code>__forceinline</code> as a stronger suggestion, but ultimately to rely on its internal heuristics and the <code>/Ob</code> setting. By default, in a release build (/O2 which implies /Ob2), MSVC will inline many small functions automatically. It will produce warnings if a <code>__forceinline</code> function cannot be inlined (so the developer knows the hint was not honored). The heuristics consider factors like the function’s size in IL instructions, the complexity and nesting of inlined code, etc., similar to other compilers. As an example, MSVC might inline a small getter or simple math function even if not marked inline, but it might refuse to inline an <code>inline</code>-marked function that contains a large loop or heavy logic. | MSVC’s strategy is therefore to treat <code>inline</code> (and methods defined in-class) as suggestions, and <code>__forceinline</code> as a stronger suggestion, but ultimately to rely on its internal heuristics and the <code>/Ob</code> setting. By default, in a release build (/O2 which implies /Ob2), MSVC will inline many small functions automatically. It will produce warnings if a <code>__forceinline</code> function cannot be inlined (so the developer knows the hint was not honored). The heuristics consider factors like the function’s size in IL instructions, the complexity and nesting of inlined code, etc., similar to other compilers. As an example, MSVC might inline a small getter or simple math function even if not marked inline, but it might refuse to inline an <code>inline</code>-marked function that contains a large loop or heavy logic. | ||
Additionally, MSVC supports '''Link-Time Code Generation (LTCG)''', enabled with <code>/GL</code> (compile for whole-program optimization) and <code>/LTCG</code> (link with whole-program optimization). When using LTCG, MSVC’s linker can inline functions across module boundaries (object files) because it has access to the entire program’s intermediate code. This is analogous to GCC/Clang’s LTO. In fact, MSVC will perform cross-module inlining under LTCG even for functions not marked inline, if profitable <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. This allows inlining of functions from libraries or other translation units that would not be visible to the compiler otherwise. | Additionally, MSVC supports '''Link-Time Code Generation (LTCG)''', enabled with <code>/GL</code> (compile for whole-program optimization) and <code>/LTCG</code> (link with whole-program optimization). When using LTCG, MSVC’s linker can inline functions across module boundaries (object files) because it has access to the entire program’s intermediate code. This is analogous to GCC/Clang’s LTO. In fact, MSVC will perform cross-module inlining under LTCG even for functions not marked inline, if profitable <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. This allows inlining of functions from libraries or other translation units that would not be visible to the compiler otherwise. | ||
'''Summary (MSVC):''' The <code>inline</code> keyword and <code>_inline</code> are hints, and C++ inline methods are considered, but the compiler decides based on <code>/Ob</code> level and its analysis <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Use <code>__forceinline</code> to push the compiler harder to inline a function (with the understanding it’s still not absolute) <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. The <code>/Ob2</code> setting (on by default in /O2 builds) allows automatic inlining of suitable functions, while <code>/Ob1</code> requires explicit inline, and <code>/Ob0</code> turns off inlining. Whole-program optimization (/LTCG) can inline across compilation units at link time. | '''Summary (MSVC):''' The <code>inline</code> keyword and <code>_inline</code> are hints, and C++ inline methods are considered, but the compiler decides based on <code>/Ob</code> level and its analysis <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Use <code>__forceinline</code> to push the compiler harder to inline a function (with the understanding it’s still not absolute) <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. The <code>/Ob2</code> setting (on by default in /O2 builds) allows automatic inlining of suitable functions, while <code>/Ob1</code> requires explicit inline, and <code>/Ob0</code> turns off inlining. Whole-program optimization (/LTCG) can inline across compilation units at link time. | ||
| Line 122: | Line 122: | ||
* '''Compiler heuristics:''' Modern compilers treat the inlining decision as an optimization problem. They often set a '''“budget” for code growth''' and try to inline the most beneficial calls without exceeding that budget. This is sometimes modeled like a knapsack problem – choosing which function calls to inline to maximize estimated performance gain for a given allowable increase in code size. The heuristics involve metrics such as the size of the function (in internal intermediate representation instructions), the number of call sites, and the estimated frequency of each call. For instance, a call inside a loop that runs thousands of times is more profitable to inline than a call in a one-off initialization function. Compilers also consider whether inlining a function will enable ''subsequent optimizations'': if inlining a function exposes a constant or a branch that can simplify the code, the compiler gives that more weight. These factors are combined into a cost/benefit analysis for each call site. If the estimated benefit (e.g., saved cycles) outweighs the cost (e.g., added instructions and bytes of code), the call is inlined – otherwise it’s left as a regular call <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. Different compilers (and even different versions of the same compiler) use different formulas and thresholds for this. For example, GCC and Clang assign a certain “cost” to the function based on IR instruction count and adjust it if the function is marked <code>inline</code> (which gives a hint benefit) or if the call is in a hot path, etc. MSVC similarly has internal thresholds and will inline more aggressively at <code>/Ob2</code> than at <code>/Ob1</code>. These heuristics are continually refined to produce good results across typical programs. | * '''Compiler heuristics:''' Modern compilers treat the inlining decision as an optimization problem. They often set a '''“budget” for code growth''' and try to inline the most beneficial calls without exceeding that budget. This is sometimes modeled like a knapsack problem – choosing which function calls to inline to maximize estimated performance gain for a given allowable increase in code size. The heuristics involve metrics such as the size of the function (in internal intermediate representation instructions), the number of call sites, and the estimated frequency of each call. For instance, a call inside a loop that runs thousands of times is more profitable to inline than a call in a one-off initialization function. Compilers also consider whether inlining a function will enable ''subsequent optimizations'': if inlining a function exposes a constant or a branch that can simplify the code, the compiler gives that more weight. These factors are combined into a cost/benefit analysis for each call site. If the estimated benefit (e.g., saved cycles) outweighs the cost (e.g., added instructions and bytes of code), the call is inlined – otherwise it’s left as a regular call <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. Different compilers (and even different versions of the same compiler) use different formulas and thresholds for this. For example, GCC and Clang assign a certain “cost” to the function based on IR instruction count and adjust it if the function is marked <code>inline</code> (which gives a hint benefit) or if the call is in a hot path, etc. MSVC similarly has internal thresholds and will inline more aggressively at <code>/Ob2</code> than at <code>/Ob1</code>. These heuristics are continually refined to produce good results across typical programs. | ||
* '''Profile-guided inlining:''' One way to improve inlining decisions is to use ''profile-guided optimization (PGO)''. PGO involves compiling the program, running it on sample workloads to gather actual execution frequencies of functions and branches, and then feeding that profile data back into a second compilation. With PGO, the compiler knows which functions are actually hot (called frequently in practice) and which call sites are executed often. This information can greatly inform the inlining heuristics – for example, the compiler might inline a function it knows is called millions of times a second, but not inline another function that is rarely used, even if they are similar in size. Using PGO, compilers can be more bold about inlining hot paths and avoid code bloat on cold paths. That said, the gains from PGO-based inlining, while real, are often in the single-digit percentages of performance <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. It helps the compiler make more informed decisions, but it doesn’t fundamentally eliminate the trade-offs. In some cases PGO might even cause slight regressions if the profile data misleads the heuristics (e.g., if the runtime usage differs from the training run). Still, PGO is a valuable tool for squeezing out extra performance by fine-tuning inlining and other optimizations based on actual usage. | * '''Profile-guided inlining:''' One way to improve inlining decisions is to use ''profile-guided optimization (PGO)''. PGO involves compiling the program, running it on sample workloads to gather actual execution frequencies of functions and branches, and then feeding that profile data back into a second compilation. With PGO, the compiler knows which functions are actually hot (called frequently in practice) and which call sites are executed often. This information can greatly inform the inlining heuristics – for example, the compiler might inline a function it knows is called millions of times a second, but not inline another function that is rarely used, even if they are similar in size. Using PGO, compilers can be more bold about inlining hot paths and avoid code bloat on cold paths. That said, the gains from PGO-based inlining, while real, are often in the single-digit percentages of performance <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. It helps the compiler make more informed decisions, but it doesn’t fundamentally eliminate the trade-offs. In some cases PGO might even cause slight regressions if the profile data misleads the heuristics (e.g., if the runtime usage differs from the training run). Still, PGO is a valuable tool for squeezing out extra performance by fine-tuning inlining and other optimizations based on actual usage. | ||
* '''Limitations and overrides:''' There are practical limits to inlining. Compilers will not inline a function in certain scenarios, no matter what: for example, a recursive function usually can’t be fully inlined (it would lead to infinite code expansion), although some compilers will unroll a recursion a fixed number of times if marked inline. If a function’s address is taken (meaning a pointer to the function is used), most compilers have to generate an actual function body for it, and they might not inline all calls either because the function now needs to exist independently <ref>Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Virtual function calls in C++ cannot be inlined unless the compiler can deduce the exact target (e.g., the object’s dynamic type is known or the function is devirtualized); thus, inlining across a polymorphic call often requires whole-program analysis or final devirtualization. Additionally, as mentioned earlier, compilers impose certain limits to avoid ''pathological code expansion'': GCC, for instance, has parameters like '''<code>inline-unit-growth</code>''' and '''<code>max-inline-insns-single</code>''' that prevent inlining from blowing up the code more than a certain factor <ref name="gcc_opt">Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref> <ref name="gcc_opt">Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref>. These ensure that even under <code>-O3</code>, the compiler won’t inline everything blindly and will stop if the function grows too large due to inlining. | * '''Limitations and overrides:''' There are practical limits to inlining. Compilers will not inline a function in certain scenarios, no matter what: for example, a recursive function usually can’t be fully inlined (it would lead to infinite code expansion), although some compilers will unroll a recursion a fixed number of times if marked inline. If a function’s address is taken (meaning a pointer to the function is used), most compilers have to generate an actual function body for it, and they might not inline all calls either because the function now needs to exist independently <ref name="ms_inline">Inline Functions (C++) | Microsoft Learn https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170</ref>. Virtual function calls in C++ cannot be inlined unless the compiler can deduce the exact target (e.g., the object’s dynamic type is known or the function is devirtualized); thus, inlining across a polymorphic call often requires whole-program analysis or final devirtualization. Additionally, as mentioned earlier, compilers impose certain limits to avoid ''pathological code expansion'': GCC, for instance, has parameters like '''<code>inline-unit-growth</code>''' and '''<code>max-inline-insns-single</code>''' that prevent inlining from blowing up the code more than a certain factor <ref name="gcc_opt">Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref> <ref name="gcc_opt">Optimize Options (Using the GNU Compiler Collection (GCC)) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</ref>. These ensure that even under <code>-O3</code>, the compiler won’t inline everything blindly and will stop if the function grows too large due to inlining. | ||
* '''Link-time optimization (LTO):''' Traditional compilation limits inlining to within a single source file (translation unit) because the compiler can only see one .c/.cpp file at a time. '''Link-time optimization''' lifts this restriction by allowing inlining (and other optimizations) to occur across translation unit boundaries at link time. With LTO enabled (for example, <code>gcc -flto</code> or MSVC’s <code>/LTCG</code>), the compiler effectively sees the entire program or library, so it can inline functions from one module into another. This means even if you didn’t mark a function <code>inline</code> or put it in a header, LTO might inline it if it’s beneficial. For instance, a small utility function defined in one source file and called in another could be inlined during LTO, whereas without LTO that call would remain a regular function call (because the compiler wouldn’t have seen the function’s body while compiling the caller). LTO thus increases the scope of inlining and can yield significant performance improvements for codebases split across many files. One common use of LTO-driven inlining is for library functions: the compiler might inline standard library functions or other library calls if LTO is enabled and it has the library’s code. The downside is that LTO can make compile times (or link times) longer and increase memory usage during compilation, due to the larger optimization scope. Also, the same caution applies: even with whole-program visibility, the compiler still uses heuristics to decide what to inline <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. Having more opportunities to inline (thanks to LTO) doesn’t mean it will inline everything; it still must choose carefully to avoid overwhelming code bloat or slower performance from cache misses. | * '''Link-time optimization (LTO):''' Traditional compilation limits inlining to within a single source file (translation unit) because the compiler can only see one .c/.cpp file at a time. '''Link-time optimization''' lifts this restriction by allowing inlining (and other optimizations) to occur across translation unit boundaries at link time. With LTO enabled (for example, <code>gcc -flto</code> or MSVC’s <code>/LTCG</code>), the compiler effectively sees the entire program or library, so it can inline functions from one module into another. This means even if you didn’t mark a function <code>inline</code> or put it in a header, LTO might inline it if it’s beneficial. For instance, a small utility function defined in one source file and called in another could be inlined during LTO, whereas without LTO that call would remain a regular function call (because the compiler wouldn’t have seen the function’s body while compiling the caller). LTO thus increases the scope of inlining and can yield significant performance improvements for codebases split across many files. One common use of LTO-driven inlining is for library functions: the compiler might inline standard library functions or other library calls if LTO is enabled and it has the library’s code. The downside is that LTO can make compile times (or link times) longer and increase memory usage during compilation, due to the larger optimization scope. Also, the same caution applies: even with whole-program visibility, the compiler still uses heuristics to decide what to inline <ref>c++ - Link-time optimization and inline - Stack Overflow https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline</ref>. Having more opportunities to inline (thanks to LTO) doesn’t mean it will inline everything; it still must choose carefully to avoid overwhelming code bloat or slower performance from cache misses. | ||
* '''Alternate strategies:''' In cases where inlining is not beneficial or possible, other optimizations may be preferable. For example, compilers might use ''outline'' strategies (the opposite of inline) to reduce code size – i.e., they might decide ''not'' to inline to keep code small (especially at <code>-Os</code> or in constrained environments). Another strategy is '''partial inlining''', where the compiler might extract and inline only a portion of a function. GCC introduced something along these lines (sometimes called “IPA-split” or partial inlining) where it tries to inline the hot parts of a function into callers and keep the cold parts out-of-line, as a compromise. This is advanced and not directly under user control, but it shows that inlining doesn’t have to be all-or-nothing. | * '''Alternate strategies:''' In cases where inlining is not beneficial or possible, other optimizations may be preferable. For example, compilers might use ''outline'' strategies (the opposite of inline) to reduce code size – i.e., they might decide ''not'' to inline to keep code small (especially at <code>-Os</code> or in constrained environments). Another strategy is '''partial inlining''', where the compiler might extract and inline only a portion of a function. GCC introduced something along these lines (sometimes called “IPA-split” or partial inlining) where it tries to inline the hot parts of a function into callers and keep the cold parts out-of-line, as a compromise. This is advanced and not directly under user control, but it shows that inlining doesn’t have to be all-or-nothing. | ||
edits