Make LLVM fast again
The front page of the LLVM website proudly claims that:
Clang is an “LLVM native” C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles […]
I’m not sure whether this has been true in the past, but it certainly isn’t true now. Each LLVM release is a few percent slower than the last. LLVM 10 put some extra effort in this area, and somehow managed to make Rust compilation a whole 10% slower, for as yet unknown reasons.
One might argue that this is expected, as the optimization pipeline is continuously being improved, and more aggressive optimizations have higher compile-time requirements. While that may be true, I don’t think it is a desirable trend: For the most part, optimization is already “good enough”, and additional optimizations have the unfortunate trend to trade large compile-time increases for very minor (and/or very rare) improvements to run-time performance.
The larger problem is that LLVM simply does not track compile-time regressions. While LNT tracks run-time performance over time, the same is not being done for compile-time or memory usage. The end result is that patches introduce unintentional compile-time regressions that go unnoticed, and can no longer be easily identified by the time the next release rolls out.
Tracking LLVM compile-time performance
The first priority then is to make sure that we can identify regressions accurately and in a timely manner. Rust does this by running a set of benchmarks on every merge, with the data available on perf.rust-lang.org. Additionally, it is possible to run benchmarks against pull requests using the @rust-timer
bot. This helps evaluate changes that are intended to improve compile-time performance, or are suspected of having non-trivial compile-time cost.
I have set up a similar service for LLVM, with the results viewable at llvm-compile-time-tracker.com. Probably the most interesting part are the relative instructions and max-rss graphs, which show the percentual change relative to a baseline. I want to briefly describe the setup here.
The measurements are based on CTMark, which is a collection of some larger programs that are part of the LLVM test suite. These were added as part of a previous attempt to track compile-time.
For every tested commit, the programs are compiled in three different configurations: O3
, ReleaseThinLTO
and ReleaseLTO-g
. All of these use -O3
in three different LTO configurations (none, thin and fat), with the last one also enabling debuginfo generation.
Compilation and linking statistics are gathered using perf
(most of them), GNU time
(max-rss and wall-time) and size
(binary size). The following statistics are available:
instructions (stable and useful)
max-rss (stable and useful)
task-clock (way too noisy)
cycles (noisy)
branches (stable)
branch-misses (noisy)
wall-time (way too noisy)
size-total (completely stable)
size-text (completely stable)
size-data (completely stable)
size-bss (completely stable)
The most useful statistics are instructions, max-rss and size-total/size-text, and these are the only ones I really look at. “instructions” is a stable proxy metric for compile-time. Instructions retired is not a perfect metric, because it discounts issues like cache/memory latency, branch misprediction and ILP, but most of the performance problems affecting LLVM tend to be simpler than that.
The actual time metrics task-clock and wall-time are too noisy to be useful and also undergo “seasonal variation”. This could be mitigated by running benchmarks
Truncated by Planet PHP, read more at the original (another 18403 bytes)