Misconceptions About Loops, or: Static Code Analysis is Hard


New member

When thinking about loops in programming languages, they often get simplified down to a conditions section and a body, but this belies the dizzying complexity that emerges when considering loop edge cases within the context of static analysis. A paper titled Misconceptions about Loops in C by [Martin Brain] and colleagues as presented to SOAP 2024 conference goes through a whole list of false assumptions when it comes to loops, including for languages other than C. Perhaps most interesting is the conclusion that these ‘edge cases’ are in fact a lot more common than generally assumed, courtesy of how creative languages and their users can be when writing their code, with or without dragging in the meta-language of C’s preprocessor.

Assumptions like loop equivalence can fall apart when considering the CFG ( control flow graph) interpretation versus a parse tree one where the former may e.g. merge loops. There are also doozies like assuming that the loop body will always exist, that the first instruction(s) in a loop are always the entry point, and the horrors of estimating loop exits in the context of labels, inlined functions and more. Some languages have specific loop control flow features that differ from C (e.g. Python’s for/else and Ada’s loop), all of which affect a static analysis.

Ultimately, writing a good static analysis tool is hard, and there are plenty of cases where it’s likely to trip up and give an invalid result. A language which avoids ambiguity (e.g. Ada) helps immensely here, but for other languages it helps to write your code as straightforward as possible to give the static analysis tool a fighting chance, or just get really good at recognizing confused static analysis tool noises.

(Heading image: Control flow merges can create multiple loop entry
edges (Credit: Martin Brand, et al., SOAP 2024) )