Bit-Precise Verification of Neural Networks

Edoardo Manino

Deploying neural network components within safety-critical systems may increase the risk of catastrophic failure. Indeed, neural networks are often seen as black boxes that deliver great accuracy on average, but may unpredictably fail in some corner cases (adversarial examples, hallucinations). While the latter may be acceptable in non-critical applications, their presence is a case for concern in the aerospace, medical, and transportation domains.

Neural Network Verification

A principled solution for certifying the absence of failure modes is neural network verification [2]. In a nutshell, a neural network verifier is a tool that can take a neural network f and a safety specification P as inputs, and mathematically prove that f satisfies P in all possible scenarios. While safety specifications may vary, a classic example is robustness to adversarial attacks, where we want to ensure that

P := ||x - c|| < d → f(x) == f(c)

or, in other terms, for any input perturbation x that is close to the reference c, the output class of the neural network stays the same.

Modern neural network verification tools such as Marabou and αβ-CROWN can scale to neural networks of up to several million parameters. Furthermore, research has been ongoing on verificatio tools that can efficiently handle a larger variety of specification such as VeryDiff for equivalence checking, and FOSSIL for closed-loop stability.

The Challenge

However, what the aforementioned tools do not do is checking the safety of the actual implementation of the neural networks. Indeed, most tools pretend that a neural network is an idealised function f:R → R over a vector space of real numbers. While this may be a useful assumption to define gradient descent and other useful machine learning algorithms, it contains a critical flaw:

real-world hardware can only compute in finite precision

As such, proving that the safety specification P holds for an idealised neural network f:R → R does not tell us much about whether the actual implementation g ~ f is still safe. In fact, it has been repeatedly shown that the small numerical noise introduced by floating-point computation or integer quantisation (for low-power applications) can accumulate and invalidate any safety proof [3].

Worse than that, finite-precision effects are highly non-monotonic. That is, we cannot fix all issues we find by just increasing the precision. Multiple experiments have shown that increasing the number of bits used during the computation may, in fact, make things considerably worse [4].

The Solution

Reasoning on the safety of neural networks requires modelling the semantics of finite-precision computation. This is as complicated as it sounds: we need to take into account every floating-point rounding, every potential integer overflow, every detail of the software and hardware infrastructure we are using. To make a concrete example, take the following dot-product operation, which is at the core of the matrix multiplication algorithms for linear layers:

y = w1 * x1 + w2 * x2 + w3 * x3 + w4 * x4

If we were to execute it on a modern CPU, it may be a good idea to exploit the parallelism offered by SIMD instructions, and compute two halves of the addition above simultaneously. To do so, we take advantage of the fact that addition is associative and reorder our expression as follows:

y' = (w1 * x1 + w2 * x2) + (w3 * x3 + w4 * x4)

Unfortunately, floating-point computation is not exactly associative. Hence, while computing y' may be more efficient than the longer chain of additions of y, the results may be slightly different, i.e. y != y'.

Luckily, we already have a precise model of finite-precision semantics: software implementations! Indeed, low-level code already specifies most of the information we need, from the numerical type of variables to the order in which operations are performed [1]. We only need to trust the compiler and underlying hardware to not perform too many unsafe optimisations…

Open Research Directions

Extracting an accurate model of the neural network implementation is challenging. For safety-critical applications, it may be more practical to synthesize a “safe-by-design’’ software implementation from scratch. Doing so in an efficient way is a challenge on its own [3].

Once we have an accurate model, verifying its safety is not easy. Reasoning about finite-precision computation is inherently more complex [4], and existing verification tools may not be entirely trustworthy [1]. Better decision procedures and abstraction-refinement algorithms are needed.

References

[1] Manino, E., Farias, B., Menezes, R.S., Shmarov, F., Cordeiro, L. C., 2025. Floating-Point Neural Network Verification at the Software Level. https://arxiv.org/abs/2510.23389

[2] Cordeiro, L.C., Daggitt, M.L., Girard-Satabin, J., Isac, O., Johnson, T.T., Katz, G., Komendantskaya, E., Lemesle, A., Manino, E., Šinkarovs, A. and Wu, H., 2024, December. Neural Network Verification is a Programming Language Challenge. In _34th European Symposium on Programming.

[3] Matos, J.B.P., de Lima Filho, E.B., Bessa, I., Manino, E., Song, X. and Cordeiro, L.C., 2023. Counterexample guided neural network quantization refinement. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(4), pp.1121-1134.

[4] Giacobbe, M., Henzinger, T.A. and Lechner, M., 2020. How many bits does it take to quantize your neural network?. In Tools and Algorithms for the Construction and Analysis of Systems: 26th International Conference, TACAS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25–30, 2020, Proceedings, Part II 26 (pp. 79-97). Springer International Publishing.