Personally, I'm thinking Z3/Nim integration could also be a huge boon for certain embedded applications. An open source integrated SMT solver could even be a "killer app" for Nim in embedded contexts.
There tends to be a lot of asynchronous communication with peripherals on embedded devices. They're more akin to a distributed system than a regular application. A linear theorem solver could let you write models that verify if you correctly handle all failure conditions. Or perhaps, that application/firmware logic always satisfies a certain set of conditions. An example good example could be a control system for an industrial motor. Another would be PLC logic that are normally programmed in a non-turing complete logic languages. Both of these must ensure certain constraints are always maintained, e.g. could you use Z3 to ensure that a certain control line is always toggled after every control feedback loop given an (annotated) program.
I'm not sure of the details, or how exactly Z3 integrates into Nim and if it's user accessible. It's more of an inkling. Does anyone else think the ideas in my previous paragraph could be possible? Or how would one go about setting up such checks?
but in practical terms could it apply to general application logic?
Oh you bet, definitely. Haven't heard of ZetZ before but https://frama-c.com/ and Ada Spark are the obvious inspirations for DrNim.
Frama-C is a PITA both in terms of building/installing and usage (beyond ridiculously minimal usage). ZZ is hardly alpha and largely written in Rust and C++ (a big minus) and seems to be (yet?) undecided which formal route to go (e.g. symb. exec. vs solver). About the only property I like is that it's not limited to Z3.
Ada/Spark OTH is a route that will serve us well for orientation.
Ada/Spark OTH is a route that will serve us well for orientation.
That makes sense. The given 'sort' example in DrNim seems to be a useful approach. I'm still only partly sure how it could be used for larger projects. If Nim's DrNim annotations could be made to have similar power/usage as Ada/Spark, then that'd be compelling. Though, to be clear, I've never used Ada/Spark so I'm not sure how it'd be used overall just that it seems great for proving certain properties in embedded software contexts (e.g. avoiding integer overflows in rocket control systems).
Btw, there have been found north of 1000 bugs in Z3 and other solvers are unlikely to be much better.
Maybe we need a solver written in Ada, F star or at least in Nim ...
Btw, there have been found north of 1000 bugs in Z3 and other solvers are unlikely to be much better. > Maybe we need a solver written in Ada, F star or at least in Nim ...
That's only really an issue if the bugs compound. Bugs in Z3 as a language proofer should mostly be independent of bugs in Nim code since the Z3 annotations create a separate proof from the annotations. Say there's a 1/100 chance of a bug occurring in some amount of Nim code, multiply that by a 1/100 likelihood of a bug in the Z3 checker itself causing it to falsely verify a bug as ok gives a ~1/10,000 chance of the bug in the Nim code making it through undetected. That's opposed to a 2/100 chance if the z3/DrNim annotations modified the code or could induce bugs in the Nim code. Those numbers aren't accurate and miss a lot of subtleties in how you'd calculate the real odds. Still they serve to showcase how bug probabilities in the two systems would interact based on ballpark numbers.
It's not going to be 100% accurate, but a reduction on the order of 10,00x, 100,00x, or more on the chances of bugs making it through is still a fantastic improvement. Especially for the hard to spot overflow, index bounds, etc errors that'd cause an issue in an embedded system (e.g. a rocket exploding due to an integer overflow).
Related RFC Correct-by-Construction Nim programs
https://github.com/nim-lang/RFCs/issues/222
Motivating examples
You are designing software for one of the following scenarios:
But that approach invariably ends tragically because it turns a principle and binary question into a quantitative one and then sooner rather than later "negotiations" start and a while later one argues "it's good enough".
I'm glad you state your opinions very frankly, but nevertheless you haven't convinced me of anything. The quest for mathematically style perfection is great, but at the end of the day you have to build real systems with real limitations in resources. It's highly unlikely any prover system itself perfect, even in the Ada/Spark world. To wait for one that is means you'll probably end up never using one at all (if it's even theoretically possible, I'd invoke Godel's theorem here as a caution). Every highly reliable real world system in the end has a it's "good enough" point; more accurately it's a high degree of confidence has been achieved by thorough investigation.
I've done work in multiple engineering disciplines and taking a quantitative approach as opposed to the binary approach is what's utilized to verify real world systems. Take "explosion proof rated" electronics, for example, the ISO standard considers that the a circuit under consideration for certification must only be proofed against 1 fault, or 2 faults at a time, depending on the desired certification level. In theory, you could have 3 faults that lead to an explosive ignition, but the ISO standards body and insurers consider 2-faults to be the maximum based on the likelihood of two faults occurring at the same time to be sufficient proof.
There's a quote from the AWS paper I link in the next paragraph that shows that norm of thinking in engineering systems: “To a first approximation, we can say that accidents are almost always the result of incorrect estimates of the likelihood of one or more things.” -C. Michael Holloway, NASA. He is a member of NASA's "Safety Critical Avionics Systems Branch" and appears to have dozens of publications discussing safety critical systems (here's one using probabilities to consider formal verification methods for real time controllers).
Explanation: 'correct' has a clear meaning, namely a) congruency to spec, and b) absence of errors. To put it in an image, a ruler is worthless if it shows correct measurements in 99% of usage.
A prover - the very core of verification - having bugs, and even plenty of them, is a tragedy and turns certainty into "highly likely". Utterly inacceptable.
As a counter-example to this reasoning: AWS is known for using TLA+ for verifying their distributed system's and they claim it's part of their tremendous success. TLA+ defaults to Z3 so likely AWS has used Z3 (with bugs and all) to great effect to build one of the most reliable compute system ever built. Even a ruler that's only accurate in 99% of one's use cases is still vastly better than eyeballing it and being only accurate enough in 40% or even 80% of those cases. Especially if that 99.99% ruler will take years to get here.
To be clear, I'm talking about designing high reliability embedded distributed systems, which means using the language that safety critical fields employ, not mathematical definition in computer science research terms or a mathematical proof. Even with a perfect proofer and perfect program logic, you still need to design algorithms for handling processor failures, bit flips, etc. If a Z3 SMT solver can help me spot logic bugs in my algorithms to handle those failures, that's all the less bugs that could cause issues. I'll still need to verify the system with several other methods even if proofed with a perfect proofer, because the assumptions input into the theorem/specification could be completely incorrect for a real situation. As such, I'm interested in pragmatic tools that can be used now (well, or in six months).
It's highly unlikely any prover system is itself perfect, even in the Ada/Spark world.
It's not.
But, AdaCore does care a great deal about fixing bugs in SPARK. I reported a bug once (I think I crashed the solver) and as I recall they fixed it quickly. I'm hardly a hard-core Ada/SPARK user. I'd like to be, though.
the fact that Ada is somewhat cumbersome and having a bad reputation and/or being perceived as complex and hard to learn and use is a major reason for formal methods being used less than desirable.
My impression, perhaps mistaken, is that almost no one has heard of Ada, let alone SPARK, and even fewer use it, due in no small part to a campaign 30-35 years ago to brand Ada 83 as either too complex or the domain of "crooks". (Verbatim from a CS professor at my univ. who preferred C++.) By the mid 90s Ada was fading and now C++ and Java have become more popular, more used, and far more complex than Ada 83 ever was. -- Oh, and far less safe! Yet every time I ask about Ada to the average Comp Sci researcher, s/he rolls his eyes and calls it either too complex or too dead.
On the other hand, my impression is that formal methods are considered less than desirable for the same reason that most undergraduate math majors regard pure mathematics with distaste: you actually have to think for a change. I was told 20 years ago that in many cases verifying correctness on a very large, standard set of desirable example problems is easier than and preferable to formal proof of correctness. It stuck in my mind because it really bothered me.
I don't want to hijack this thread, so if you can point me to something that would correct this misimpression (assuming it is a misimpression) then please do!
In fact, there are only 252 soundness issues in total.
but nevertheless you haven't convinced me of anything.
I should have been informed that convincing you of anything was a major target.
... but at the end of the day you have to build real systems with real limitations in resources.
Hmm, that (not at all) good old story ... Well, fact is that debugging, finding and fixing bugs, preferably without introducing new problems, tends to cost considerably more both in time and in money than developing properly in the first place. Or to word it for you: Using formal methods is cheaper.
It's highly unlikely any prover system is itself perfect, even in the Ada/Spark world.
Nice attempt, but there is a lot of room between "perfect" and "shitty". (Also related to your last point) how about creating not 100% but 99.90% correct and reliable tools?
Every highly reliable real world system in the end has a it's "good enough" point; more accurately it's a high degree of confidence has been achieved by thorough investigation.
OK, let's look at data centers. There "good enough" means 99.95% availability. I'd accept that as a reasonable average quality standard - but you'll have a hard time to show me any software done in C, C++, let alone javascript and similar toys that offers that level of availability. With Pascal there is a chance and with Ada there even is a good chance.
There's a quote from the AWS paper I link in the next paragraph that shows that norm of thinking in engineering systems: “To a first approximation, we can say that accidents are almost always the result of incorrect estimates of the likelihood of one or more things.” -C. Michael Holloway, NASA. He is a member of NASA's "Safety Critical Avionics Systems Branch" and appears to have dozens of publications discussing safety critical systems (here's one using probabilities to consider formal verification methods for real time controllers).
Funny. We talk about correctness and you quote from some paper from some guy saying "To a first approximation, we can say...". Thanks for the laugh.
And btw NASA may be a name impressing many for whatever reason but I can tell you that that name carries very little weight in terms of correct software.
As a counter-example to this reasoning: AWS is known for using TLA+ for verifying their distributed system's and they claim it's part of their tremendous success. TLA+ defaults to Z3 so likely AWS has used Z3 (with bugs and all) to great effect to build one of the most reliable compute system ever built.
"counter example"? Hahaha.
Even a ruler that's only accurate in 99% of one's use cases is still vastly better than eyeballing it and being only accurate enough in 40% or even 80% of those cases. Especially if that 99.99% ruler will take years to get here.
And what if that ruler was measuring the number in your bank account? Or if it were in the medical equipment on which your mothers life depended?
To be clear, I'm talking about designing high reliability embedded distributed systems, which means using the language that safety critical fields employ, not mathematical definition in computer science research terms or a mathematical proof. Even with a perfect proofer and perfect program logic, you still need to design algorithms for handling processor failures, bit flips, etc. If a Z3 SMT solver can help ...
In all friendliness: I'm under the impression that you are out of your depth but let me also state one thing clearly: I'm not against you nor against you choosing whatever route you like. But my interest here is to have a language and tools that credibly support those among us who want or even need to create provably correct software.
1. This number -- 1000 -- mostly means that it's a big project that has been used by plenty of people. Currently there are 180 open bugs out of which only 28 are "soundness" issues:
Sorry, No. It means that they coded sloppily, gladly creating relatively few serious problems and mostly not critical ones.
In fact, there are only 252 soundness issues in total.
For a project that is all about enabling people to check the soundness of their models, code, etc. that's a tragically high number.
2. Just because the "proof engine" has bugs doesn't mean that the produced proof has bugs. And there is no alternative anyway, hand crafted proofs in math frequently have bugs too.
A triple No. First it boils down to "you can't rely on what Z3 hands you back". Second is that the "over 1000 bugs" comes from a project that (granted, in an "evil intention" way) tested the proving logic, so we are not talking about an "it accepts '-T' too as a parameter instead of only '-'" kind of problem. Third one of the very raisons d'etre of prover software is in the fact that machines' strong side is what is humans weak side: precision and stubbornly do thousands or millions of small steps. If Z3 (or any other prover) can not deliver that it's all but worthless.
Well noted, I immensely value the work and money Microsoft and others have poured into the Z3 project and my interest certainly isn't bashing the project but to create awareness re the real state of things.
@cantanima
almost no one has heard of Ada, let alone SPARK, and even fewer use it
That's sadly true although I think the situation is a bit less grave.
brand Ada 83 as either too complex or the domain of "crooks". (Verbatim from a CS professor at my univ. who preferred C++.)
I'll probably be called names again, but frankly I do not care what a prof who evidently doesn't understand his own field "thinks". Moreover Ada83 might be called somewhat limited but it certainly wasn't complex. Really big names in the field thought very differently. But there actually really was a reason to dislike Ada back then, namely the fact that it was seen as a pentagon project (albeit with good intentions).
C++ and Java have become more popular, more used, and far more complex than Ada 83 ever was. -- Oh, and far less safe!
It's a wide spread and common problem to think that "popular" and "more used" carry any real weight. In fact it's about as mindless to consider Lady Gaga a greater musician than the Callas or Bach because more people like her sounds.
Yet every time I ask about Ada to the average Comp Sci researcher, s/he rolls his eyes and calls it either too complex or too dead. ...
... and such clearly demonstrates a gross lack of knowledge and understanding. According to realistic criteria Ada is more alive than quite a few languages considered to be very alive.
I was told 20 years ago that in many cases verifying correctness on a very large, standard set of desirable example problems is easier than and preferable to formal proof of correctness.
It must have been idiots telling that because (properly) verifying correctness is necessarily to a a proof of formal correctness (or the contrary).
Re. your last paragraph: Read Dijkstra. He clearly - and correctly - said that testing can not prove correctness (and trust me, next to Dijkstra, Hoare, Brinch) all the professors who told you nonsense are insignificant nobodies).
Fun fact: I have looked at and worked with quite a few formal tools (over many years) and (Ada) Spark is in fact one of the easiest to use for developers. Also my productivity in Ada/Spark is considerably higher than in most other languages + formal tools (if available at all).
It means that they coded sloppily, gladly creating relatively few serious problems and mostly not critical ones.
No, it doesn't mean that and you're whole way of arguing is barely acceptable. Yeah, yeah everybody except Dijkstra is an "idiot" or "sloppy". It's convenient to argue this way but it's futile.
And also, so what's the lesson here? "Don't use Z3, use X instead which has fewer users and hence fewer reported bugs" -- sorry, but "number of bugs" is a terrible criterion for choosing a technology. You at least have to weight it against "number of users" and even then it's not sufficient, not all bugs are equally bad and not all users are equally demanding.
No, it doesn't mean that and you're whole way of arguing is barely acceptable. Yeah, yeah everybody except Dijkstra is an "idiot" or "sloppy". It's convenient to argue this way but it's futile.
@Araq, Agreed, and I believe there isn't much to be gained by continuing this thread. I'm still interested in DrNim and seeing what it'd take to get it up and running. My interest is not in an entire system being provable, but subsections of it. Overall TLA+ might be a better fit for some of the projects I'm considering, since the reliability of a single system will be less important than the distributed system, but having the ability to use DrNim in Nim could be a boon to the many embedded developers still using Plain-Ole-C.
No, it doesn't mean that and you're whole way of arguing is barely acceptable. Yeah, yeah everybody except Dijkstra is an "idiot" or "sloppy". It's convenient to argue this way but it's futile.
That is not what I said.
And also, so what's the lesson here? "Don't use Z3, use X instead ...
That also is not what I said. In fact I did not even mention any Z3 alternative - and I commended their goal and their efforts.
But no worries, I got it, this here is about who seems to be more likeable and who is not. You are the boss around here, I accept that and won't molest you any more with my opinions (and knowledge) on formal methods, SA, etc.
Z3 be praised! (of which most here do not even know why it became so widely used ...)
Have a nice weekend