« Text Mess in .NET | Main | XLinq StreamingElement »

June 13, 2006


Wesner Moise

A point that I failed to mentioned is that Erlang assumes that software will have errors in them. The trick is to reliably handle such errors.

The traditional approach is to find all such errors, because any such errors could be ruinous to the application. However, an alternative approach is to make the application more resilient to these types of "ruinous" errors.


Hm, very interesting, Wes.

Ok, so the improvement here isn't fewer bugs, but more reliable software that doesn't crash when it hits bugs.

I think that's a great improvement, but what about getting fewer bugs in software? At work, the bugs in our software -- probably 99% of the time -- are coding errors. When our software fails, it's almost always due to programming error. In the Erlang way, it would just retry my component that failed, then the parent component, and so on until the whole thing restarts. That's a bad system for us! With 99% of the time being programming error, restarting components isn't going to help in the least. The places where non-programming errors occur (such as network timeouts), well, a simply try/catch/error message is simple enough. We've designed our software to retry in such scenarios as well before issuing any errors. And even the error messages provide an easy way for users to retry such operations.

All that said, this is the first time hearing of Erlang, so I'm no expert. But after reading his blog posts on it I think it's clear Erlang does not attempt to address the "fewer bugs" problem that plagues most software, or at least, plagues our real-world software.

That's one thing I'm hoping your tool will help us with, Wes. I want fewer programming errors in our software. So hurry up with that beta! :)

Wesner Moise

I guess the main issue is fault tolerance for certain types of applications -- in this particular case, ATM machines. So we must assume there will be bugs, but the application can resurrect itself in the face of them...

One compiler company got in trouble for leaving asserts on in released code. We don't see debug asserts now, because they are turned off, but it pretty confirms my belief that in reality software that appears to be performing perfectly well may in fact be encountering frequent bugs. In such a case, the fault tolerance of the application is just as important as minimizing the number of defects in the software.


I suppose you're right there, at least for the ATM instance. While I agree fault tolerance is important for software, most time spent as developers here at work is fixing coding errors. We can pretty well keep an app running so long as things like out of memory errors aren't happening (but hey, if the system's out of memory, things beyond our control are going wrong).

Erwyn van der Meer

Hi Wesner, you might be interested in Juval Löwy's MSDN Magazine article titled "Volatile Resource Managers in .NET Bring Transactions to the Common Type" (http://msdn.microsoft.com/msdnmag/issues/05/12/transactions/default.aspx).

He introduces an approach to use transactions to keep objects in-memory in a consistent state.

Wesner Moise

I am actually familiar with that .NET transactions for volatile data, as I attended a lecture by a program manager in the transactions team in 2004.

The comments to this entry are closed.