I have recently had a lot of success programming in a certain nontraditional style at a small scale. Programming in this style has numerous advantages, such as better support for multithreading, less bookkeeping, simpler code.
This programming style, though, is completely impractical at a global level. Or is it? Functional programming languages are actually based on this style.
The idea is that all document data structures are immutable; data is instead copied or composed. Copying data in an immutable world is very cheap, because deep copies are equivalent to shallow copies since all references are sharable. Equality tests are also cheap, due to a couple of quick comparison tests on identical references and cached hashcodes.
The simplicity of this approach carries over to advanced algorithms built on top of the immutable data. For example, a trivial example of an undo implementation is to stash away the former copy of the document.
While immutable data solves or alleviates a lot of problems, it introduces one huge problem (reminiscent of trade-offs introduced by garbage collection) by eliminating the efficient assignment from the programmer’s toolbox. I do think that this problem is solvable, and that the collective advantages of immutable data may actually outweigh the inability to directly modify data in-place.
You say the problem is with assignment, but can you overload the "=" operator?
Posted by: Ben Reichelt | May 01, 2006 at 06:37 AM
It's a more functional style of programming, in which the assignment is not necessary. Changes to data structures are made through composition, mapping, transforms, etc.
The main advantages are
... the ability to use more general algorithms to manipulate data
... composable operations
... no longer have to precisely sequence operations since order of operations no longer matter
... cheaper copies
I mentioned undo as one example of simplificiation. Traditionally, document-based applications maintained an undo stack, in which each operation knew how to perform its own undo. Since copies are expensive in a mutable world, there's often additional optimizations with higher level routines which would turn off undo of lower level routines.
With immutable data structures, I no longer need to maintain an undo context--in which every operation is logged. I can utilize and combine various general methods for undo including snapshots, differencing, merging and patching. There's more consistency with less testing needed and less chance for data corruption.
I can better make sense of what has changed. Without logging, changes to documents can occur very quickly. A minimal undo record is created when I merge my changes unto the main document and differences are detected. I can also unify my AI data structures with my document framework.
A lot of hard features come become easy:
* Versions
* Comparisons
* Merges
* Simultaneous editing
* Out of order undo
The last two are important for me, since my software makes numerous changes to the document in the background, simultaneously with the user. A user would just be able to undo his own changes.
To make this work, I use transient structures that do have side-effects and mirror the immutable document tree. When the document changes, the mirror structures are invalidated, after which they are recreated or rearranged to match the changes.
Posted by: Wesner Moise | May 01, 2006 at 07:10 AM
Traditional memory management force us to think about allocation and deallocation of memory.
Imperative programming forces us to properly sequence, initialize and cleanup operations--all of which are unnecessary.
Posted by: Wesner Moise | May 01, 2006 at 07:14 AM
The biggest problem I see with this approach is the efficient handling of large lists. Data should be structured as hierarchical as possible.
Large lists should be probably be organized as trees of shareable pieces, that are extendable at the edges. In addition, "views" and other types of virtual lists are very helpful.
Posted by: Wesner Moise | May 01, 2006 at 09:26 AM
At what point does a data structure become immutable in this paradigm?
Posted by: Rubikzube* | May 04, 2006 at 08:14 PM
I'll talk more about this in a future post. Data is always immutable; this is after all a functional style of programming.
To make changing the document efficient without excessive copying, the whole document tree is mirrored by nodes. (This is analogous to calling a recursive function on the tree except the function call occurs in "space" rather than in "time.")
Changing a subtree occurs by changing the mirroring node to point to the new subtree. The parent (and its ancestors) of the node is dirtied and is only copied on demand, so that many nodes can be changed before the whole document is copied.
Posted by: Wesner Moise | May 04, 2006 at 08:39 PM