There’s a conversation going on in the Alt.Net discussion list about code solubility. The term (as applied to code) seems to have been coined by Scott Bellaware as a measure of how easy it is for someone reading a code base to absorb the information in it. In essence, soluble code is a refinement of simple code and readable code. While individual blocks of code can be judged as simple or readable, Scott argues that solubility only applies to an entire code base, and is a more stringent requirement than readability: “code can be readable without being soluble” (although soluble code has to be readable, of course).

The reason this came up on the discussion list is that someone asked if there was an example in the open source world of a highly soluble code base … Like everyone else, I’m having a hard time coming up with an example of an open source project who’s code I would consider imminently grokable and tractable, but I’m intrigued by the discussion because it has turned to a couple of threads about the usefulness of “solubility” as a metric:

  1. Is solubility actually a metric we can use?
  2. How important is solubility?

As far as the metric of solubility, I think it’s true that the skill level of the reader is going to be the primary factor. Scott said, in a comment on his own post, that “Some writers come as close as possible to being objectively soluble,” and added the idea that a developer should be “intent on ensuring the communication of understanding.” I still think that there’s no such thing as “objectively soluble” without specifying a level of expertise. If we have a coder who has just started on our project and there are pieces of code in our code base which I would naturally expect any software engineering or computer science graduate to understand at first sight … but this guy does not understand them without having them explained (or spending time reading them on his own), does that necessarily mean that the code isn’t soluble? Maybe it just means that he isn’t literate enough.

I really think it comes down to reading level. Authors for the mass market may strive to write their texts at a 6th or 8th grade reading level, but authors of programming books do well to get down to a high-school reading level. In the same way, we have to assume some level of skill/knowledge in order to be able to communicate effectively — otherwise you end up unable to use entire features of the language. The obvious thing is that we expect some comprehension of math and computer logic, but as a specific example, take the Yield statement).aspx in C# 2.0. Most C# programmers seem to be unaware of it’s existence, and have to go look it up to understand what it does — sometimes I’m not sure they get it, even after reading it, without seeing a few examples in execution. If they don’t know about the Yield statement, and have a good grasp of IEnumerable as well, they will certainly not understand something as simple as how this method works when called from a foreach loop:


Public IEnumerator GetEnumerator() {
   Yield return 4;
   Yield return 3;
   Yield return 2;
   Yield return 1;
   // Even though it's a dumb idea to have a side effects in an enumeration ...
   // Lets pretend that something happens here, just to give the example some meat
   Debug.Console.WriteLine( "Blast Off" );
   Yield return 0;
   // and here ...
   Debug.Console.WriteLine( "We're airborne" );
}

 

If it’s true that “most C# programmers” don’t understand the Yield syntax, does that mean that I should avoid the use of the Yield statement? Bear in mind that I would presumably have to replace it with a full C# 1.0 implementation of the IEnumerable interface, complete with an internal counter variable, and methods for MoveNext, Reset, and Current. That change would actually make the implementation far less readable and of course, less soluble as well — except for that group of people who understand the concept of IEnumerable, but are not familiar with the Yield statement…

To go back to the reading level analogy: If your books are intentionally written at a graduate reading level, there’s no point in trying to sell them to high school kids, but that doesn’t necessarily mean they’re badly written, unless, of course, you’re trying to explain something that should be at their level. An algebra book written at a post-grad reading level is worse than useless, but a book on particle physics written at the 3rd grade reading level is probably equally so.

Anyway, that’s probably enough stretching of the analogy … but I started thinking about that second point: the importance of this metric. I just thought I’d throw this out there.

Someone on the list made a comment with regard to the Quake source code released by Id (and assumed to be fairly complicated): “while the … code might do what it’s supposed to do very well, that alone is not necessarily an indication of the quality of the source code”.

Sometimes I think discussions about code quality can go a little too far in the theoretical direction. It’s certainly true that maintainability is a key component of code quality (in all of it’s aspects: both regarding the fragility of the code and it’s readability and “solubility” for new developers). But I think that for the source code of a reuseable game engine, the primary metric is it’s performance, not it’s readability.

Given two engines with similar APIs and performance, you would want to use the engine which you judged to be the most maintainable — but if you had the same API on two engines, and the one were clearly more maintainable but gave your game a graphics performance of 30 FPS instead of 60 FPS, you would probably pick the fast one (assuming some level of confidence that it works, and especially if you’re not allowed to make changes to the engine anyway).

I guess all I’m saying is that even in the modern age, there remain areas where performance still outranks solubility, but it’s probably safe to asume, for the sake of discussion, that as a .Net developer you’re more concerned with maintainability than performance, and that code solubility should have a fairly high rating.

In fact, in some industries we already consider the ability of a programmer to read and understand your code (does that make the programmer a solvent?) as a measuring stick of his literacy level, and as a factor in hiring. At the same time, we might want consider their ability to explain something at a lower reading level than it’s been expressed to them — even if it takes more code or results in a performance hit. In other words: you might want your new hires to have a certain level of code literacy, but you also want to be sure they can express themselves well in that lower reading level — in code that is more “universally soluble” — because that is what they should be ideally coding most of the time.