Thoughts on Software Evolution

Thoughts on Software Evolution--to see the list of articles used in this paper click here (section C).

Software evolution is a topic that to me seems to have little importance placed upon it in the initial design of a product. Only after a few releases and some maintenance do we then realize how important it is to write software with considerations not only both in the present and future—and generally we make these realizations too late. Along those lines this paper will present abstracts of four articles about the subject, along with some personal insights into each, in the hopes of making the reader more familiar with the topic.

Starting the series of papers on Software Evolution Hausi Müller et al presents the reader with an idea on how to better address reverse engineering. Starting with a brief introduction of software maintenance, Müller spends the remainder of the paper trying to impress upon the reader the benefits of using a bottom-up approach concerning reverse engineering, plugging Rigi (a tool developed by the authors to better facilitate reverse engineering) throughout the work.

Beginning with a discussion on software maintenance, the author notes that ideally design documents and well-documented source are available to whoever is undertaking this task. Unfortunately, the source code is often the only reliable means of understanding a system. This is the premise the authors build their paper around—using automation (in this case Rigi) to help the developer perform some of the arduous tasks of decomposing a system. The authors are quick to note however that the developer still plays a central role throughout the entire process, subscribing to the motto "automate as much as possible, but never fully automate."

The two most commonly used methods to reverse engineer a system are to extract design artifacts out of the source directly and use them to build abstractions that are less implementation oriented then the source—which is where the strength of Rigi lies. Using what the authors’ term a "subsystem composition methodology," Rigi constructs resource-flow graphs, subsystem hierarchies and ultimately (k, 2)-partite graphs which provide the developer with an abstract model of the initial system. Rigi goes about this by using two primary models to represent and manage software models: the unit interconnection model and the syntactic interconnection model. The difference between the two is that the unit module is more concerned with how files, subsystems and classes relate, while the syntactic module is more closely tied to the nametable of the given language and how procedures, functions, and variable relate. The graphs resulting from applying both these methods and combining the results are the aforementioned (k, 2)-partite graphs.

The rest of the paper is more or less a selling point for the Rigi editor. The authors also present a case study in which Rigi was used reverse engineer large ray-tracing rendering system written in C. Based upon the success of this case study and the features available in Rigi, the authors conclude that providing a tool such as Rigi to a developer undertaking the task of reverse engineering would decrease the amount of time spent understanding the system.

I found the idea this paper was presenting interesting, although I think some of the discussion of the Rigi editor could have been removed. While I have yet to tackle a large reverse engineering project (at present the reverse engineering I have done consisted of code bases of no more then a few thousand lines) I can understand the value of Rigi. Presented with a large system you know nothing about, Rigi can provide you with the workings of a system through its graphs of system components and connection modules in a shorter amount of time then debugging alone. While Rigi facilitates an easier understanding of a system, I would not see much additional use for it for two reasons. First, Rigi does not support the C++ platform, which is used extensively by our group. In the article’s defense it was written in 1993, so it is possible that Rigi now supports C++. If this were not the case however, I would see this as a major impediment to widespread use. Second, and most importantly, what Rigi provides in understanding it lacks in functionality. If I wanted to reverse engineer and make numerous changes to a particular system I feel that I would be limited as I am removed from the source directly. Having an idea of how a system works is valuable, but ultimately you need to touch upon the source to make changes—and Rigi seems to move away from that.

Continuing with the topic of reverse engineering, Robert Bowdidge and William Griswold introduce the idea of automated support for encapsulating abstract data types through the star diagram. While there are similarities between the star diagram and Müller’s Rigi editor in that each gives the user an abstract graphical model of source code, Bowdidge and Grisswold’s method allows the user to manipulate the source directly.

Unanticipated modifications to any system are rarely easy, and as these modifications add up maintenance becomes next to impossible as the original design is lost amongst the changes. Proposing a solution to this problem; the authors introduce the reader to the star diagram, which is like "a data flow graph—with nodes as operations and edges as data flows between them—but showing only the data flows generated by references to the data structure." Simply put, the star diagram takes all references to a variable by a single node, leaving the user with a tree showing all items that refer (directly or indirectly) to a particular data structure.

The real value of the star diagram is in the manipulations that can be performed upon it and the corresponding source code. While some nice features of the editor are mentioned, the authors draw the reader’s attention to six restructuring operations that can be performed on the star diagram. A function can be either extracting or made inline; the same two operations can also be done to function parameters, or functions can be moved either into or out of an interface. The star diagram will also check against the program to make sure that meaning is preserved when performing one of the aforementioned transactions, with the star diagram updated to reflect the changes.

The authors then describe two examples in which the star diagram was used to perform restructuring operations: the KWIC program and a transportation simulation. The important takeaway the authors want the reader to notice from these examples was that not only were they able to perform the restructuring operations on the program, but that they were able to understand the operation they were attempting and a little bit about the program itself.

Following discussions on how the star diagram is generated and coupling and cohesion, the authors conclude that tool assisted program restructuring can help lower the costs of excessive maintenance. Acknowledging the fact that there are already text based tools that provide restructuring help, the authors point out how the star diagram gives a visual representation to the user, allowing the programs to be more easily manipulated through views other then the source code. The authors leave the reader with three design principles used in the design of their star diagram, suggesting them as guidelines for developing similar tools: limiting the problem domain, allowing views to be derivable from the source code, and making sure that manipulations must have predictable effects.

The thing that really struck me was how the star diagram remained so closely tied to the source code of a program. Where Rigi provided no real way to dig down and make changes, the star diagram provided six common operations to perform on source code. Unfortunately, I do not understand how the authors’ were able to better understand a given program. I can certainly see how you could understand a specific area of the source, and would without question use this method over Rigi when making software modifications on large systems. I still feel however that Rigi provides developers with a better overall diagram of how a given program works. If my job presented me with the task of making changes to a large system that I had no prior knowledge about, I could see myself using both Rigi and the star diagram to accomplish my task. It would also again be interesting to see if this tool is available for C++ (the authors did not mention this one way or the other), as I spend a majority of my time in that language over any other.

Keeping with the concept of reengineering, Gail Murphy and David Notkin present the software reflexion model which, similar to the aforementioned Rigi and the star models, enable developers to "rapidly and cost-effectively gain task-specific knowledge about a system’s source code." Where the authors differ from the other programs is in how the source model is constructed—simultaneously constructing a top-down and bottom-up approach versus selecting one or the other.

After describing some of the motivation behind developing the reflexion model the authors focus on the required steps to make it work which, quite surprisingly, only requires five tasks. The user begins by constructing a high-level model to help understand the task at hand (some examples being reviewing source code or taking to experts on the system). The user then applies a tool similar to a call graph or file dependency extractor to build the source model which will be compared to the high level model constructed in step one. Next, the user defines a map to describe how the source and high level models relate. With these three steps completed, the user now uses a suite of programs provided by the authors to build the software reflexion model, allowing the user to "see interactions in the source code from the viewpoint of the high-level model." The fifth and final step involves investigation and computation of successive reflexion models until enough architectural information about the task at hand has been acquired.

The authors then describe a case study involving the source to Microsoft Excel in the hopes of seeing how their tool would handle a large system. Using the five steps mentioned previously, the authors describe how the developer went about breaking down and understanding the Excel source. From the results of the study the authors concluded that both graphical and text interfaces are needed (the developer primarily used the text models, questioning the importance of graphical models), and that their tool had some performance difficulties on such a large code base which they corrected.

Based upon the success of the Excel example, the authors conclude that their technique has practical value in the software industry. This conclusion is based upon the feedback provided by the developer in that he was able to achieve his task in only a few months time versus the two years that he estimated it would take without applying any reengineering tools. The authors also point out that since Excel was written in C (a commonly used language in the industry) they could successfully draw these generalizations for success from this one example.

Out of the three papers that presented a tool to make reengineering easier I believe this one offers the most value when applied to the software industry. My reasoning is two-fold: first, this method, in my opinion, offers the best of both worlds in providing a model that not only gives you a high level view of how a particular program works, but also keeps you closely tied to the source. In the previous two examples each tool offered one or the other, but never seemed to successfully integrate the two. Second, unlike Rigi or the star diagram, this method has been proven on a large project. I do feel there is a downside however—the largest of which is that it appears to require more work then the other two. If I wanted to just get a quick understanding of how a given program works or just make a quick code change I might use Rigi or the star diagram respectively. However, if I were presented as daunting a task as understanding Excel I would opt for the reflexion model without hesitation. Also, like the previous two tools, I would be interested to know if this model was available for C++ code as well.

Last but never least, David Parnas’ "Software Aging" proved to me to be the most interesting. Simply stated, the goal of Parnas’ paper would be that the previous papers on providing tools that aid in reengineering would never have been written, as software would have been designed with long term health in mind.

"Software aging" does not imply bit decay—it instead occurs when existing software no longer meets the owners needs or changes made to an existing system make it even harder to maintain. Three negative aspects of aging software (as Parnas describes) are first, owners of aging software have an increasingly difficult time keeping up with market demands. Second, aging software often loses performance over time. Third, aging software becomes unstable as new changes are continuously introduced into the system.

To combat the effects of software aging Parnas addresses the notion of software engineers who consider a program done after one successful iteration, when in fact that is just the beginning. As the author puts it, one needs to "design for change." Spend time estimating the types of changes, then organize the software so that the items "most likely to be changed are confined to a small amount of code, so that if those things do change, only a small amount of code would be affected." When designing software be sure to look beyond the first release—look to when the time when the software is old.

Software aging, Parnas argues, is inevitable. To combat this he describes steps should be taken to plan ahead and deal with the eventuality. We, as a profession, must not "let today’s pressure result in a crippled product next year." We must also introduce documentation into the design process. All programs and changes must be documented and reviewed constantly. We should also plan for the eventuality of when our software must be replaced, making sure all the appropriate funds and people are available.

The author concludes by addressing four attitudes that, in his opinion, prevent us from addressing this topic. The notion of "software crisis" must be removed as it leads to short term thinking and aging software. Communication between industries must improve so that information can be shared and problems addressed quickly. The notion that "anyone can code" is inappropriate and must be changed. Finally, researchers must expand their audience beyond those of their colleagues.

Parnas continues to amaze me—what he says rings true years after it’s written. There were numerous times that, while reading through some of the problems and effects of software aging, I came up with examples of applications in my group that I could identify as undergoing some form of aging. The concepts that Parnas argues for could absolutely apply (and in some instances already do) to my profession. The most telling example is designing software with the future in mind, as I would agree that we too often design for the present. If we can change these notions now perhaps I will not cringe so much when asked to update software later, nor will it take me so long to do so.

Designing software is not an easy—adding consideration for how this software acts now and in the future is even more difficult. Out of the four articles discussed three of them dealt with techniques for reengineering a system that had either been poorly designed, had no documentation, or had degraded from constant maintenance, while the fourth article discussed measures on how to prevent this from happening. If nothing else, I hope that you can take away from this article how difficult reengineering can be, and how software design as a whole would improve if we all spent a little more time in our designs considering our software beyond the present.