A Reading List about Code Health

In this post I would like to give an overview about the books related to code health that had the most influence on me as a software engineer. They are also listed in my post A Reading List for Software Engineers (which is still in an unfinished state), but are covered here in more detail.

The books are not presented in any particular order, except for the first entry (which is actually a series of books), which had a profound effect on me, in the sense that it made the whole field of code health and sustainable software development accessible to me.

But before we dive into the books, let me say a few words about why I believe that code health is such an important topic and why it is essential for sustainable development at a high pace and also a determining factor for the long-term success of projects.

Why Code Health?

Code health is very difficult to define and even more difficult to measure. We have some indicators that can show the presence of potential issues, but to my knowledge there is no indicator that proves that a codebase is healthy. In other words, it is easier to spot issues than to spot quality. In my opinion, this is due to the fact that code health is to a large extent defined by higher-level structures of the system and abstractions and not so much by local low-level properties. This leaves us in a complicated situation and it makes it quite difficult to approach it from a scientific point of view.

Nevertheless, we have gathered plenty of anecdotal evidence over the last decades. And I have collected my own experiences during my professional career. In a high-quality codebase I am able to deliver plenty of value per time, while maintaining the high quality. In contrast, in a big ball of mud, I feel I am not even worth the money I am getting paid. And the difference in productivity I am talking about here is not a factor of 2 or 3, it is orders of magnitude. And yes, the plural orders is not a typo. We are talking about factors above 100 here.

So why do we so often work in low-quality codebases? In particular, this is puzzling, since fresh projects have by default a high quality. I have written about this in another post (Code Health is not a Trade-off) and do not want to go too much into detail here, but I think it is a combination of multiple factors, such as poor understanding of the requirements and the problems, inexperienced developers, time pressure and companies setting the wrong incentives.

My personal solution to all that is to apply the knowledge I extracted from the books listed below (and many other books which are not directly related to code health, but still had a great impact) and to shape those parts of the codebases which I have to work with, in such a way, to create an environment that allows me (and other developers) to perform at the highest level of productivity. The problem here is that this is not always easy and too often there are external factors involved which further complicate this. But starting with the books listed in this post is a first step in making this happen.

The Clean Code Series

This series consists of four books. These are (in chronological order) Clean Code: A Handbook of Agile Software Craftsmanship, The Clean Coder: A Code of Conduct for Professional Programmers, Clean Architecture: A Craftsman’s Guide to Software Structure and Design and Clean Agile: Back to Basics. The first and the third book of this series are all about code health, while the second is more about software craftsmanship and being a professional software engineer and the fourth gives us a sort of historical perspective on the Agile movement. I list them all here for completeness, the second book is also relevant with respect to code health, the fourth one maybe less so. But overall, I think they are all worth a read.

In any case, right after I made the transition from academia into the world of professional software development, I started reading Clean Code, Clean Coder and Clean Architecture (if I recall correctly, in that particular order). And those books had a tremendous impact on me and my view about writing software. Reading those books made me realize that the software I wrote during my time in research was pretty low quality. I was always very confident about the code I produced, but as a scientist I did not know how much better I could have done. Looking back, it makes me a bit afraid. I am still sure that my results were correct and there is a lot of empirical evidence that backs this up, but how could I be so unprofessional? Well, my main focus was on the scientific results, but writing software was one of the tools I used to obtain those results. And I used that tool quite badly.

Coming back to the books. Clean Code: A Handbook of Agile Software Craftsmanship focuses on what I would call now low-level readability. This is about smaller chunks of code being expressive and readable. In some sense, this is the foundation on which we can build larger components and whole systems, all adhering to high standards of quality. And this is exactly the topic of Clean Architecture: A Craftsman’s Guide to Software Structure and Design. Here the focus is on higher-level structures and abstractions. The principles that are explained in this book also apply to the design and architecture of whole systems or services, but the focus is more on components and relations between components. Exactly those structures, which to a large extent determine the overall code health. This should not diminish the value of low-level readability, but while I personally value low-level readability, I value clean structures and abstractions even more.

The Clean Coder: A Code of Conduct for Professional Programmers complements the other two books by providing guidelines for professional behavior as a software engineer. Definitely not with the same strong focus on code health as the other two books, but laying out why a professional conduct is important for doing our job efficiently (and also for being able to push for code health and other things we as developers know are important). For Clean Agile: Back to Basics the focus is even less on code health. Actually, for those who understand the Agile movement as a strong force for promoting code health and clean software development practices, there is quite some connection to code health. In any case, I would recommend this book for the historical perspective that it provides on the Agile movement and the more recent developments.

All in all, Clean Code, Clean Architecture and Clean Coder are all a must-read and I would recommend to read them in exactly that order. Clean Agile is a nice addition, but it would not be wrong to put it into the personal book backlog for the moment. Here are a few links to the books. Please note that there is a special offer for the electronic versions of Clean Code and Clean Coder, bundled together.

Expanding Your Toolkit

The books in this section are Design Patterns: Elements of Reusable Object-Oriented Software, Test Driven Development: By Example, Working Effectively with Legacy Code and Refactoring: Improving the Design of Existing Code. Let me start by saying that they are all a must-read. They all enrich your personal toolkit as a professional software engineer by providing important insights and generic techniques. I am pretty sure I am using some of the knowledge I got from these books on a day by day basis.

Let’s start with Design Patterns: Elements of Reusable Object-Oriented Software. During my time as a University student, a friend introduced me to this book and I read it front to back in only a few days. I did not fully understand all the aspects, but I found it fascinating to think in patterns while developing software. Unfortunately, I was not able to grasp the full value of this book at that time. This happened years later, though, after I made the transition from academia into software engineering and when I read the book for a second time. I don’t know what changed, but it was much easier to comprehend and suddenly all the patterns were easily accessible. Maybe the few experience I had with writing production software played a role here. In any case, this book contains a list of generic design patterns and presents them in a very structured way. These patterns make it easier to write, communicate and reason about software. They are mostly about lower-level design, just where software engineering gets interesting and where decisions have a lot of impact on code health.

The next one is Test Driven Development: By Example. This book is all about writing tests first in order to drive the implementation of production code. It had a profound effect on me. For a short period of around two months, I was practicing test driven development (TDD). It felt slow and awkward at the beginning, but I got used to it quite fast. The insights I gathered from that period of time are about what makes code and design testable and how this can be achieved without sacrificing any other code or design properties. Think about it for a moment, this is huge: making the code testable without being invasive. I stopped with strict TDD after those two months and currently I am practicing something that I would call pseudo-TDD (I guess I have to write about that in the future). By the way, this is also something quite typical that happens when I read books: I try to absorb the knowledge that is within the books and use it to improve / complement / adapt my personal practices and views as a software engineer, while I usually do not fully buy in to the more dogmatic parts. That does not mean that there is anything wrong with strict TDD, quite the contrary, I can only recommend everyone to read this book and to practice TDD (at least for a while). For me personally, I tried it for a bit and that was sufficient for me to extract the underlying principles and incorporate those into my personal practices. So, saying that my current style is influenced by TDD would be definitely an understatement.

The last two books in this section, Working Effectively with Legacy Code and Refactoring: Improving the Design of Existing Code are sort of related to each other. Both books focus on non-functional changes, so-called refactorings, but within different contexts. The first one focuses on how we can regain control over legacy code. This book contains a large number of specific techniques that can be used to incrementally transform legacy code into a more healthy state. If you ever became desperate while trying to add tests to existing code, while breaking up a large method or while separating responsibilities of a single huge class, this is the right book for you. The second book approaches refactorings from a different angle. In this book the term code smell was coined. This term depicts an unusual structure in the code that could potentially cause issues in the long-term. Many such code smells are presented in the book, together with the techniques that can be used to remove them. In my opinion, those two books nicely complement each other and both should be part of any library of a professional software engineer.

Again, here some links for the books discussed in this section.

More Perspectives

In this section I would like to introduce a few books which I read after the books that I discussed earlier. Due to the knowledge I already had while reading those books, it is difficult for me to judge how much value they would provide by themselves in isolation. All I can say is that they were very good reads, refreshed my memory on many different aspects of code health, provided different perspectives on certain topics and offered new insights which made me adapt and improve my personal practices even more. For the latter it is not clear how much of the insights are actually completely new and how many I missed while reading other books. This is also why I tend to re-read books years later while having a more profound background knowledge and a completely different context. In any case, the following books were definitely worth my time: The Art of Readable Code: Simple and Practical Techniques for Writing Better Code, A Philosophy of Software Design and Understanding Software: Max Kanat-Alexander on simplicity, coding, and how to suck less as a programmer.

I would recommend all of these to anyone who wants to dig deeper into the topic of code health and who has already consumed most of the other books mentioned earlier. As usual, here the links to the books.

Epilogue

I hope this list provides some value for fellow software engineers, in particular to those who are at the very beginning of their careers. If this is the case, please feel free to share it with your peers. And please let me know in the comments what you think about this list and if there is anything missing here.

A Reading List for Software Engineers

IMPORTANT: This post is not in its final desired form, yet. The ratings (probably a finer granularity) and categories still need some work and a lot of descriptions are missing.

In this post I would like to assemble a list of resources that I found helpful for my work as a software engineer. Since my views slowly change over time, I will try to keep this post up to date and continuously add / remove / adapt the content. To make it easier to consume this reading list, I will put the different books into different categories, potentially having a book in multiple categories at the same time. I will also provide for each book one or more ratings out of the categories essential, useful, optional and classic.

As usual, a short disclaimer: Everything I present in this post (and in this blog in general) is my personal view and does not represent the view of my current employer, my former employers or any future employers.

Overview


Code Health

Refactoring: Improving the Design of Existing Code
[Rating: essential]

Design Patterns: Elements of Reusable Object-Oriented Software
[Rating: essential]

Working Effectively with Legacy Code
[Rating: essential]

Test Driven Development: By Example
[Rating: essential]

Clean Code: A Handbook of Agile Software Craftsmanship
[Rating: essential]

Clean Architecture: A Craftsman’s Guide to Software Structure and Design
[Rating: essential]

The Art of Readable Code: Simple and Practical Techniques for Writing Better Code
[Rating: useful]

A Philosophy of Software Design
[Rating: useful]

Understanding Software: Max Kanat-Alexander on simplicity, coding, and how to suck less as a programmer
[Rating: essential]

Algorithms and Data Structures

Introduction to Algorithms
[Rating: essential]

This book is essential for building a solid foundation regarding algorithms, data structures and to some extent also computational complexity.

I first stumbled upon an earlier version of this book during my studies in computer science and really enjoyed reading it. The book itself is quite verbose, covers a large amount of topics and contains a lot of illustrations. Overall, it is a very good introductory book, as well as a valuable reference later on.

On top of that, the book can be very useful for interview preparation.

Algorithms
[Rating: useful]

Not quite as good as Introduction to Algorithms for my taste, but still a very good book.

The focus is more on practical aspects of algorithms, while Introduction to Algorithms covers the practical as well as the theoretical side. For this reason it might actually be the preferred choice for some engineers.

This book can also be very useful for interview preparation.

The Art of Computer Programming
[Rating: classic]

These books are definitely among the most famous books about computer programming of all time. Having started with this project in 1962, Donald Knuth published the first three volumes between 1968 and 1973. Volumes 4 and 5 were published in 2011 and 2015, respectively. Volume 6 is about to be published and volume 7 might conclude this compilation at some point in the future.

I personally bought volumes 1-4 at the beginning of 2017. So far I have just read very few parts. But since I added these books to my personal library my productivity miraculously increased by 2-3%.

Not an easy read, though.

Programming Languages

C++

Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14
[Rating: essential]

Effective C++: 55 Specific Ways to Improve Your Programs and Designs
[Rating: useful]

The C++ Standard Library: A Tutorial and Reference
[Rating: useful]

Modern C++ Programming with Test-Driven Development: Code Better, Sleep Better
[Rating: useful]

Python

Fluent Python: Clear, Concise, and Effective Programming
[Rating: essential]

Effective Python: 59 Specific Ways to Write Better Python
[Rating: useful]

Java

Java: The Complete Reference
[Rating: useful]

Go

The Go Programming Language
[Rating: essential]

Tools

Practical Vim: Edit Text at the Speed of Thought
[Rating: essential]

The Linux Command Line: A Complete Introduction
[Rating: useful]

Docker Deep Dive
[Rating: useful]

Shell Scripting: How to Automate Command Line Tasks Using Bash Scripting and Shell Programming
[Rating: useful]

Pro Git
[Rating: useful]

Mercurial: The Definitive Guide
[Rating: useful]

Software Craftsmanship

Coders at Work: Reflections on the Craft of Programming
[Rating: useful]

97 Things Every Programmer Should Know: Collective Wisdom from the Experts
[Rating: useful]

97 Things Every Software Architect Should Know: Collective Wisdom from the Experts
[Rating: useful]

The Passionate Programmer: Creating a Remarkable Career in Software Development
[Rating: useful]

Effective Programming: More Than Writing Code by Jeff Atwood
[Rating: useful]

How to Stop Sucking and Be Awesome Instead
[Rating: useful]

A Programmer’s Rantings: On Programming-Language Religions, Code Philosophies, Google Work Culture, and Other Stuff
[Rating: useful]

Software Developer Life: Career, Learning, Coding, Daily Life, Stories
[Rating: useful]

The Pragmatic Programmer: your journey to mastery
[Rating: useful]

Processes

Clean Agile: Back to Basics
[Rating: useful]

The Clean Coder: A Code of Conduct for Professional Programmers
[Rating: useful]

Becoming a Technical Leader: An Organic Problem-Solving Approach
[Rating: useful]

Debugging Teams: Better Productivity through Collaboration
[Rating: useful]

The Dip: A Little Book That Teaches You When to Quit (and When to Stick)
[Rating: useful]

So Good They Can’t Ignore You: Why Skills Trump Passion in the Quest for Work You Love
[Rating: useful]

Crucial Conversations Tools for Talking When Stakes Are High, Second Edition
[Rating: useful]

Crucial Accountability: Tools for Resolving Violated Expectations, Broken Commitments, and Bad Behavior
[Rating: useful]

The Healthy Programmer: Get Fit, Feel Better, and Keep Coding (Pragmatic Programmers)
[Rating: useful]

The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change
[Rating: useful]

Getting Things Done: The Art of Stress-Free Productivity
[Rating: useful]

Great at Work: The Hidden Habits of Top Performers
[Rating: useful]

Classics

Code Complete: A Practical Handbook of Software Construction, Second Edition
[Rating: classic]

The Mythical Man-Month: Essays on Software Engineering
[Rating: classic]

Code Health is not a Trade-off

It is a popular perception in software engineering that there exists a trade-off between code health and delivering software fast. This is a huge misconception fueled by the fact that we very often maneuver ourselves into such a situation, where we suddenly have to decide between improving the quality of our codebase and delivering new features.

Yes, this trade-off is real and I have experienced it myself more often than I would have liked to, but the underlying problem is that we allow this trade-off to materialize in the first place without actually getting anything in return.

In the remainder of this post I will look into some of the reasons that create such trade-off situations, how we could avoid them and what we could do to get out of them.

Before going more into detail, a short disclaimer: Everything I present in this post (and in this blog in general) is my personal view and does not represent the view of my current employer, my former employers or any future employers.

Accumulating Technical Debt

Having a large amount of technical debt is the reason why we experience this trade-off between code health and delivering features fast. So why are we getting into this vortex of technical debt? There are three reasons for that.

First, we very often incorrectly perceive that by going into technical debt we are able to gain short-term advantages which potentially outweigh all the long-term downsides. But in software development this is just not possible. Alright, there are a few notable exceptions here, like the one-off script, the small prototype or the feasibility study. But even in those cases, how often did we need our one-off script more than once? How often did our small prototype grew over time until it made it into production? How often did our feasibility study require time-consuming debugging until it gave correct results? I do not want to claim that those exceptions do not exist, but they are extremely rare. In all other cases, there are no significant short-term benefits from going into technical debt. Technical debt hits us much faster than we expect, even for single-person short-term projects, but more so for multi-developer projects with a long lifetime.

Second, we sometimes just do not know how to do it better. Each developer has a specific set of skills and experiences. Even by doing our best, we might immediately introduce additional technical debt. And this is one of the reasons why I personally do not like the term technical debt. While we go into technical debt consciously, but based on false premises, in some cases (first point), we mostly go into technical debt accidentally (this point and the next one). And this is not how you typically get into other forms of debt.

And third, even in cases where we do know how to implement things perfectly, we get it wrong. Actually, we get it wrong all the time. And not because we are missing skills or experiences, but because we are not able to predict the future. We are implementing something in software (and not hardware), because we expect the implementation to change over time. Even in cases where our current implementation is perfect (whatever that means, but this is a different story), it might be far from ideal in one month from now or in half a year. And that means that we get additional technical debt just because the context (e.g. constraints, requirements, scale, etc.) changes over time. Again, the term debt does not seem appropriate here.

How to avoid Technical Debt?

Looking at the three different ways in which we are adding technical debt, how can we best avoid it? First of all, lets not get lured into consciously adding technical debt by the false premises of short-term gains. In most cases, this is not what we get back in return. So what about the other two cases in which we accidentally add technical debt?

We can reduce the amount of technical debt that is added because we were not able to find better ways of implementing or designing a feature by getting better ourselves. Continuous learning and improvement is something I am very passionate about. It takes time, but there are plenty of resources (books, videos, courses, etc.) available that could help us to avoid repeating the same mistakes we (or other developers) did in the past. Those are techniques like test-driven development, pair programming, code reviews, refactoring, patterns and many more. I always wanted to compile a list of useful resources and might be able to add it in the next days, so you have some starting points.

What about the technical debt that is added because the context of our project has changed? Well, we can easily avoid this by getting better at predicting the future. Just kidding. While we might get better over time about making reasonable predictions of the future, this will never be sufficient to avoid this issue altogether. What we can do is to accept that we will get things wrong and to design the system with that in mind. This is not easy at all and in my opinion one of the most difficult things in software development. But by designing the system for change, by deferring decisions and by keeping the codebase simple and modular, we can mitigate a lot of technical debt as soon as it occurs.

Whatever we do, we will sooner or later accumulate technical debt. Even a team with the most experienced and skilled engineers will have to deal with technical debt. The important thing here is to continuously deal with it and to reduce it as soon as it occurs. As long as we are able to keep technical debt below a critical threshold (and this threshold is not very large), we are not getting into the situation where we have a trade-off between code health and feature development. This is what we should aim for. But what if we are above that threshold?

How to reduce Technical Debt?

Working in a project with a large amount of technical debt is not a pleasant situation to be in. As I said before, this is something we should try to avoid from the very beginning. Don’t get into such a situation.

But what if it is too late or if you just joined a new project which is in an unhealthy state? It can be extremely stressful to work in such an environment. If that is the case for you and if the situation is not improving, I would recommend to look for other opportunities and to not remain in that situation for too long.

In any case, if we are in such a situation, we basically have to deal with the uncomfortable trade-off between code health and feature development. That means there has to be someone in the decision chain which supports reducing technical debt. If that is not the case, I doubt that such an endeavour could be successful and would like to refer to what I said in the previous paragraph. If, on the other hand, reducing technical debt is a priority, it will still be a very difficult task, but there are a few things we can (at least try to) do.

Replacing the whole system is most of the times very risky and something I would generally advice against. Instead, I would try to reduce technical debt more locally. I would start by improving testability and test coverage in order to increase the confidence in the codebase and to make further changes easier. Being able to change code with confidence might allow us to modularize some components and decouple them from the rest of the system. Replacing such components with better implementations does not come with the same risks as doing it for the whole system and is something that worked very well for me in the past. Apart from that, as a general rule we could try to simplify any code we touch for maintenance or for feature development. This allows reducing technical debt while developing new features and might slowly move the whole project into a more healthy state (where other changes might become feasible).

All in all, this is not easy. I am not sure if I mentioned that before, but the whole point of my post is to avoid getting here in the first place. And if you end up here, then good luck.

Conclusions

There is no fundamental trade-off between code health and delivering features fast. We experience this trade-off because of a large amount of technical debt. Being in such a situation is very unpleasant and significantly slows down the progress of projects. In the extreme, it makes projects fail.

And this is why we should be mindful about technical debt, the reasons why we accumulate technical debt, how to reduce technical debt and most importantly, how to avoid adding technical debt in the first place.

This is easier said than done and I hope I will find the time to get into more concrete examples here in this blog. Please let me know if there are any specific topics you would be interested in.

Scrum is fragile, not Agile

As the title suggests, this post is about two different aspects of Scrum. The first part deals with Scrum not being Agile and the second part is about Scrum being fragile.

Before going more into detail, a short disclaimer: Everything I present in this post (and in this blog in general) is my personal view and does not represent the view of my current employer, my former employers and any future employers.

Scrum is not Agile

I guess a typical reaction to this heading would go like “How is this possible? Scrum is not Agile? Isn’t Scrum the number one Agile software development process?”. The short answer is that Scrum claims to be an Agile process, but the sad reality is that Scrum is quite far from being Agile. I will show you why.

Lets have a quick look at the Agile Manifesto. It states that it values “Individuals and interactions over processes and tools”. Lets also have a quick look at the meaning of the word agile. According to the Oxford Dictionary agile means “Able to move quickly and easily”. It is not a coincidence that the term agile has been chosen to represent the high-level ideas within the Agile Manifesto. In fact, one major point behind Agile is that in many software projects it is extremely difficult to move quickly and easily. This is not the case for a completely new project, but over time many projects get into a situation where sustainable development is simply not possible anymore. To prevent this (and other issues), the Agile Manifesto and the Principles behind the Agile Manifesto provide several high-level guidelines. These guidelines are not specific well-defined processes or tools and they allow for many different implementations. I suspect that both of these properties (high-level and allowing different implementations) were fully intended. The overall goal was not to present a silver bullet, but to help peers to avoid many of the pitfalls in software development, which the authors of the Agile Manifesto experienced first-hand and which fall into exactly these categories.

Now lets have a look at the Scrum Guide (written by two of the authors of the Agile Manifesto). In contrast to the Agile Manifesto and the Agile Principles, this guide seems quite lengthy. Surprisingly, the whole guide does not mention Agile a single time. I am not sure if this was historically always the case, but if the authors of the Scrum Guide do not claim that Scrum is Agile, then we would already be done with the first part of this blog post. I assume that this is not the case, so lets move on. The Scrum Guide is about a framework which contains “roles, events, artifacts, and the rules that bind them together”. In other words, it is a very specific and well-defined process. This does not sound agile and it also does not sound Agile (remember: “Individuals and interactions over processes and tools”). This is quite ironic and obvious. And this is where the whole Scrum movement should have stopped. But it did not and instead frustrates an increasing number of software developers all around the world. And whenever a Scrum project fails, it is not because of Scrum’s potential flaws, but because Scrum was not implemented correctly. That sounds like a nice transition into the second part of this post.

Scrum is fragile

This part is very short. I thought that the wordplay (Scrum being agile / fragile) is kind of funny and apart from that it perfectly describes one of the things that really bother me about Scrum: Whenever a Scrum project fails, it is because Scrum was not implemented correctly. And you can read about a vast amount of such projects. What does it mean, if a large number of intelligent software developers are not able to implement Scrum correctly? It means the whole framework is fragile. And this is another major argument against using Scrum. What is a framework good for, if it is so difficult to use?

Well, it seems that with the help of expensive consulting and coaching, as well as training and certificates, Scrum might in fact provide value. But it is not clear if this is value for the companies developing software and the hard-working software developers or for those who offer services in and around the Scrum ecosystem.

Personal View

I would like to finish this post with a bit of my personal view regarding software development, Agile and Scrum. To me it seems that one very important part of high quality software development is to maintain a simple priority queue of tasks. The weight is a combination of the value a task provides for the customer / developers and the estimated effort to implement this task. For some developers this comes naturally. For teams and companies for which this is not the case, Scrum offers a rather expensive and inefficient implementation of a priority queue.

And lets be honest. Software development is a very difficult and complex work. Are we really surprised that so many projects fail? The field is still very young and we need to learn a lot. And this is crucial: We need to learn from past experiences, let it be failures or success stories. And here we collectively fail. We are not using the wrong processes or implementing the right processes in the wrong way. We are simply caught in a rat race and not able to make a short break in order to look at and learn from all the things that happened around us, maybe even before our time. It is our duty to extract the knowledge, the experiences and the wisdom from the many resources that are so easily available to us: The many many books, articles and videos about software development and, last but not least, the Agile Manifesto.

Inherent Limitations for Benchmarking Problems beyond NP

Today I would like to talk about some difficulties regarding benchmarking of hard problems. With hard problems I mean problems which are supposed to be harder than NP, or related classes, from a computational complexity point of view. This is a very interesting topic on which I am currently working in collaboration with Cassio P. de Campos from Queen’s University Belfast. The motivation for this work comes from the fact that we are both confronted with very difficult computational problems in our research. In the past I have worked a lot with stochastic combinatorial optimization problems, which might be extremely difficult to solve or even to approximate. For example, the Probabilistic Traveling Salesman Problem with Deadlines is #P-hard, which implies NP-hardness and is a much stronger hardness result. In fact, already the evaluation of the objective function for this problem is #P-hard. The question is, how can we empirically assess the performance of algorithms for difficult problems such as the Probabilistic Traveling Salesman Problem with Deadlines?

The situation is not too bad if we are able to compute the value of an optimal solution in a reasonable amount of time and if we are able to evaluate solutions in a reasonable amount of time. But since the problem is #P-hard this is only possible for very small instances. Instances of practical relevance might not belong to that category. What can we do in that case?

If we create a benchmark by sampling from the set of relevant instances in a fair way, we run into the following two problems: We are in general not able to compute an optimal solution (or the value of an optimal solution) for the benchmark instances and we are in general not able to evaluate solutions provided by algorithms on the benchmark instances (in many cases not even approximately). How can we empirically assess the performance of algorithms in such a setting? It seems that we cannot do a lot here. An algorithm returns a solution and we are not able to evaluate, or even estimate, the quality of such a solution.

How can we circumvent this problem?

Are there any ways to circumvent this problem? Well, we could sample and select instances of relevant size for which the above problems do not apply for some reason, or we could use an instance generator in order to create instances together with certain information about these instances, for example the value of an optimal solution. These two approaches  are commonly used and therefore I will thoroughly discuss them in the following.

The problem with the first approach is that we are not really creating a benchmark which is representative for the set of relevant instances. Let me give you an example. Let us consider the following approach: We sample a certain number of instances of relevant size. We then apply an exact solver to these instances with a quite huge time limit. After that, we select only those instances for further usage, which could be solved by the exact solver within the given time limit. The problem is that we are not sampling anymore from the set of relevant instances, but instead from the set of instances which can be solved by an exact solver within a given time limit. This introduces a kind of bias and there is a serious threat (in fact it might be quite likely) that the resulting instances are not representative for the set of relevant instances.

The second approach runs into some theoretical limitations. Whenever we use an instance generator to create instances together with certain additional information about these instances, we are also able to retrieve these additional information using an algorithm in NP or some related complexity class. If these are information that are useful for the benchmarking process, then they are also very likely useful for the given computational task. That means for problems beyond NP such an instance generator cannot exist, unless the corresponding complexity classes collapse. Or in other words: Any attempt to generate instances in such a way leads to a bias in the benchmark, because it is not possible to create instances from the set of relevant instances together with certain additional information without introducing some kind of bias.

Conclusions

Both approaches are not very satisfactory and introduce a certain kind of bias into the benchmarking process. It seems that we are not able to empirically assess the performance of algorithms for hard problems on instances which are representative for the set of relevant instances. We can only hope that the results on our biased benchmark instances generalize to the set of relevant instances, but that does not seem very scientific to me. Computational problems are becoming more and more complex nowadays and there is a huge amount of problems with a computational complexity beyond NP. We are not able to solve them exactly and we are also not able to empirically compare algorithms for such problems. I think what we ultimately can do, is to attack these difficult problems from a more theoretical point of view, for example by finding approximation algorithms for related computational tasks or by identifying important special cases which are tractable.