A Reading List for Software Engineers

IMPORTANT: This post is not in its final desired form, yet. The ratings (probably a finer granularity) and categories still need some work and a lot of descriptions are missing.

In this post I would like to assemble a list of resources that I found helpful for my work as a software engineer. Since my views slowly change over time, I will try to keep this post up to date and continuously add / remove / adapt the content. To make it easier to consume this reading list, I will put the different books into different categories, potentially having a book in multiple categories at the same time. I will also provide for each book one or more ratings out of the categories essential, useful, optional and classic.

As usual, a short disclaimer: Everything I present in this post (and in this blog in general) is my personal view and does not represent the view of my current employer, my former employers or any future employers.

Overview


Code Health

Refactoring: Improving the Design of Existing Code
[Rating: essential]

Design Patterns: Elements of Reusable Object-Oriented Software
[Rating: essential]

Working Effectively with Legacy Code
[Rating: essential]

Test Driven Development: By Example
[Rating: essential]

Clean Code: A Handbook of Agile Software Craftsmanship
[Rating: essential]

Clean Architecture: A Craftsman’s Guide to Software Structure and Design
[Rating: essential]

The Art of Readable Code: Simple and Practical Techniques for Writing Better Code
[Rating: useful]

A Philosophy of Software Design
[Rating: useful]

Understanding Software: Max Kanat-Alexander on simplicity, coding, and how to suck less as a programmer
[Rating: essential]

Algorithms and Data Structures

Introduction to Algorithms
[Rating: essential]

This book is essential for building a solid foundation regarding algorithms, data structures and to some extent also computational complexity.

I first stumbled upon an earlier version of this book during my studies in computer science and really enjoyed reading it. The book itself is quite verbose, covers a large amount of topics and contains a lot of illustrations. Overall, it is a very good introductory book, as well as a valuable reference later on.

On top of that, the book can be very useful for interview preparation.

Algorithms
[Rating: useful]

Not quite as good as Introduction to Algorithms for my taste, but still a very good book.

The focus is more on practical aspects of algorithms, while Introduction to Algorithms covers the practical as well as the theoretical side. For this reason it might actually be the preferred choice for some engineers.

This book can also be very useful for interview preparation.

The Art of Computer Programming
[Rating: classic]

These books are definitely among the most famous books about computer programming of all time. Having started with this project in 1962, Donald Knuth published the first three volumes between 1968 and 1973. Volumes 4 and 5 were published in 2011 and 2015, respectively. Volume 6 is about to be published and volume 7 might conclude this compilation at some point in the future.

I personally bought volumes 1-4 at the beginning of 2017. So far I have just read very few parts. But since I added these books to my personal library my productivity miraculously increased by 2-3%.

Not an easy read, though.

Programming Languages

C++

Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14
[Rating: essential]

Effective C++: 55 Specific Ways to Improve Your Programs and Designs
[Rating: useful]

The C++ Standard Library: A Tutorial and Reference
[Rating: useful]

Modern C++ Programming with Test-Driven Development: Code Better, Sleep Better
[Rating: useful]

Python

Fluent Python: Clear, Concise, and Effective Programming
[Rating: essential]

Effective Python: 59 Specific Ways to Write Better Python
[Rating: useful]

Java

Java: The Complete Reference
[Rating: useful]

Go

The Go Programming Language
[Rating: essential]

Tools

Practical Vim: Edit Text at the Speed of Thought
[Rating: essential]

The Linux Command Line: A Complete Introduction
[Rating: useful]

Docker Deep Dive
[Rating: useful]

Shell Scripting: How to Automate Command Line Tasks Using Bash Scripting and Shell Programming
[Rating: useful]

Pro Git
[Rating: useful]

Mercurial: The Definitive Guide
[Rating: useful]

Software Craftsmanship

Coders at Work: Reflections on the Craft of Programming
[Rating: useful]

97 Things Every Programmer Should Know: Collective Wisdom from the Experts
[Rating: useful]

97 Things Every Software Architect Should Know: Collective Wisdom from the Experts
[Rating: useful]

The Passionate Programmer: Creating a Remarkable Career in Software Development
[Rating: useful]

Effective Programming: More Than Writing Code by Jeff Atwood
[Rating: useful]

How to Stop Sucking and Be Awesome Instead
[Rating: useful]

A Programmer’s Rantings: On Programming-Language Religions, Code Philosophies, Google Work Culture, and Other Stuff
[Rating: useful]

Software Developer Life: Career, Learning, Coding, Daily Life, Stories
[Rating: useful]

The Pragmatic Programmer: your journey to mastery
[Rating: useful]

Processes

Clean Agile: Back to Basics
[Rating: useful]

The Clean Coder: A Code of Conduct for Professional Programmers
[Rating: useful]

Becoming a Technical Leader: An Organic Problem-Solving Approach
[Rating: useful]

Debugging Teams: Better Productivity through Collaboration
[Rating: useful]

The Dip: A Little Book That Teaches You When to Quit (and When to Stick)
[Rating: useful]

So Good They Can’t Ignore You: Why Skills Trump Passion in the Quest for Work You Love
[Rating: useful]

Crucial Conversations Tools for Talking When Stakes Are High, Second Edition
[Rating: useful]

Crucial Accountability: Tools for Resolving Violated Expectations, Broken Commitments, and Bad Behavior
[Rating: useful]

The Healthy Programmer: Get Fit, Feel Better, and Keep Coding (Pragmatic Programmers)
[Rating: useful]

The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change
[Rating: useful]

Getting Things Done: The Art of Stress-Free Productivity
[Rating: useful]

Great at Work: The Hidden Habits of Top Performers
[Rating: useful]

Classics

Code Complete: A Practical Handbook of Software Construction, Second Edition
[Rating: classic]

The Mythical Man-Month: Essays on Software Engineering
[Rating: classic]

Code Health is not a Trade-off

It is a popular perception in software engineering that there exists a trade-off between code health and delivering software fast. This is a huge misconception fueled by the fact that we very often maneuver ourselves into such a situation, where we suddenly have to decide between improving the quality of our codebase and delivering new features.

Yes, this trade-off is real and I have experienced it myself more often than I would have liked to, but the underlying problem is that we allow this trade-off to materialize in the first place without actually getting anything in return.

In the remainder of this post I will look into some of the reasons that create such trade-off situations, how we could avoid them and what we could do to get out of them.

Before going more into detail, a short disclaimer: Everything I present in this post (and in this blog in general) is my personal view and does not represent the view of my current employer, my former employers or any future employers.

Accumulating Technical Debt

Having a large amount of technical debt is the reason why we experience this trade-off between code health and delivering features fast. So why are we getting into this vortex of technical debt? There are three reasons for that.

First, we very often incorrectly perceive that by going into technical debt we are able to gain short-term advantages which potentially outweigh all the long-term downsides. But in software development this is just not possible. Alright, there are a few notable exceptions here, like the one-off script, the small prototype or the feasibility study. But even in those cases, how often did we need our one-off script more than once? How often did our small prototype grew over time until it made it into production? How often did our feasibility study require time-consuming debugging until it gave correct results? I do not want to claim that those exceptions do not exist, but they are extremely rare. In all other cases, there are no significant short-term benefits from going into technical debt. Technical debt hits us much faster than we expect, even for single-person short-term projects, but more so for multi-developer projects with a long lifetime.

Second, we sometimes just do not know how to do it better. Each developer has a specific set of skills and experiences. Even by doing our best, we might immediately introduce additional technical debt. And this is one of the reasons why I personally do not like the term technical debt. While we go into technical debt consciously, but based on false premises, in some cases (first point), we mostly go into technical debt accidentally (this point and the next one). And this is not how you typically get into other forms of debt.

And third, even in cases where we do know how to implement things perfectly, we get it wrong. Actually, we get it wrong all the time. And not because we are missing skills or experiences, but because we are not able to predict the future. We are implementing something in software (and not hardware), because we expect the implementation to change over time. Even in cases where our current implementation is perfect (whatever that means, but this is a different story), it might be far from ideal in one month from now or in half a year. And that means that we get additional technical debt just because the context (e.g. constraints, requirements, scale, etc.) changes over time. Again, the term debt does not seem appropriate here.

How to avoid Technical Debt?

Looking at the three different ways in which we are adding technical debt, how can we best avoid it? First of all, lets not get lured into consciously adding technical debt by the false premises of short-term gains. In most cases, this is not what we get back in return. So what about the other two cases in which we accidentally add technical debt?

We can reduce the amount of technical debt that is added because we were not able to find better ways of implementing or designing a feature by getting better ourselves. Continuous learning and improvement is something I am very passionate about. It takes time, but there are plenty of resources (books, videos, courses, etc.) available that could help us to avoid repeating the same mistakes we (or other developers) did in the past. Those are techniques like test-driven development, pair programming, code reviews, refactoring, patterns and many more. I always wanted to compile a list of useful resources and might be able to add it in the next days, so you have some starting points.

What about the technical debt that is added because the context of our project has changed? Well, we can easily avoid this by getting better at predicting the future. Just kidding. While we might get better over time about making reasonable predictions of the future, this will never be sufficient to avoid this issue altogether. What we can do is to accept that we will get things wrong and to design the system with that in mind. This is not easy at all and in my opinion one of the most difficult things in software development. But by designing the system for change, by deferring decisions and by keeping the codebase simple and modular, we can mitigate a lot of technical debt as soon as it occurs.

Whatever we do, we will sooner or later accumulate technical debt. Even a team with the most experienced and skilled engineers will have to deal with technical debt. The important thing here is to continuously deal with it and to reduce it as soon as it occurs. As long as we are able to keep technical debt below a critical threshold (and this threshold is not very large), we are not getting into the situation where we have a trade-off between code health and feature development. This is what we should aim for. But what if we are above that threshold?

How to reduce Technical Debt?

Working in a project with a large amount of technical debt is not a pleasant situation to be in. As I said before, this is something we should try to avoid from the very beginning. Don’t get into such a situation.

But what if it is too late or if you just joined a new project which is in an unhealthy state? It can be extremely stressful to work in such an environment. If that is the case for you and if the situation is not improving, I would recommend to look for other opportunities and to not remain in that situation for too long.

In any case, if we are in such a situation, we basically have to deal with the uncomfortable trade-off between code health and feature development. That means there has to be someone in the decision chain which supports reducing technical debt. If that is not the case, I doubt that such an endeavour could be successful and would like to refer to what I said in the previous paragraph. If, on the other hand, reducing technical debt is a priority, it will still be a very difficult task, but there are a few things we can (at least try to) do.

Replacing the whole system is most of the times very risky and something I would generally advice against. Instead, I would try to reduce technical debt more locally. I would start by improving testability and test coverage in order to increase the confidence in the codebase and to make further changes easier. Being able to change code with confidence might allow us to modularize some components and decouple them from the rest of the system. Replacing such components with better implementations does not come with the same risks as doing it for the whole system and is something that worked very well for me in the past. Apart from that, as a general rule we could try to simplify any code we touch for maintenance or for feature development. This allows reducing technical debt while developing new features and might slowly move the whole project into a more healthy state (where other changes might become feasible).

All in all, this is not easy. I am not sure if I mentioned that before, but the whole point of my post is to avoid getting here in the first place. And if you end up here, then good luck.

Conclusions

There is no fundamental trade-off between code health and delivering features fast. We experience this trade-off because of a large amount of technical debt. Being in such a situation is very unpleasant and significantly slows down the progress of projects. In the extreme, it makes projects fail.

And this is why we should be mindful about technical debt, the reasons why we accumulate technical debt, how to reduce technical debt and most importantly, how to avoid adding technical debt in the first place.

This is easier said than done and I hope I will find the time to get into more concrete examples here in this blog. Please let me know if there are any specific topics you would be interested in.

Inherent Limitations for Benchmarking Problems beyond NP

Today I would like to talk about some difficulties regarding benchmarking of hard problems. With hard problems I mean problems which are supposed to be harder than NP, or related classes, from a computational complexity point of view. This is a very interesting topic on which I am currently working in collaboration with Cassio P. de Campos from Queen’s University Belfast. The motivation for this work comes from the fact that we are both confronted with very difficult computational problems in our research. In the past I have worked a lot with stochastic combinatorial optimization problems, which might be extremely difficult to solve or even to approximate. For example, the Probabilistic Traveling Salesman Problem with Deadlines is #P-hard, which implies NP-hardness and is a much stronger hardness result. In fact, already the evaluation of the objective function for this problem is #P-hard. The question is, how can we empirically assess the performance of algorithms for difficult problems such as the Probabilistic Traveling Salesman Problem with Deadlines?

The situation is not too bad if we are able to compute the value of an optimal solution in a reasonable amount of time and if we are able to evaluate solutions in a reasonable amount of time. But since the problem is #P-hard this is only possible for very small instances. Instances of practical relevance might not belong to that category. What can we do in that case?

If we create a benchmark by sampling from the set of relevant instances in a fair way, we run into the following two problems: We are in general not able to compute an optimal solution (or the value of an optimal solution) for the benchmark instances and we are in general not able to evaluate solutions provided by algorithms on the benchmark instances (in many cases not even approximately). How can we empirically assess the performance of algorithms in such a setting? It seems that we cannot do a lot here. An algorithm returns a solution and we are not able to evaluate, or even estimate, the quality of such a solution.

How can we circumvent this problem?

Are there any ways to circumvent this problem? Well, we could sample and select instances of relevant size for which the above problems do not apply for some reason, or we could use an instance generator in order to create instances together with certain information about these instances, for example the value of an optimal solution. These two approaches  are commonly used and therefore I will thoroughly discuss them in the following.

The problem with the first approach is that we are not really creating a benchmark which is representative for the set of relevant instances. Let me give you an example. Let us consider the following approach: We sample a certain number of instances of relevant size. We then apply an exact solver to these instances with a quite huge time limit. After that, we select only those instances for further usage, which could be solved by the exact solver within the given time limit. The problem is that we are not sampling anymore from the set of relevant instances, but instead from the set of instances which can be solved by an exact solver within a given time limit. This introduces a kind of bias and there is a serious threat (in fact it might be quite likely) that the resulting instances are not representative for the set of relevant instances.

The second approach runs into some theoretical limitations. Whenever we use an instance generator to create instances together with certain additional information about these instances, we are also able to retrieve these additional information using an algorithm in NP or some related complexity class. If these are information that are useful for the benchmarking process, then they are also very likely useful for the given computational task. That means for problems beyond NP such an instance generator cannot exist, unless the corresponding complexity classes collapse. Or in other words: Any attempt to generate instances in such a way leads to a bias in the benchmark, because it is not possible to create instances from the set of relevant instances together with certain additional information without introducing some kind of bias.

Conclusions

Both approaches are not very satisfactory and introduce a certain kind of bias into the benchmarking process. It seems that we are not able to empirically assess the performance of algorithms for hard problems on instances which are representative for the set of relevant instances. We can only hope that the results on our biased benchmark instances generalize to the set of relevant instances, but that does not seem very scientific to me. Computational problems are becoming more and more complex nowadays and there is a huge amount of problems with a computational complexity beyond NP. We are not able to solve them exactly and we are also not able to empirically compare algorithms for such problems. I think what we ultimately can do, is to attack these difficult problems from a more theoretical point of view, for example by finding approximation algorithms for related computational tasks or by identifying important special cases which are tractable.