Evolution of programming – is simple better?

Disclaimer: this is an opinionated article, flowing directly from brain to keyboard without lots of references and explanations. I know much of its content is debatable and it just reflects my current opinions and mindset, likely to change without notice.

Recently I watched the Design Patterns Coursera course (https://www.coursera.org/learn/design-patterns) and noticed that it’s full of Gang of Four (GoF) design patterns. If you are a programmer reading about programming online, you’ll find that discussions and material on GoF has been very sparse in the recent years, especially with the uprising functional programming hype (if we can call it a hype).

I wondered about this and a bit of investigation led me to (mostly) the following opinions:

GoF patterns are now common knowledge and therefore nobody talks about it anymore, just like nobody talks about using functions or control structures anymore.
They were a fail, born in the object-oriented-Java-mindset, made software even more complex and went overboard with useless abstractions.
They were there to help with programming language shortcomings which are less and less of in issue now, especially with functional features entering existing languages.

So, was OOP a fail? Will functional programming save us? Are strapped-on design patterns useless and replaced by simpler language features? How did we get to the current situation, where will we and up?

I like to look at this topic mostly from a C++ perspective. Why? Because C++ is a multi-paradigm language and evolved a lot over the years. C++ codebases reflect the mindsets of the different eras. Let’s look at the different eras:

C with classes

No talking about the history of C++ without mentioning “C with classes”: early C++ programs were generally still very procedural in nature with some classes interspersed. There was a lot of skepticism involved, like polymorphism might be nice, but those vtable lookups are certainly expensive etc. Coming to the field because of gaming, I early on adapted the style shown by the Code on the Cob articles (http://archive.gamedev.net/archive/reference/list981c.html?categoryid=215) and was pretty happy with it. People soon discovered to power of destructors for resource management and RAII (https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization) became popular (unfortunately it’s still not as popular as it should be) and classes became more and more common.

OOP and Javaland

My impression is that the rise of Java brought us the dogmatic OOP architectures we saw in the following years where GoF became dogma and were a standard question in every software developer interview. So we were taught about Singletons and Factories and all that. While I started to sort of like Java after some time and I did my Singletons and Factories like everyone else, it never felt really right to me. Somehow I still preferred my Code on the Cob style. But still, after a couple of years in Javaland, this style swapped over to my C++. Schools, courses, universities taught object-oriented design, we were doing UML diagrams, abstracting the virtual world into sets of objects and their relations. We did “dog is-a animal and has-a leg” diagrams to explain inheritance and composition. Starting a new project or module was typically a question of “well, we need some thing that does this and that – make a class for it”. This style had some advantages, while often unnecessarily verbose and heavily abstracted, it led to a subset of C++ that usually worked out. I think this is the reason why it took off in the first place, especially in combination with Java: stuff typically worked. I felt Java to be easily learnt (in comparison to C++) with its hand-holding. No thinking about where to place the includes/imports best and circular dependencies, adding stuff to headers or cpp files, ownership issues, if your method parameter shall be a copy, a reference, a raw pointer, a unique pointer, a shared pointer. No considerations about lifetime, placing data on the stack or heap or data segment, returning elements while avoiding copying, data type sizes on different architectures, buffer overflows, compiler differences and much more. Just reading this list makes me wonder why the hell I’m still a fan of C++. So in essence, Java restricted your possibilities and disallowed everything that could cause you troubles. At the expense of expressivity and control. So we needed patterns. While languages like C# started pretty early adding language features which made a couple patterns obsolete (events, properties, LINQ…), Java was much more conservative in making changes to the language and for example relied on IDE support to generate masses of getters and setters and code templates for common patterns. Java itself is heavily abstracted from the underlying systems and we began stacking even more abstractions on top of it. Everything should be dynamic, abstract and open for modification in future. This led to monsters of abstraction boilerplate. This became so pervasive to the Java world that you can even find lots of parody on this, like the FizzBuzz enterprise edition (https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition), which turns a roughly 10 lines of code problem into an abstraction monster consisting of dozens of files and classes.

Leaky abstractions

As years passed, many programmers noticed that while in many cases their abstractions helped future code modifications due to requirement changes/extensions, in probably even more cases it did not. Perhaps the abstractions in place didn’t really fit the new, radically different mechanism that should be added to the project. Often it could somehow be bent hard enough so it did fit. Even more often we had to introduce weird hacks and workarounds to deal with leaky abstractions. As frameworks instead of libraries became popular, we saw this even more often: once you need something different than intended or want to combine two frameworks, things become hairy. Instead of combining small and flexible building blocks, we often ended up with monstrosities that had to be nudged in roughly the direction we want to go and break their abstractions all the time.

Shredding abstractions

At some point, people like me acknowledged this and started to shred abstractions again. I think there is an important distinction to be made: “user”-facing code and internal implementation. Code that will be shared with other applications not under control of the programmer shall still be as flexible as possible to further modification without introducing breaking changes to the interface. This is the place for smart patterns. The implementation of the internals is free for refactoring and probably not the place for obscure abstractions. So say you would like to have something that produces different types of objects according to some string value and keep the potential for caching stuff and returning the same object multiple times. A typical OOP design patterns solution might be to have a Singleton (so you can globally cache in future) Factory (to abstract the object creation) with some Command-style pattern (to map strings to creation processes). If this is an internal mechanism, nowadays I’d rather go with a simple function containing if/else or switch blocks to return the appropriate (potentially small immutable) objects. If you REALLY need the caching at some point, you can still access some module-internal cache later on. Meanwhile I have a couple of those in my codebases and guess what? Over the years the only modifications I did to those was to add one or two “else if”s and that’s it. Most of them are now just a simple function with 6-7 lines of code instead of n classes. I’ve found myself to use more “stupid” datatypes again, with only public members and no Getters and Setters but instead const/immutable with their values fixed at construction time. Typically you don’t want to have complex (i.e. intransparent, unpredictable and impure) Getter logic anyway and different types of validation or modification can be done at construction time due to the constness. Changing the behavior of a Getter might result in a leaky abstraction as well – suddenly the user code accessing 500k elements via their Getters takes 2 minutes instead of a second because of the new logic added or because it rendered the pre-cacher unable to do its work. Suddenly the video stream of the user stalls because of the internal modification of a Getter. Perhaps the new caching mechanism in the getter is not thread-safe, breaking the user code. This led to some companies issuing rules that Getters are really only allowed to get an internal value without added logic, but keeping them because the learnt OOP rules say so. Personally I found the cases where I had to refactor something like this so extremely rare that it certainly outweighed a bloated codebase.

Let’s skip deep class hierarchies for now – most have stopped using them for good. “Composition over inheritance” has become a common term anyway. Personally I found the component-based system Unity3D uses a very flexible and nice example for this. I think I never created huge type hierarchies but now I’m even more careful and you’ll typically only see me building a single level of inheritance in my code to make use of polymorphism and specify interfaces, rarely for code reuse. Languages like Go show that we can have duck-typing with static type checking.

Power to the programmers

The concept of encapsulation has its merits. But as it was generally taught in the 1990s and early 2000s, it was mostly seen as a tool to protect your codebase from stupid programmers. Python “encapsulates” private members by just flagging them with a prepended underscore. At first I found this weird, but meanwhile I know: sometimes we just NEED to access that internal member if we want to get our job done. We KNOW this thing might change in future but we still have to get our job done now. We need to because an abstraction was leaky or the interface didn’t provide everything we need. Of course we should strive to make our code as easy to use and intuitive as possible for other programmers. But we should probably stop treating them like idiots. This also presents the fundamental difference in the philosophy of C++ versus Java. Our future selves might be the idiots using our code, so let’s make their life as painless as possible. I guess we can assume that (hopefully) they will still know the difference between “_internalfunction_donttouchthis” and “publicfunction_usethis”. So let’s not make them suffer by having them replace “private” with “public” because they just need to call “_internalfunction_donttouchthis” for some (good) reason.

Functional programming

I mentioned it a couple of times in the previous paragraphs and here it is now, the elephant in the room: can functional programming save us?

I’m skeptical.

While I do think there is a definitive value in exploring the advantages of pure functions and immutability and we should all get started with it, I am not convinced that functional programming leads to better code by default and in general. We currently see lots of expert programmers swarming to functional programming languages and doing cool stuff. But I think novice programmers will have just as many pitfalls. I’ve seen a few functional codebases and what I found yet again: over-engineering, over-abstraction, leaky abstractions and unreadable messes. It just manifests slightly different. But one of the classic problems is just as true:

Naming is hard.

Functional programming often means lots and lots of functions. And they all need (good) names. This is one of the reasons why I’m not a huge fan of one-liner functions. It is very distracting when reading the code and you have to jump in a nested function call hierarchy with dozens of functions with very similar names. If they can’t be inlined they blow up the stack – once the disassembly consists of 70% pushing stack pointers and jumps, you know that you might have over-abstracted the thing. Of course this affects performance of critical sections as well. Back to naming: looking at the Scheme codebases of Festival and Festvox (http://festvox.org/) you will see lots of “build, do_build, perform_build, build_blahblah” functions – just using functional programming by itself didn’t help to avoid that naming hell. You will see code that modifies other code over time and you have no idea what was actually called at some point after the thing ran for 7 hours. Functions as first class citizens: returned functions with closures passed around often lead to code that can only sensibly be understood when run. Makes sense, considering working with LISP languages typically involves working with the REPL and operating on “live” code. But we will talk about in the next section. Personally I really enjoyed my few smaller adventures with Haskell much more than LISPy languages, but this is a highly subjective issue.

That all being said, I’d love to see a language like Haskell becoming more mainstream. But I don’t think functional languages will perfectly solve all of our difficulties writing software.

Dynamic languages

I mentioned it before and it’s an age-old discussion: you get a list of lists of lists from a function but have no idea that it is a list of lists of lists, and what the inner-most list actually contains. If you’re working in the REPL you’ll just inspect it. If you’re statically analyzing a codebase, you’re screwed if there’s no documentation. I prefer making types explicit to the compiler than in documentation only for humans. Telling the compiler stuff we know is always a good thing, if it can’t infer it by itself. I would call it my own stupidity if I wouldn’t be in good company with famous guys like John Carmack:

The lack of strong typing in lisp is freaking me out, even in tiny programs. Writing an industrial program like this seems… unsound.

— John Carmack (@ID_AA_Carmack) June 5, 2013

I guess this also highly depends on which type of projects you work on and also on your personality. I often find myself writing C++ code for hours before running it the first time. I can stay in the flow better that way. Others like that REPL-flow better and actually when working on Data Science problems, I also typically resort to iPython and Spyder to inspect every single step. This is not only because there is less boilerplate (because of the language but also because the analysis scripts don’t have to be as robust and performant), but also because the C++ IDE constantly tell me about small mistakes beforehand. And about the types I get back from a function. In comparison, when working with pandas in python, the elements I get back from a pandas function is nearly never the one I expected.

However, back to functions: in functional codebases you will often see functions/lambdas created at some point, assigned and used at some other point. When they are used, you usually have a hard time to figure out, which one it is and what will actually happen. It’s created and stored somewhere else, and potentially also caught another dozen parameters in a closure. With polymorphism you typically at least know that it is one of the 5 subclasses/implementations of the base class/interface in question. With an “if x then A() else B()” it is completely obvious, but obviously not as dynamic. So here is our tradeoff again between flexibility and simplicity.

Recent years have brought us a lot of dynamic languages. Python, JavaScript, Ruby etc. saw huge gains in popularity. Test-driven development became big with those. Unfortunately many codebases contain unit tests for issues any good compiler could tell you. Writing rather long-running machine learning pipelines this can be quite tedious when after 5 hours the thing crashed because you forgot to “str” the float you wanted to log. In comparison I often write 1-2 hours of code in C++ and not rarely it compiles and runs without problems. I’m really excited at how much a language like Haskell can/could help with that. As I’m not alone with that opinion, people recently started to value statically and strongly typed language features again. So we see languages like TypeScript, Elm and Scala riding on top of JavaScript or the JVM and existing languages, adding features like in type annotations in Python or typespecs in Elixir. We also see simple and statically typed languages like Go on the rise.

But of course it’s not as black & white:

I favor static typing, but people that deny there is any value in fast and loose dynamic typing are wrong.

— John Carmack (@ID_AA_Carmack) April 21, 2017

So at the same time we see the classic “static” languages becoming more dynamic. Additions like the “auto” keyword in C++ make programming much more convenient while retaining all the advantages. Or just look at the structural typing in Go.

But where are we heading?

So static languages are becoming more dynamic, dynamic languages more static. We see functional features entering existing languages, we can find huge multi-paradigm monsters like Scala and the complete opposite like Go. Not only in languages itself, we find people swarming from declarative/static model/graph definitions in frameworks like Tensorflow to the more dynamic and imperative PyTorch graphs. Web developers find that there is now reactive programming with react, state-management with redux and immutability with libraries like immutable.js. Overall I don’t think we will see a “simpler” functional concept taking over, but we will see an ever bigger mess of paradigms and concepts, with developers required to know them all to deal with different codebases. Similarly to know knowing C++ nowadays means knowing the full history of the language and all the style that were prevalent over the years.

To get back to the headline: I do think that simple would be better, but I don’t think things will actually become simpler in future.

Evolution of programming – is simple better?

C with classes

OOP and Javaland

Leaky abstractions

Shredding abstractions

Power to the programmers

Functional programming

Dynamic languages

But where are we heading?

Recent Posts

Archives

Recent Posts

Recent Comments

Archives

Categories

Meta