What does the concept immutable data mean from a Ruby programmer’s perspective? How is immutability supported in Ruby, and why should you care? In this article I’ll attempt to explore these questions. The material is based on a little talk I did at the March 2013 meetup of the Helsinki Ruby Brigade.
The article comes in two parts:
- Part 1 (this one) defines immutability and looks at how it shows up in the Ruby standard library.
- Part 2 explores the effects of immutability in object-oriented domain modeling.
This is how Wikipedia defines an immutable object:
An immutable object is one whose state cannot be changed after construction.
The key word here is state. The way we choose to manage state in our programs has profound implications on our ability to read, comprehend, and test them. Where do we keep state? How, when, and where do we allow it to mutate?
For state held inside immutable objects the answer is that no one is able to change it. Ever. Once you make an immutable object, it is what it is. If you want something different, you need to make another one. This approach, which may sound unworkable at first, is actually quite useful.
Consider version control systems. At Deveo, version control is at the heart of what we do. As Philip Nilsson has beautifully described in a recent blog post, Git is basically an immutable, purely functional data structure. The same is generally true for Mercurial and Subversion as well: Data is append-only. Changes to your codebase make new versions of the codebase in version control, they do not mutate old versions of the code in place. That would defeat the whole purpose of version control.
So, actually, we’re working with immutable objects all day, every day.
Immutability is not limited to version control systems, of course. It can be an extremely useful tool for designing any piece of software, big or small.
Immutability in Functional Programming
In functional programming languages, such as Clojure and Haskell, immutable data is everywhere. It is the default.
For me personally, it has been particularly Clojure that’s brough the concept of immutable data home. Rich Hickey, the creator of Clojure, has had this to say about mutable state:
Mutable stateful objects are the new spaghetti code:
- hard to understand, test, reason about
- Concurrency disaster
Indeed, in Clojure, all the core data structures are immutable. For example, consider this vector of numbers:
It contains the numbers 1, 2, 3, and 4. It will continue to do so as long as it exists - nothing will ever be added or removed. If you want to manipulate it somehow, you use functions that return new datastructures. Existing ones are never mutated.
The Benefits of Immutability
So why exactly would you want to write programs in this way? Much has been written about the benefits of immutability already, but I’d like to point out two things that have been particularly significant for me. They happen to be the ones also mentioned by Rich Hickey in the quote above: Readability and concurrency. There are many other benefits (such as a boost in testability), but I will concentrate on these two.
Take a look at this, vaguely suspicious method:
1 2 3 4 5 6
What is the total amount of the order when it is returned?
Does it match the amount given as an argument to
Order is a mutable class, everything depends on what
VatProcessor.process does. The order’s total
amount may be mutated there, and by just looking at the
make_order method there is no way for us to know whether
that happens or not.
Mutability introduces these kinds of wormholes of state change, where a change to state in one place may have effects on some other remote location in the code. These effects are not visible by just reading a single method in the code.
If, on the other hand, we knew that
Order is an immutable class, we would also know exactly what the amount of the order is after
make_order has executed. To confirm this, we would not have to go look at
any of the methods it might invoke.
VatProcessor.process would need to make changes to an immutable order, it would need to return a new instance
Order, and that means we would have to make this behavior explicit in our method:
1 2 3 4 5 6
Now it is obvious that
VatProcessor.process will change the order. There is no ambiguity.
Let’s say we have an in-memory data structure holding the current set of logged-in users in our application. Let’s also say we want to call some method on each of the logged-in users:
1 2 3
What could go wrong here? Well, if this is a multi-threaded program, the array of logged-in users might change while we’re iterating, causing all kinds of interesting behavior.
The way we often deal with concurrency like this is to make it go away, by serializing access with something like a Mutex.
But there is another way: What if the collection of logged in users was immutable? We’d be pretty
much safe if that were the case. The global variable
Application::LOGGED_IN_USERS could simply start pointing
to some other collection at some point in the future, and that would not affect our iteration. We would
still be working with the version of the collection that existed at the time we dereferenced it. This separation
between identity (the global variable) and state (the variable’s value at a given time), is brought out by
the use of immutable data structures. Rich Hickey is particularly good at explaining this separation, and his talk
on the subject from QCon 2009
is highly recommended viewing.
When we’re using immutable data structures, we can have many things going on concurrently in our programs, without having to be so cautions about stepping on each other’s toes.
Does Immutability Work In Object-Oriented Languages?
Clojure and other functional programming languages have their standard libraries all fundamentally built to support immutable data. Immutability is the default and the norm. But what about languages that aren’t built around immutable data, such as Ruby? Can we still make use of it?
Interestingly, if you look at the object oriented literature, immutability is all over the place. Far from a mere footnote, it is actually recommended by many of the respected voices in the OO community as the default way to construct programs. For example, Josh Bloch, in Effective Java, which could be described as The Java book, says this:
Immutable objects are simple. Classes should be immutable unless there’s a very good reason to make them mutable. If a class cannot be made immutable, limit its mutability as much as possible.
You would not expect this when you look at most of the OO code you see out there. Most of it is mutation, in place and out of control. Why the big gap between the theory and the practice?
One could argue that the reason is OO languages make no effort to help us work with immutable data. The default is always mutable. We, the application developers, are expected to do all the work and build immutable objects on mutable foundations. And of course we often don’t. We follow the path of least resistance because we have other stuff to deal with. Stuff like, for example, the customer problems we’re paid to solve.
But the benefits of immutability are too significant to ignore. Even where our languages fail to support us, we should strive to do the right thing. So how about going immutable in Ruby? Can it be done? Let’s look at what the standard library gives us.
Immutable Ruby: The Standard Library
If we look at the Ruby standard library, we see a mixture of mutable and immutable data structures. Some of them offer both mutable and immutable APIs, and in those cases we can choose our approach.
Whatever we do, it’s probably a good idea to be aware of when we’re doing in-place mutation, and who else might be affected by it. So let’s look at a couple of examples.
Consider this fine number:
That’s an integer - an instance of Fixnum. Are Fixnums immutable? Well, obviously, yes. What would mutation for numbers look like, anyway?
With simple values like this, it seems ridiculous to do mutation. 42 is 42. There are other numbers, but they are also different objects. 42 cannot become 44 by means of method calls that would magically mutate its internal state.
With Strings, things become a lot more interesting. As it happens, a Ruby String is not an immutable object. It comes with a bunch of methods, with which we can reach into its internals in all sorts of ways.
One of these mutating methods is
1 2 3
One way to see the significance of this is to think of a situation where Strings are used as Hash keys. Here’s an example:
1 2 3 4
When you use a String as a Hash key, and then mutate the String, what you get as a result of accessing the Hash
with the same exact String object is
nil. This is because the String is not actually the Hash key.
The Hash key is the return value of the String’s hash method
at the time when it was stored in the Hash.
I think we can agree that this behavior is not obvious. It doesn’t do what it would seem to do at first glance. The reason is that the identity of the String object and its value at a given time are hopelessly intertwined.
Much of Ruby, especially its runtime, is much closer to Python’s [than Perl’s], with the exception of Ruby’s mutable strings, which I find an abomination.
Matz himself, in a reaction to this comment, responded by channeling the philosophy of Ruby’s design:
I rarely have such problems caused by mutable strings. Besides, Ruby is not a language to keep people away from horror. You can write ugly, scary, or dangerous programs in Ruby, if you want. It’s cost for freedom.
That’s Ruby. It does let you do terrible things, but usually also gives you the possibility to take the higher ground.
For example, in our Hash example one could argue that in Ruby you would actually use Symbols as Hash keys instead of Strings. There are many reasons for that, one of which is that Symbols are immutable in Ruby. You cannot change the nth character of a Symbol. If you want a Symbol like that, you just use another Symbol.
Interestingly, Rich Hickey, in an unrelated discussion, seems to be much closer to Guido’s language design philosophy, in that you probably should not let language users shoot themselves in the foot in this way:
There’s no such thing as a convention of immutability, as anyone who has tried to enforce one can attest. If a data structure offers only an immutable API, that is what’s most important. If it offers a mixed API, it is simply not immutable.
But I digress.
On the other hand, both Array (with Enumerable) and Hash define another bunch of methods, for doing functional style transformations, in which no in-place mutation happens:
1 2 3
take here return new Enumerables instead of mutating the original ones.
The Enumerable held in
a is no different after this operation from what it was before.
Many Rubyists, including myself, prefer this kind of approach whenever possible, since with it you can pack a lot of power into a relatively small amount of code. But this approach does have its drawbacks. One of those is the performance problems it may cause.
Since new data structures are created for each operation, a lot of memory may need to be allocated: One array for the results of
map, and another for the results of
a has a million items the overhead may be
significant. Since in the end we only take two items from the result, the
select operations will also
have done a whole lot of unnecessary computation, the results of which are never seen by anyone.
Functional languages often get around this kind of problem with laziness. This is true in the case of Clojure and Haskell, for example. With lazy data structures, no memory allocation or computation is done before the results of that computation are actually needed.
Fortunately, in Ruby 2.0 we now have the Enumerable::Lazy module,
which does exactly this for the functional style methods in Enumerable. By just calling the
lazy method of
a before chaining the other operations
on it, we make the whole chain of operations lazy:
1 2 3 4
This makes Ruby a lot more friendly towards people who like to write this kind of functional code with immutable data structures.
Sometimes the convention of immutability just isn’t enough. Perhaps you’re writing a library and simply don’t know what other code will be involved with it in the future. Or perhaps you’re dealing with a lot of in-process concurrency and still need to retain your ability to reason about state changes. For situations like this, you won’t find anything in the standard library to help you.
But fortunately there are libraries! If you’re running on JRuby, you can actually just use the immutable and persistent data structures provided by Clojure. There are several libraries available to make the use of Clojure data structures from JRuby easy.
Another option is to use the Hamster gem, which provides solid pure Ruby implementations of many immutable data structures, such as lists, hashes, stacks, and queues.
With Hamster, operations on data structures do not mutate them in place, but instead return new versions:
1 2 3 4 5
Because of this fundamental difference to the built-in Arrays and Hashes, you probably cannot simply drop Hamster data structures into your code in place of the built-in ones, though Hamster does provide most of the read-only methods of Enumerable and Hash. The code that deals with these data structures must be converted to a functional style. In any case, the fruits of that labor may well be worth the effort.
Immutable objects, particularly popular in functional programming languages, have their place in OO as well. They serve to make code simpler, safer, more readable, easier to test, and better equipped for concurrent use.
Some of Ruby’s standard data types are immutable, others are not. For classes that provide a mixed API, it is often a good idea to try to stick to the non-mutating methods. When you do use in-place mutation, try to be aware of when and where it happens in your code, and what other parts of the code may be affected by it.
Outside of the Ruby standard library, there are fully immutable collection implementations that can be used in Ruby, such as Hamster and all the Clojure bridge libraries for JRuby.
In the second part of this article, I will turn our attention to object-oriented domain modeling, and how immutability affects how we approach OO design.