Immutability in Ruby Part 1: Data Structures

| Comments

What does the concept immutable data mean from a Ruby programmer’s perspective? How is immutability supported in Ruby, and why should you care? In this article I’ll attempt to explore these questions. The material is based on a little talk I did at the March 2013 meetup of the Helsinki Ruby Brigade.

The article comes in two parts:

Immutable Objects

This is how Wikipedia defines an immutable object:

An immutable object is one whose state cannot be changed after construction.

The key word here is state. The way we choose to manage state in our programs has profound implications on our ability to read, comprehend, and test them. Where do we keep state? How, when, and where do we allow it to mutate?

For state held inside immutable objects the answer is that no one is able to change it. Ever. Once you make an immutable object, it is what it is. If you want something different, you need to make another one. This approach, which may sound unworkable at first, is actually quite useful.

Consider version control systems. At Deveo, version control is at the heart of what we do. As Philip Nilsson has beautifully described in a recent blog post, Git is basically an immutable, purely functional data structure. The same is generally true for Mercurial and Subversion as well: Data is append-only. Changes to your codebase make new versions of the codebase in version control, they do not mutate old versions of the code in place. That would defeat the whole purpose of version control.

So, actually, we’re working with immutable objects all day, every day.

Immutability is not limited to version control systems, of course. It can be an extremely useful tool for designing any piece of software, big or small.

Immutability in Functional Programming

In functional programming languages, such as Clojure and Haskell, immutable data is everywhere. It is the default.

For me personally, it has been particularly Clojure that’s brough the concept of immutable data home. Rich Hickey, the creator of Clojure, has had this to say about mutable state:

Mutable stateful objects are the new spaghetti code:
- hard to understand, test, reason about
- Concurrency disaster

State - You're Doing It Wrong
Image credit: Paul Barry

Indeed, in Clojure, all the core data structures are immutable. For example, consider this vector of numbers:

vector.clj
1
[1 2 3 4]

It contains the numbers 1, 2, 3, and 4. It will continue to do so as long as it exists - nothing will ever be added or removed. If you want to manipulate it somehow, you use functions that return new datastructures. Existing ones are never mutated.

The Benefits of Immutability

So why exactly would you want to write programs in this way? Much has been written about the benefits of immutability already, but I’d like to point out two things that have been particularly significant for me. They happen to be the ones also mentioned by Rich Hickey in the quote above: Readability and concurrency. There are many other benefits (such as a boost in testability), but I will concentrate on these two.

Readability

Take a look at this, vaguely suspicious method:

order.rb
1
2
3
4
5
6
def make_order(amount)
  order = Order.new
  order.total_amont = amount
  VatProcessor.process(order)
  return order
end

What is the total amount of the order when it is returned? Does it match the amount given as an argument to make_order?

Well, if Order is a mutable class, everything depends on what VatProcessor.process does. The order’s total amount may be mutated there, and by just looking at the make_order method there is no way for us to know whether that happens or not.

Mutability introduces these kinds of wormholes of state change, where a change to state in one place may have effects on some other remote location in the code. These effects are not visible by just reading a single method in the code.

If, on the other hand, we knew that Order is an immutable class, we would also know exactly what the amount of the order is after make_order has executed. To confirm this, we would not have to go look at VatProcessor.process or any of the methods it might invoke.

If VatProcessor.process would need to make changes to an immutable order, it would need to return a new instance of Order, and that means we would have to make this behavior explicit in our method:

order.rb
1
2
3
4
5
6
def make_order(amount)
  order = Order.new
  order.total_amont = amount
  order = VatProcessor.process(order)
  return order
end

Now it is obvious that VatProcessor.process will change the order. There is no ambiguity.

Concurrency

Let’s say we have an in-memory data structure holding the current set of logged-in users in our application. Let’s also say we want to call some method on each of the logged-in users:

users.rb
1
2
3
Application::LOGGED_IN_USERS.each do |user|
  user.do_the_harlem_shake!
end

What could go wrong here? Well, if this is a multi-threaded program, the array of logged-in users might change while we’re iterating, causing all kinds of interesting behavior.

The way we often deal with concurrency like this is to make it go away, by serializing access with something like a Mutex.

But there is another way: What if the collection of logged in users was immutable? We’d be pretty much safe if that were the case. The global variable Application::LOGGED_IN_USERS could simply start pointing to some other collection at some point in the future, and that would not affect our iteration. We would still be working with the version of the collection that existed at the time we dereferenced it. This separation between identity (the global variable) and state (the variable’s value at a given time), is brought out by the use of immutable data structures. Rich Hickey is particularly good at explaining this separation, and his talk on the subject from QCon 2009 is highly recommended viewing.

When we’re using immutable data structures, we can have many things going on concurrently in our programs, without having to be so cautions about stepping on each other’s toes.

Does Immutability Work In Object-Oriented Languages?

Clojure and other functional programming languages have their standard libraries all fundamentally built to support immutable data. Immutability is the default and the norm. But what about languages that aren’t built around immutable data, such as Ruby? Can we still make use of it?

Interestingly, if you look at the object oriented literature, immutability is all over the place. Far from a mere footnote, it is actually recommended by many of the respected voices in the OO community as the default way to construct programs. For example, Josh Bloch, in Effective Java, which could be described as The Java book, says this:

Immutable objects are simple. Classes should be immutable unless there’s a very good reason to make them mutable. If a class cannot be made immutable, limit its mutability as much as possible.

You would not expect this when you look at most of the OO code you see out there. Most of it is mutation, in place and out of control. Why the big gap between the theory and the practice?

One could argue that the reason is OO languages make no effort to help us work with immutable data. The default is always mutable. We, the application developers, are expected to do all the work and build immutable objects on mutable foundations. And of course we often don’t. We follow the path of least resistance because we have other stuff to deal with. Stuff like, for example, the customer problems we’re paid to solve.

But the benefits of immutability are too significant to ignore. Even where our languages fail to support us, we should strive to do the right thing. So how about going immutable in Ruby? Can it be done? Let’s look at what the standard library gives us.

Immutable Ruby: The Standard Library

If we look at the Ruby standard library, we see a mixture of mutable and immutable data structures. Some of them offer both mutable and immutable APIs, and in those cases we can choose our approach.

Whatever we do, it’s probably a good idea to be aware of when we’re doing in-place mutation, and who else might be affected by it. So let’s look at a couple of examples.

Numbers

Consider this fine number:

fourtytwo.rb
1
42

That’s an integer - an instance of Fixnum. Are Fixnums immutable? Well, obviously, yes. What would mutation for numbers look like, anyway?

fourtytwo.rb
1
2
42 +! 2
42.oddify!

With simple values like this, it seems ridiculous to do mutation. 42 is 42. There are other numbers, but they are also different objects. 42 cannot become 44 by means of method calls that would magically mutate its internal state.

Strings

With Strings, things become a lot more interesting. As it happens, a Ruby String is not an immutable object. It comes with a bunch of methods, with which we can reach into its internals in all sorts of ways.

One of these mutating methods is []:

string_mutation.rb
1
2
3
str = "abc"
str[1] = "d"
str            # => "adc"

One way to see the significance of this is to think of a situation where Strings are used as Hash keys. Here’s an example:

string_mutation_hash_key.rb
1
2
3
4
str = "abc"
hsh = {str => "value"}
str[1] = "d"
hsh[str]                # => nil

When you use a String as a Hash key, and then mutate the String, what you get as a result of accessing the Hash with the same exact String object is nil. This is because the String is not actually the Hash key. The Hash key is the return value of the String’s hash method at the time when it was stored in the Hash.

I think we can agree that this behavior is not obvious. It doesn’t do what it would seem to do at first glance. The reason is that the identity of the String object and its value at a given time are hopelessly intertwined.

Matz decided to make Strings mutable in Ruby. This is in contrast to most other popular OO languages. Strings are immutable in Python, Java, and JavaScript, for example. Matz’s decision hasn’t been without controversy. Python creator Guido van Rossum has had this to say about the issue:

Much of Ruby, especially its runtime, is much closer to Python’s [than Perl’s], with the exception of Ruby’s mutable strings, which I find an abomination.

Matz himself, in a reaction to this comment, responded by channeling the philosophy of Ruby’s design:

I rarely have such problems caused by mutable strings. Besides, Ruby is not a language to keep people away from horror. You can write ugly, scary, or dangerous programs in Ruby, if you want. It’s cost for freedom.

That’s Ruby. It does let you do terrible things, but usually also gives you the possibility to take the higher ground.

For example, in our Hash example one could argue that in Ruby you would actually use Symbols as Hash keys instead of Strings. There are many reasons for that, one of which is that Symbols are immutable in Ruby. You cannot change the nth character of a Symbol. If you want a Symbol like that, you just use another Symbol.

Interestingly, Rich Hickey, in an unrelated discussion, seems to be much closer to Guido’s language design philosophy, in that you probably should not let language users shoot themselves in the foot in this way:

There’s no such thing as a convention of immutability, as anyone who has tried to enforce one can attest. If a data structure offers only an immutable API, that is what’s most important. If it offers a mixed API, it is simply not immutable.

But I digress.

Collections

How about Ruby’s core collection classes, particularly Arrays and Hashes? Well, on the one hand they are as mutable as can be, with a whole bunch of methods available for in-place mutation:

mutable_array.rb
1
2
a = [1, 2, 3, 4]
a << 5
mutable_hash.rb
1
2
h = {:a => 1, :b => 2}
h[:c] = 3

On the other hand, both Array (with Enumerable) and Hash define another bunch of methods, for doing functional style transformations, in which no in-place mutation happens:

enumerable_ops.rb
1
2
3
a.map    { |n| n * n }.
  select { |n| n.odd? }.
  take(2)

All of map, select, and take here return new Enumerables instead of mutating the original ones. The Enumerable held in a is no different after this operation from what it was before.

Many Rubyists, including myself, prefer this kind of approach whenever possible, since with it you can pack a lot of power into a relatively small amount of code. But this approach does have its drawbacks. One of those is the performance problems it may cause.

Since new data structures are created for each operation, a lot of memory may need to be allocated: One array for the results of map, and another for the results of select. If a has a million items the overhead may be significant. Since in the end we only take two items from the result, the map and select operations will also have done a whole lot of unnecessary computation, the results of which are never seen by anyone.

Functional languages often get around this kind of problem with laziness. This is true in the case of Clojure and Haskell, for example. With lazy data structures, no memory allocation or computation is done before the results of that computation are actually needed.

Fortunately, in Ruby 2.0 we now have the Enumerable::Lazy module, which does exactly this for the functional style methods in Enumerable. By just calling the lazy method of a before chaining the other operations on it, we make the whole chain of operations lazy:

enumerable_lazy_ops.rb
1
2
3
4
a.lazy.
  map    { |n| n * n }.
  select { |n| n.odd? }.
  take(2)

This makes Ruby a lot more friendly towards people who like to write this kind of functional code with immutable data structures.

Immutable Collections

Sometimes the convention of immutability just isn’t enough. Perhaps you’re writing a library and simply don’t know what other code will be involved with it in the future. Or perhaps you’re dealing with a lot of in-process concurrency and still need to retain your ability to reason about state changes. For situations like this, you won’t find anything in the standard library to help you.

But fortunately there are libraries! If you’re running on JRuby, you can actually just use the immutable and persistent data structures provided by Clojure. There are several libraries available to make the use of Clojure data structures from JRuby easy.

Another option is to use the Hamster gem, which provides solid pure Ruby implementations of many immutable data structures, such as lists, hashes, stacks, and queues.

With Hamster, operations on data structures do not mutate them in place, but instead return new versions:

hamster.rb
1
2
3
4
5
vec = Hamster.vector(1, 2, 3, 4)
vec.cons(5)                           ; => [1, 2, 3, 4, 5]

hsh = Hamster.hash(:a => 1, :b => 2)
hsh.put(:c, 3)                        ; => {:a => 1, :b => 2, :c => 3}

Because of this fundamental difference to the built-in Arrays and Hashes, you probably cannot simply drop Hamster data structures into your code in place of the built-in ones, though Hamster does provide most of the read-only methods of Enumerable and Hash. The code that deals with these data structures must be converted to a functional style. In any case, the fruits of that labor may well be worth the effort.

Summary

Immutable objects, particularly popular in functional programming languages, have their place in OO as well. They serve to make code simpler, safer, more readable, easier to test, and better equipped for concurrent use.

Some of Ruby’s standard data types are immutable, others are not. For classes that provide a mixed API, it is often a good idea to try to stick to the non-mutating methods. When you do use in-place mutation, try to be aware of when and where it happens in your code, and what other parts of the code may be affected by it.

Outside of the Ruby standard library, there are fully immutable collection implementations that can be used in Ruby, such as Hamster and all the Clojure bridge libraries for JRuby.

In the second part of this article, I will turn our attention to object-oriented domain modeling, and how immutability affects how we approach OO design.

Comments