Building a Caching Custom Combine Operator #2: The .cache() Method

In part 1, we started building our custom combine operator by starting with the foundation: building the Combine chain that implemented our caching operator. In this part, we want to make this something that is reusable, and functions like a built-in Combine operator, by adding it to the Publisher type via an extension.

As a refresher, let’s review how we wanted our .cache() method to work for somebody using it. The upstream chain would provide input for the caching operator, which would either calculate the output from a given operation, or if it’s already been calculated previously, simply pass on the previous value. So the signature would look something like this:

upstream.cache(operation: ...)

Now we just need to convert that into a method declaration in a Publisher extension. Easy, right?

Now we recall the image from part 1, flatMap()’s method declaration:

How do we put together this big bundle of Swift generics? We’ll break that down in this part, and adapt our code to fit the new declaration. By the end, we will have a .cache() method that will cache the results of our operation called on the upstream inputs, whatever the input and output types are. Put on your generics caps, and let’s start.

Understanding the types of our components

In order to be able to compose this declaration, we need to understand exactly what types we are dealing with in our method:

The upstream Publisher
The operation, which is a method that returns a Publisher
The return from our .cache() method, also a Publisher

This would be simple enough, except both of these types have Publisher in them, which itself has nested types. Additionally, the types depend on each other: for example, the Publisher our .cache() method outputs must have the same Output and Failure types as the one the operation returns. Laying out all of these conditions ahead of time will make it easier to write the declaration.

So what are all the conditions?

Our operation’s input is the same type as the upstream Publisher’s output
- This should be reasonably obvious, but we want to state it explicitly so we have everything laid out for the next step.
The Publisher our cache method outputs has the same Output and Failure types as the one the operation produces
- This is because our caching operator will simply pass on the result of the operation; we don’t do any further manipulation.
The Publisher our cache method outputs has the same Failure type as the upstream Publisher
- This is because our operation doesn’t produce any sort of error.
- We could have made our operation throw errors and produced a different Failure type, but this is complicated enough already!
The input to our caching operator, which is the upstream Output, must be Hashable
- This is because we use it as a key for a dictionary.

Translating our words into code

Now we need to take those conditions above, and translate them into the declaration for our .cache() method.

So let’s start with just a basic declaration:

extension Publisher {
  func cache() {
  }
}

Let’s fill in some of the obvious, with ??? where we’ll fill in later.

func cache(operation: (???) -> ???) -> AnyPublisher<???, ???>

This gets us a function that takes a closure as an operation, and returns an AnyPublisher.

Next, take the first condition above: the upstream Publisher’s output is the same as the operation’s input. Because this is a method in a Publisher extension, when you call it on a Publisher, its type is always specified as Self. So our operation will take, Self.Output, the upstream Publisher’s Output type:

func cache(operation: (Self.Output) -> ???) -> AnyPublisher<???, ???>

Now to the second condition: The Publisher our cache method produces has the same Output and Failure types as the one our operation produces.

func cache(operation: (Self.Output) -> ???) -> AnyPublisher<???.Output, ???.Failure>

Why didn’t we make the output of our operation a Publisher? Because Publisher is not a generic type, so we cannot say Publisher<X,Y> in our method.

But we still need to be able to make conditions based on its Output and Failure types. What can we do? Make the entire output of the operation a generic type P and specify that it must be a Publisher:

func cache<P>(operation: (Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher

That gives us access to the Output and Failure associated types so we can specify our method’s return type.

That was the most complicated part of building up our declaration. The last two conditions are pretty easy to add.

The third condition is that the upstream Failure type is the same as the one our cache method outputs. So we add another condition to the where:

func cache<P>(operation: (Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher, P.Failure == Self.Failure

And finally, the last and easiest condition to understand: the upstream Output type must be Hashable:

func cache<P>(operation: (Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher, P.Failure == Self.Failure, Self.Output: Hashable

Okay, there is one last thing, but it isn’t related to generics. The operation closure provided to the cache function will need to be escaping, because it will need to access things outside of the cache method environment. So:

func cache<P>(operation: @escaping(Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher, P.Failure == Self.Failure, Self.Output: Hashable

Phew!

Implementing the .cache() method

Now we’ll take the code from part 1 and adapt it to work in our new cache method declaration, making it generic in the process.

Let’s start by wrapping it in a Publisher extension:

extension Publisher {
  func cache<P>(operation: @escaping (Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher, P.Failure == Self.Failure, Self.Output: Hashable {
  }
}

And a copy of the code from last time:

import Combine

var publisher: AnyPublisher<Double,Never>
var upstream = PassthroughSubject<Double,Never>()
var operation = { x in Just(x+1.0) }       // example operation
do {
  var cache: [Double:Double] = [:]
  publisher = upstream.flatMap({ input -> AnyPublisher<Double,Never> in
    if let result = cache[input] {
      return Just(result).eraseToAnyPublisher()
    }
    else {
      return operation(input).map({ result in
        cache[input] = result
        return result
      }).eraseToAnyPublisher()
    }
  }).eraseToAnyPublisher()
}

upstream just turns into self, because our .cache() method is an extension of Publisher. publisher is what our method returns, and operation is getting passed in, so we can just copy in what’s inside the do {} block.

extension Publisher {
  func cache<P>(operation: @escaping (Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher, P.Failure == Self.Failure, Self.Output: Hashable {
    var cache: [Double:Double] = [:]
    return self.flatMap({ input -> AnyPublisher<Double,Never> in
      if let result = cache[input] {
        return Just(result).eraseToAnyPublisher()
      }
      else {
        return operation(input).map({ result in
          cache[input] = result
          return result
        }).eraseToAnyPublisher()
      }
    }).eraseToAnyPublisher()
  }
}

Our code used to take a Double and return a Double, but now it needs to use Self.Output and P.Output, and the Never should now be P.Failure:

extension Publisher {
  func cache<P>(operation: @escaping (Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher, P.Failure == Self.Failure, Self.Output: Hashable {
    var cache: [Self.Output:P.Output] = [:]
    return self.flatMap({ input -> AnyPublisher<P.Output,P.Failure> in
      if let result = cache[input] {
        return Just(result).eraseToAnyPublisher()
      }
      else {
        return operation(input).map({ result in
          cache[input] = result
          return result
        }).eraseToAnyPublisher()
      }
    }).eraseToAnyPublisher()
  }
}

But this code produces a strange error on the Just chain:

Cannot convert return expression of type 'AnyPublisher<P.Output, Just<Output>.Failure>' (aka 'AnyPublisher<P.Output, Never>') to return type 'AnyPublisher<P.Output, Self.Failure>'

We aren’t hardcoding our Failure type to Never anymore; our type is P.Failure (or Self.Failure, which is exactly the same, because we’re just passing through errors). But Just produces a failure type of Never, so the error is complaining about the type mismatch. Thankfully this is an easy fix, via the setFailureType(to:) operator:

extension Publisher {
  func cache<P>(operation: @escaping (Self.Output) -> P) -> AnyPublisher<P.Output, P.Failure> where P: Publisher, P.Failure == Self.Failure, Self.Output: Hashable {
    var cache: [Self.Output:P.Output] = [:]
    return self.flatMap({ input -> AnyPublisher<P.Output,P.Failure> in
      if let result = cache[input] {
        return Just(result).setFailureType(to: P.Failure.self).eraseToAnyPublisher()
      }
      else {
        return operation(input).map({ result in
          cache[input] = result
          return result
        }).eraseToAnyPublisher()
      }
    }).eraseToAnyPublisher()
  }
}

Testing the .cache() method

We can do as in part 1 and copy the entire extension code above into a playground (don’t forget the import Combine!), and then add a little test code after the extension:

let subject = PassthroughSubject<Double,Never>()
let chain = subject.cache(operation: { x in Just(x+1.0) }).sink(receiveValue: {value in print(value)})

subject.send(2.0)
subject.send(2.0)
subject.send(3.0)

And we get the output we expect:

3.0
3.0
4.0

But we removed the debug code that told us whether the cache was used or not. How can we tell now? Here’s where the choice we made in part 1 works out nicely. We can just add a handleEvents operator in our operation:

operation: { x in Just(x+1.0).handleEvents(receiveOutput: { _ in print("calculated") }) }

Now our output is this, showing the second call was not calculated by virtue of the fact nothing was printed:

calculated
3.0
3.0
calculated
4.0

Next steps

Now we have a method that somebody can add to their own Combine chains to cache results. In part 3, we will take this method and turn it into a full Publisher type. This will teach us how to use Combine chains in the context of reusable types, which will be useful for many other cases beyond just implementing our own Combine operators.

Part 3 isn’t out yet, so watch out for it coming soon, or sign up for the newsletter below, and it will be in your inbox when it publishes.