I’m currently working my way through Functional Programming in Scala. Chapter 5 focuses on non-strictness as a property of functions. I found the chapter really fascinating because this property allows one to build a lazily evaluated list. In the next post, I’ll try doing the same in Python. What follows here my explanation in Scala taken from the book. I try to cover the Scala specific syntax and features so hopefully this makes sense even if you’re not familiar with Scala.
First the definition of non-strictness:
Non-strictness is a property of a function. To say a function is non-strict just means that the function may choose not to evaluate one or more of its arguments. In contrast, a strict function always evaluates its arguments.
The problem we’re solving:
Let’s say we have an array in JavaScript and we want to transform the values and also filter them. Here we increment each number in the array and then filter to get only the even numbers.
[1, 2, 3].map(num => num + 1).filter(num => num % 2 === 0)
map
and filter
are great because we can decompose the operations on the elements of the array into different functions. The downside is that it’ll take two iterations through the array; once for map and once for filter. On small arrays we won’t mind the performance hit, but on larger arrays it could be an issue. It would be best if we could use map and filter the same way without the performance cost.
Below is FP In Scala’s implementation of lazy list called Stream
.
import Stream._ // required to override Scala's stdlib Stream
trait Stream[+A] {
// The arrow => in front of the argument type B means that
// the argument z is non-strict
// argument f is a function whos second argument is non-strict
def foldRight[B](z: => B)(f: (A, => B) => B): B =
this match {
case Cons(h,t) => f(h(), t().foldRight(z)(f))
case _ => z
}
def map[B](f: A => B): Stream[B] =
foldRight(empty: Stream[B])((a, b) => cons(f(a), b))
def filter(f: A => Boolean): Stream[A] =
foldRight(empty: Stream[A])((a, b) =>
if (f(a)) cons(a, b)
else b)
def toList: List[A] = {
@annotation.tailrec
def go(s: Stream[A], acc: List[A]): List[A] = s match {
case Cons(h,t) => go(t(), h() :: acc)
case _ => acc
}
go(this, List()).reverse
}
}
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, tl: => Stream[A]): Stream[A] = {
lazy val head = hd
lazy val tail = tl
Cons(() => head, () => tail)
}
def empty[A]: Stream[A] = Empty
def apply[A](as: A*): Stream[A] =
if (as.isEmpty) empty
else cons(as.head, apply(as.tail: _*))
}
The important Scala specific syntax here is a singleton value object called Empty
, and a case class for Cons
. The Stream
companion implements apply
so one can do Stream(...)
.
Also traits can refer to methods on their companion object, hence inside trait Stream
we have empty
and cons
.
The argument types of cons
have the =>
operator in front of them, which means that these arguments are not evaluated at the point of function application, but instead at each use within the function, hence they’re not strict.
In the third line of the trace above, the expressions 1
and Stream.apply(2, 3)
are not evaluated when we pass them to Stream.cons
but instead when they are used inside Stream.cons
. You’ll notice that the definition of cons
re-assigns these arguments with lazy val
, which is like non-strictness but for variable definitions rather than arguments. cons
passes these lazy arguments to Cons
wrapped in a function because case classes are not allowed have lazy values. To recap, hd
and tl
are non-strict arguments passed to Stream.cons
, so they are not evaluated when Stream.cons
is called. Then inside Stream.cons
they are made lazy so using the arguments does not evaluate them when they are wrapped in a function. Now we have a single linked list that is constructed lazily. This is nice because you could in theory build a Stream
of very expensive expressions but they would not be evaluated until the Stream
was traversed.
Here’s the trace of a Stream
being constructed.
Stream(1, 2, 3)
Stream.apply(1, 2, 3)
Stream.cons(1, Stream.apply(2, 3))
Stream.cons(1, Stream.cons(2, Stream.apply(3)))
Stream.cons(1, Stream.cons(2, Stream.cons(3, Stream.apply())))
Stream.cons(1, Stream.cons(2, Stream.cons(3, Stream.Empty)))
Back to our original problem which was to not loop through an array multiple times when using map
and filter
.
The tool which we use for this is foldRight
def foldRight[B](z: => B)(f: (A, => B) => B): B =
this match {
case Cons(h,t) => f(h(), t().foldRight(z)(f))
case _ => z
}
foldRight
travels recursively to the right along the singly linked list and calls a function f
. If it gets to the end, it returns z
which is a non-strict expression. In the previous iteration, the right side of f(h(), t().foldRight(z)(f))
returns z
so we get f(h(), z)
, the head is evaluated and we pop back up. If we wanted to sum the elements of a Stream we could do:
Stream(1, 2, 3).foldRight(0)((a, b) => a + b)
This will immediately evalute to 6
, but we can use foldRight
to implement map
and filter
. (In the code above, map takes f
but I renamed it to be easier to read below.)
def map[B](mapper: A => B): Stream[B] =
foldRight(empty: Stream[B])((a, b) => cons(mapper(a), b))
map
creates a new Stream but lazily because it’s built on cons
. The expression mapper(a)
would reference a
which is really the head of the Stream
that foldRight is called on but since cons
takes its parameters by-name, it’s not actually evaluated. The same happens with b
which is the expression t().foldRight(z)(mapper)
In other words f(h(), t().foldRight(z)(f))
=> cons(mapper(h()), t().foldRight(Empty)(mapper))
Now let’s look at filter:
def filter(f: A => Boolean): Stream[A] =
foldRight(empty: Stream[A])((a, b) =>
if (f(a)) cons(a, b)
else b)
The idea with filter
is similar to map
but we only call cons
if f(a)
is true.
Again because cons
is call by name, re-assigns the parameters with lazy val
and wraps them in a function, it doesn’t evalute its arguments when they’re applied to it. The other key detail is that the function f
passed to foldRight
has the signature f: (A, => B) => B
so its second argument is call by name and is not evaluated when applied.
Now let’s write a trace for our original add one and take even numbers program.
// _ is shorthand for an anonymous function with one arg
Stream(1, 2, 3).map(_ + 1).filter(_ % 2 == 0)
// _ + 1 is applied to the first element and we get 2
Stream.cons(2, Stream(2, 3).map(_ + 1)).filter(_ % 2 == 0)
// _ % 2 == 0 is applied to the first element 2, it passes so the item stays
// so now map and filter are called on the tail
Stream.cons(2, Stream(2, 3).map(_ + 1).filter(_ % 2 == 0))
// _ + 1 is applied to the second element and we get 3
Stream.cons(2, Stream.cons(3, Stream(3).map(_ + 1)).filter(_ % 2 == 0))
// _ % 2 == 0 is applied to the second element 3 and it's odd so it's gone
Stream.cons(2, Stream(3).map(_ + 1).filter(_ % 2 == 0))
// _ + 1 is applied to the second element 3 so it's 4
Stream.cons(2, Stream.cons(4, Stream.empty).filter(_ % 2 == 0))
// _ % 2 is applied to the second element and it's even so it stays
Stream.cons(2, Stream.cons(4, Stream.empty))
// the final result is equivalent to
Stream(2, 4)
Calling map
and filter
won’t actually evaluate the stream because they lazily construct new streams. What you can see here is that map
and filter
’s functions are being applied one after another to the first element, then the second and so on. This was the aha moment for me because we no longer have to repeatedly loop through a single linked list for map
and filter
!
If we wanted to convert the stream to a list and evaluate everything we need add a toList
method
def toList: List[A] = {
@annotation.tailrec
def go(s: Stream[A], acc: List[A]): List[A] = s match {
case Cons(h,t) => go(t(), h() :: acc)
case _ => acc
}
go(this, List()).reverse
}
Since go
starts from the right, we end up with the stream values in reverse order, hence the .reverse
at the end. If we wanted to evaluate Stream(1, 2, 3).map(_ + 1).filter(_ % 2 == 0)
we just put a .toList
at the end. The caveat with the whole solution is that foldRight
as it’s written is not tail recursive, so it could stack overflow.
Here’s a Scalafiddle you can play with:
Pretty cool huh? At this point I was curious about being able to do this in Python and JavaScript which I’m more familiar with. I found this great article by Jeremy Fairbank on creating functional streams in JavaScript
Python is my go-to language, so in the next part I’ll implement Stream
in Python. The challenge is that Python doesn’t have non-strictness or laziness the same way that Scala does.