{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 6: More Scala: Map, Filter, and Fold\n", "Originally by Sriram Sankaranarayanan \n", "\n", "Modified by Ravi Mangal \n", "\n", "Last Modified: Feb 10, 2025." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Higher-order functions over collections in Scala\n", "\n", "The functional programming style eschews loops and replaces it with tail recursive functions that express the logic in a simple fashion, without carrying extra state. Another mechanism that we will study now is that of in-built higher-order functions such as \"map\", \"filter\" and \"fold\" applicable to various collections. Typically, these functions will be used to manipulate Lists of objects. But they also apply to other collections in Scala such as Maps, Vectors, Sets, etc..\n", "\n", "\n", "Before we begin with these higher-order functions, let us study different ways to write functions in Scala, including a convenient notation for anonymous functions.\n", "\n", "Let us start with a function to multiply every element of a list by two." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now multiply every element of a list by 2." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mmultiplyEachEltByTwo\u001b[39m" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def multiplyEachEltByTwo(lst: List[Int], accList: List[Int] = Nil): List[Int] = lst match {\n", " case Nil => accList\n", " case hd::tail => {\n", " val newAccList = accList ++ List(2 * hd) // We add 2 * hd at the end why?\n", " multiplyEachEltByTwo(tail, newAccList)\n", " }\n", "} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we would like to remove all the even elements from a list, returning a new list with just the odd elements." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mremoveEvenNumbers\u001b[39m" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def removeEvenNumbers(lst: List[Int], accList: List[Int] = Nil): List[Int] = lst match {\n", " case Nil => accList\n", " case hd::tail => {\n", " val newAccList = if (hd %2 == 0) { accList } else { accList ++ List(hd) }\n", " removeEvenNumbers(tail, newAccList)\n", " }\n", "} " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(List[Int], List[Int]) => List[Int]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we wish to sum up the elements of the list" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36msumOfList\u001b[39m" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def sumOfList(lst: List[Int], sum: Int = 0): Int = lst match {\n", " case Nil => sum\n", " case hd::tail => sumOfList(tail, sum + hd)\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can write our main function:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mprocessList\u001b[39m" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def processList(lst: List[Int]): Int = {\n", " // Multiply by two\n", " val lst1 = removeEvenNumbers(lst)\n", " val lst2 = multiplyEachEltByTwo(lst1)\n", " sumOfList(lst2)\n", "}" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres12_0\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m1\u001b[39m, \u001b[32m3\u001b[39m, \u001b[32m5\u001b[39m, \u001b[32m7\u001b[39m)\n", "\u001b[36mres12_1\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m2\u001b[39m, \u001b[32m4\u001b[39m, \u001b[32m6\u001b[39m, \u001b[32m8\u001b[39m)\n", "\u001b[36mres12_2\u001b[39m: \u001b[32mInt\u001b[39m = \u001b[32m5050\u001b[39m\n", "\u001b[36mres12_3\u001b[39m: \u001b[32mInt\u001b[39m = \u001b[32m200\u001b[39m" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "removeEvenNumbers(List(1,3,4,5,6,7,8))\n", "multiplyEachEltByTwo(List(1,2,3,4))\n", "sumOfList ((1 to 100).toList)\n", "processList((1 to 20).toList)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is somewhat painful since we need to write three separate functions to do the job. Is there something better we can do? Yes! let us recognize three patterns of operations we would like to achieve:\n", "\n", "- Map: apply a function f to every element of a list.\n", "- Filter: keep just those elements of the list that satisfy a \"predicate\"\n", "- Fold (or reduce): perform an accumulative operation to every element of the list.\n", "\n", "### Anonymous Functions In Scala \n", "\n", "Before we look closer at these operations, let us first familiarize ourselves with `anonymous` functions in Scala.\n", "Often it is cumbersome to define functions by name where we would like to pass a function. Therefore, we will use \"anonymous\" functions." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mmultiplyByTwo\u001b[39m" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def multiplyByTwo(x: Int): Int = x * 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are two other ways to write the same thing." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mf\u001b[39m: \u001b[32mInt\u001b[39m => \u001b[32mInt\u001b[39m = ammonite.$sess.cmd3$Helper$$Lambda$2385/0x0000000100bef840@6968216e" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val f : Int => Int = x => x * 2" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mf_prime\u001b[39m" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def f_prime(m:Int => Int, y:Int): Int = m(y) " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres6\u001b[39m: \u001b[32mInt\u001b[39m = \u001b[32m4\u001b[39m" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f_prime(f,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "f is bound to a function that takes in an argument `x` and returns `x * 2`. You can pass the expression ` (x) => x * 2 ` in any context you wish without giving it a name as we will see. Here is another succint version:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mf2\u001b[39m: \u001b[32mInt\u001b[39m => \u001b[32mInt\u001b[39m = ammonite.$sess.cmd15$Helper$$Lambda$2406/0x0000000100824840@4774c76" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val f2: Int => Int = _ * 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `_` here is simply the first argument. Often it is important to specify the type of an argument in an anonymous function." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mg\u001b[39m: \u001b[32mString\u001b[39m => \u001b[32mString\u001b[39m = ammonite.$sess.cmd16$Helper$$Lambda$2408/0x0000000100821840@20a2a4db" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val g = (x: String) => x + x // OK: Scala infers the type of x + x from that of x and the type of g is inferred." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mg\u001b[39m: \u001b[32mString\u001b[39m => \u001b[32mString\u001b[39m = ammonite.$sess.cmd17$Helper$$Lambda$2410/0x000000010081c840@70ea23f8" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val g: String => String = x => x + x // OK: Scala infers the typeof x from the type given to g" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cmd8.sc:1: missing parameter type\n", "val g2 = x => x + x // BAD: Scala has no way of knowing what x is. It can be a String, Int, Double, ...\n", " ^\n", "Compilation Failed" ] } ], "source": [ "val g2 = x => x + x // BAD: Scala has no way of knowing what x is. It can be a String, Int, Double, ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Anonymous functions can take multiple arguments." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36maddFun\u001b[39m: (\u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m) => \u001b[32mInt\u001b[39m = ammonite.$sess.cmd2$Helper$$Lambda$2383/0x0000000100bed040@3659f16a" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val addFun = (x: Int, y:Int) => x + y" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36maddFun\u001b[39m: ((\u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m)) => \u001b[32mInt\u001b[39m = ammonite.$sess.cmd19$Helper$$Lambda$2422/0x000000010080b840@44e77580" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val addFun = (x: (Int, Int)) => x._1 + x._2" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36maddFun\u001b[39m: (\u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m) => \u001b[32mInt\u001b[39m = ammonite.$sess.cmd20$Helper$$Lambda$2424/0x0000000100809840@214ef5f1" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val addFun: (Int, Int) => Int = _ + _ // First _ is the first argument and second _ is the second argument." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last but not least in a case pattern matching setup, you can define an anonymous function without the match statement." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mtrait\u001b[39m \u001b[36mMyList\u001b[39m\n", "defined \u001b[32mobject\u001b[39m \u001b[36mMyNil\u001b[39m\n", "defined \u001b[32mclass\u001b[39m \u001b[36mMyCons\u001b[39m" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sealed trait MyList \n", "case object MyNil extends MyList\n", "case class MyCons(x: Int, l: MyList) extends MyList" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36manonIsEmptyFun\u001b[39m: \u001b[32mMyList\u001b[39m => \u001b[32mBoolean\u001b[39m = ammonite.$sess.cmd22$Helper$$Lambda$2504/0x0000000100c7b840@2b6cc14f" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val anonIsEmptyFun: MyList => Boolean = (x) => { x match {\n", " case MyNil => true\n", " case MyCons(_, _) => false\n", "}}" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36manonIsEmptyFun\u001b[39m: \u001b[32mMyList\u001b[39m => \u001b[32mBoolean\u001b[39m = ammonite.$sess.cmd23$Helper$$Lambda$2506/0x0000000100c7f840@6fde4e2e" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val anonIsEmptyFun: MyList => Boolean = {\n", " case MyNil => true\n", " case MyCons(_, _) => false\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In other words, when you have the pattern \n", "~~~\n", " (x : Type) => x match {\n", " case .. =>\n", " case .. => \n", " ...\n", " }\n", "~~~\n", "You can instead simply say \n", "\n", "~~~\n", "{ \n", " case .. => \n", " case .. => \n", " ...\n", " }\n", "~~~\n", "\n", "without saying `(x : Type) => x match`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Map, Filter and Fold (Reduce) Operations\n", "\n", "In many languages, the use of for-loops/while loops to iterate is replaced by higher-order operations on data structures such as `map`, `filter` and `fold`. In this lecture, we provide a brief overview with some examples. We show how many varieties of loops or equivalently recursion, can be systematically replaced by these operations.\n", "\n", "\n", "## Map operation\n", "\n", "The idea of a map operation is to apply a function $f$ to every member of a container (eg., list, array, map, etc.) and return a new container.\n", "\n", "### Example 1\n", "\n", "We have a list `List(1, 3, 4, 5, 6, 110, 12, 2)`. We wish to compute the square of each element in the list and make a new list with the result." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mrecursivelySquareEachElt\u001b[39m" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def recursivelySquareEachElt(l: List[Int], acc: List[Int] = Nil): List[Int] = l match {\n", " case Nil => acc.reverse\n", " case hd::tl => recursivelySquareEachElt(tl, (hd*hd)::acc)\n", "}" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres25\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m100\u001b[39m)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "recursivelySquareEachElt(List(10))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres26\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m1\u001b[39m, \u001b[32m9\u001b[39m, \u001b[32m16\u001b[39m, \u001b[32m25\u001b[39m, \u001b[32m36\u001b[39m, \u001b[32m12100\u001b[39m, \u001b[32m144\u001b[39m, \u001b[32m4\u001b[39m)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "recursivelySquareEachElt(List(1, 3, 4, 5, 6, 110, 12, 2), Nil)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the map operator over lists." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36msquareEachElt\u001b[39m" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def squareEachElt(l: List[Int]): List[Int] = l.map( (x: Int) => x*x ) \n", "// x => x * x is an anonymous function that squares its arguments." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres28\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m1\u001b[39m, \u001b[32m9\u001b[39m, \u001b[32m16\u001b[39m, \u001b[32m25\u001b[39m, \u001b[32m36\u001b[39m, \u001b[32m12100\u001b[39m, \u001b[32m144\u001b[39m, \u001b[32m4\u001b[39m)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squareEachElt(List(1, 3, 4, 5, 6, 110, 12, 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`l.map(f)` says that apply the function `f` on each element of the list `f`.\n", "\n", "First of all, the elements of the lists must be some type `A`. \n", "Then, the function `f` must be of type `A => B`.\n", "\n", "`l.map(f)` applies `f` to every element in the list and returns a new list\n", "of type `List[B]`. Following is a recursive definition of this function. Note that this function is an example of a [*polymorphic*](https://docs.scala-lang.org/tour/polymorphic-methods.html) function in Scala. (Can you make it tail recursive?)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mlistMap\u001b[39m" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def listMap[A,B](lst: List[A], fun: A => B): List[B] = lst match {\n", " case Nil => Nil\n", " case hd :: tail => fun(hd) :: listMap(tail, fun) // :: is the Cons operator in scala.\n", "}" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36msayHelloTo\u001b[39m" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def sayHelloTo(l: List[String]): List[String] = listMap(l, x => (\"Hello \"+ x)) // Type of x is inferred by Scala" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres31\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mString\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m\"Hello Cat\"\u001b[39m, \u001b[32m\"Hello Dog\"\u001b[39m, \u001b[32m\"Hello World\"\u001b[39m)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sayHelloTo(List(\"Cat\", \"Dog\", \"World\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter Operation\n", "\n", "Just like we have used map to apply a function to each element and make a new container, we similarly use `filter` but to remove all elements that do not satisfy a predicate.\n", "\n", "__Predicate:__ A predicate is a function that takes in a value and returns true/false." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mretainAllMultiplesOfThree\u001b[39m" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def retainAllMultiplesOfThree(l: List[Int], acc: List[Int] = Nil): List[Int] = l match {\n", " case Nil => acc\n", " case hd :: tail => {\n", " val newAcc = if (hd % 3 == 0) { acc ++ List(hd)} else { acc }\n", " retainAllMultiplesOfThree(tail, newAcc)\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`l.filter(c)` filters all those elements that do not satisfy the condition `c` from the list `l`." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mretainAllMultiplesOfThree\u001b[39m" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def retainAllMultiplesOfThree(l: List[Int]): List[Int] = {\n", " l.filter( x => x%3 == 0 )\n", "}" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres34\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m15\u001b[39m, \u001b[32m18\u001b[39m, \u001b[32m12\u001b[39m, \u001b[32m3\u001b[39m)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "retainAllMultiplesOfThree(List(10, 15, 18, 12, 3, 1, 5, 7, 8, 14))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is how the filter operation is defined abstractly" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mfilterList\u001b[39m" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def filterList[A] (lst: List[A], filterFun: A => Boolean): List[A] = lst match {\n", " case Nil => Nil\n", " case head :: tail => {\n", " if (filterFun(head)){\n", " head :: filterList(tail, filterFun)\n", " } else {\n", " filterList(tail, filterFun)\n", " }\n", " }\n", "}\n", "\n", "// Ths is not tail recursive. Why? Can you make it tail recursive?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fold Operations\n", "\n", "Fold/reduce operations are useful to gather all data thus far during a computation. Take a list\n", "\n", "$$[l_1, l_2, \\ldots, l_n] $$\n", "\n", "We wish to sum up the numbers in the list.\n", "This is achieved in a loop with accumulator.\n", "~~~\n", "acc = 0\n", "for each item in List\n", " acc = acc + item\n", "return acc\n", "~~~\n", "\n", "We can also do it with fold left operator.\n", "\n", "As an example consider the sum of the elements of a list above.\n", "\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mrecSumOfList\u001b[39m" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def recSumOfList(lst: List[Int], sum: Int = 0): Int = lst match {\n", " case Nil => sum\n", " case hd::tail => recSumOfList(tail, sum + hd)\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fold is a tricky operation to wrap one's head around. A list data structure gives us two versions of fold.\n", "\n", "### list.foldLeft (startVal) (fun)\n", "\n", "For list `[l1, l2, l3, ..., ln]` the function call computes the following unrolled function:\n", "\n", "` fun(.... fun( fun ( fun( startVal, l1), l2), l3), ....., ln)`\n", "This is equivalent to the following Scala code:\n", "\n", "~~~\n", "var acc = startVal\n", "for (lj <- list)\n", " acc = fun(acc, lj) // Very imp: acc is the first argument and lj is the second argument.\n", "~~~\n", "\n", "\n", "\n", "### list.foldRight (startVal) (fun)\n", "\n", "This iterates the list from right to left. To wit, list `[l1, l2, l3, ..., ln]` the function call computes the following unrolled function:\n", "\n", "` fun(l1, fun(.....,fun(ln-2, fun(ln-1, fun(ln, startVal)))`\n", "\n", "This is equivalent to the following scala code:\n", "\n", "~~~\n", "var acc = startVal\n", "for (lj <- list.reverse) // Note list is iterated in reverse\n", " acc = fun(lj, acc) // Very imp: acc is the second argument for foldRight\n", "~~~\n", "\n", "The fold function has two arguments: `startVal` and `fun`. Why don't we write: \n", "`list.foldLeft(startVal, fun)`? This is a special syntax for writing functions with multiple argument \n", "in Scala called __curried syntax__\n", "\n", "https://alvinalexander.com/scala/fp-book/partially-applied-functions-currying-in-scala\n", "\n", "We will talk about currying in detail later on (in a few weeks) and it has nothing to do with Indian cuisine.\n", " " ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36msumList\u001b[39m" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def sumList(l: List[Int]): Int = l.foldLeft (0) ((acc, x) => acc + x )\n", "// Fold left with initial value of accumulator = 0\n", "// Every time we have a new list element x and accumulator value acc, update acc by acc + x" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres38\u001b[39m: \u001b[32mInt\u001b[39m = \u001b[32m55\u001b[39m" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sumList(List(1, 2, 3,4, 5, 6, 7, 8, 9, 10))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36msumFromRight\u001b[39m" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def sumFromRight(l: List[Int]) : Int = l.foldRight (0) ((x, acc) => x + acc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us now write a function `reverseList`" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cmd40.sc:3: type mismatch;\n", " found : List[Int]\n", " required: collection.immutable.Nil.type\n", " elt::listSoFar\n", " ^\n", "Compilation Failed" ] } ], "source": [ "def reverseList(l: List[Int]): List[Int] = \n", "l.foldLeft (Nil) ( (listSoFar: List[Int], elt: Int) => {\n", " elt::listSoFar\n", "} )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What just happened? Scala's type checker bailed on us.\n", "\n", "- Nil is the empty list for any type: List[String], List[Int], List[Double], List[List[List[Int]]], and so on.\n", "- The type checker is simply not sophisticated enough to figure out that the type of the accumulator in foldLeft here must be a list of int. \n", "\n", "There are two fixes.\n" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mreverseListA\u001b[39m" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def reverseListA(l: List[Int]): List[Int] = \n", " l.foldLeft ( List[Int]() ) ( (listSoFar: List[Int], elt: Int) => {\n", " elt::listSoFar\n", "} )" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mreverseListB\u001b[39m" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def reverseListB(l: List[Int]): List[Int] = \n", " l.foldLeft[List[Int]] ( Nil ) ( (listSoFar: List[Int], elt: Int) => {\n", " elt::listSoFar\n", "} )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In general, it is always nice to have the type of the accumulator specified in fold left. Last but not least, note that the anonymous function in fold can be written in case pattern form. Note the syntax for the 2nd argument of `foldLeft`. Instead of enclosing it in `( )`, we use `{ }`. When a function literal is the last argument to a method, you can replace the usual parentheses with curly braces." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mreverseListC\u001b[39m" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def reverseListC(l: List[Int]): List[Int] = l.foldLeft[List[Int]] (Nil) {\n", " case (listSoFar: List[Int], elt: Int) => elt::listSoFar\n", "}" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres43\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m4\u001b[39m, \u001b[32m3\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m1\u001b[39m)" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reverseListA(List(1,2,3,4))" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres44\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m4\u001b[39m, \u001b[32m3\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m1\u001b[39m)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reverseListB(List(1,2,3,4))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mres45\u001b[39m: \u001b[32mList\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mList\u001b[39m(\u001b[32m4\u001b[39m, \u001b[32m3\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m1\u001b[39m)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reverseListC(List(1,2,3,4))" ] } ], "metadata": { "kernelspec": { "display_name": "Scala", "language": "scala", "name": "scala" }, "language_info": { "codemirror_mode": "text/x-scala", "file_extension": ".sc", "mimetype": "text/x-scala", "name": "scala", "nbconvert_exporter": "script", "version": "2.13.14" } }, "nbformat": 4, "nbformat_minor": 4 }