{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 5: Operations on Inductively Defined Structures\n", "Originally by Sriram Sankaranarayanan \n", "\n", "Modified by Ravi Mangal \n", "\n", "Last Modified: Feb 6, 2025." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Previously, we examined inductive definitions starting with numbers, lists, binary trees, arithmetic expressions and a simple imperative language. However, beyond defining them, we did not do much else with them. In this lecture, we will fix this by performing a variety of exciting operations on inductively defined structures we have defined so far.\n", "\n", "We will examine two mechanisms for defining these operations: \n", "1. Using a *visitor pattern*: i.e, implement the operation as a member function of the classes.\n", "2. Using pattern matching: this is a special feature of functional languages like Scala, Lisp, OCaml and Haskell (and also modern systems languages such as Rust).\n", "\n", "The first option is generically applicable to most languages. The second one is very special and very powerful. We will focus extensively on the second option of pattern matching, while mentioning what a visitor pattern is.\n", "\n", "## Operations on Numbers\n", "\n", "Let us recall the grammar for inductively defining numbers.\n", "\n", "$$\\textbf{NatNum} \\ \\rightarrow\\ Z\\; |\\; Succ(\\textbf{NatNum}) $$\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mtrait\u001b[39m \u001b[36mNatNum\u001b[39m\n", "defined \u001b[32mobject\u001b[39m \u001b[36mZ\u001b[39m\n", "defined \u001b[32mclass\u001b[39m \u001b[36mSucc\u001b[39m" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sealed trait NatNum\n", "\n", "case object Z extends NatNum\n", "\n", "case class Succ(n : NatNum) extends NatNum\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest function we can imagine is to add one to a given NatNum. This is easy to write since this is exactly what *Succ* does." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36maddOne\u001b[39m" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def addOne(n:NatNum) = Succ(n)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mtwo\u001b[39m: \u001b[32mSucc\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z))\n", "\u001b[36mthree\u001b[39m: \u001b[32mSucc\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z)))" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val two = Succ(Succ(Z))\n", "val three = addOne(two)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we wish to write a function _minusOne_ that given a number subtracts one from it. Before we do so, we have to understand how to handle the zero case. We could raise an error/exception saying that it is undefined. This is the best way to do it since it is the most honest.\n", "\n", "Here is how we should do it. \n", "\n", "~~~\n", "minusOne(s) = if s is of the form Succ(t) then return t, else s must be Z and return Error\n", "~~~\n", "\n", "Therefore we need a construct that checks if a given input NatNum is of the form Succ(t) and extracts this inner stuff t, but how?\n", "\n", "There are two solutions to this. First is to redefine things to have the _minusOne_ function implemented inside each class. This solves the problem using the way object oriented programs work.\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mtrait\u001b[39m \u001b[36mNatNum1\u001b[39m\n", "defined \u001b[32mobject\u001b[39m \u001b[36mZ1\u001b[39m\n", "defined \u001b[32mclass\u001b[39m \u001b[36mSucc1\u001b[39m" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sealed trait NatNum1 {\n", " def minusOne(): NatNum1 // All those who inherit from NatNum1 better implement this\n", " // minusOne function, or else ... \n", "}\n", "\n", "case object Z1 extends NatNum1{\n", " def minusOne(): NatNum1 = { // subtracting one from Zero should give an error.\n", " throw (new IllegalArgumentException(\"minusOne cannot be called on Zero\"))\n", " }\n", "}\n", "\n", "case class Succ1(n: NatNum1) extends NatNum1 {\n", " def minusOne(): NatNum1 = { // Otherwise, number is of the form Succ1(..) \n", " return this.n // return the \"inner stuff\"\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NatNum1 is the base class (it is an abstract class or a trait in Scala). What this means is that any class that inherits from it must have all the members that are defined in it. We define a member minusOne() corresponding to the function we wish to implement. Therefore, when we call the minusOne function on an instance of NatNum1, the instance can be either a Z1 or a Succ1 class. In either case, the object system in Scala ensures that the right function gets called. This is an indirect but effective way of finding out the question if a given NatNum1 is of the form Z1 or Succ1(t)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mt1\u001b[39m: \u001b[32mZ1\u001b[39m.type = Z1\n", "\u001b[36mt2\u001b[39m: \u001b[32mSucc1\u001b[39m = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = Z1)))\n", "\u001b[36mt3\u001b[39m: \u001b[32mNatNum1\u001b[39m = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = Z1))" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val t1 = Z1\n", "val t2 = Succ1(Succ1(Succ1(Z1)))\n", "val t3 = t2.minusOne()\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "ename": "java.lang.IllegalArgumentException", "evalue": "minusOne cannot be called on Zero", "output_type": "error", "traceback": [ "\u001b[31mjava.lang.IllegalArgumentException: minusOne cannot be called on Zero\u001b[39m", " ammonite.$sess.cmd5$Helper$Z1$.minusOne(\u001b[32mcmd5.sc\u001b[39m:\u001b[32m8\u001b[39m)", " ammonite.$sess.cmd7$Helper.(\u001b[32mcmd7.sc\u001b[39m:\u001b[32m1\u001b[39m)", " ammonite.$sess.cmd7$.(\u001b[32mcmd7.sc\u001b[39m:\u001b[32m7\u001b[39m)" ] } ], "source": [ "val t4 = t1.minusOne() // This will throw an exception" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second way to do it is using the idea of *pattern matching*: a very powerful feature that is available in some functional programming languages including Scala. Here is how it works.\n", "\n", "An instance object of type NatNum can be of two forms Succ(t) or Z. Scala provides a construct very similar to the case statement in C like languages, but much more powerful." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mminusOne\u001b[39m" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def minusOne(num: NatNum): NatNum = {\n", " num match {\n", " case Succ(t) => t // Is n of the form Succ(t), then return t\n", " // The magic here is that the variable t gets assigned to the contents of num.n in this case.\n", " case Z => throw new IllegalArgumentException(\"minusOne cannot be called on Zero\")\n", " // Is n of the form Z, then throw an exception\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mt5\u001b[39m: \u001b[32mNatNum\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z))" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val t5 = minusOne(Succ(Succ(Succ(Z))))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "ename": "java.lang.IllegalArgumentException", "evalue": "minusOne cannot be called on Zero", "output_type": "error", "traceback": [ "\u001b[31mjava.lang.IllegalArgumentException: minusOne cannot be called on Zero\u001b[39m", " ammonite.$sess.cmd8$Helper.minusOne(\u001b[32mcmd8.sc\u001b[39m:\u001b[32m5\u001b[39m)", " ammonite.$sess.cmd10$Helper.(\u001b[32mcmd10.sc\u001b[39m:\u001b[32m1\u001b[39m)", " ammonite.$sess.cmd10$.(\u001b[32mcmd10.sc\u001b[39m:\u001b[32m7\u001b[39m)" ] } ], "source": [ "val t6 = minusOne(Z) // Boom!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us write code to add two NatNum. The basic idea is this: \n", "- If the first argument to the call is of the form Z, then the answer is the second argument since 0 + something = something.\n", "- If the first argument is of the form Succ(t) then simply make a recursive call to add t with Succ(second argument). We are simply saying (1 + t) + n = t + (1 + n)\n", "\n", "Let us see how pattern matching can help us do this." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36maddNatNums\u001b[39m" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def addNatNums (n1: NatNum, n2: NatNum ): NatNum = {\n", " n1 match {\n", " case Z => n2 // If n1 == Z, then the result is just n2\n", " case Succ(t) => addNatNums(t, Succ(n2)) // Otherwise, the result is to peel of a Succ from first argument \n", " // and add it to the second argument.\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mtwo\u001b[39m: \u001b[32mSucc\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z))\n", "\u001b[36mthree\u001b[39m: \u001b[32mSucc\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z)))\n", "\u001b[36mfive\u001b[39m: \u001b[32mNatNum\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z)))))\n", "\u001b[36mten\u001b[39m: \u001b[32mNatNum\u001b[39m = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z))))))\n", " )\n", " )\n", " )\n", ")" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val two = Succ(Succ(Z))\n", "val three = addOne(two)\n", "val five = addNatNums(two, three)\n", "val ten = addNatNums(five, five)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For curiosity, how would we implement it using the _visitor pattern_ ? Let us now redefine NatNum1 to require a new member function addNatNums. You can see how instead of pattern matching, we simply write the code for the zero case inside the object Z1 and the code for the successor case inside the object Succ1. The idea is the same as before but the two cases get split into two different member functions of Z1 and Succ1, respectively." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mtrait\u001b[39m \u001b[36mNatNum1\u001b[39m\n", "defined \u001b[32mobject\u001b[39m \u001b[36mZ1\u001b[39m\n", "defined \u001b[32mclass\u001b[39m \u001b[36mSucc1\u001b[39m" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sealed trait NatNum1 {\n", " def minusOne(): NatNum1\n", " def addNatNums(n1: NatNum1): NatNum1\n", "}\n", "\n", "case object Z1 extends NatNum1{\n", " def minusOne(): NatNum1 = {\n", " throw (new IllegalArgumentException(\"minusOne cannot be called on Zero\"))\n", " }\n", " \n", " def addNatNums(n1: NatNum1): NatNum1 = n1 // return is optional\n", " \n", "}\n", "\n", "case class Succ1(n: NatNum1) extends NatNum1 {\n", " def minusOne(): NatNum1 = {\n", " return this.n\n", " }\n", " def addNatNums(n1: NatNum1): NatNum1 = this.n.addNatNums(Succ1(n1)) // return is optional\n", " \n", "}" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mtwo\u001b[39m: \u001b[32mSucc1\u001b[39m = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = Z1))\n", "\u001b[36mthree\u001b[39m: \u001b[32mSucc1\u001b[39m = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = Z1)))\n", "\u001b[36mfive\u001b[39m: \u001b[32mNatNum1\u001b[39m = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = Z1)))))\n", "\u001b[36mten\u001b[39m: \u001b[32mNatNum1\u001b[39m = \u001b[33mSucc1\u001b[39m(\n", " n = \u001b[33mSucc1\u001b[39m(\n", " n = \u001b[33mSucc1\u001b[39m(\n", " n = \u001b[33mSucc1\u001b[39m(\n", " n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = \u001b[33mSucc1\u001b[39m(n = Z1))))))\n", " )\n", " )\n", " )\n", ")" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val two = Succ1(Succ1(Z1))\n", "val three = Succ1(two)\n", "val five = two.addNatNums(three)\n", "val ten = five.addNatNums(five)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we have addition, multiplication can now be implemented using recursion, as below." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mmultiplyNatNums\u001b[39m" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def multiplyNatNums(n1: NatNum, n2: NatNum ): NatNum = {\n", " n1 match {\n", " case Z => { return Z }\n", " case Succ(t) => { \n", " // (t+1)* n2 = t * n2 + n2\n", " val s1 = multiplyNatNums(t, n2) // t * n2\n", " addNatNums(n2, s1) // n2 + t * n2 = (t+1)* n2 = n1 * n2!\n", " }\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mfour\u001b[39m: \u001b[32mSucc\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z))))\n", "\u001b[36mfive\u001b[39m: \u001b[32mSucc\u001b[39m = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z)))))\n", "\u001b[36mtwenty\u001b[39m: \u001b[32mNatNum\u001b[39m = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = \u001b[33mSucc\u001b[39m(n = Z))))\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", " )\n", ")\n", "\u001b[36mhundred\u001b[39m: \u001b[32mNatNum\u001b[39m = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSucc\u001b[39m(\n", " n = \u001b[33mSu\u001b[39m..." ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val four = Succ(Succ(Succ(Succ(Z))))\n", "val five = addOne(four)\n", "val twenty = multiplyNatNums(five, four)\n", "val hundred = multiplyNatNums(five, twenty)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations on List of Numbers\n", "\n", "Recall that we previously defined a grammar for lists.\n", "$$\\begin{array}{ccccc}\n", "\\textbf{NumList} & \\rightarrow & Nil &\\ |\\ & Cons(\\textbf{Num}, \\textbf{NumList}) \\\\\n", "\\textbf{Num} & \\rightarrow & 0 \\ |\\ 1\\ |\\ 2\\ |\\ 3\\ |\\ 4\\ |\\ \\cdots \\\\\n", "\\end{array}$$" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mtrait\u001b[39m \u001b[36mNumList\u001b[39m\n", "defined \u001b[32mobject\u001b[39m \u001b[36mNil\u001b[39m\n", "defined \u001b[32mclass\u001b[39m \u001b[36mCons\u001b[39m" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sealed trait NumList\n", "\n", "case object Nil extends NumList\n", "\n", "case class Cons(hd: Int, tl: NumList) extends NumList" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many exciting things we wish to do to lists. The simplest one is to find the length of a list.\n", "How do we do that in principle?\n", "- The length of the empty list Nil is zero\n", "- The length of the list of the form Cons(something, tail) is 1 + length(tail)\n", "\n", "Let us use pattern matching to implement this." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mlistLength\u001b[39m" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def listLength(lst: NumList): Int = { lst match { // pattern match the lst\n", " case Nil => 0\n", " case Cons(_, tl) => 1 + listLength(tl) // _ here is to tell scala that I do not care what is the element.\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How would we do it using a visitor? Simple, the listLength function is going to become a member of the trait \n", "NumList and get implemented in all the classes that inherit from it." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mtrait\u001b[39m \u001b[36mAltNumList\u001b[39m\n", "defined \u001b[32mobject\u001b[39m \u001b[36mAltNil\u001b[39m\n", "defined \u001b[32mclass\u001b[39m \u001b[36mAltCons\u001b[39m" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sealed trait AltNumList{\n", " def listLength(): Int\n", "}\n", "\n", "case object AltNil extends AltNumList {\n", " def listLength(): Int = 0\n", "}\n", "\n", "case class AltCons(hd: Int, tl: AltNumList) extends AltNumList {\n", " def listLength(): Int = {\n", " 1 + tl.listLength()\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36ml1\u001b[39m: \u001b[32mAltCons\u001b[39m = \u001b[33mAltCons\u001b[39m(\n", " hd = \u001b[32m1\u001b[39m,\n", " tl = \u001b[33mAltCons\u001b[39m(hd = \u001b[32m3\u001b[39m, tl = \u001b[33mAltCons\u001b[39m(hd = \u001b[32m7\u001b[39m, tl = AltNil))\n", ")\n", "\u001b[36mj1\u001b[39m: \u001b[32mInt\u001b[39m = \u001b[32m3\u001b[39m" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val l1 = AltCons(1, AltCons(3, AltCons(7, AltNil)))\n", "val j1 = l1.listLength()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may now be wondering why we are bothering describing both visitor functions and pattern matches. Aren't they just two different ways of achieving the same effect? The answer to that is yes for about 90% of the cases, but not always. Pattern matching can make life infinitely more easier. \n", "\n", "\n", "I would like to write a function now that does the following: given a list, check if it is sorted in ascending order.\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36misAscendingOrder\u001b[39m" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def isAscendingOrder (l: NumList): Boolean = l match {\n", " \n", " case Nil => true // An empty list is ascending sure!\n", " \n", " case Cons(_, Nil) => true // A list with just one element is surely ascending ordered\n", " \n", " case Cons(j1, tl @ Cons(j2, _)) => (j1 <= j2) && isAscendingOrder(tl) \n", " // We did something funky:\n", " // We pattern matched the first two elements of the list to j1 and j2 respectively.\n", " // Also, we told Scala to call Cons(j2, _) by the name tl using the @ symbol. \n", " // This is called pattern matching with names.\n", " \n", " \n", " case _ => { assert(false); false } // This is the catch all case and should never happen.\n", " \n", "}" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb1\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mtrue\u001b[39m" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b1 = isAscendingOrder( Cons(1, Cons(3, Cons(3, Cons(5, Cons(10, Nil ))))))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb2\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mfalse\u001b[39m" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b2 = isAscendingOrder( Cons(5, Cons(3, Cons(3, Cons(5, Cons(10, Nil ))))))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb3\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mfalse\u001b[39m" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b3 = isAscendingOrder( Cons(0, Cons(3, Cons(3, Cons(5, Cons(4, Nil ))))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Things can be made even more funky with pattern matching. But this is where you should read a tutorial on pattern matching to acquaint yourself with all the funky features: https://docs.scala-lang.org/tour/pattern-matching.html" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36misAscendingOrderAlt\u001b[39m" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def isAscendingOrderAlt (l : NumList) : Boolean = l match {\n", " case Nil => true\n", " \n", " case Cons(_, Nil) => true\n", " \n", " case Cons(j1, Cons(j2, _)) if (j1 > j2) => false // This case matches only if j1 > j2\n", " \n", " case Cons(_, tl) => isAscendingOrderAlt(tl) // We can reach this case only if j1 <= j2, \n", " // so there is no need to expand tl out\n", " \n", " // A catch all case is not needed: can you argue why?\n", "}" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb4\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mtrue\u001b[39m" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b4 = isAscendingOrderAlt(Cons(1, Cons(3, Cons(3, Cons(5, Cons(10, Nil ))))))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb5\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mfalse\u001b[39m" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b5 = isAscendingOrderAlt( Cons(0, Cons(3, Cons(3, Cons(5, Cons(4, Nil ))))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pattern matching really helps us in this example: *isAscendingOrder* is not easy to implement as a visitor. It will require something messy such as storing the previous element of the array in a global variable, extra function argument or doing something akin to pattern matching to try and extract the next array element. Can you try some of these options?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One nice operation on list is to reverse the list. " ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mappendToEndHelper\u001b[39m\n", "defined \u001b[32mfunction\u001b[39m \u001b[36mreverseBad\u001b[39m" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// First consider a helper function that will take the list l and append an element e to its end\n", "def appendToEndHelper(l: NumList, e: Int): NumList = l match { \n", " case Nil => Cons(e, Nil)\n", " case Cons(j, tl) => Cons(j, appendToEndHelper(tl, e))\n", "}\n", "\n", "// Use the helper function appendToEndHelper to now define reverse\n", "def reverseBad(l: NumList): NumList = l match {\n", " case Nil => Nil // Reverse of an empty list is empty\n", " case Cons(n1, t1) => {\n", " val r1 = reverseBad(t1) // reverse the tail\n", " appendToEndHelper(r1, n1) // append n1 to the end.\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36ml1\u001b[39m: \u001b[32mCons\u001b[39m = \u001b[33mCons\u001b[39m(\n", " hd = \u001b[32m1\u001b[39m,\n", " tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m3\u001b[39m, tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m5\u001b[39m, tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m7\u001b[39m, tl = Nil)))\n", ")\n", "\u001b[36ml2\u001b[39m: \u001b[32mNumList\u001b[39m = \u001b[33mCons\u001b[39m(\n", " hd = \u001b[32m7\u001b[39m,\n", " tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m5\u001b[39m, tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m3\u001b[39m, tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m1\u001b[39m, tl = Nil)))\n", ")" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val l1 = Cons(1, Cons(3, Cons(5, Cons(7, Nil))))\n", "val l2 = reverseBad(l1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the key question you should be asking. Why is the reverse method we wrote previously \"Bad\"? The answer has to do with its complexity. It is often hard to understand what the complexity is when you write recursive functions. But imagine the same logic carried out on a linked list with a for loop. Can you now guess what the complexity would be on a list of length $n$?\n", "\n", "If you guessed $\\Theta(n^2)$, you are absolutely right. But this is a pity since we would like to reverse a list in time $\\Theta(n)$. How can we achieve that?\n", "\n", "Let us write a helper function with two arguments: the first argument is the tail of the list that we are yet to reverse, and the other argument is the reverse of the prefix of the list. Initially, the tail to be reversed is the entire list and the prefix is just Nil or the empty list. But as we \"peel off\" each element of the list, it joins the front of the reverse list so far. " ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mreverseHelper\u001b[39m\n", "defined \u001b[32mfunction\u001b[39m \u001b[36mreverseGood\u001b[39m" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def reverseHelper(l: NumList, resultSoFar: NumList): NumList = l match {\n", " case Nil => resultSoFar\n", " case Cons(n, tail) => {\n", " val v1 = Cons(n, resultSoFar)\n", " return reverseHelper(tail, v1 )\n", " }\n", "}\n", "// This will now run in \\Theta(n) rather than \\Theta(n^2) time\n", "def reverseGood(l: NumList): NumList = reverseHelper(l, Nil)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36ml3\u001b[39m: \u001b[32mNumList\u001b[39m = \u001b[33mCons\u001b[39m(\n", " hd = \u001b[32m7\u001b[39m,\n", " tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m5\u001b[39m, tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m3\u001b[39m, tl = \u001b[33mCons\u001b[39m(hd = \u001b[32m1\u001b[39m, tl = Nil)))\n", ")" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val l3 = reverseGood(l1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations on Trees with Numbers\n", "\n", "Let us now get to the grammar of trees with numbers. Recall the grammar\n", "\n", "$$\\begin{array}{rclclcl}\n", "\\textbf{NumTree} & \\rightarrow & Leaf & \\ |\\ & Node(\\textbf{Num}, \\textbf{NumTree}, \\textbf{NumTree}) \\\\\n", "\\textbf{Num} & \\rightarrow & 0 \\ |\\ 1\\ |\\ 2\\ | \\ 3 \\ |\\ \\cdots \\\\\n", "\\end{array}$$\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mtrait\u001b[39m \u001b[36mNumTree\u001b[39m\n", "defined \u001b[32mobject\u001b[39m \u001b[36mLeaf\u001b[39m\n", "defined \u001b[32mclass\u001b[39m \u001b[36mNode\u001b[39m" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sealed trait NumTree\n", "case object Leaf extends NumTree\n", "case class Node(n: Int, left: NumTree, right: NumTree) extends NumTree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just like we defined length for a list, you can define functions like *depth* of a tree, *number* of leaves in a tree, *number* of nodes and so on.\n", "\n", "But let us do something more interesting here. We would like to check if a tree has the binary search tree property. \n", "\n", "Recall this property that says that for every node all the elements of its left subtree (if not a leaf) must be smaller than the node and all right subtree elements must be larger than the node." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36misBST_WrongCode\u001b[39m" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// A simple attempt is to check at each node if its left child is smaller and right child is larger.\n", "// This is wrong: you should be able to come up with an example why\n", "def isBST_WrongCode(t: NumTree): Boolean = t match {\n", " case Leaf => true // Nothing to say for a leaf\n", " case Node(n, Leaf, Leaf) => true // Nothing to check since both children are leaves\n", " case Node(n1, Node(n2, _, _ ), _ ) if (n2 > n1) => false // Violates BST property \n", " case Node(n1, _, Node(n3, _, _) ) if (n3 < n1) => false // Violates BST property\n", " case Node(n1, leftChild, rightChild) => isBST_WrongCode(leftChild) && isBST_WrongCode(rightChild) // The previous two cases did not match\n", " // Therefore, we conclude that n2 <= n1 and n3 >= n1, so all that remains is to \n", " // check BST property of left and right children.\n", "}" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mfindMaxElementHelper\u001b[39m\n", "defined \u001b[32mfunction\u001b[39m \u001b[36mfindMinElementHelper\u001b[39m\n", "defined \u001b[32mfunction\u001b[39m \u001b[36misBST\u001b[39m" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def findMaxElementHelper(t: NumTree): Int = t match {\n", " case Leaf => Int.MinValue // Make it a smallest number \n", " case Node(n, t1, t2) => { val v1 = findMaxElementHelper(t1) \n", " val v2 = findMaxElementHelper(t2) \n", " return List(n, v1, v2).max } \n", "}\n", "\n", "def findMinElementHelper(t: NumTree): Int = t match {\n", " case Leaf => Int.MaxValue \n", " case Node(n, t1, t2) => {\n", " val v1 = findMinElementHelper(t1)\n", " val v2 = findMinElementHelper(t2)\n", " return List(n, v1, v2).min\n", " }\n", " \n", "}\n", "\n", "def isBST(t: NumTree): Boolean = t match {\n", " case Leaf => true\n", " case Node(n1, lChild, rChild) => {\n", " val maxLeft = findMaxElementHelper(lChild)\n", " val minRight = findMinElementHelper(rChild)\n", " isBST(lChild) && isBST(rChild) && \n", " (n1 >= maxLeft) && (n1 <= minRight)\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mtree1\u001b[39m: \u001b[32mNode\u001b[39m = \u001b[33mNode\u001b[39m(\n", " n = \u001b[32m1\u001b[39m,\n", " left = \u001b[33mNode\u001b[39m(\n", " n = \u001b[32m2\u001b[39m,\n", " left = \u001b[33mNode\u001b[39m(n = \u001b[32m3\u001b[39m, left = Leaf, right = Leaf),\n", " right = \u001b[33mNode\u001b[39m(n = \u001b[32m2\u001b[39m, left = Leaf, right = Leaf)\n", " ),\n", " right = \u001b[33mNode\u001b[39m(n = \u001b[32m4\u001b[39m, left = Leaf, right = Leaf)\n", ")\n", "\u001b[36mtree2\u001b[39m: \u001b[32mNode\u001b[39m = \u001b[33mNode\u001b[39m(\n", " n = \u001b[32m6\u001b[39m,\n", " left = \u001b[33mNode\u001b[39m(\n", " n = \u001b[32m2\u001b[39m,\n", " left = \u001b[33mNode\u001b[39m(n = \u001b[32m1\u001b[39m, left = Leaf, right = Leaf),\n", " right = Leaf\n", " ),\n", " right = \u001b[33mNode\u001b[39m(\n", " n = \u001b[32m10\u001b[39m,\n", " left = \u001b[33mNode\u001b[39m(n = \u001b[32m7\u001b[39m, left = Leaf, right = Leaf),\n", " right = Leaf\n", " )\n", ")" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val tree1 = Node(1, Node(2, Node(3, Leaf, Leaf), Node(2, Leaf, Leaf)), Node(4, Leaf, Leaf))\n", "val tree2 = Node(6, Node(2, Node(1, Leaf, Leaf), Leaf), Node(10, Node(7, Leaf, Leaf), Leaf))" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb1\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mfalse\u001b[39m" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b1 = isBST(tree1)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb2\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mtrue\u001b[39m" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b2 = isBST(tree2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We still have a problem here. This time it is a problem of efficiency. The `isBST` method we wrote is quite\n", "inefficient since it performs two passes over the tree. In the \"outer pass\" it visits each node in the tree and\n", "then at each node, it performs yet another pass to find max and min elements of left and right subtrees.\n", "\n", "What is the complexity of this approach: look at it from the point of view of each node. It is visited once in the outer iteration and for the inner loop, it is visited once for each parent. Thus the number of visits for each node is 1 + depth of the node. Thus, total work done is the sum of the depths of all nodes in the tree. In the worst case this is $\\Theta(n^2)$ worst case being achieved for the completely unbalanced binary tree.\n", "\n", "Thus, we face the familiar problem we did when we tried to reverse a list: correct but inefficient code.\n", "\n", "Let us make it efficient: how? \n", "\n", "The idea is to keep a bit of extra information for each subtree as we descend down in the outer iteration. Suppose we are at the root, we have no limits on what the values of the individual nodes can be. Thus, we start at the root \n", "and allow nodes to be in the interval $(-\\infty, \\infty)$. Suppose the root had the number $r$, we know that \n", "everything to its left can only be in the interval $(-\\infty, r]$ and the right subtree in $[r, \\infty)$.\n", "If we keep going, at any node, we get an interval $[l, u]$ of allowable values. If the node value $v$ lies in that\n", "range, we know that the left subtree must be in the range $[l, v]$ and right subtree in $[v,r]$. This gives us the plan to move forward with a helper function." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36misBST_Eff_Helper\u001b[39m\n", "defined \u001b[32mfunction\u001b[39m \u001b[36misBST_Efficient\u001b[39m" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def isBST_Eff_Helper(t: NumTree, l: Int, r: Int): Boolean = t match {\n", " case Leaf => true\n", " case Node(j, lChild, rChild) if (l <= j && j <= r) => { \n", " isBST_Eff_Helper(lChild, l, j) && isBST_Eff_Helper(rChild, j, r)\n", " } \n", " case _ => false\n", "}\n", "\n", "def isBST_Efficient(t: NumTree): Boolean = isBST_Eff_Helper(t, Int.MinValue, Int.MaxValue)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb3\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mfalse\u001b[39m" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b3 = isBST_Efficient(tree1)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[36mb4\u001b[39m: \u001b[32mBoolean\u001b[39m = \u001b[32mtrue\u001b[39m" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val b4 = isBST_Efficient(tree2)" ] } ], "metadata": { "kernelspec": { "display_name": "Scala", "language": "scala", "name": "scala" }, "language_info": { "codemirror_mode": "text/x-scala", "file_extension": ".sc", "mimetype": "text/x-scala", "name": "scala", "nbconvert_exporter": "script", "version": "2.13.14" } }, "nbformat": 4, "nbformat_minor": 4 }