# TD 1 - Cost evaluation in worst and mean case¶

In [203]:
import numpy as np
import random
import time


## 1. Evaluate the extreme entries of a table¶

### 1.1. Finding the largest entry¶

Let us start by creating the table. I'll take here a numpy array of random uniform integers in $\{0,\ldots,99\}$.

In [209]:
Tsize = 10
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")

Random table T:
[86  6 74 44 32 87  3 41 77 49]



#### Sequential version of find_max¶

Function: find_max
INPUT : list T
OUTPUT : maximum of T, sequential approach
In [210]:
def find_max(T):
max_element = T[0]
for t in T[1:]:
if t>max_element:
max_element=t
return max_element

print("Max of T:\n",find_max(T),"\n")

Max of T:
87


##### Computational cost¶

The function calls $n-1$ comparisons, so the computational cost is of order $O(n)$. It cannot possibly be compressed as T contains $n$ elements: they must all be seen.

##### Assignment cost¶

In terms of assignments of max_element, it varies from $1$ (when the max is in first position) to $n$ (when in last position).

On average though, if the elements of T are all different and pickly uniformly at random, the calculus goes as follows:

• T[0] is always assigned (so $+1$ in the cost)
• T[1] is assigned as maximum if T[1]>T[0] which occurs with probability $1/2$ (so $+1\times 1/2$ on average cost)
• T[2] is assigned as maximum if T[2]>max(T[0],T[1]) which occurs with probability $1/3$ (so $+1\times 1/3$ on average cost)
• and so on ... until T[n-1] assigned if larger than the $n-1$ previous terms, so with probability $1/n$ (so $+1\times 1/n$ on average cost) In total, the average cost is thus: $$1 + \frac12 + \frac13 + \ldots + \frac1n = O(\log n).$$

#### Recursive version of find_max¶

Function: find_max_rec
INPUT : list T
OUTPUT : maximum of T, recursive approach
In [211]:
def find_max_rec(T):
if len(T)==1:
return T[0]
return max(T[0],find_max_rec(T[1:]))

find_max_rec=trace(find_max_rec)  # --> convenient to "trace" the recursive call

print("\nMax of T:\n",find_max_rec(T),"\n")

|-- find_max_rec([86  6 74 44 32 87  3 41 77 49])
|  |-- find_max_rec([ 6 74 44 32 87  3 41 77 49])
|  |  |-- find_max_rec([74 44 32 87  3 41 77 49])
|  |  |  |-- find_max_rec([44 32 87  3 41 77 49])
|  |  |  |  |-- find_max_rec([32 87  3 41 77 49])
|  |  |  |  |  |-- find_max_rec([87  3 41 77 49])
|  |  |  |  |  |  |-- find_max_rec([ 3 41 77 49])
|  |  |  |  |  |  |  |-- find_max_rec([41 77 49])
|  |  |  |  |  |  |  |  |-- find_max_rec([77 49])
|  |  |  |  |  |  |  |  |  |-- find_max_rec([49])
|  |  |  |  |  |  |  |  |  |  |-- return 49
|  |  |  |  |  |  |  |  |  |-- return 77
|  |  |  |  |  |  |  |  |-- return 77
|  |  |  |  |  |  |  |-- return 77
|  |  |  |  |  |  |-- return 87
|  |  |  |  |  |-- return 87
|  |  |  |  |-- return 87
|  |  |  |-- return 87
|  |  |-- return 87
|  |-- return 87

Max of T:
87



### 1.2. Finding both minimum and maximum¶

#### Naiv approach: sequential comparisons¶

Function: find_minmax_naive
INPUT : list T
OUTPUT : minimum and maximum of T, naiv version
In [213]:
Tsize = int(1e6)
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")

def find_minmax_naive(T):
min_element,max_element=T[0],T[0]
for t in T[1:]:
if t<min_element:
min_element=t
elif t>max_element:
max_element=t
return min_element,max_element

now=time.time()
print("Naive min of T:",find_minmax_naive(T)[0],"\nNaive max of T:",find_minmax_naif(T)[1],"\n")
print("Duration:",time.time()-now,"sec\n")

Random table T:
[ 3 70 92 ...  0 20 28]

Naive min of T: 0
Naive max of T: 99

Duration: 0.5123188495635986 sec



In the worst case, the implementation costs $2n$ conditions when the table entries are increasing and $n$ if decreasing.

On average, the cost at step $i$ of the loop is again $1\times 1/i$ if the first condition is true or $2\times (1-1/i)$ if the first condition fails. So on average: $$2 + \frac12 + \ldots + \frac1{n-1} + 2\left(1-\frac12\right) + \ldots + 2\left(1-\frac1{n-1}\right) = 2n - \sum_{i=2}^{n-1} \frac1i \sim 2n - \log n$$

#### Improved approach: adding a condition¶

The previous cost is dominated by the fact that $2$ conditions are almost systematically performed (inducing the $2n$ dominating term in number of conditions). The idea to improve the speed is to first filter through a less likely external condition as follows.

Function: find_minmax_naive
INPUT : list T
OUTPUT : minimum and maximum of T, improved version
In [214]:
def find_minmax(T):
min_element,max_element=T[0],T[0]
for t in T[1:]:
if t<min_element or t>max_element:
if t>max_element:
max_element=t
else:
min_element=t
return min_element,max_element

now=time.time()
print("Smart min of T:",find_minmax(T)[0],", smart max of T:",find_minmax(T)[1])
print("Duration:",time.time()-now,"sec")

Smart min of T: 0 , smart max of T: 99
Duration: 0.5963411331176758 sec


If we suppose that the outer condition has a cost of $1$, the cost per iterate is either $1$ when min_element or $2$ with the converse. The latter has probability $2/i$ at step $i$, so finally the cost: $$\sum_i 2\times \frac2i + 1 \times \left(1-\frac2i\right) = n + \sum_i \frac2i \sim n+2\log n$$

# 2. Iterative search¶

We now look for a given element in a table.

In [215]:
Tsize = 50
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")

Random table T:
[98  9 15 91 86 90 69 14 92  6 43 52 91 37  2  2 83 54 48 92 13 16 13  2
58 87  4 31 72 72  9 48 44 41 44 29 20 91 69 96 14  9 17 83 45 96 91 48
50 19]


Function: find_element
INPUT : element e,list T
OUTPUT : True if element e found in T, False otherwise
In [216]:
def find_element(e,T):
for t in T:
if e == t:
return True
return False

print("Element found?",find_element(3,T))

Element found? False


In the best case, this takes $1$ iteration to succeed. In the worst cas, $n$.

When e is for sure in T, its probability to be in position $i$ is $1/n$, so that on average it takes $$\sum_{i=1}^n \frac{i}n = \frac{n+1}2 \sim \frac{n}2$$ iterations to converge. Is instead e only has probability $\alpha$ to be found, then the total cost is on average: $$(1-\alpha) n + \alpha\sum_{i=1}^n \frac{i}n \sim (1-\alpha) n + \alpha \frac{n}2 = \left(1-\frac{\alpha}2\right)n$$

Function: find_element_rec
INPUT : element e,list T
OUTPUT : True if element e found in T, False otherwise; recursive version
In [198]:
def find_element_rec(e,T):
if T==[]:
return False
else:
return (e==T[0]) or find_element_rec(e,T[1:])

find_element_rec=trace(find_element_rec)

print("Element found?",find_element(3,T))

Element found? True


# 3. Dichotomic search¶

We now proceed to a search in a sorted array using a dichotomic search.

In [217]:
Tsize = 100
T=np.sort(np.random.randint(0,100,size=Tsize))
print("Random table T:",T,"\n")

Random table T: [ 0  2  3  3  4  5  5  6  6  7  7  8 10 11 13 13 14 15 16 16 17 17 19 19
20 22 22 23 24 24 25 27 31 31 32 33 34 35 35 36 37 38 39 40 42 42 45 47
48 48 49 50 50 52 53 53 54 54 55 55 55 56 58 58 58 62 63 64 67 69 69 71
72 72 74 74 74 75 75 76 77 77 78 80 81 84 84 85 86 87 87 90 91 91 93 94
97 99 99 99]


Function: find_element_dichotomy
INPUT : element e, sorted list T
OUTPUT : True if element e found in T, False otherwise
In [201]:
def find_element_dichotomy(e,T):
index_left = 0
index_right = len(T)-1
while index_left<=index_right:
index_center = (index_left+index_right)//2
if T[index_center] == e:
return True
if T[index_center] > e:
index_right=index_center-1
else:
index_left=index_center+1
return False

print("Element found?",find_element_dichotomy(3,T))

Element found? False


In the best case, e is at the center of T and $1$ comparison suffices to conclude. In the worst case, e is not in T. If the cost of searching in T of length $n$ is $C(n)$, by dichotomy, $$C(n) = 1+C(n/2)$$ so that $C(n)=2+C(n/4)$ and more generally $C(n)= k + C(n/2^k)$. Taking $n=2^k$, we get $$C(n) = k + 1 \sim \log_2(n).$$

If we know that e is in T of length $n=2^k-1$ (so that $k\sim \log n$), on average, we get the complexity: $$1\times \frac1n + 2\times \frac2n + 3\times \frac4n + \ldots + k \times \frac{2^{k-1}}n = \sum_{a=1}^k \frac{a 2^{a-1}}n \sim \frac{k 2^k}n \sim \log n$$ This follows from the fact that "falling right on e after $a$ divisions can occur at $2^a$ positions of T (the $2^a$ centers of dichotomy after $a$ divisions).

Function: find_element_dichotomy_rec
INPUT : element e, sorted list T
OUTPUT : True if element e found in T, False otherwise; recursive version
In [202]:
def find_element_dichotomy_rec(e,T):
if len(T)==0:
return False
index_center = len(T)//2
value_center = T[index_center]
return value_center == e \
or ( value_center > e and find_element_dichotomy_rec(e,T[:index_center]) ) \
or find_element_dichotomy_rec(e,T[index_center+1:])

print("Element found?",find_element_dichotomy_rec(3,T))

Element found? False


## EXTRAS¶

The function trace traces the calls of a function (here convenient for us for recursive functions).

In [173]:
from functools import wraps

def trace(func):
func_name = func.__name__
separator = '|  '

trace.recursion_depth = 0

@wraps(func)
def traced_func(*args, **kwargs):
print(f'{separator * trace.recursion_depth}|-- {func_name}({", ".join(map(str, args))})')
trace.recursion_depth += 1
result = func(*args, **kwargs)
trace.recursion_depth -= 1
print(f'{separator * (trace.recursion_depth + 1)}|-- return {result}')

return result

return traced_func