# TD5 - Hash tables¶

## 1. Identifying polygons¶

For simplicity here, we replace polygons (so lists of points) by lists of integers. Two "polygons" are here identical if the sets of elements in their lists is identical.

In [1]:
class Polygon:
def __init__(self,list_of_nodes):
self.nodes=list_of_nodes
def compare(self,p):
return set(self.nodes)==set(p.nodes)


### 1.1. Simple algorithm¶

In [15]:
def create_classes(polygones):
classes = []
for p in polygones:
test = False
for classe in classes:
if classe[0].compare(p):
classe.append(p)
test=True
break
if not test:
classes.append([p])
return classes

In [16]:
polygones = [Polygon([1,2,3]),Polygon([2,5]),Polygon([2,6]),Polygon([5,2])]
create_classes(polygones)

Out[16]:
[[<__main__.Polygon at 0x7fc4d0397130>],
[<__main__.Polygon at 0x7fc4d0397040>, <__main__.Polygon at 0x7fc4d0397730>],
[<__main__.Polygon at 0x7fc4d0397c40>]]

The worst case cost is $O(n^2)$ (in number of comparisons).

### 1.2. Improved algorithm¶

The idea is here to create a hash table with key the number of nodes in each polygon and value the list of polygons with this number of nodes. Then create classes by filtering over the hash table.

In [17]:
def create_hash_table(polygones):
hash_table={}
for p in polygones:
key = len(p.nodes)
if key in hash_table:
hash_table[key].append(p)
else:
hash_table[key] = [p]
return hash_table

def create_classes_improved(hash_table_of_polygones):
classes=[]
for key in hash_table_of_polygones:
classes += create_classes(hash_table_of_polygones[key])

return classes

In [18]:
ht=create_hash_table(polygones)
create_classes_improved(ht)

Out[18]:
[[<__main__.Polygon at 0x7fc4d0397130>],
[<__main__.Polygon at 0x7fc4d0397040>, <__main__.Polygon at 0x7fc4d0397730>],
[<__main__.Polygon at 0x7fc4d0397c40>]]

## 2. Removing overlaps¶

### 2.1. Hashing segments¶

Note that overlapping segments must be aligned: a hash-table can thus be designed based on a couple (angle,intersect).

### 2.2. Compute $c(p)=|\{s\in S~|~s \text{ starts in }p\}|-|\{s\in S~|~s \text{ ends in }p\}|$¶

We can also use a hash-table collecting all extremum points (its keys) and the corresponding value of $c(p)$.

In [19]:
def compute_store_c(S):
c = {}
for b,e in S:
if not (b in c):
c[b] = 0
c[b]+=1

if not(e in c):
c[e] = 0
c[e]-=1
return c

In [20]:
S = [[0,3],[1,2],[2,8],[5,7],[1,3]]
compute_store_c(S)

Out[20]:
{0: 1, 3: -2, 1: 2, 2: 0, 8: -1, 5: 1, 7: -1}
In [8]:
def print_segments(S):
for b,e in S:
#print(" "*b,"="*(e-b),"\n")
print((b,e),":\t","{}x{}x\n".format(" "*b,"="*(e-b-1)))

In [9]:
print_segments(S)

(0, 3) :	 x==x

(1, 2) :	  xx

(2, 8) :	   x=====x

(5, 7) :	      x=x

(1, 3) :	  x=x



The strategy is the following:

• we first show that, sorting the $c(p)$'s in the order on the line of the points $p\in P$ made of the extremeties of the segments of $S$, $$C(p)\equiv \sum_{i\in P}^p c(i)$$ corresponds to the number of segments of the type $[p_1,p_2)$ on which $p$ stays. The variable $C(p)$ will be iterated over in the code, and will be named counter.

Prove it! (Hint: by recursion; be careful that if $s = [p_1,p_2]\in S$, then $p_2\notin [p_1,p_2)$)

• as a consequence, a segment in the final list of segments (without overlaps) must necessarily start at a $p$ with $C(p)=1$: this is when only one segment starts at $p$.
• when a point $p'$ immediately follows $p$, the segment must be cut if $C(p')\neq 1$: which is always the case unless the unique segment starting at $p$ stops at $p'$ and another one starts at $p'$.
In [21]:
def discard_overlaps(S):
Sout = []

# get the values of c(p) for all p extremeties of segments of S
c = compute_store_c(S)

# take the opportunity to collect the extremeties in 'points' and sort them
points = [key for key in c]
points.sort()

# identifier to know whether we have a current beginning for the next segment (so 'None' initially)
segment_beginning = None

# counter counts the number of segments of the form '[p1,p2)' on which the current point 'p' lies
counter = 0

for p in points:
counter += c[p]
if counter != 1:
# if counter != 1, break the segment, output it, and wait for new segment
if segment_beginning != None:
Sout.append([segment_beginning,p])
segment_beginning = None
else:
# initiate a new segment if counter==1
if segment_beginning==None:
segment_beginning = p
return Sout


The algorithm complexity is in the worst case $O(|S|\log(|S|))$ dominated by the first sorting operation. The core of the algorithm has linear complexity in $|S|$.

In [22]:
print("Original segments:",S)

Original segments: [[0, 3], [1, 2], [2, 8], [5, 7], [1, 3]]

Output segments: [[0, 1], [3, 5], [7, 8]]

In [23]:
print("Original segments:\n")
print_segments(S)

print("\nAfter removing overlaps:\n")

Original segments:

(0, 3) :	 x==x

(1, 2) :	  xx

(2, 8) :	   x=====x

(5, 7) :	      x=x

(1, 3) :	  x=x

After removing overlaps:

(0, 1) :	 xx

(3, 5) :	    x=x

(7, 8) :	        xx


In [ ]:


In [ ]: