Sorting is the process of assigning the elements of a collection (e.g. list) in a particular order (e.g. put words in an alphabetical order).
A simplistic systematic way to sort the elements of a collection is, first, to compare two values from the collection seeing which is the smaller (or greater).
And, second, if the two values are not in the correct order, based on their comparison, we need to exchange them.
This exchange is expensive and, therefore, the total number of the needed exchanges will indicate the efficiency of a sorting algorithm.
Example:
A simple sort algorithm where list elements “bubbles” to the position in the list where it belongs (e.g holding the order invariant)
Given a list \(C[]\) with \(n\) elements
Sort the following list \(C = [7,6,1]\)
This would require 2 comparisons for a list of 3 values
First iteration
Second iteration
Third iteration
Values are in order
Sort the following list \(C = [5,3,1,9,8,2,4,7]\) using bubble sort
def bubble_sort(list):
n_times = len(list)
# Traverse all list elements
for i in range(n_times):
print "round: " + str(i) + " with " + str(list)
for j in range(0, n_times-i-1):
# traverse the list from 0 to n-i-1 (e.g number of possible swap left)
print str(list[j]) + " > " + str(list[j+1]) + "?"
if list[j] > list[j+1] :
list[j], list[j+1] = list[j+1], list[j]
print "swap"
else:
print "no swap"
return list
For each iteration of loop variable \(i\), there are \(n-i\) comparisons performed. Thus, Bubble sort conducts \(O(n)\) operations on an \(O(n)\) number of elements that leads to a time complexity of \(O(n^2)\)
What about best case scenario?
Therefore, when \(i = n\), \(n - 1\) comparisons are performed irrespective of the input
For a list with \(n\) elements does:
\(1 + 2 + 3 + 4 + \cdots + (n - 2) + (n - 1) = O(n^2)\) comparisons
Last, the space complexity is \(O(1)\) (e.g constant amount of memory)
key idea: Search and swap algorithm
Sort the following list \(C = [-1,5,3,9,12,4,8]\)
def selectionSort(list):
for i in range(len(list)-1):
print "index " + str(i) + " in the list"
min = i
for j in range(i + 1, len(list)):
if list[j] < list[min]:
min = j
print "min value is " + str(list[min]) + " at index " + str(min)
list[i], list[min] = list[min], list[i]
return list
The selection sort makes the same number of comparisons as the bubble sort and is therefore also \(O(n^2)\). This is due to the linear search for finding the minimum element in which we have to pass \(i - n\) values
However, due to the reduction in the number of exchanges, the selection sort typically executes faster than the bubble sort.
What is the best case?
A simple sorting algorithm that works in similar fashion to sorting cards with our hands
\(A = [−1,4,7,2,3,8,−5]\)
def insertionSort(list):
# Traverse through 1 to n
for i in range(1, len(list)):
key = list[i]
# Move elements of list[0..i-1], that are
# greater than key, to one position ahead
# of their current position
j = i-1
while j >=0 and key < list[j] :
list[j+1] = list[j]
j -= 1
list[j+1] = key
return list
Best case scenario: The list is sorted in ascending order, the \(i^{th}\) element is only compared to \(i^{th} - 1\) (e.g \(j\)), this amounts to \(n\) comparisons and thus \(O(n)\)
\(\sum _{ i=1 }^{ N-1}{N-1} = O(N)\)
Worst case scenario: The list is sorted inverse, the \(i^{th}\) needs to be shifted to the first position
\(\sum _{ i=1 }^{ N-1}{i} = 1 + 2 + 3 + \cdot + (N-1) + \frac{(N-1)N}{2} = O(N^2)\)
Average case scenario: The list is half-way in order
\(\sum _{ i=1 }^{ N-1}{\frac{i}{2}} = \frac{1}{2}(1 + 2 + 3 + \cdot + (N-1)) + \frac{(N-1)N}{4} = O(N^2)\)
We have mainly discussed with recursion, how would we do it iteratively? hint: think of a previous data structure
def qs_recursive(list, left, right):
"""
Quick sort method (Recursive)
"""
if right <= left:
return
else:
#Get pivot
piv = partition(list, left, right)
#Sort left side of pivot
qs_recursive(list, left, piv-1)
#Sort right side of pivot
qs_recursive(list, piv+1, right)
return list
What is left
and right
for \([1,5,8,9,10]\)
left is \(1\) and right is \(10\)
def qs_iterative(list, left, right):
"""
Quick sort (iterative)
"""
stack = []
stack.append((left,right))
#Main loop to pop and push items until stack is empty
while stack:
pos = stack.pop()
right, left = pos[1], pos[0]
pivot = partition(list,left,right)
#If items in the left of the pivot push them to the stack
if pivot-1 > left:
stack.append((left,pivot-1))
#If items in the right of the pivot push them to the stack
if pivot+1 < right:
stack.append((pivot+1,right))
return list
def partition(list, left, right):
i = left - 1 # index of smaller element
pivot = list[right] # pivot
for j in range(left, right):
print list
print i, j, pivot
# If current element is smaller than or
# equal to pivot
print "is " + str(list[j]) + " smallar or equal to " + str(pivot)
if list[j] <= pivot:
# increment index of smaller element
i = i + 1
print "swap list[i]= " + str(list[i]) + " with list[j]=" + str(list[j])
list[i], list[j] = list[j], list[i]
list[i + 1], list[right] = list[right], list[i + 1]
return i + 1
Best case scenario:
Worst case scenario: This is when the pivot is the smallest or largest element. This means one of the partitions are empty and we repeat the process \(N-1\) elements. Think of this as an unbalanced BST, the complexity is \(O(n^2)\)
Average case scenario: In average, each partitioning divides the list into two partitions and creating the partition takes \(O(n)\), Thus the average \(log n\) partitioning and creating partitions takes \(O(n)\)
def mergesort(lst):
if len(lst) > 1:
mid = len(lst) // 2
left = mergesort(lst[:mid])
right = mergesort(lst[mid:])
return merge(left, right)
return lst
def merge(left, right):
result = []
while len(left) > 0 and len(right) > 0:
if left[0] <= right[0]:
result.append(left.pop(0))
else:
result.append(right.pop(0))
if len(left) > 0:
result += left
else:
result += right
return result
For all cases it is \(O(n logn)\), can someone explain why?
While Merge Sort have better overall complexity than Quick Sort, why would someone prefer quick sort? hint: space complexity
A search returns true
if the element is found in a collection, and false
when it does not reside in that collection.
In Python, we can use the in
operator to search whether an element is a member of a particular collection of items.
>>> 15 in [3,5,2,4,1]
False
>>> 3 in [3,5,2,4,1]
True
>>>
When elements are stored in a collection, such as a list, where they are relative to each other, we say that these elements have a linear or sequential relationship.
Given that the values of lists are indexed, it is possible for us to order the items and process them in a sequence.
But, how does this operate?
In summary, we start from the first item of the list and, then, we move from item to item, following the sequential order, until we find the item we are looking for.
If we run out of items, it means that the item we are looking for does not belong to the list.
def sequentialSearch(alist, item):
pos = 0
found = False
while pos < len(alist) and not found:
if alist[pos] == item:
found = True
else:
pos = pos+1
return found
#!/usr/bin/env python
def main():
testlist = [1, 2, 32, 8, 17, 19, 42, 13, 0]
print(sequentialSearch(testlist, 3))
print(sequentialSearch(testlist, 13))
def sequentialSearch(alist, item):
pos = 0
found = False
while pos < len(alist) and not found:
if alist[pos] == item:
found = True
else:
pos = pos+1
return found
if __name__ == "__main__":
main()
What will be the result?
False
True
To find the complexity of a sequential search algorithm, we need to count the number of the comparisons we have to perform until we find the element we are looking for in a collection.
If we make the assumption that the elements of a list are in a random order, the probability to find an element in a particular position is the same as if the item was allocated in any other position in the list.
Whereas if the elements are in an ascending order, we still need to make a number of comparisons to find a particular element, but our search will be faster if the element is not, finally, in the list.
If the element we are looking for does not belong to the randomly ordered list, \(A = {2, 6, 1, ... n}\), we will conduct \(n\) comparisons for a sequential search.
On the contrary, if the element does belong to the above list, we will conduct either \(1\) comparison (best case scenario), or \(n\) comparisons (worst case scenario), until we find the right element.
Why does this happen?
Assume that \(L\) is a list with \(n\) elements, that is, \(|L| = n\).
If the elements in \(L\) are completely randomly assigned, then the element we are looking for can be found in any position of the \(L\) with equal probability, \(1/n\).
If the element is the first item of the list, the loop of the sequential search (see the implementation in previous slides) will be executed only once (best scenario, \(O(1)\)). Whereas if the element is the second item, the loop will be executed twice, and so on.
If the element is the last item or it does not exist in the list at all, we need to execute the loop of the sequential search \(n\) times (worst scenario, \(O(n)\)).
The performance of the algorithm is:
\(\frac{1}{n}*1 + \frac{1}{n}*2 + \cdot \cdot \cdot + \frac{1}{n}*n = \frac{1 + 2 + \cdot \cdot \cdot + n}{n} = \frac{n + 1}{2}\)
Where \(1 + 2 + \cdot \cdot \cdot + n = \frac{n*(n + 1)}{2}\).
Therefore, the performance of a successfull sequential search will be, on average: \(O((n + 1) / 2)\)
If an element does not belong to a ordered list, \(L = {1, 2, 3, ... n}\), we will need to conduct either \(1\) comparison (best case scenario) or \(n\) comparisons (worst case scenario) for a sequential search.
If an element does belong in the above list, we will conduct either \(1\) comparison (best case scenario), or \(n\) comparisons (worst case scenario) to find the right element.
Therefore, the complexity of the sequential search algorithm is again \(O(n)\).
We can split the list in the middle and applying the searching process in its both halfs. For instance, consider to find the element \(54\) by using the so-called divide and conquer strategy.
Note that the array should be sorted first!
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
def main():
testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
print(binarySearch(testlist, 3))
print(binarySearch(testlist, 13))
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
if __name__ == "__main__":
main()
What will be the result?
False
True
To estimate the complexity of a binary search algorithm, we should take into account that each comparison eliminates about half of the remaining elements.
If we have an ordered list of \(n\) elements, around \(n/2\) elements will be left after the first comparison. After the second comparison, there will be around \(n/4\), and so on, until the list have only \(1\) item.
The left number of comparisons at the last item (\(i\)th item) will be \(n/2^i = 1\). Solving this for \(i\), we will get \(i = \log n\) (max number of comparisons).
Then, the worst case performance of the algorithm is \(O(\log n)\), the best case performance is \(O(1)\), and the average performance is \(O(\log n)\).
Problem Solving with Algorithms and Data Structures using Python, by Brad Miller and David Ranum, Luther College.
Real-World Algorithms: A Beginner’s Guide, by Panos Louridas, The MIT Press, 2017.
This work is (c) 2017 - onwards by TU Delft and Maria Kechagia, Joseph Hejderup and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.