Thursday 17 July 2014

Data Structure and Algorithm - Sort and Binary Search

Other day I was talking to my friends about the the most often interview questions we had encountered as interviewer or  interviewee. We seem to agree 3 items are definitely on the top of the list
    - Big O
    - Sorting and search algorithms
    - Linked list
Questions are often mixed with each other and some coding exercise are often related to these concepts and implementation.. Sometime a lot of information disguises the true algorithm that the interviewer wants the interviewee to implement.Quite often underneath it requires to use one of sorting and/or search algorithms to solve it  given a required big O notation for complexity and space.

I can't remember exactly last time I was implementing such an algorithm. Probably it was a few years when I was working on low level of C code primary in embedded applications. Nowadays I am working on C++ mostly  and life is easier by simply calling the sort/search function provided by the library. Still these algorithms are serving as the basic knowledge of Computer Science and I still enjoy a lot reviewing them and implement them as homework.

1. Heap Sort
The key to understand heap sort is to visualize the array into a tree structure. Each iteration is to pop up the largest the value in the rest of the tree to the root.
Given an array size of n: (index starts from 0)
    - Index 0 is the root, and its children are 1, 2
    - The children of index 1: 3, 4
    - The children of index 2: 5, 6
    - The children of index 3: 7, 8
    - The children of index i: 2i+1, 2i+2
The binary tree is filled always the left-side first.
As long as understanding the visualization of the binary tree structure, then the rest of algorithm to pop the largest value into the root position will be easy to understand. More details please refer to heap sort on wikipedia. Here is my homework implementation.

//********************************************************************************
// HeapSort.h
#ifndef HEAP_SORT_H
#define HEAP_SORT_H

#pragma once

#include <vector>

class HeapSort
{
public:
HeapSort();
~HeapSort();

void Sort(std::vector<int>& data);
void Sort(int data[], int size);
void Sort_Recursive(int data[], int size);

private:
void Heapify(int data[], int start, int end);
void Heapify_Recursive(int data[], int start, int end);
void ShiftRight(int data[], int start, int end);
void ShiftRight_Recursive(int data[], int start, int end);
};

#endif


// HeapSort.cpp
#include "stdafx.h"
#include "HeapSort.h"


HeapSort::HeapSort()
{
}


HeapSort::~HeapSort()
{
}


void HeapSort::Sort(std::vector<int>& data)
{
if (!data.empty()) {
Sort(&data[0], data.size());
}
}

void HeapSort::Sort(int data[], int size)
{
if (size < 1) {
return;
}

Heapify(data, 0, size);

int tempSize = size - 1;
int tempVal;
while (tempSize > 0) {
// the biggest value alwasy stays at index of "0"
tempVal = data[tempSize];
data[tempSize] = data[0];
data[0] = tempVal;

ShiftRight(data, 0, tempSize);
--tempSize;
}
}

void HeapSort::Sort_Recursive(int data[], int size)
{
if (size < 1) {
return;
}
Heapify_Recursive(data, 0, size);

int tempSize = size - 1;
int tempVal;
while (tempSize > 0) {
// the biggest value alwasy stays at index of "0"
tempVal = data[tempSize];
data[tempSize] = data[0];
data[0] = tempVal;

ShiftRight_Recursive(data, 0, tempSize);
--tempSize;
}
}

void HeapSort::Heapify(int data[], int start, int end)
{
int midpoint = (start + end) >> 1;
while (midpoint >= 0) {
ShiftRight(data, midpoint, end);
--midpoint;
}
}

void HeapSort::Heapify_Recursive(int data[], int start, int end)
{
int midpoint = (start + end) >> 1;
while (midpoint >= 0) {
ShiftRight_Recursive(data, midpoint, end);
--midpoint;
}
}

void HeapSort::ShiftRight(int data[], int start, int end)
{
int root = start;
int leftChild = 2 * root + 1;
int rightChild;
int max;
int tempRoot;
while (leftChild < end) {
max = data[root];
tempRoot = root;
if (data[leftChild] > data[root]) {
max = data[leftChild];
tempRoot = leftChild;
}
rightChild = leftChild + 1;
if (rightChild < end) {
if (data[rightChild] > max) {
max = data[rightChild];
tempRoot = rightChild;
}
}

if (tempRoot != root) {
data[tempRoot] = data[root];
data[root] = max;
root = tempRoot;
leftChild = 2 * root + 1;
}
else{
break;
}
}
}

void HeapSort::ShiftRight_Recursive(int data[], int start, int end)
{
int root = start;
int leftChild = 2 * root + 1;
int rightChild;
int max;
int tempRoot;
if (leftChild < end) {
max = data[root];
tempRoot = root;
if (data[leftChild] > data[root]) {
max = data[leftChild];
tempRoot = leftChild;
}
rightChild = leftChild + 1;
if (rightChild < end) {
if (data[rightChild] > max) {
max = data[rightChild];
tempRoot = rightChild;
}
}

if (tempRoot != root) {
data[tempRoot] = data[root];
data[root] = max;
root = tempRoot;
ShiftRight_Recursive(data, tempRoot, end);
}
}
}
//********************************************************************************

2. Merge Sort
The idea behind merge sort is not difficult to understand if you know how divide-and-conquer is working generally. Simply divide a big problem into small sub-problems. The idea of merge sort is
    - Split the big array into two (nearly) even parts: left and right
    - Sort the left and right respectively
    - Then merge the two sub-arrays into one.

This is top-down view. There is another bottom-up view. Imagine that you have a widow size of: 1, 2, 4, 8, ....... (until over the size of the array) to go through the array.
    - Windows size of 1: each individual element of the array - no need to sort
    - Windows size of 2: compare the two elements and sort them into the correct order
    - Windows size of 4: compare the two sorted arrays of Windows size of 2. Merge them in sorted order.
    - keep going on until the window size takes over the size of the array.
More details please read merge sort on wikipedia. Here is my homework implementation.

//********************************************************************************
// MergeSort.h
#ifndef MERGE_SORT_H
#define MERGE_SORT_H

#pragma once

#include <vector>

class MergeSort
{
public:
MergeSort();
~MergeSort();

void Sort(std::vector<int>& data);
void Sort(int data[], int size);
void Sort_Recursive(int data[], int size);

private:
void BottomUpMerge(int data[], int start, int mid, int end, int copy[]);
void TopDownSplit(int data[], int start, int end, int copy[]);
void CopyArray(int src[], int dst[], int start, int end);
};

#endif

// MergeSort.cpp
#include "stdafx.h"
#include "MergeSort.h"


MergeSort::MergeSort()
{
}


MergeSort::~MergeSort()
{
}

void MergeSort::Sort(std::vector<int>& data)
{
if (data.empty()) {
return;
}

Sort(&data[0], data.size());
}

void MergeSort::Sort(int data[], int size)
{
if (size <= 0) {
return;
}

int* tempCopy = new int[size];

for (int width = 1; width < size; width *= 2) {
for (int mid, end, start = 0; start < size; start += 2 * width) {
mid = (start + width) < size ? (start + width) : size;
end = (start + 2 * width) < size ? (start + width * 2) : size;
BottomUpMerge(data, start, mid, end, tempCopy);
}

// copy back the data
CopyArray(tempCopy, data, 0, size);
}

delete[] tempCopy;
}

void MergeSort::BottomUpMerge(int data[], int start, int mid, int end, int copy[])
{
int leftStart = start;
int rightStart = mid;

for (int index = start; index < end; ++index) {
if (leftStart < mid && (rightStart >= end || data[leftStart] <= data[rightStart]) ) {
copy[index] = data[leftStart];
++leftStart;
}
else {
copy[index] = data[rightStart];
++rightStart;
}
}
}

void MergeSort::Sort_Recursive(int data[], int size)
{
if (size <= 0) {
return;
}

int* tempCopy = new int[size];
TopDownSplit(data, 0, size, tempCopy);
delete[] tempCopy;
}

void MergeSort::TopDownSplit(int data[], int start, int end, int copy[])
{
if ((end - start) < 2) {
return;
}

int midpoint = (start + end) >> 1;
TopDownSplit(data, start, midpoint, copy);
TopDownSplit(data, midpoint, end, copy);
BottomUpMerge(data, start, midpoint, end, copy);

CopyArray(copy, data, start, end);
}

void MergeSort::CopyArray(int src[], int dst[], int start, int end)
{
while (start < end) {
*(dst + start) = *(src + start);
++start;
}
}
//********************************************************************************

3. Quick Sort
Quick sort utilizes the idea of divide-and-conquer as well. But the sizes of two parts in quick sort are not predefined as merge sort. The sizes of two sub-problems depend on the selection of the pivotal value. For example the left side of the pivotal value are all less than the pivotal value and the right are all no less than pivotal value. Then the big problem is split into two sub-problems whose size is decided by the pivotal values. Then recursively resolve the sub-problems until they devolve into single element.
More details please refer to quick sort on wikipedia. And here is my homework implementation.

//********************************************************************************
//QuickSort.h
#ifndef QUICK_SORT_H
#define QUICK_SORT_H

#pragma once

#include <vector>

class QuickSort
{
public:
QuickSort();
~QuickSort();

void Sort(std::vector<int>& data);
void Sort(int data[], int size);

private:
void SortInternal(int data[], int start, int end);
int Partition(int data[], int start, int end);
};

#endif


// QuickSort.cpp
#include "stdafx.h"
#include "QuickSort.h"


QuickSort::QuickSort()
{
}

QuickSort::~QuickSort()
{
}

void QuickSort::Sort(std::vector<int>& data)
{
if (data.empty()) {
return;
}

Sort(&data[0], data.size());
}

void QuickSort::Sort(int data[], int size)
{
SortInternal(data, 0, size);
}

void QuickSort::SortInternal(int data[], int start, int end)
{
if (start < end) {
int pivotalIndex = Partition(data, start, end);
SortInternal(data, start, pivotalIndex);
SortInternal(data, pivotalIndex + 1, end);
}
}

int QuickSort::Partition(int data[], int start, int end)
{
#if 0
int midpoint = (start + end) >> 1;
int pivotalVal = data[midpoint];
// swap it to rightest value
data[midpoint] = data[end - 1];
data[end - 1] = pivotalVal;
int pivotalIndex = end - 1;
#else
    // take the last element as the pivotal value
int pivotalVal = data[end - 1];
int pivotalIndex = end - 1;
#endif
for (int temp, index = pivotalIndex - 1; index >= start; --index) {
if (data[index] > pivotalVal) {
temp = data[pivotalIndex - 1];
data[pivotalIndex] = data[index];
data[index] = temp;
data[pivotalIndex - 1] = pivotalVal;
--pivotalIndex;
}
}

return pivotalIndex;
}

//********************************************************************************

4. Binary Search
Binary search is the most important search algorithm in Computer Science. Should never forget this. More details please refer to binary search algorithm on wikipedia.

//********************************************************************************
// BinarySearch.h
#ifndef BINARY_SERACH_H
#define BINARY_SERACH_H

#pragma once

#include <vector>

class BinarySearch
{
public:
BinarySearch();
~BinarySearch();

bool Search(const std::vector<int>& data, int key, int& pos);
bool Search(const int data[], int start, int end, int key, int& pos);

bool Search_recursive(const std::vector<int>& data, int key, int& pos);
bool Search_recursive(const int data[], int start, int end, int key, int& pos);
};

#endif

// BinarySearch.cpp
#include "stdafx.h"
#include "BinarySearch.h"


BinarySearch::BinarySearch()
{
}


BinarySearch::~BinarySearch()
{
}


bool BinarySearch::Search(const std::vector<int>& data, int key, int& pos)
{
if (data.empty()) {
return false;
}
return Search(&data[0], 0, data.size(), key, pos);
}

bool BinarySearch::Search(const int data[], int start, int end, int key, int& pos)
{
pos = -1;
if (data == nullptr) {
return false;
}

int midpoint;
int tempStart = start;
int tempEnd = end;
while (tempStart <= tempEnd) {
midpoint = (tempStart + tempEnd) >> 1;
if (data[midpoint] == key) {
pos = midpoint;
return true;
}
else if (data[midpoint] > key) {
tempEnd = midpoint - 1;
}
else{
tempStart = midpoint + 1;
}
    }

return false;
}

bool BinarySearch::Search_recursive(const std::vector<int>& data, int key, int& pos)
{
if (data.empty()) {
return false;
}
return Search_recursive(&data[0], 0, data.size(), key, pos);
}

bool BinarySearch::Search_recursive(const int data[], int start, int end, int key, int& pos)
{
if (start > end || data == nullptr) {
pos = -1;
return false;
}

int midpoint = (start + end) >> 1;
if (data[midpoint] == key) {
pos = midpoint;
return true;
}
else if (data[midpoint] > key) {
return Search_recursive(data, start, midpoint - 1, key, pos);
}
else {
return Search_recursive(data, midpoint + 1, end, key, pos);
}
}

//********************************************************************************

5. Summary
These algorithms are all implemented in C/C++ standard like, qsort, std::sort, std::stable_sort, std::binary_search and those specific search functions for associated containers. If your application allows, should just directly use these implementation. They are all implemented with template and simply instantiate them with the correct types and use them as you need.

The characteristics of these 3 sorting algorithm are list on sorting algorithm on wikipedia. It is worth pointing out that even thought quick sort outperforms the other two sort algorithms in most cases, heap sort and merge sort are still used in real-time/time-critical systems just in case hitting the worst case, O(N*N), in quick sort. And heap sort is particularly useful in embedded world when the resource (mainly cache, memory) is limited, because it has O(1) space requirement. Merge sort comes to use when the internal order of the same elements are needed to keep after the sorting. In the rest case quick sort should do the work.

Bibliography:
[1] http://en.wikipedia.org/wiki/Sorting_algorithm
[2] http://en.wikipedia.org/wiki/Heapsort
[3] http://en.wikipedia.org/wiki/Merge_sort
[4] http://en.wikipedia.org/wiki/Quicksort
[5] http://en.wikipedia.org/wiki/Binary_search_algorithm
[6] http://crackprogramming.blogspot.co.uk/2012/11/heap-sort-c-implementation.html

No comments:

Post a Comment