C++

Sunday, 17 August 2014

Data Structure and Algorithm - Linked List: Circle Detection

This is one of interviewing questions that my friends and I all have encountered and used in quite a few occasions. And I believe it is worth of one blog.

1. Problem
Given the fact that in my last two blog entries, Data Structure and Algorithm - Linked List: Common API and Data Structure and Algorithm - Linked List: Merge Sort, these algorithms and implementation are only operational on Linked List without circle linking. Otherwise some of them will be lost in loop. For instance, LL_PushBack(), this operation will be trapped in the loop to find the last node, if a circle linking exists in the Linked List.

Hence it will be circular to guarantee that there does not exist circle linking in the Linked List when using these functions. Hence circle detection algorithm is useful for this occasion.The goal is to design/implement an algorithm to remove the circle linking if it does exist.

2. Solution 1 - Hash Table
This is the first intuitive solution popping into my head. It is similar to use a linear method to find the very first repeating number given an array. Use hash table to contain the hash value of the pointer in order to speed up the searching. It guarantees that traversing the Linked List once will find the solution. But one drawback of this solution is that its correctness depends on the quality of the hash function. Given a perfect hash function to guarantee different input producing different output, then this solution will work as expected.

// Pseudo-code
//********************************************************************************
curHead <- Current head of this Liked List
ptrHashTable <- Initialize this hash table
lastNodeOfCircle <- nullptr
while (curHead is not nullptr)
  hashVal <- Compute the hash value of curHead
if (hashVal does not exist in ptrHashTable)
  ptrHashTable <- Insert hashVal
  lastNodeOfCircle <- curHead
  curHead <- (curHead->next)
  else
  lastNodeOfCircle <- nullptr
Found and return
Not found and return
//********************************************************************************

3. Solution 2 - Two Speeds of Traversing
The basic idea behind of this solution is to use two pointers traversing the Linked List with two different speed. The slower pointer traverses one node by another and the faster pointer traverses twice faster than the slower. Think about the clock. Obviously three hands (second, minute and hour) go around and around the clock. Assuming at the moment of midnight three hands meet at the 12, (actually no matter any position these three hands are currently now), the second hand can always catch the minute and hour hands and the minute hand can always catch the hour hand. Because the second hand moves faster than other two and the minute hand moves faster than the hour hand. This algorithm uses the exactly the same idea - the faster can always catch the slower in a circle arena.

If the circle linking does exists, traverse the Linked List and the faster pointer will eventually catch the slower one. If does not exist, the faster pointer will reach the end of the Linked List first. In order to remove the circle linking, the key is to find the last node in the Linked List. In the clock example, if the head is set as "12", then the last node of the circle is "11". Removing the circle is simple - break the link and set the "next" of the last node in the circle as nullptr.

// C implementation
//********************************************************************************
// LinkedListCircleDetect.h
#ifndef LINKED_LIST_CIRCLE_DETECT_H
#define LINKED_LIST_CIRCLE_DETECT_H

#pragma once

#include "LinkedListElement.h"

template<typename T>
LinkedListElement<T>* LL_GetOneOfNodesInCircleIfExists(LinkedListElement<T>* head)
{
if (head == nullptr) {
return nullptr;
}

LinkedListElement<T>* slow = head;
LinkedListElement<T>* fast = head;

  while (1) {
for (int speed = 0; speed < 2; ++speed) {
fast = fast->next;
if (fast == nullptr) {
return nullptr;
}
if (slow == fast) {
return slow;
}
}

slow = slow->next;
}

return nullptr;
}

template<typename T>
bool LL_CheckIfCircleExists(LinkedListElement<T>* head)
{
return LL_GetOneOfNodesInCircleIfExists(head) != nullptr;
}

template<typename T>
size_t LL_GetCircleSizeIfExists(LinkedListElement<T>* head)
{
LinkedListElement<T>* oneNodeInCircle = LL_GetOneOfNodesInCircleIfExists(head);

if (oneNodeInCircle == nullptr) {
return 0;
}

size_t count = 1;
LinkedListElement<T>* cur = oneNodeInCircle->next;
while (cur != oneNodeInCircle) {
cur = cur->next;
++count;
}

return count;
}

template<typename T>
LinkedListElement<T>* LL_DetectTheLastNodeOfCircleIfExists(LinkedListElement<T>* head)
{
size_t circleSize = LL_GetCircleSizeIfExists(head);

if (circleSize == 0) {
return nullptr;
}

LinkedListElement<T>* cur = head;
LinkedListElement<T>* tempOffsetByCircleSize = head;

  while (--circleSize > 0) {
tempOffsetByCircleSize = tempOffsetByCircleSize->next;
}

while (cur != tempOffsetByCircleSize->next) {
cur = cur->next;
tempOffsetByCircleSize = tempOffsetByCircleSize->next;
}

return tempOffsetByCircleSize;
}

/*
* If circle does not exsit, return false
* If circle exists, then remove the circle and return true
*/
template<typename T>
bool LL_RemoveCircleIfExists(LinkedListElement<T>** head)
{
if (head == nullptr) {
return false;
}
LinkedListElement<T>* lastNodeOfCircle = LL_DetectTheLastNodeOfCircleIfExists(*head);
if (lastNodeOfCircle == nullptr) {
return false;
}

lastNodeOfCircle->next = nullptr;
return true;
}

#endif

// test.cxx

void TestLinkedListCircleDetector()

{

LinkedListElement<int>* head = nullptr;

// empty list

assert(LL_GetOneOfNodesInCircleIfExists(head) == nullptr);

assert(LL_CheckIfCircleExists(head) == false);

assert(LL_GetCircleSizeIfExists(head) == 0);

assert(LL_DetectTheLastNodeOfCircleIfExists(head) == nullptr);

assert(LL_RemoveCircleIfExists(&head) == false);

LL_PushFront(&head, 11);

LinkedListElement<int>* tail = head;

// point to itself

tail->next = head;

assert(LL_GetOneOfNodesInCircleIfExists(head) != nullptr);

assert(LL_CheckIfCircleExists(head) == true);

assert(LL_GetCircleSizeIfExists(head) == 1);

assert(LL_DetectTheLastNodeOfCircleIfExists(head) == tail);

assert(LL_RemoveCircleIfExists(&head) == true);

assert(head->next == nullptr);

assert(head->data == 11);

// construct 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

LL_PushFront(&head, 10);

LL_PushFront(&head, 9);

LL_PushFront(&head, 8);

LL_PushFront(&head, 7);

LL_PushFront(&head, 6);

LL_PushFront(&head, 5);

LL_PushFront(&head, 4);

LL_PushFront(&head, 3);

LL_PushFront(&head, 2);

LL_PushFront(&head, 1);

LL_PushFront(&head, 12);

assert(LL_GetOneOfNodesInCircleIfExists(head) == nullptr);

assert(LL_CheckIfCircleExists(head) == false);

assert(LL_GetCircleSizeIfExists(head) == 0);

assert(LL_DetectTheLastNodeOfCircleIfExists(head) == nullptr);

assert(LL_RemoveCircleIfExists(&head) == false);

// setup the circle

tail->next = head;

assert(LL_GetOneOfNodesInCircleIfExists(head) != nullptr);

assert(LL_CheckIfCircleExists(head) == true);

assert(LL_GetCircleSizeIfExists(head) == 12);

assert(LL_DetectTheLastNodeOfCircleIfExists(head) == tail);

assert(LL_RemoveCircleIfExists(&head) == true);

assert(tail->next == nullptr);

assert(LL_GetSize(head) == 12);

int val;

assert(LL_GetNth(head, 12, val) == false);

// setup the circle

tail->next = head;

LL_PushFront(&head, 20);

LL_PushFront(&head, 30);

LL_PushFront(&head, 40);

LL_PushFront(&head, 50);

LL_PushFront(&head, 60);

LL_PushFront(&head, 70);

assert(LL_GetOneOfNodesInCircleIfExists(head) != nullptr);

assert(LL_CheckIfCircleExists(head) == true);

assert(LL_GetCircleSizeIfExists(head) == 12);

assert(LL_DetectTheLastNodeOfCircleIfExists(head) == tail);

assert(LL_RemoveCircleIfExists(&head) == true);

assert(tail->next == nullptr);

assert(LL_GetSize(head) == 18);

assert(LL_GetNth(head, 18, val) == false);

LL_Delete(&head);

}

//********************************************************************************

4. Summary
The first solution using hash table is to guarantee O(N) computation complexity and O(N) space complexity. The second solution is to guarantee O(N) computation complexity as well but only O(1) space complexity. But the second solution could run ~3 times slower than the first solution in the worst case, because it needs to traverse the Linked List to detect if the circle does exist, traverse again to find out the size of circle, and traverse the third time to find out the last node of the circle. And it is fair that the first solution trades space for speed.

Bibliography:
[1] http://cslibrary.stanford.edu/105/LinkedListProblems.pdf
[2] Glenn W. Rowe, Data Structure & Algorithm C++, Prentice Hall, 1996

Thursday, 14 August 2014

Data Structure and Algorithm - Linked List: Merge Sort

As I presented the common API for Linked List in my last blog entry, Data Structure and Algorithm - Linked List (Part I), in this blog I would like to present a sorting algorithm for Linked List, merge sort. Please refer to the basic idea about sorting in Data Structure and Algorithm - Sort and Binary Search.

Why Merge Sort
- Simply idea: divide and conquer
- Easy adapted to parallel computing
- Space complexity: O(1) for Linked List, comparing with other sorting algorithm
- Better performance than other more efficient algorithm in theory due to the traverse of Linked List
- Guarantee to be stable sort
- Guarantee to the worst case of O(N*logN)

// C implementation of Merge Sort on Linked List
//********************************************************************************
// LinkedListSort.h
#ifndef LINKED_LIST_SORT_H
#define LINKED_LIST_SORT_H

#pragma once

#include "LinkedListAlgo.h"
#include "LinkedListElement.h"

template<typename T>
void LL_MergeSort(LinkedListElement<T>** head)
{
if (head == nullptr || *head == nullptr || (*head)->next == nullptr) {
return;
}

LinkedListElement<T>* front = nullptr;
LinkedListElement<T>* back = nullptr;
LL_FrontBackSplit(*head, &front, &back);
LL_MergeSort(&front);
LL_MergeSort(&back);
*head = LL_SortedMerge(front, back);
}

template<typename T>
void LL_FrontBackSplit(LinkedListElement<T>* src,
LinkedListElement<T>** front,
LinkedListElement<T>** back)
{
if (src == nullptr) {
return;
}

LinkedListElement<T>* slow = src;
LinkedListElement<T>* fast = src->next;
while (fast != nullptr) {
fast = fast->next;
if (fast != nullptr) {
slow = slow->next;
fast = fast->next;
}
}

*front = src;
*back = slow->next;
slow->next = nullptr;
}

template<typename T>
LinkedListElement<T>* LL_SortedMerge(LinkedListElement<T>* l1,
LinkedListElement<T>* l2)
{
LinkedListElement<T> temp;
LinkedListElement<T>* cur = &temp;

while (l1 != nullptr || l2 != nullptr) {
if (l1 == nullptr) {
cur->next = l2;
return temp.next;
}
if (l2 == nullptr) {
cur->next = l1;
return temp.next;
}
if (l1->data <= l2->data) {
cur->next = l1;
cur = cur->next;
l1 = l1->next;
}
else {
cur->next = l2;
cur = cur->next;
l2 = l2->next;
}
}

return temp.next;
}

template<typename T>
LinkedListElement<T>* LL_SortedMergeRecursively(LinkedListElement<T>* l1,
LinkedListElement<T>* l2)
{
if (l1 == nullptr) {
return l2;
}
if (l2 == nullptr) {
return l1;
}

LinkedListElement<T>* result = nullptr;
if (l1->data <= l2->data) {
result = l1;
result->next = LL_SortedMergeRecursively(l1->next, l2);
}
else {
result = l2;
result->next = LL_SortedMergeRecursively(l1, l2->next);
}

return result;
}

template<typename T>
void LL_RemoveDuplicates(LinkedListElement<T>** head)
{
if (head == nullptr || *head == nullptr || (*head)->next == nullptr) {
return;
}

LinkedListElement<T>* curValPtr = *head;
LinkedListElement<T>* nextValPtr = (*head)->next;
while (nextValPtr != nullptr) {
if (nextValPtr->data == curValPtr->data) {
curValPtr->next = nextValPtr->next;
delete nextValPtr;
nextValPtr = curValPtr->next;
}
else {
curValPtr = nextValPtr;
nextValPtr = nextValPtr->next;
}
}

return;
}

/*
* create a new list which is intersection of two lists
* no duplicate in this list
* l1 and l2: has to be sorted linked list
*/
template<typename T>
LinkedListElement<T>* LL_SortedIntersect(LinkedListElement<T>* l1,
LinkedListElement<T>* l2)
{
if (l1 == nullptr || l2 == nullptr) {
return nullptr;
}

LinkedListElement<T>* head = nullptr;
ptrdiff_t found;
size_t pos = 0;
LinkedListElement<T>* curValPtr = nullptr;
while (l1 != nullptr) {
if (curValPtr == nullptr || curValPtr->data != l1->data) {
curValPtr = l1;

found = LL_Find(l2, pos, curValPtr->data);
if (found >= 0) {
LL_PushFront(&head, curValPtr->data);
pos = static_cast<size_t>(found + 1);
}
}
l1 = l1->next;
}

return head;
}

template<typename T>
bool LL_SortedInsert(LinkedListElement<T>** head, const T& val)
{
if (head == nullptr) {
return false;
}

if (*head == nullptr) {
LL_PushFront(head, val);
return true;
}

if (val <= (*head)->data) {
LL_PushFront(head, val);
return true;
}

LinkedListElement<T>* prev = *head;
LinkedListElement<T>* cur = (*head)->next;

while (cur != nullptr) {
if (cur->data >= val) {
prev->next = new LinkedListElement<T>(val);
prev->next->next = cur;
return true;
}
prev = cur;
cur = cur->next;
}

// hit here not found the position
// LL_PushBack(head, val);
prev->next = new LinkedListElement<T>(val);
return true;
}

#endif

// test.cpp

void TestLinkedListSort()

{

LinkedListElement<int>* head1 = nullptr;
// construct list: 10, 9, 20, 8, 7, 7, 20, 30, 50, 20, 4, 3,
LL_PushFront(&head1, 3);
LL_PushFront(&head1, 4);
LL_PushFront(&head1, 20);
LL_PushFront(&head1, 50);
LL_PushFront(&head1, 30);
LL_PushFront(&head1, 20);
LL_PushFront(&head1, 7);
LL_PushFront(&head1, 7);
LL_PushFront(&head1, 8);
LL_PushFront(&head1, 20);
LL_PushFront(&head1, 9);
LL_PushFront(&head1, 10);
assert(LL_GetSize(head1) == 12);

LinkedListElement<int>* head2 = nullptr;
// const list: 12, 17, 56, 19, 18, 10, 9, 12, 6, 8, 10
LL_PushFront(&head2, 10);
LL_PushFront(&head2, 8);
LL_PushFront(&head2, 6);
LL_PushFront(&head2, 12);
LL_PushFront(&head2, 9);
LL_PushFront(&head2, 10);
LL_PushFront(&head2, 18);
LL_PushFront(&head2, 19);
LL_PushFront(&head2, 56);
LL_PushFront(&head2, 17);
LL_PushFront(&head2, 12);
assert(LL_GetSize(head2) == 11);

LL_MergeSort(&head1); // 3, 4, 7, 7, 8, 9, 10, 20, 20, 20, 30, 50

int val;
assert(LL_GetNth(head1, 0, val) == true);
assert(val == 3);
assert(LL_GetNth(head1, 1, val) == true);
assert(val == 4);
assert(LL_GetNth(head1, 2, val) == true);
assert(val == 7);
assert(LL_GetNth(head1, 3, val) == true);
assert(val == 7);
assert(LL_GetNth(head1, 4, val) == true);
assert(val == 8);
assert(LL_GetNth(head1, 5, val) == true);
assert(val == 9);
assert(LL_GetNth(head1, 6, val) == true);
assert(val == 10);
assert(LL_GetNth(head1, 7, val) == true);
assert(val == 20);
assert(LL_GetNth(head1, 8, val) == true);
assert(val == 20);
assert(LL_GetNth(head1, 9, val) == true);
assert(val == 20);
assert(LL_GetNth(head1, 10, val) == true);
assert(val == 30);
assert(LL_GetNth(head1, 11, val) == true);
assert(val == 50);
assert(LL_GetNth(head1, 12, val) == false);

LL_MergeSort(&head2); // 6, 8, 9, 10, 10, 12, 12, 17, 18, 19, 56
assert(LL_GetNth(head2, 0, val) == true);
assert(val == 6);
assert(LL_GetNth(head2, 1, val) == true);
assert(val == 8);
assert(LL_GetNth(head2, 2, val) == true);
assert(val == 9);
assert(LL_GetNth(head2, 3, val) == true);
assert(val == 10);
assert(LL_GetNth(head2, 4, val) == true);
assert(val == 10);
assert(LL_GetNth(head2, 5, val) == true);
assert(val == 12);
assert(LL_GetNth(head2, 6, val) == true);
assert(val == 12);
assert(LL_GetNth(head2, 7, val) == true);
assert(val == 17);
assert(LL_GetNth(head2, 8, val) == true);
assert(val == 18);
assert(LL_GetNth(head2, 9, val) == true);
assert(val == 19);
assert(LL_GetNth(head2, 10, val) == true);
assert(val == 56);
assert(LL_GetNth(head2, 11, val) == false);

LinkedListElement<int>* front = nullptr;
LinkedListElement<int>* back = nullptr;
LL_FrontBackSplit(head1, &front, &back);
assert(front == head1);
assert(back->data == 10);
assert(LL_GetSize(front) == 6);
assert(LL_GetSize(back) == 6);
head1 = LL_SortedMerge(front, back); // 3, 4, 7, 7, 8, 9, 10, 20, 20, 20, 30, 50
assert(LL_GetSize(head1) == 12);
assert(LL_GetNth(head1, 0, val) == true);
assert(val == 3);
assert(LL_GetNth(head1, 1, val) == true);
assert(val == 4);
assert(LL_GetNth(head1, 2, val) == true);
assert(val == 7);
assert(LL_GetNth(head1, 3, val) == true);
assert(val == 7);
assert(LL_GetNth(head1, 4, val) == true);
assert(val == 8);
assert(LL_GetNth(head1, 5, val) == true);
assert(val == 9);
assert(LL_GetNth(head1, 6, val) == true);
assert(val == 10);
assert(LL_GetNth(head1, 7, val) == true);
assert(val == 20);
assert(LL_GetNth(head1, 8, val) == true);
assert(val == 20);
assert(LL_GetNth(head1, 9, val) == true);
assert(val == 20);
assert(LL_GetNth(head1, 10, val) == true);
assert(val == 30);
assert(LL_GetNth(head1, 11, val) == true);
assert(val == 50);
assert(LL_GetNth(head1, 12, val) == false);

LL_FrontBackSplit(head2, &front, &back); // 6, 8, 9, 10, 10, 12, 12, 17, 18, 19, 56
assert(front == head2);
assert(back->data == 12);
assert(LL_GetSize(front) == 6);
assert(LL_GetSize(back) == 5);
head2 = LL_SortedMergeRecursively(front, back);
assert(LL_GetNth(head2, 0, val) == true);
assert(val == 6);
assert(LL_GetNth(head2, 1, val) == true);
assert(val == 8);
assert(LL_GetNth(head2, 2, val) == true);
assert(val == 9);
assert(LL_GetNth(head2, 3, val) == true);
assert(val == 10);
assert(LL_GetNth(head2, 4, val) == true);
assert(val == 10);
assert(LL_GetNth(head2, 5, val) == true);
assert(val == 12);
assert(LL_GetNth(head2, 6, val) == true);
assert(val == 12);
assert(LL_GetNth(head2, 7, val) == true);
assert(val == 17);
assert(LL_GetNth(head2, 8, val) == true);
assert(val == 18);
assert(LL_GetNth(head2, 9, val) == true);
assert(val == 19);
assert(LL_GetNth(head2, 10, val) == true);
assert(val == 56);
assert(LL_GetNth(head2, 11, val) == false);

// 810, 9, 8
LinkedListElement<int>* intersect = LL_SortedIntersect(head1, head2);
assert(LL_GetSize(intersect) == 3);
assert(LL_GetNth(intersect, 0, val) == true);
assert(val == 10);
assert(LL_GetNth(intersect, 1, val) == true);
assert(val == 9);
assert(LL_GetNth(intersect, 2, val) == true);
assert(val == 8);
assert(LL_GetNth(intersect, 3, val) == false);
LL_Delete(&intersect);

// 3, 4, 6, 7, 7, 8, 8, 9, 9, 10, 10, 10, 12, 12, 17, 18, 19, 20, 20, 20, 30, 50, 56
LinkedListElement<int>* head = LL_SortedMergeRecursively(head1, head2);
assert(LL_GetSize(head) == 23);
assert(LL_GetNth(head, 0, val) == true);
assert(val == 3);
assert(LL_GetNth(head, 1, val) == true);
assert(val == 4);
assert(LL_GetNth(head, 2, val) == true);
assert(val == 6);
assert(LL_GetNth(head, 3, val) == true);
assert(val == 7);
assert(LL_GetNth(head, 4, val) == true);
assert(val == 7);
assert(LL_GetNth(head, 5, val) == true);
assert(val == 8);
assert(LL_GetNth(head, 6, val) == true);
assert(val == 8);
assert(LL_GetNth(head, 7, val) == true);
assert(val == 9);
assert(LL_GetNth(head, 8, val) == true);
assert(val == 9);
assert(LL_GetNth(head, 9, val) == true);
assert(val == 10);
assert(LL_GetNth(head, 10, val) == true);
assert(val == 10);
assert(LL_GetNth(head, 11, val) == true);
assert(val == 10);
assert(LL_GetNth(head, 12, val) == true);
assert(val == 12);
assert(LL_GetNth(head, 13, val) == true);
assert(val == 12);
assert(LL_GetNth(head, 14, val) == true);
assert(val == 17);
assert(LL_GetNth(head, 15, val) == true);
assert(val == 18);
assert(LL_GetNth(head, 16, val) == true);
assert(val == 19);
assert(LL_GetNth(head, 17, val) == true);
assert(val == 20);
assert(LL_GetNth(head, 18, val) == true);
assert(val == 20);
assert(LL_GetNth(head, 19, val) == true);
assert(val == 20);
assert(LL_GetNth(head, 20, val) == true);
assert(val == 30);
assert(LL_GetNth(head, 21, val) == true);
assert(val == 50);
assert(LL_GetNth(head, 22, val) == true);
assert(val == 56);
assert(LL_GetNth(head, 23, val) == false);

// 3, 4, 6, 7, 8, 9, 10, 12, 17, 18, 19, 20, 30, 50, 56
LL_RemoveDuplicates(&head);
assert(LL_GetSize(head) == 15);
assert(LL_GetNth(head, 0, val) == true);
assert(val == 3);
assert(LL_GetNth(head, 1, val) == true);
assert(val == 4);
assert(LL_GetNth(head, 2, val) == true);
assert(val == 6);
assert(LL_GetNth(head, 3, val) == true);
assert(val == 7);
assert(LL_GetNth(head, 4, val) == true);
assert(val == 8);
assert(LL_GetNth(head, 5, val) == true);
assert(val == 9);
assert(LL_GetNth(head, 6, val) == true);
assert(val == 10);
assert(LL_GetNth(head, 7, val) == true);
assert(val == 12);
assert(LL_GetNth(head, 8, val) == true);
assert(val == 17);
assert(LL_GetNth(head, 9, val) == true);
assert(val == 18);
assert(LL_GetNth(head, 10, val) == true);
assert(val == 19);
assert(LL_GetNth(head, 11, val) == true);
assert(val == 20);
assert(LL_GetNth(head, 12, val) == true);
assert(val == 30);
assert(LL_GetNth(head, 13, val) == true);
assert(val == 50);
assert(LL_GetNth(head, 14, val) == true);
assert(val == 56);
assert(LL_GetNth(head, 15, val) == false);

assert(LL_SortedInsert(&head, 1) == true);
// 1, 3, 4, 6, 7, 8, 9, 10, 12, 17, 18, 19, 20, 30, 50, 56
assert(LL_GetSize(head) == 16);
assert(LL_Find(head, 1) == 0);
// 1, 3, 4, 6, 7, 8, 9, 10, 12, 17, 18, 19, 20, 30, 50, 56, 60
assert(LL_SortedInsert(&head, 60) == true);
assert(LL_GetSize(head) == 17);
assert(LL_Find(head, 60) == 16);
// 1, 3, 4, 6, 7, 8, 9, 10, 10, 12, 17, 18, 19, 20, 30, 50, 56, 60
assert(LL_SortedInsert(&head, 10));
ptrdiff_t pos = LL_Find(head, 10);
assert(pos == 7);
pos = LL_Find(head, pos + 1, 10);
assert(pos == 0);

LL_Delete(&head);

}

//********************************************************************************

Bibliography:
[1] http://cslibrary.stanford.edu/105/LinkedListProblems.pdf
[2] Glenn W. Rowe, Data Structure & Algorithm C++, Prentice Hall, 1996
[3] http://en.wikipedia.org/wiki/Merge_sort
[4] http://en.wikipedia.org/wiki/Sorting_algorithm

Tuesday, 12 August 2014

Data Structure and Algorithm - Linked List: Common API

As I mentioned that Linked List is regarded as one of most frequently asked interview questions in Data Structure and Algorithm - Sort and Binary Search. One of the most beautiful things of these questions related to Linked List is to tell the interviewers how the candidate understands or grasps the concept and manipulation of pointer, which is regarded not an technical skill but an aptitude.

As I highlight in the following code, passing a pointer to a pointer "**" or passing a pointer "*" to LinkedListElement depends on what the function is to do. Any function is potentially to modify the argument head, a pointer to a pointer to LinkedListElement has to be passed, because the modification has to be made into the argument passed into the function. Otherwise a pointer to LinkedListElement should do the job.

// C Implementation of Linked List
//********************************************************************************
// LinkedListElement.h
#ifndef LINKED_LIST_ELEMENT_H
#define LINKED_LIST_ELEMENT_H

#pragma once

template <typename T>
struct LinkedListElement {
T data;
LinkedListElement<T>* next = nullptr;

LinkedListElement()
{}
LinkedListElement(const T& t) : data(t)
{}
~LinkedListElement()
{}
};

#endif

// LinkedListAlgo.h

#ifndef LINKED_LIST_ALGO_H

#define LINKED_LIST_ALGO_H

#pragma once

#include "LinkedListElement.h"

template<typename T>

void LL_PushFront(LinkedListElement<T>** head, const T& data)

{

if (*head == nullptr) {

*head = new LinkedListElement<T>(data);

}

else {

LinkedListElement<T>* curHead = *head;

*head = new LinkedListElement<T>(data);

(*head)->next = curHead;

}

template<typename T>

void LL_PushBack(LinkedListElement<T>** head, const T& data)

{

if (*head == nullptr) {

*head = new LinkedListElement<T>(data);

}

else {

LinkedListElement<T>* curHead = *head;

while (curHead->next != nullptr) {

curHead = curHead->next;

}

curHead->next = new LinkedListElement<T>(data);

}

template<typename T>

void LL_PopFront(LinkedListElement<T>** head)

{

if (head == nullptr || *head == nullptr) {

return;

}

LinkedListElement<T>* temp = *head;

*head = (*head)->next;

delete temp;

}

template<typename T>

void LL_PopBack(LinkedListElement<T>** head)

{

if (head == nullptr || *head == nullptr) {

return;

}

LinkedListElement<T>* prev = *head;

LinkedListElement<T>* curHead = *head;

while (curHead->next != nullptr) {

prev = curHead;

curHead = curHead->next;

}

prev->next = nullptr;

delete curHead;

}

template<typename T>

bool LL_Insert(LinkedListElement<T>** head, size_t pos, const T& data)

{

if (pos == 0) {

LL_PushFront(head, data);

return true;

}

LinkedListElement<T>* curHead = *head;

while (--pos > 0) {

curHead = curHead->next;

if (curHead == nullptr) {

return false;

}

LinkedListElement<T>* next = curHead->next;

curHead->next = new LinkedListElement<T>(data);

curHead->next->next = next;

return true;

}

template<typename T>

bool LL_Insert(LinkedListElement<T>** head, const LinkedListElement<T>* pos_node, const T& data)

{

if (head == nullptr || *head == nullptr || pos_node == nullptr) {

return false;

}

if (*head == pos_node) {

LL_PushFront(head, data);

return true;

}

LinkedListElement<T>* prev = nullptr;

LinkedListElement<T>* curHead = *head;

while (curHead != pos_node) {

prev = curHead;

curHead = curHead->next;

if (curHead == nullptr) {

return false;

}

prev->next = new LinkedListElement<T>(data);

prev->next->next = curHead;

return true;

}

template<typename T>

bool LL_Erase(LinkedListElement<T>** head, size_t pos)

{

if (head == nullptr || *head == nullptr) {

return false;

}

if (pos == 0) {

LL_PopFront(head);

return true;

}

LinkedListElement<T>* prev = *head;

LinkedListElement<T>* curHead = *head;

while (--pos > 0) {

prev = curHead;

curHead = curHead->next;

if (curHead == nullptr) {

return false;

}

prev->next = curHead->next;

delete curHead;

return true;

}

template<typename T>

bool LL_Erase(LinkedListElement<T>** head, const LinkedListElement<T>* pos_node)

{

if (head == nullptr || *head == nullptr || pos_node == nullptr) {

return false;

}

if (*head == pos_node) {

LL_PopFront(head);

return true;

}

LinkedListElement<T>* prev = nullptr;

LinkedListElement<T>* curHead = *head;

while (curHead != pos_node) {

prev = curHead;

curHead = curHead->next;

if (curHead == nullptr) {

return false;

}

prev->next = curHead->next;

delete curHead;

return true;

}

template<typename T>

bool LL_GetNth(LinkedListElement<T>* head, size_t pos, T& data)

{

if (head == nullptr) {

return false;

}

while (pos > 0) {

head = head->next;

if (head == nullptr) {

return false;

}

--pos;

}

data = head->data;

return true;

}

template<typename T>

LinkedListElement<T>* LL_GetNth(LinkedListElement<T>* head, size_t pos)

{

if (head == nullptr) {

return nullptr;

}

while (pos > 0) {

head = head->next;

if (head == nullptr) {

return nullptr;

}

--pos;

}

return head;

}

template<typename T>

size_t LL_GetSize(LinkedListElement<T>* head)

{

size_t size = 0;

while (head != nullptr) {

head = head->next;

++size;

}

return size;

}

template<typename T>

void LL_Reverse(LinkedListElement<T>** head)

{

// return if has less than 2 nodes

if (head == nullptr || *head == nullptr || (*head)->next == nullptr) {

return;

}

LinkedListElement<T>* cur = *head;

LinkedListElement<T>* next = nullptr;

LinkedListElement<T>* curHead = nullptr;

while (cur != nullptr) {

next = cur->next;

cur->next = curHead;

curHead = cur;

cur = next;

}

*head = curHead;

}

template<typename T>

void LL_Delete(LinkedListElement<T>** head)

{

if (head == nullptr || *head == nullptr) {

return;

}

LinkedListElement<T>* curHead = *head;

LinkedListElement<T>* temp = nullptr;

while (curHead != nullptr) {

temp = curHead;

curHead = curHead->next;

delete temp;

}

*head = nullptr;

}

* return -1, if not found

* 0 - size-1, if found the very first

template<typename T>

ptrdiff_t LL_Find(LinkedListElement<T>* head, const T& val)

{

if (head == nullptr) {

return -1;

}

ptrdiff_t pos = 0;

while (head != nullptr) {

if (head->data == val) {

return pos;

}

head = head->next;

++pos;

}

return -1;

}

template<typename T>

ptrdiff_t LL_Find(LinkedListElement<T>* head, size_t pos, const T& val)

{

if (head == nullptr) {

return -1;

}

while (pos > 0) {

head = head->next;

if (head == nullptr) {

return -1;

}

--pos;

}

return LL_Find(head, val);

}

#endif

// test.cpp

void TestLinkedListAlgo()

{

LinkedListElement<int>* head = nullptr;

// construct list: 20, 12, 10, 9, 9, 8, 7, 15

LL_PushFront(&head, 9);

LL_PushFront(&head, 10);

LL_PushFront(&head, 12);

LL_PushBack(&head, 9);

LL_PushBack(&head, 8);

LL_PushBack(&head, 7);

LL_PushBack(&head, 15);

LL_PushFront(&head, 20);

assert(LL_GetSize(head) == 8);

int val;

assert(LL_GetNth(head, 0, val) == true);

assert(val == 20);

assert(LL_GetNth(head, 1, val) == true);

assert(val == 12);

assert(LL_GetNth(head, 2, val) == true);

assert(val == 10);

assert(LL_GetNth(head, 3, val) == true);

assert(val == 9);

assert(LL_GetNth(head, 4, val) == true);

assert(val == 9);

assert(LL_GetNth(head, 5, val) == true);

assert(val == 8);

assert(LL_GetNth(head, 6, val) == true);

assert(val == 7);

assert(LL_GetNth(head, 7, val) == true);

assert(val == 15);

assert(LL_GetNth(head, 8, val) == false);

LL_PopFront(&head);

assert(LL_GetSize(head) == 7); // 12, 10, 9, 9, 8, 7, 15

LL_PopBack(&head);

assert(LL_GetSize(head) == 6); // 12, 10, 9, 9, 8, 7

LL_Delete(&head);

// constuct list: 18, 25, 30, 11, 10, 9, 5, 20

LL_PushBack(&head, 10);

LL_PushBack(&head, 9);

LL_PushBack(&head, 5);

LL_PushBack(&head, 20);

LL_PushFront(&head, 11);

LL_PushFront(&head, 30);

LL_PushFront(&head, 25);

LL_PushFront(&head, 18);

assert(LL_GetSize(head) == 8);

assert(LL_Insert(&head, 0, 40) == true); // 40, 18, 25, 30, 11, 10, 9, 5, 20

assert(LL_GetSize(head) == 9);

assert(LL_GetNth(head, 0, val) == true);

assert(val = 40);

assert(LL_Insert(&head, 10, 50) == false);

assert(LL_GetSize(head) == 9);

assert(LL_Insert(&head, 9, 50) == true); // 40, 18, 25, 30, 11, 10, 9, 5, 20, 50

assert(LL_GetSize(head) == 10);

assert(LL_GetNth(head, 9, val) == true);

assert(val == 50);

assert(LL_Insert(&head, 3, 100) == true); // 40, 18, 25, 100, 30, 11, 10, 9, 5, 20, 50

assert(LL_GetSize(head) == 11);

assert(LL_GetNth(head, 3, val) == true);

assert(val == 100);

LinkedListElement<int>* elemPtr = LL_GetNth(head, 2); // index starts from 0

assert(elemPtr != nullptr);

assert(elemPtr->data == 25);

assert(LL_GetNth(head, 11) == nullptr);

elemPtr = LL_GetNth(head, 9);

assert(elemPtr != nullptr);

assert(elemPtr->data == 20);

assert(LL_Insert(&head, elemPtr, 70) == true); // 40, 18, 25, 100, 30, 11, 10, 9, 5, 70, 20, 50

assert(LL_GetSize(head) == 12);

elemPtr = LL_GetNth(head, 9);

assert(elemPtr != nullptr);

assert(elemPtr->data == 70);

assert(LL_GetNth(head, 3, val) == true);

assert(val == 100);

assert(LL_GetNth(head, 12, val) == false);

assert(LL_GetSize(head) == 12);

assert(LL_Erase(&head, 0) == true); // 18, 25, 100, 30, 11, 10, 9, 5, 70, 20, 50

assert(LL_GetSize(head) == 11);

elemPtr = LL_GetNth(head, 0);

assert(elemPtr != nullptr);

assert(elemPtr->data == 18);

elemPtr = LL_GetNth(head, 10);

assert(elemPtr != nullptr);

assert(elemPtr->data == 50);

assert(LL_Erase(&head, elemPtr)); // 18, 25, 100, 30, 11, 10, 9, 5, 70, 20

assert(LL_GetSize(head) == 10);

assert(LL_GetNth(head, 10, val) == false);

assert(LL_GetNth(head, 9, val) == true);

assert(val == 20);

LL_PopFront(&head); // 25, 100, 30, 11, 10, 9, 5, 70, 20

assert(LL_GetSize(head) == 9);

elemPtr = LL_GetNth(head, 0);

assert(elemPtr != nullptr);

assert(elemPtr->data == 25);

LL_PopBack(&head); // 25, 100, 30, 11, 10, 9, 5, 70

assert(LL_GetSize(head) == 8);

assert(LL_GetNth(head, 8, val) == false);

assert(LL_GetNth(head, 7, val) == true);

assert(val == 70);

LL_Reverse(&head); // 70, 5, 9, 10, 11, 30, 100, 25

assert(LL_GetSize(head) == 8);

assert(LL_GetNth(head, 0, val) == true);

assert(val == 70);

assert(LL_GetNth(head, 1, val) == true);

assert(val == 5);

assert(LL_GetNth(head, 2, val) == true);

assert(val == 9);

assert(LL_GetNth(head, 3, val) == true);

assert(val == 10);

assert(LL_GetNth(head, 4, val) == true);

assert(val == 11);

assert(LL_GetNth(head, 5, val) == true);

assert(val == 30);

assert(LL_GetNth(head, 6, val) == true);

assert(val == 100);

assert(LL_GetNth(head, 7, val) == true);

assert(val == 25);

assert(LL_Insert(&head, 2, 10) == true); // 70, 5, 10, 9, 10, 11, 30, 100, 25

ptrdiff_t pos = LL_Find(head, 70);

assert(pos == 0);

pos = LL_Find(head, 25);

assert(pos == 8);

pos = LL_Find(head, 30);

assert(pos == 6);

pos = LL_Find(head, 10);

assert(pos == 2);

pos = LL_Find(head, 2, 10);

assert(pos == 0);

pos = LL_Find(head, 3, 10);

assert(pos == 1);

pos = LL_Find(head, 50);

assert(pos == -1);

LL_Delete(&head);

assert(head == nullptr);

assert(LL_GetSize(head) == 0);

}

Wednesday, 6 August 2014

Data Structure and Algorithm - String Pattern Search

It was a very pleasant train journey after work with a couple friends and talking about the interesting algorithms we have encountered. One of the friends talked about some interesting algorithm for string pattern search. And he could not recall exactly what its name is but it can speed up the string pattern search and do come clever step jumping other than the naive method that simply uses the brutal forces of computers. He mentioned the name of algorithm starting with "D"-something and this algorithm has linear complexity of computation. I thought he means "Dijkstra", which is one of the biggest names in computer science.

To the curiosity towards this algorithm I googled well-known string pattern search algorithm. I think what he means is the Knuth-Morris-Pratt algorithm. "Knuth", another big name of computer science. Still remember the first time I read his 3 magnificent books "The Art of Computer Programming". Even now still I have not the courage to finish some chapters in the third volume.

Again this is my another coding exercise on the train for 20 minutes of test code and 30 minutes of code implementation. See here is the list of algorithms stated on wikipedia about string pattern search.

1. Naive method
This is of course the first method that most of developers can think of. Let's assume
- Text string, S with length m
- Pattern sting, P with length n
As simple as it is, from the beginning to the end [0, m-1], compare S[i] to P[j], where
- i within [0, m-1],
- j within [0, n-1]
If at the first position i, where has S[i+j] = P[j] for every j, where j within [0, n-1]. Then i is the what we want and the match does exist. Otherwise match does not exist and there does not exist such a i.

// String pattern search - Naive method
//********************************************************************************
ptrdiff_t StringPatternSearch_Naive(const std::string& text,
const std::string& pattern)
{
if (text.empty() || pattern.empty()) {
return -1;
}

for (size_t pos_t = 0, pos_p = 0; (pos_t + pos_p) < text.size();) {
if (text[pos_t + pos_p] == pattern[pos_p]) {
++pos_p;
if (pos_p == pattern.size()) {
return static_cast<ptrdiff_t>(pos_t);
}
}
else {
++pos_t;
pos_p = 0;
}
}

return -1;
}
//********************************************************************************

Naive method has the worst case of O(m*n) and average case of O(m+n).

2. Rabin-Karp Algorithm

The idea of Rabin-Karp algorithm is to use hash function to computer the value of every continuous sub-strings with length of n in S and compare the hash value against the hash value of the pattern P. If they are equal, then compare the sub-string in S and pattern P to verify that they are equal.

// pseudo code - Rabin Karp Algorithm
//********************************************************************************
patternHashValue <-- compute hash value of pattern P
for every pos within [0, m-n) in text S
substringHashValue <-- compute hash value of sub-string of text S in [pos, pos+n]
compare patternHashValue against substringHashValue, if true
compare this sub-string against pattern P, if equal
match found and return pos
pos <-- pos +1

not match found and return -1
//********************************************************************************

The optimization of this algorithm happens in hash function. For each computation next to each other only one character needs to computer. Moving from pos to pos+1, S[pos] is replaced by S[pos+n] and all the computation in hash function on S[pos+1],..., S[pos+n-1] has been calculated previously. With a good design/implementation of hash function, Rabin-Karp Algorithm can achieve O(n+m) on average.

3. Knuth-Morris-Pratt Algorithm

The idea of Knuth-Morris-Pratt algorithm is to extract information from pattern P and use this information as the heuristic rules to guide the search. Comparing with naive method this algorithm is trying to increment pos_t and pos_p with variously steps to speed up the performance.

A simple example of this heuristic rule is that: assuming pattern P is consisted of all different characters (no single character appears twice). For say at position i at text S and j at pattern P, S[i+j] is equal to P[j], for every j within [0, k], where k < n-1. But S[i+j+1] is not equal to p[j+1]. pos_t is incremented by 1 and pos_p is assigned to 0, according to naive method. But here we know actually pos_t can be increment by j+1, because every character in pattern P is different and there is no match that S[l], where l within [i, i+j+l] is equal to P[0]. This is telling us that there must be some information in pattern P that can be used as heuristic rules to guide the search.

This heuristic rules are based on the repeat of sub-string in pattern P. The upcoming string is always trying to match from the start of P[0]. More detail please refer to Knuth-Morris-Pratt algorithm on wikipedia. And this algorithm can achieve linear complexity, O(m).

// String pattern search - Knuth-Morris-Pratt

//********************************************************************************

void BuildUpKMPTable(const std::string& pattern, std::vector<ptrdiff_t>& table)

{

if (pattern.empty()) {

return;

}

table.clear();

table.reserve(pattern.size());

table.push_back(-1);

ptrdiff_t matchCount = 0;

for (size_t pos = 1; pos < pattern.size(); ++pos) {

table.push_back(matchCount);

if (pattern[pos] == pattern[matchCount]) {

++matchCount;

}

else {

matchCount = 0;

}

ptrdiff_t StringPatternSearch_KMP(const std::string& text,

const std::string& pattern)

{
if (text.empty() || pattern.empty()) {

return -1;

}

std::vector<ptrdiff_t> kmpTable;

BuildUpKMPTable(pattern, kmpTable);

for (size_t pos_t = 0, pos_p = 0; (pos_t + pos_p) < text.size();) {

if (text[pos_t + pos_p] == pattern[pos_p]) {

++pos_p;

if (pos_p == pattern.size()) {

return static_cast<ptrdiff_t>(pos_t);

}

else {

if (kmpTable[pos_p] < 0) {

++pos_t;

pos_p = 0;

}

else {

pos_t = pos_t + pos_p - kmpTable[pos_p];

pos_p = kmpTable[pos_p];

}

return -1;

}

//********************************************************************************

4. Summary

Finite-state automation search algorithm can also achieve linear complexity but is more expensive on pre-processing than Knuth-Morris-Pratt algorithm. Boyer-Moor search algorithm also process the pattern string to get heuristic rules to guide search but it's worst case is O(m*n). More details about the comparison on these algorithms, please refer to String Search Algorithm on wikipedia.

Bibliography:

[1] http://en.wikipedia.org/wiki/String_searching_algorithm

[2] http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

[3] http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

[4] http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm