- Notifications
You must be signed in to change notification settings - Fork0
This project (Sorting and Searching Algorithms, AVL/Red-Black Trees , Hashing with Chains) was implemented as part of the Data Structures course during my second year of studies at the Department of Computer Engineering and Informatics (CEID) at the University of Patras.
License
Sofosss/DataStructures
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A simple introductory project in the field of data structures. It includes the implementation of typical sorting and searching algorithms (along with some alternative versions and optimizations) (PART I), the development of BST trees (AVL and Red-Black) and the implementation of a Hashing structure using chains with linked lists (PART II).
- About the project
- PART I: Sorting and Searching Algorithms
- PART II: Binary Search Trees and Hashing
- Contact
- License
The project was implemented as part of the Data Structures course during my second year of studies at the Department of Computer Engineering and Informatics (CEID) at the University of Patras.
In the first part of the project, algorithms for sorting and searching are implemented on a dataset (provided in this repository) containing historical data with daily values for three stocks traded on the NYSE and NASDAQ stock exchanges from 2005 to 2017. Specifically, the implemented sorting algorithms includeMergeSort,QuickSort,HeapSort andCountingSort. The searching algorithms encompassBinary Search,Interpolation Search andBinary Interpolation Search (BIS).
In the second part, binary search trees are developed, specificallyAVL andRed-Black (RB) trees. Each node of these trees maintains the Date and Volume fields of a stock record from the corresponding input file. The binary search trees are ordered based on a specific field (initially Date, with a later implementation for the Volume field) and they are implemented with dynamic memory management. In case of modification, insertion or deletion of a record (e.g., a tree node), the tree is appropriately readjusted, adhering strictly to the specifications of its structure. Additionally, aHash Chain structure is provided as an alternative implementation.
MergeSort and QuickSort: InPartI_A, theMergeSort andQuickSort algorithms are implemented with the purpose of sorting dates (
Date
field) in ascending order based on the opening values (Open
field). For QuickSort, three different implementations (variations) are provided. The first implementation corresponds to theStandard QuickSort, the second one to theRandomized QuickSort and the third one to theMedian of Three logic.HeapSort and CountingSort: InPartI_B, theHeapSort andCountingSort algorithms are implemented with the objective of sorting dates (
Date
field) in ascending order based on the opening values (Open
field). For CountingSort, two different implementations (variations) are provided. The first implementation corresponds to theStandard CountingSort and the second one to theExtended CountingSort.The Standard CountingSort operates on the integer values of the Close field (Close: a field of a stock record). However, it does not produce a correctly sorted array (e.g., data records with Close values of 35,510, 35,243, 35,210 appear in the sorted array in that order). The Extended CountingSort operates on the values of 1000*Close, considering them as integers since the Close values of data records have three decimal places. Thus, it produces a correctly sorted array at the expense of requiring a larger space for the C array, which is 1000 times larger than the C array of the standard CountingSort. [ More precisely, it holds that 1.000*[(int(max)–int(min)-1]+1 ≤ size of array C ≤ 1.000*[(int(max)–int(min)+1]-1, where max and min are the maximum and minimum decimal values and (int)max, (int)min are the maximum and minimum integer values (integer part of the decimal value) of the Close field, respectively. ]Binary Search and Interpolation Search:InPartI_C, the algorithmsBinary Search andInterpolation Search are implemented with the purpose of finding the total transaction volume (
Volume
field) for the stocks in the corresponding file, based on a specific date (Date
field).Binary Interpolation Search and Binary Interpolation Search Improved:InPartI_D, the algorithmsBinary Interpolation Search andBinary Interpolation Search Improved are implemented with the purpose of finding the total transaction volume (
Volume
field) for the stocks in the corresponding file, based on a specific date (Date
field). In both variations, the average-case time complexity remainsO(lognlogn). The Binary Interpolation Search variation performs jumps of size √n until it locates the block of size √n that potentially contains the search element x. In the next step, the jumps have a size of √√n and so on. This approach improves theworst-case time complexity from O(n) to O(√n). The Improved Binary Interpolation Search variation executes exponentially large jumps (of size √n, 2*√n, 2^2*√n, etc.) until it identifies the block that may contain x. Then, it calls the Binary Search algorithm for that block with elements at positions, for example, m+2^(i-1)*√n to m+2^i*√n in array S. This helps locate the block of size √n that potentially contains x. In one step, at most logn jumps are performed and the Binary Search algorithm requires at most log√n comparisons. Consequently, the total time complexity isO(logn) and in the next step it requiresO(log√n) and so on, resulting in aworst-case time complexity of O(logn).
Note
For the creation of the executable files of PART I, aMakefile is provided.
InPartII_A_AVL |PartII_A_RB, a Binary Search Tree (BST) is implemented as an AVL | RED-BLACK node-oriented tree for the records of the corresponding input stock file. Each node of the tree maintains theDate
andVolume
fields of a stock record.
The operations to be executed on the structure of each individual BST are:
- Displaying the Binary Search Tree (BST) with an in-order traversal. Each representation should include a header with the titles of the elements of the records being depicted.
- Searching for the volume trading price (
Volume
) based on the date (Date
) provided by the user. - Modifying the volume trading price corresponding to a specific date, which will be provided by the user.
- Deleting a record corresponding to a specific date, which will be provided by the user.
The structure of a node in AVL:
structdateVolume// Data record stored in binary tree node{charDate[11];intVolume;};typedefstructdateVolumedataItem;structbinaryTreeNode// Binary Search Tree node implemented as AVL tree node{dataItemdata;structbinaryTreeNode*left;structbinaryTreeNode*right;intheight;};typedefstructbinaryTreeNodebtNode;btNode*root=NULL;// Root of the tree initially empty
The structure of a node in Red-Black(RB):
structdateVolume// Data record stored in a binary tree node{charDate[11];intVolume;};typedefstructdateVolumedataItem;structbinaryTreeNode// Binary Search Tree node implemented as RED-BLACK tree node{dataItemdata;structbinaryTreeNode*left;structbinaryTreeNode*right;structbinaryTreeNode*parent;charcolor;};typedefstructbinaryTreeNodebtNode;btNode*root=NULL;// Root of the tree initially empty
The requested operations are implemented through the following functions:
voidinorderBinTree(btNode*r)btNode*searchBinTree(btNode*r,charx[11])deleteFromBinTree(btNode*r,charx[11])
(FordeleteFromBinTree()
, in the case of an AVL tree, it returns a pointer to a node in the tree, while in the case of a RED-BLACK tree it does not return any value).
ThemodifyBinTree()
function has not been implemented separately.
For each different structure (AVL, RED-BLACK), a set of auxiliary functions is used (e.g.,rotateL()
,rotateR()
,btNodeHeight()
,btNodeBalance()
,uncle()
,sibling()
,swapColors()
,fixRedRed()
,fixDoubleBlack()
, etc.) to implement the basic operations.
📌 The creation of the tree is done with the help of the corresponding functioninsertToBinaryTree(dataItem x)
, which is called byreadFileToBinaryTree()
every time a data record is read. In the case of the AVL tree, the function returns a pointer to a tree node, while in the case of the RED-BLACK tree, it does not return any value. The insertion of duplicate records (data records with the same value in theDate
field) is not allowed. Practically, since all values in theDate
field are unique, there are no duplicate records (the precaution against inserting duplicates is taken for completeness reasons).
📌 The rebalancing of the subtree with the root being the node T1 is performed using the functions btNode*rotateL(btNode *T1)
and btNode*rotateR(btNode *T1)
, which implement the single left and right rotations. Double rotation involves either a single left and a single right rotation or a single right and a single left rotation.
Note
In the case of theAVL tree, the insertion of a new node is concluded with at most a single or double terminal rotation, while in the deletion of a node, rotations may propagate up to the root of the tree.
Note
In the case of theRED-BLACK tree, for each insertion or deletion operation of a node at most one single and one double terminal rotation are executed.
Caution
When a violation of the properties of theRED-BLACK tree occurs in the node 'r', the functionsvoid fixRedRed(btNode *r)
andvoid fixDoubleBlack(btNode *r)
are called to correct the problem.
fixRedRed()
is called during the insertion of a new node into the tree when two consecutive red nodes are created (red parent and red child).fixDoubleBlack()
is called during the deletion of a black node that leads to a reduction by 1 in the number of black nodes on the paths from the root of the tree that, before the deletion, they(the paths) passed through the deleted node.
Both functions (fixRedRed() and fixDoubleBlack()
) are recursive, operate level-wise and perform color changes on the nodes, which may propagate up to the root of the tree along with O(1) rotations.
For the representation of the nodes in each tree, the functionsinorderBinTree()
andprintBinTree()
are utilized:
IninorderBinTree()
, the fundamental information (Date, Volume
) of each data record stored in the tree nodes is displayed, along with a set of auxiliary information primarily for testing purposes.In the AVL tree implementation, additional information such as the height and balance factor of each node is displayed.In the RED-BLACK tree implementation, for each node with one or zero children (leaf), the color of the node, the count of black nodes in the path from the node to the tree's root, as well as the total count of red and black nodes in the path are displayed.
The functionprintBinTree()
prints the structure of the tree from left to right with theVolume
information of each node.
- In the AVL tree implementation, the balance factor of each node is additionally printed.
- In the RED-BLACK tree implementation, the color of each node is printed.
As in the previous case, inPartII_B_AVL |PartII_B_RB the tree is constructed with the functioninsertToBinTree(). The insertion of duplicate records (data records with identical values in theDate
andVolume
fields) is not allowed. As mentioned earlier, since all values in theDate
field are unique, there are no duplicate records (the precaution against inserting duplicates is taken for completeness reasons).
In this case (Volume
based BST), the operations to be executed on the structure of each individual BST are:
- Finding the day/days with theMINIMUM trading volume.
- Finding the day/days with theMAXIMUM trading volume.
The requested operations are implemented through the following functions:
btNode *minValuebtNode(btNode *r)
: Returns a pointer to the node in the tree with the smallestVolume
value (leftmost node of the tree with only a right child or is a leaf) or NULL if the tree is empty.btNode *maxValuebtNode(btNode *r)
: Returns a pointer to the node in the tree with the largestVolume
value (rightmost node of the tree with only a left child or is a leaf) or NULL if the tree is empty.void reportBinTree(btNode *r, int x)
: Prints theDate
values of data records that have aVolume
value equal to x. [reportBinTree()
is quite general: it can easily be modified to print theDate
values of data records withVolume
values between x1 and x2, where x1 < x2. The time complexity for a tree with n data records is O(logn + k), where k is the size of the answer (the number of data records withDate
values within the given range). ]
Finally, inPartII_C, for the records of the corresponding input stock file, we replace the Binary Search Tree (BST) based on theDate
field using a Hashing structure withchained (linked) lists. The hash function is computed as the modulus of the sum of ASCII codes of individual characters forming theDATE
, divided by an odd number m, which represents the number of buckets. For instance, withDATE = "2014-02-13"
andm=11
, the formula is:
Hash("2014-02-13") = [ASCII(‘2’)+ ASCII(‘0’)+ ASCII(‘1’)+ ASCII(‘4’)+ ASCII(‘-’)+ ASCII(‘0’)+ ASCII(‘2’)+ ASCII(‘-’)+ ASCII(‘1’)+ ASCII(‘3’)] mod 11
The structure of the data stored in the buckets of the hash chain is as follows:
#defineM 11 // Size of Hash tablestructdateVolume// Data record stored in Hash table Bucket list{charDate[11];intVolume;};typedefstructdateVolumedataItem;
The structure of the buckets of the hash chain is as follows:
structlistNode// Bucket list node{dataItemdata;structlistNode*next;};typedefstructlistNodelNode;lNode*hashTable[M]= {NULL};// Hash table of M buckets initially empty
In this case (Hashing structure with chained (linked) lists), the operations to be executed are:
- Search for the transaction volume based on the date provided by the user.
- Modify the record entries based on the date provided by the user. The modification specifically concernsONLY the transaction volume field.
- Delete a record from the hash table based on the date provided by the user.
Note
The creation of the hash structure (buckets with linked lists) is done with the help of the functionvoid insertToHashTable(dataItem x)
, which is called byreadFileToHashTable()
each time a data record is read.
Note
The insertion of duplicate records (data records with the same Date) is not allowed. As mentioned earlier, since all values in theDate
field are unique, there are no duplicate records (the handling of not allowing duplicate records is here for completeness).
The requested functionalities are implemented through the operations:
lNode*searchHashTable(charx[11])voiddeleteFromHashTable(charx[11])
Before the user exits the program, we print the contents of the Hash Table usingvoid displayHashTable()
.
Note
For the creation of the executable files of PART I, aMakefile is provided.
Important
In thedirectory, the filesPartII_AB_AVL,PartII_AB_RB,PartII_ABC_AVL andPartII_ABC_RB are also provided:
PartII_AB_AVL.cpp
: It supports the functions of the first two subsections of Part II on the AVL BST structure.PartII_AB_RB.cpp
: It supports the functions of the first two subsections of Part II on the Red-Black BST structure.PartII_ABC_AVL.cpp
: It supports the functions of the first two subsections of Part II on the AVL BST structure, as well as the Hashing with Chained Linked Lists structure along with the corresponding tasks.PartII_ABC_RB.cpp
: It supports the functions of the first two subsections of Part II on the Red-Black BST structure, as well as the Hashing with Chained Linked Lists structure along with the corresponding tasks.
- Sofotasios Argiris |a.sofotasios@ac.upatras.gr
Distributed under theMIT License. SeeLICENSE.md
for more details.
About
This project (Sorting and Searching Algorithms, AVL/Red-Black Trees , Hashing with Chains) was implemented as part of the Data Structures course during my second year of studies at the Department of Computer Engineering and Informatics (CEID) at the University of Patras.