You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
CSE 512 - Distributed and Parallel Database Systems
Project description
The project was aimed to setup a spark cluster with HDFS and run SparkSQL queries (geo-spatial) on the it.
Native spark cluster was used as cluster manager.
Hadoop Distributed File System (HDFS) was used as distributed storage system.
The setup was done using Amazon EC2 virtual machines as nodes.
Spatial queries such as range query, range join query, distance query, distance join query, hot zone analysis and hot cell analysis were executed.
Spatial queries were executed by implementing user defined functions such as ST_contains and ST_within in Scala.
ST_contains takes a point and a rectangle and returns a boolean indicating whether the point is inside the rectangle.
ST_within takes two points and a distance and returns a boolean indication whether the distance between the points is not more than the distance provided.