- Notifications
You must be signed in to change notification settings - Fork1
airtonjal/centrality-nubank
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A simple RESTful server application to solve the graph centrality problem
In this challenge, suppose we are looking to do social network analysis for prospective customers. We want to extract from their social network a metric called "closeness centrality".
Centrality metrics try to approximate a measure of influence of an individual within a social network. The distance between any two vertices is their shortest path. Thefarness of a given vertexv is the sum of all distances from each vertex tov. Finally, thecloseness of a vertexv is the inverse of thefarness.
The first part of the challenge is to rank the vertices in a given undirected graph by theircloseness. The graph is provided in the attached file; each line of the file consists of two vertex names separated by a single space, representing an edge between those twonodes.
The second part of the challenge is to create a RESTful web server with endpoints to register edges and to render a ranking of vertexes sorted by centrality. We can think of the centrality value for a node as an initial "score" for that customer.
The third and final part is to add another endpoint to flag a customer node as "fraudulent". It should take a vertex id, and update the internal customer score as such:
- The fraudulent customer score should be zero.
- Customers directly related to the "fraudulent" customer should have their score halved.
- More generally, scores of customers indirectly referred by the "fraudulent" customer should be multiplied by a coefficient F:
F(k) = (1 - (1/2)^k)
where k is the length of the shortest path from the "fraudulent" customer to the customer in question.
- Edge weight == 1
- edges.txt must be read and loaded when the application starts
- REST server might receive new vertexes, not necessarily existed previously
- The graph is not necessarily connected
- Read edges.txt, create an adjacency map (given a vertex returns adjacent vertexes)
- ImplementBreadth First Search to find the shortest paths of a vertex to every other in the graph
- UseBFS for all vertexes, put the result in a map, calculate and sort closeness
- When a fraud is signalized, recalculate and sort the closeness of every connected node
- When an edge is added, recalculate and sort the closeness of the whole graph
The application uses Scala'sSBT tool. To build it, cd to the project root dir and type:
sbt package
To run the application and start the REST server:
sbt run
To run tests (developed with ScalaTest):
sbttest
The shell output should be something like the following:
Services are invoked through http requests. The data format isJSON. The following table describes the services:
Service | Description | Path | Method | Return |
---|---|---|---|---|
List | Lists the vertexes sorted by score | /graph/list | GET | A JSON like the following:[ {"vertex":44,"score":0.005988024}, {"vertex":88,"score":0.00591716}, {"vertex":33,"score":0.005882353} ... ] |
Add | Adds a new edge to the graph | /edge/${v1}/${v2} | POST | A string confirming the edge was added |
Fraud | Signalizes a vertex as fraudulent | /fraud/${v} | PUT | A string confirming the vertex was marked |
- Implement core functionality usingAkka Actors
- Avoid recalculating the whole graph closeness when an edge is added
- Use a dependency injection framework likeScaldi orGuice
- ImplementFloyd-Warshall algorithm. Maybe could be simplified since edge weight == 1
- Usebreadth first search, probably better than Floyd-Warshall
- Each time a vertex is added, the closeness of each vertex must be recalculated. Can we optmize it??
- When a fraud is detected, must recalculate closeness of each vertex. Can we optimize it??
Other ideas:
- Use lazy evaluation??
- Djikstra Algorithm