steve-thousand/distributed-db-projectPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star1

Educational exploration of distributed-systems as a topic by building a replicated-database as practice.

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
cluster		cluster
common		common
node		node
registry		registry
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Repository files navigation

distributed-db-project

For my own education (and anyone that wants to follow) this project explores the creation of adistributed system, specifically a simple database.

I plan to perform updates, providing the steps taken in each update here in the readme, as well astagging each version in git.

Oh, and by the way, don't use this anywhere for any reason. To clarify, I mean that I don'trecommend that you rely on this code for anything you may be building. This is an educationalproject. You are free to use it in any way you want, but I make no guarantees.

versions

Unfortunately when I set this up, I didn't really have much of a plan for how these versions wouldbe decided, so the version system here is a little confusing.

version 1.0.0 - Just a Simple Key/Value Datastore

A simple client and server architecture for reading and writing key/value pairs. Data is stored onlyin memory. Persistence will come in later version.

I've written a lot of web servers before but usually end up using some framework for it. In thiscase,just using some native java classes for server/http stuff.

This is hardly a database and certainly not distributed, but making it persistent and consistent isnext.

version 2.0.0 - Registry Service, and Service Discovery

Ok so this was a big one. I got a little carried away with reorganizing the project.

One of my goals is to be able to spin up a 2 or more nodes that are aware of eachother and able tocommunicate with one another. But I do not want to have to explicitly tell each node about the othernodes. This means that nodes need a way of "discovering" one another. I did a little research. Thereare a few ways to do this. The one I have gone with is a simple registry-service. The registry isa server that hangs out and receives requests to register nodes. Each node is given the address ofthe registry service on startup. Nodes must register themselves with the registry. Eventually,nodes will use the registry service to list all nodes in the cluster to facilitate communication.

One of many changes I included here is use of the fabric8 docker maven plugin to build dockerimages. Now I can start a registry container

docker run steve000/distributed-db-project/registry:2.0.0

Then start a node container, passing the registry address as an argument

docker run steve000/distributed-db-project/node:2.0.0 -ra http://172.17.0.2:8080

And the node registers itself with the registry

[pool-1-thread-1] INFO io.steve000.distributed.db.registry.server.InMemoryRegistry - Registered node 7a735fec-2a39-471c-9f49-f719f6c5b36b at IP 172.17.0.3:8050

version 2.1.0 - Leader and Follower, the Bully Algorithm

Sorry that this ended up being such a big one. This includes adding a new "cluster" module thatincludes the code that will be used by each node to discover and communicate. Making it its ownmodule hopefully will allow it to be used despite what the cluster is used for (data replication,distribution, workflows etc...).

Ultimate goal is to be able to communicate automatically between nodes so that we can distributedata. Someone needs to coordinate that work, so I am thinking ofaleader and follower pattern. For that to work, I will implement an election process. Soooooo let's do that!

And since writing that last paragraph I have done some reading and it turns out that elections arecomplicated. I have implemented abully algorithmfor choosing a leader. On startup, every node waits some period of time to receive heartbeat from aleader node. Any node that receives a heartbeat identifies the sending node as the new leader. If itreceives none in a configured window of time (most likely to happen if it is the first node in thecluster), it starts an election process.

The election process works like this:

the node that begins the election considers all registered nodes.
if there are any higher-id nodes (currently using alphabetic order of names) they are allpinged for live-ness. if any of those returns a timely response, the node further waits toreceive a victory response. if a victory response is received in that time, we have found ourleader.
if there are no higher-id live-nodes or we do not hear a victory message from them soon enough,then this node is the leader. all lower-id nodes are sent victory messages.

Of course this requires some changes to other parts of the platform as well. Now that I've enteredthe servers-being-lost territory, the registry and the servers that use it for service discoveryneed some changes to be able to forget nodes, otherwise I'll end up with error logs endlessly.

It wasn't easy and I'm sure there are race conditions lurking there somewhere but I've got it setup so that multiple nodes can be started with nothing but a registry parameter, and at run timenodes will discover each-other, and a leader will be chosen. If the leader fails, a new leaderwill be chosen.

version 2.2.0 - Simple Replication

Still working towards the goal of data replication.

It is my understanding of the leader/follower pattern that write requests should go through theleader, which replicates data to the followers. In that way, read requests can be done against anynode.

I have set up a simple replication layer. It is not fault-tolerant, and it is not transactional. ButI've tested it locally with a cluster of 3 nodes (plus 1 register node). Writes go through theleader and then are successfully replicated to the followers.

I am also not sure what to do about replicating existing data to new nodes. It's one of my goals tobe able to add new nodes to a cluster and have it replicate the data.

Future improvements will include making it durable and transactional.

version 3.0.0 - Replication Write-Ahead, the Raft Algorithm

To enable replication and ensure consensus across nodes, I am usingthe raftalgorithm, or at least somehalf-baked form of it. Each node has a replication log, and the leader sendsclient actions to each follower to be appended to their logs. Only once all(or ideally a majority) of followers have successfully appended to their logsdoes the leader commit to its log, performing the entry action and responding tothe client. At that point followers are told to commit as well.

This is a pretty awesome visual demonstration of the raft algorithm

I've also set it up so that new nodes can request a replication log sync withthe leader. This sync action could be used to repair disagreements betweennodes as well.

About

Educational exploration of distributed-systems as a topic by building a replicated-database as practice.

Releases

5tags

Packages

No packages published

Languages

Java100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

distributed-db-project

versions

version 1.0.0 - Just a Simple Key/Value Datastore

version 2.0.0 - Registry Service, and Service Discovery

version 2.1.0 - Leader and Follower, the Bully Algorithm

version 2.2.0 - Simple Replication

version 3.0.0 - Replication Write-Ahead, the Raft Algorithm

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

steve-thousand/distributed-db-project

Folders and files

Latest commit

History

Repository files navigation

distributed-db-project

versions

version 1.0.0 - Just a Simple Key/Value Datastore

version 2.0.0 - Registry Service, and Service Discovery

version 2.1.0 - Leader and Follower, the Bully Algorithm

version 2.2.0 - Simple Replication

version 3.0.0 - Replication Write-Ahead, the Raft Algorithm

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages