Movatterモバイル変換

CassandraIntroduction & Key FeaturesMeetup Vienna Cassandra Users13th of January 2014philipp.potisk@geroba.com

DefinitionApache Cassandra is an open source, distributed,decentralized, elastically scalable, highly available,fault-tolerant, tuneably consistent, column-orienteddatabase that bases its distribution design on Amazon’sDynamo and its data model on Google’s Bigtable.Created at Facebook, it is now used at some of the mostpopular sites on the Web [The Definitive Guide, EbenHewitt, 2010]13/01/2014Cassandra Introduction & Key Features by Philipp Potisk2

HistoryDynamo, 2007Bigtable, 2006OpenSource, 200813/01/2014Cassandra Introduction & Key Features by Philipp Potisk3

Key FeaturesDistributedandDecentralizedHigh PerformanceCQL – A SQLlike queryinterfaceElasticScalabilityCassandraColumnorientedKey-Valuestore13/01/2014HighAvailabilityand FaultToleranceTuneableConsistencyCassandra Introduction & Key Features by Philipp Potisk4

Distributed and DecentralizedDatacenter 1• Distributed: Capable of runningon multiple machines• Decentralized: No single point offailureNo master-slave issues due topeer-to-peer architecture(protocol "gossip")Single Cassandra cluster may runacross geographically disperseddata centers13/01/2014Datacenter 2176253412811910Read- and writerequests to any nodeCassandra Introduction & Key Features by Philipp Potisk5

Elastic Scalability181• Cassandra scales horizontally,adding more machines that haveall or some of the data on• Adding of nodes increaseperformance throughput linearly• De-/ and increasing thenodecount happen seamlessly4 Performance2throughput = N32Performancethroughput = N x 27465Linearly scales toterabytes andpetabytes of data13/01/2014Cassandra Introduction & Key Features by Philipp Potisk36

Scaling Benchmark By Netflix*48, 96, 144 and 288instances, with 10, 20,30 and 60 clientsrespectively. Each clientgenerated ~20.000w/shaving 400byte in sizeCassandra scales linearly farbeyond our current capacityrequirements, and veryrapid deploymentautomation makes it easy tomanage. In particular,benchmarking in the cloudis fast, cheap and scalable,*http://techblog.netflix.com/2011/11/benchmarking-cassandrascalability-on.html13/01/2014Cassandra Introduction & Key Features by Philipp Potisk7

High Availability and Fault Tolerance• High Availability?Multiple networked computersoperating in a clusterFacility for recognizing nodefailuresForward failing over requests toanother part of the system162534• Cassandra has High AvailabilityNo single point of failuredue to the peer-to-peerarchitecture13/01/2014Cassandra Introduction & Key Features by Philipp Potisk8

Tunable Consistency• Choose between strong and eventualconsistency• Adjustable for read- and writeoperations separately• Conflicts are solved during reads, asfocus lies on write-performanceTUNABLEAvailableConsistencyUse case dependentlevel of consistency13/01/2014Cassandra Introduction & Key Features by Philipp Potisk9

When do we have strong consistency?• Simple Formula:jsmith(nodes_written + nodes_read) >replication_factorjsmitht1t2NW: 2NR: 2RF: 3t1t2jsmitht1• Ensures that a read alwaysreflects the most recent write• If not: Weak consistency Eventually consistentjsmith13/01/2014Cassandra Introduction & Key Features by Philipp Potiskt210

Column-oriented Key-Value StoreRow Key1ColumnKey1ColumnValue1ColumnKey2ColumnValue2ColumnKey3ColumnValue3………• Data is stored in sparsemultidimensional hash tables• A row can have multiple columns –not necessarily the same amount ofcolumns for each row• Each row has a unique key, whichalso determines partitioning• No relations!Stored sorted by row key *Stored sorted by column key/valueMap<RowKey, SortedMap<ColumnKey, ColumnValue>>* Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly13/01/2014Cassandra Introduction & Key Features by Philipp Potisk11

CQL – An SQL-like query interface• “CQL 3 is the default and primary interface into the Cassandra DBMS” *• Familiar SQL-like syntax that maps to Cassandras storage engine andsimplifies data modellingCRETE TABLE songs (id uuid PRIMARY KEY,title text,album text,artist text,data blob,tags set<text>);INSERT INTO songs(id, title, artist,album, tags)VALUES('a3e64f8f...','La Grange','ZZ Top','Tres Hombres'‚{'cool', 'hot'});SELECT *FROM songsWHERE id = 'a3e64f8f...';“SQL-like” but NOTrelational SQL* http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf13/01/2014Cassandra Introduction & Key Features by Philipp Potisk12

High Performance• Optimized from the ground upfor high throughput• All disk writes are sequential,append only operations• No reading before writing• Cassandra`s threading-concept isoptimized for running onmultiprocessor/ multicoremachines13/01/2014Optimized for writing,but fast reads arepossible as wellCassandra Introduction & Key Features by Philipp Potisk13

Benchmark from 2011 (Cassandra 0.7.4)*opsCassandra showedoutstanding throughput in“INSERT-only” with 20,000opsInsert: Enter 50 million 1K-sized recordsRead: Search key for a one hour period + optional updateHardware: Nehalem 6 Core x 2 CPU, 16GB Memory13/01/2014Cassandra Introduction & Key Features by Philipp Potisk*NoSql Benchmarking by Curbithttp://www.cubrid.org/blog/dev-platform/nosqlbenchmarking/14

Benchmark from 2013 (Cassandra 1.1.6)** Benchmarking Top NoSQL Databases by End Point Corporation,http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdfYahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB13/01/2014Cassandra Introduction & Key Features by Philipp Potisk15

When do we need these features?Lots ofWrites,Statistics, andAnalysisGeographicalDistributionLargeDeployments13/01/2014EvolvingApplicationsCassandra Introduction & Key Features by Philipp Potisk16

Who is using Cassandra?13/01/2014Cassandra Introduction & Key Features by Philipp Potisk17

ebay Data Infrastructure*••••••Thousands of nodes> 2K sharded logical host> 16K tables> 27K indexes> 140 billion SQLs/day> 5 PB provisioned• 10+ clusters• 100+ nodes• > 250 TB provisioned(local HDD + shared SSD)• > 9 billion writes/day• > 5 billion reads/day• Hundreds of nodes• Persistent & in-memory• > 40 billion SQLs/dayNot replacing RDMBS butcomplementing!Hundreds of nodes> 50 TB> 2 billion ops/day• Thousands of nodes• The world largest clusterwith 2K+ nodes*by Jay Patel, Cassandra Summit June 2013 San Francisco13/01/2014Cassandra Introduction & Key Features by Philipp Potisk18

Cassandra Use Case at EbayApplication/Use Case• Time-series data and real-time insights• Fraud detection & prevention• Quality Click Pricing for affiliates• Order & Shipment Tracking•…• Server metrics collection• Taste graph-based next-gen recommendationsystem• Social Signals on eBay Product & Item pages13/01/2014Why Cassandra?• Multi-Datacenter (active-active)• No SPOF• Easy to scale• Write performance• Distributed CountersCassandra Introduction & Key Features by Philipp Potisk19

Cassandra/Hadoop Deployment13/01/2014Cassandra Introduction & Key Features by Philipp Potisk20

Summary• History• Key features of Cassandra•••••••Distributed and DecentralizedElastic ScalabilityHigh Availability and Fault ToleranceTunable ConsistencyColumn-oriented key-value storeCQL interfaceHigh Performance• Ebay Use Case13/01/2014Apache project: http://cassandra.apache.orgCommunity portal: http://planetcassandra.orgDocumentation: http://www.datastax.com/docsCassandra Introduction & Key Features by Philipp Potisk21

Movatterモバイル変換

Change Language

Cassandra Introduction & Features

In this document

Recommended

More Related Content

What's hot

Viewers also liked

Similar to Cassandra Introduction & Features

More from DataStax Academy

Recently uploaded

Cassandra Introduction & Features