Movatterモバイル変換


[0]ホーム

URL:


DataStax Academy, profile picture
Uploaded byDataStax Academy
35,329 views

Cassandra Introduction & Features

Apache Cassandra is an open-source, distributed, decentralized, and highly available database characterized by elastic scalability and tunable consistency, originally developed at Facebook. Key features include a column-oriented key-value store, a SQL-like query interface (CQL), and optimized high performance suitable for large-scale applications. Notable users like eBay employ Cassandra for various applications, including fraud detection and real-time insights.

In this document
Powered by AI

Introduction to Apache Cassandra, its purpose, and context for the meetup.

Definition of Apache Cassandra as a highly available, fault-tolerant, decentralized database.

Historical context of Cassandra's development referencing Dynamo, Bigtable, and OpenSource.

Highlights the essential features of Cassandra including distribution, scalability, and consistency.

Explains the distributed architecture of Cassandra with peer-to-peer mechanics and geographical capabilities.

Details on Cassandra's ability to horizontally scale and maintain performance as nodes are added.

Netflix benchmark data showing Cassandra's linear scalability across various instances.

Describes the importance of high availability in Cassandra and its mechanism for handling node failures.

Explains how Cassandra allows tuning of consistency levels for read and write operations based on use cases.

Illustrates conditions for achieving strong consistency versus eventual consistency in operations.

Describes how Cassandra stores data in a column-oriented key-value format without fixed schemas.

Overview of Cassandra Query Language (CQL) designed to simplify data interaction.

Details on Cassandra's design optimizations to ensure high write-throughput and performance.

2011 performance benchmarks demonstrating Cassandra's impressive insert capabilities.

References benchmark studies from 2013, comparing the capabilities of NoSQL databases including Cassandra.

Identifies scenarios where Cassandra's features are advantageous including large deployments and geographic distribution.

Discusses who utilizes Cassandra, specifically highlighting eBay's substantial data infrastructure.

Detailed use cases from eBay showcasing why Cassandra was chosen for various applications.

Brief mention of how Cassandra integrates with Hadoop for data processing.

Summary of Cassandra including history, key features, and specific use case highlights.

Downloaded 999 times
CassandraIntroduction & Key FeaturesMeetup Vienna Cassandra Users13th of January 2014philipp.potisk@geroba.com
DefinitionApache Cassandra is an open source, distributed,decentralized, elastically scalable, highly available,fault-tolerant, tuneably consistent, column-orienteddatabase that bases its distribution design on Amazon’sDynamo and its data model on Google’s Bigtable.Created at Facebook, it is now used at some of the mostpopular sites on the Web [The Definitive Guide, EbenHewitt, 2010]13/01/2014Cassandra Introduction & Key Features by Philipp Potisk2
HistoryDynamo, 2007Bigtable, 2006OpenSource, 200813/01/2014Cassandra Introduction & Key Features by Philipp Potisk3
Key FeaturesDistributedandDecentralizedHigh PerformanceCQL – A SQLlike queryinterfaceElasticScalabilityCassandraColumnorientedKey-Valuestore13/01/2014HighAvailabilityand FaultToleranceTuneableConsistencyCassandra Introduction & Key Features by Philipp Potisk4
Distributed and DecentralizedDatacenter 1• Distributed: Capable of runningon multiple machines• Decentralized: No single point offailureNo master-slave issues due topeer-to-peer architecture(protocol "gossip")Single Cassandra cluster may runacross geographically disperseddata centers13/01/2014Datacenter 2176253412811910Read- and writerequests to any nodeCassandra Introduction & Key Features by Philipp Potisk5
Elastic Scalability181• Cassandra scales horizontally,adding more machines that haveall or some of the data on• Adding of nodes increaseperformance throughput linearly• De-/ and increasing thenodecount happen seamlessly4 Performance2throughput = N32Performancethroughput = N x 27465Linearly scales toterabytes andpetabytes of data13/01/2014Cassandra Introduction & Key Features by Philipp Potisk36
Scaling Benchmark By Netflix*48, 96, 144 and 288instances, with 10, 20,30 and 60 clientsrespectively. Each clientgenerated ~20.000w/shaving 400byte in sizeCassandra scales linearly farbeyond our current capacityrequirements, and veryrapid deploymentautomation makes it easy tomanage. In particular,benchmarking in the cloudis fast, cheap and scalable,*http://techblog.netflix.com/2011/11/benchmarking-cassandrascalability-on.html13/01/2014Cassandra Introduction & Key Features by Philipp Potisk7
High Availability and Fault Tolerance• High Availability?Multiple networked computersoperating in a clusterFacility for recognizing nodefailuresForward failing over requests toanother part of the system162534• Cassandra has High AvailabilityNo single point of failuredue to the peer-to-peerarchitecture13/01/2014Cassandra Introduction & Key Features by Philipp Potisk8
Tunable Consistency• Choose between strong and eventualconsistency• Adjustable for read- and writeoperations separately• Conflicts are solved during reads, asfocus lies on write-performanceTUNABLEAvailableConsistencyUse case dependentlevel of consistency13/01/2014Cassandra Introduction & Key Features by Philipp Potisk9
When do we have strong consistency?• Simple Formula:jsmith(nodes_written + nodes_read) >replication_factorjsmitht1t2NW: 2NR: 2RF: 3t1t2jsmitht1• Ensures that a read alwaysreflects the most recent write• If not: Weak consistency Eventually consistentjsmith13/01/2014Cassandra Introduction & Key Features by Philipp Potiskt210
Column-oriented Key-Value StoreRow Key1ColumnKey1ColumnValue1ColumnKey2ColumnValue2ColumnKey3ColumnValue3………• Data is stored in sparsemultidimensional hash tables• A row can have multiple columns –not necessarily the same amount ofcolumns for each row• Each row has a unique key, whichalso determines partitioning• No relations!Stored sorted by row key *Stored sorted by column key/valueMap<RowKey, SortedMap<ColumnKey, ColumnValue>>* Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly13/01/2014Cassandra Introduction & Key Features by Philipp Potisk11
CQL – An SQL-like query interface• “CQL 3 is the default and primary interface into the Cassandra DBMS” *• Familiar SQL-like syntax that maps to Cassandras storage engine andsimplifies data modellingCRETE TABLE songs (id uuid PRIMARY KEY,title text,album text,artist text,data blob,tags set<text>);INSERT INTO songs(id, title, artist,album, tags)VALUES('a3e64f8f...','La Grange','ZZ Top','Tres Hombres'‚{'cool', 'hot'});SELECT *FROM songsWHERE id = 'a3e64f8f...';“SQL-like” but NOTrelational SQL* http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf13/01/2014Cassandra Introduction & Key Features by Philipp Potisk12
High Performance• Optimized from the ground upfor high throughput• All disk writes are sequential,append only operations• No reading before writing• Cassandra`s threading-concept isoptimized for running onmultiprocessor/ multicoremachines13/01/2014Optimized for writing,but fast reads arepossible as wellCassandra Introduction & Key Features by Philipp Potisk13
Benchmark from 2011 (Cassandra 0.7.4)*opsCassandra showedoutstanding throughput in“INSERT-only” with 20,000opsInsert: Enter 50 million 1K-sized recordsRead: Search key for a one hour period + optional updateHardware: Nehalem 6 Core x 2 CPU, 16GB Memory13/01/2014Cassandra Introduction & Key Features by Philipp Potisk*NoSql Benchmarking by Curbithttp://www.cubrid.org/blog/dev-platform/nosqlbenchmarking/14
Benchmark from 2013 (Cassandra 1.1.6)** Benchmarking Top NoSQL Databases by End Point Corporation,http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdfYahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB13/01/2014Cassandra Introduction & Key Features by Philipp Potisk15
When do we need these features?Lots ofWrites,Statistics, andAnalysisGeographicalDistributionLargeDeployments13/01/2014EvolvingApplicationsCassandra Introduction & Key Features by Philipp Potisk16
Who is using Cassandra?13/01/2014Cassandra Introduction & Key Features by Philipp Potisk17
ebay Data Infrastructure*••••••Thousands of nodes> 2K sharded logical host> 16K tables> 27K indexes> 140 billion SQLs/day> 5 PB provisioned• 10+ clusters• 100+ nodes• > 250 TB provisioned(local HDD + shared SSD)• > 9 billion writes/day• > 5 billion reads/day• Hundreds of nodes• Persistent & in-memory• > 40 billion SQLs/dayNot replacing RDMBS butcomplementing!Hundreds of nodes> 50 TB> 2 billion ops/day• Thousands of nodes• The world largest clusterwith 2K+ nodes*by Jay Patel, Cassandra Summit June 2013 San Francisco13/01/2014Cassandra Introduction & Key Features by Philipp Potisk18
Cassandra Use Case at EbayApplication/Use Case• Time-series data and real-time insights• Fraud detection & prevention• Quality Click Pricing for affiliates• Order & Shipment Tracking•…• Server metrics collection• Taste graph-based next-gen recommendationsystem• Social Signals on eBay Product & Item pages13/01/2014Why Cassandra?• Multi-Datacenter (active-active)• No SPOF• Easy to scale• Write performance• Distributed CountersCassandra Introduction & Key Features by Philipp Potisk19
Cassandra/Hadoop Deployment13/01/2014Cassandra Introduction & Key Features by Philipp Potisk20
Summary• History• Key features of Cassandra•••••••Distributed and DecentralizedElastic ScalabilityHigh Availability and Fault ToleranceTunable ConsistencyColumn-oriented key-value storeCQL interfaceHigh Performance• Ebay Use Case13/01/2014Apache project: http://cassandra.apache.orgCommunity portal: http://planetcassandra.orgDocumentation: http://www.datastax.com/docsCassandra Introduction & Key Features by Philipp Potisk21

Recommended

PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
KEY
Introduction to memcached
PDF
Introduction to Redis
PPTX
Introduction to Storm
PDF
Intro to HBase
PPTX
Introduction to Apache ZooKeeper
PDF
Etsy Activity Feeds Architecture
PDF
Introduction to MongoDB
PPTX
Apache Spark Architecture
PDF
Parquet performance tuning: the missing guide
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Scalability, Availability & Stability Patterns
PPTX
Stability Patterns for Microservices
PDF
Hive Bucketing in Apache Spark with Tejas Patil
PDF
Iceberg: a fast table format for S3
PDF
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
PPTX
Introduction to Redis
PPTX
Processing Large Data with Apache Spark -- HasGeek
PPTX
Apache HBase™
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
Introduction to Spark Internals
PDF
Redis cluster
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
PDF
MongoDB vs. Postgres Benchmarks
byEDB
 
PDF
The Apache Spark File Format Ecosystem
PPTX
Apache Cassandra Developer Training Slide Deck
PDF
Cassandra Tutorial

More Related Content

PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
KEY
Introduction to memcached
PDF
Introduction to Redis
PPTX
Introduction to Storm
PDF
Intro to HBase
PPTX
Introduction to Apache ZooKeeper
PDF
Etsy Activity Feeds Architecture
From cache to in-memory data grid. Introduction to Hazelcast.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Introduction to memcached
Introduction to Redis
Introduction to Storm
Intro to HBase
Introduction to Apache ZooKeeper
Etsy Activity Feeds Architecture

What's hot

PDF
Introduction to MongoDB
PPTX
Apache Spark Architecture
PDF
Parquet performance tuning: the missing guide
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Scalability, Availability & Stability Patterns
PPTX
Stability Patterns for Microservices
PDF
Hive Bucketing in Apache Spark with Tejas Patil
PDF
Iceberg: a fast table format for S3
PDF
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
PPTX
Introduction to Redis
PPTX
Processing Large Data with Apache Spark -- HasGeek
PPTX
Apache HBase™
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
Introduction to Spark Internals
PDF
Redis cluster
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
PDF
MongoDB vs. Postgres Benchmarks
byEDB
 
PDF
The Apache Spark File Format Ecosystem
Introduction to MongoDB
Apache Spark Architecture
Parquet performance tuning: the missing guide
The Parquet Format and Performance Optimization Opportunities
Scalability, Availability & Stability Patterns
Stability Patterns for Microservices
Hive Bucketing in Apache Spark with Tejas Patil
Iceberg: a fast table format for S3
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Introduction to Redis
Processing Large Data with Apache Spark -- HasGeek
Apache HBase™
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Introduction to Spark Internals
Redis cluster
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
MongoDB vs. Postgres Benchmarks
byEDB
 
The Apache Spark File Format Ecosystem

Viewers also liked

PPTX
Apache Cassandra Developer Training Slide Deck
PDF
Cassandra Tutorial
PDF
Cassandra By Example: Data Modelling with CQL3
PDF
Cassandra NoSQL Tutorial
PDF
facebook architecture for 600M users
PDF
NoSQL Essentials: Cassandra
PPTX
An Overview of Apache Cassandra
PDF
Cassandra Explained
Apache Cassandra Developer Training Slide Deck
Cassandra Tutorial
Cassandra By Example: Data Modelling with CQL3
Cassandra NoSQL Tutorial
facebook architecture for 600M users
NoSQL Essentials: Cassandra
An Overview of Apache Cassandra
Cassandra Explained

Similar to Cassandra Introduction & Features

ODP
Intro to cassandra
PPT
Introduction to cassandra
PDF
An Introduction to Apache Cassandra
PPTX
Presentation of Apache Cassandra
PDF
Apache Cassandra overview
PPTX
Cassandra tutorial
PPTX
Learn Cassandra at edureka!
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
PPTX
Appache Cassandra
PPTX
Cassandra
PPTX
BigData Developers MeetUp
PPTX
Cassandra for mission critical data
PDF
Cassandra Workshop - Cassandra from scratch in one day
PPTX
Learning Cassandra NoSQL
PPTX
cassandra_presentation_final
PDF
Cassandra Database
PDF
cassandra
PDF
Cassandra and Spark
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Intro to cassandra
Introduction to cassandra
An Introduction to Apache Cassandra
Presentation of Apache Cassandra
Apache Cassandra overview
Cassandra tutorial
Learn Cassandra at edureka!
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Appache Cassandra
Cassandra
BigData Developers MeetUp
Cassandra for mission critical data
Cassandra Workshop - Cassandra from scratch in one day
Learning Cassandra NoSQL
cassandra_presentation_final
Cassandra Database
cassandra
Cassandra and Spark
Cassandra's Sweet Spot - an introduction to Apache Cassandra

More from DataStax Academy

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra

Recently uploaded

PDF
Automation in Action: Accelerating Your FME Flow Deployment with CI/CD
PDF
AGILE - Make It A Success in the long term
PPTX
electric vehicle technology charging .pptx
PPTX
How to Make Network Refresh Projects a Success in the AI Era
PPTX
"Don’t clone infrastructure — isolate data: ephemeral PR environments in a sh...
 
PDF
Dr. PANKAJ DHUSSA NASA 2025 INTERNATIONAL OBSERVE THE MOON NIGHT
PDF
Technology, Innovation & Green Transition: Korea-Bangladesh Collaboration Fra...
PDF
Learning Programming with ChatGPT [Free Meetup]
PDF
Dr. PANKAJ DHUSSA The near side of the Moon By NASA's Lunar Reconnaissance Or...
PDF
Bringing Ideas to Life: Visualization and Virtual Servers Explained with Powe...
PPTX
Advanced Features - Reporting, Autopilot, and Tester Benefits [3/3]
PDF
Certified Kubernetes Security Specialist (CKS): Unit 4
PDF
63SATS CYBERTECH ORDER... 23.09.2025.pdf
PDF
Remote Desktop Protocol Attacks Explained
PPTX
"How to run 200+ PHP services in production without losing your mind?", Yurii...
 
PDF
Epochalypse Now: The Coming Collapse of Time Integrity (BruCON 2025, Umbelino...
 
PPTX
How to Make Network Refresh Projects a Success in the AI Era
PDF
Digital Transformation Strategies (DTS) .pdf
PPTX
COMPUTER NETWORKS AND WEB TECHNOLOGIES Unit 2.pptx
PDF
TrustArc Webinar - Mastering Data Inventory: The Foundation of Strong Privacy...
Automation in Action: Accelerating Your FME Flow Deployment with CI/CD
AGILE - Make It A Success in the long term
electric vehicle technology charging .pptx
How to Make Network Refresh Projects a Success in the AI Era
"Don’t clone infrastructure — isolate data: ephemeral PR environments in a sh...
 
Dr. PANKAJ DHUSSA NASA 2025 INTERNATIONAL OBSERVE THE MOON NIGHT
Technology, Innovation & Green Transition: Korea-Bangladesh Collaboration Fra...
Learning Programming with ChatGPT [Free Meetup]
Dr. PANKAJ DHUSSA The near side of the Moon By NASA's Lunar Reconnaissance Or...
Bringing Ideas to Life: Visualization and Virtual Servers Explained with Powe...
Advanced Features - Reporting, Autopilot, and Tester Benefits [3/3]
Certified Kubernetes Security Specialist (CKS): Unit 4
63SATS CYBERTECH ORDER... 23.09.2025.pdf
Remote Desktop Protocol Attacks Explained
"How to run 200+ PHP services in production without losing your mind?", Yurii...
 
Epochalypse Now: The Coming Collapse of Time Integrity (BruCON 2025, Umbelino...
 
How to Make Network Refresh Projects a Success in the AI Era
Digital Transformation Strategies (DTS) .pdf
COMPUTER NETWORKS AND WEB TECHNOLOGIES Unit 2.pptx
TrustArc Webinar - Mastering Data Inventory: The Foundation of Strong Privacy...

Cassandra Introduction & Features

  • 1.
    CassandraIntroduction & KeyFeaturesMeetup Vienna Cassandra Users13th of January 2014philipp.potisk@geroba.com
  • 2.
    DefinitionApache Cassandra isan open source, distributed,decentralized, elastically scalable, highly available,fault-tolerant, tuneably consistent, column-orienteddatabase that bases its distribution design on Amazon’sDynamo and its data model on Google’s Bigtable.Created at Facebook, it is now used at some of the mostpopular sites on the Web [The Definitive Guide, EbenHewitt, 2010]13/01/2014Cassandra Introduction & Key Features by Philipp Potisk2
  • 3.
    HistoryDynamo, 2007Bigtable, 2006OpenSource,200813/01/2014Cassandra Introduction & Key Features by Philipp Potisk3
  • 4.
    Key FeaturesDistributedandDecentralizedHigh PerformanceCQL– A SQLlike queryinterfaceElasticScalabilityCassandraColumnorientedKey-Valuestore13/01/2014HighAvailabilityand FaultToleranceTuneableConsistencyCassandra Introduction & Key Features by Philipp Potisk4
  • 5.
    Distributed and DecentralizedDatacenter1• Distributed: Capable of runningon multiple machines• Decentralized: No single point offailureNo master-slave issues due topeer-to-peer architecture(protocol "gossip")Single Cassandra cluster may runacross geographically disperseddata centers13/01/2014Datacenter 2176253412811910Read- and writerequests to any nodeCassandra Introduction & Key Features by Philipp Potisk5
  • 6.
    Elastic Scalability181• Cassandrascales horizontally,adding more machines that haveall or some of the data on• Adding of nodes increaseperformance throughput linearly• De-/ and increasing thenodecount happen seamlessly4 Performance2throughput = N32Performancethroughput = N x 27465Linearly scales toterabytes andpetabytes of data13/01/2014Cassandra Introduction & Key Features by Philipp Potisk36
  • 7.
    Scaling Benchmark ByNetflix*48, 96, 144 and 288instances, with 10, 20,30 and 60 clientsrespectively. Each clientgenerated ~20.000w/shaving 400byte in sizeCassandra scales linearly farbeyond our current capacityrequirements, and veryrapid deploymentautomation makes it easy tomanage. In particular,benchmarking in the cloudis fast, cheap and scalable,*http://techblog.netflix.com/2011/11/benchmarking-cassandrascalability-on.html13/01/2014Cassandra Introduction & Key Features by Philipp Potisk7
  • 8.
    High Availability andFault Tolerance• High Availability?Multiple networked computersoperating in a clusterFacility for recognizing nodefailuresForward failing over requests toanother part of the system162534• Cassandra has High AvailabilityNo single point of failuredue to the peer-to-peerarchitecture13/01/2014Cassandra Introduction & Key Features by Philipp Potisk8
  • 9.
    Tunable Consistency• Choosebetween strong and eventualconsistency• Adjustable for read- and writeoperations separately• Conflicts are solved during reads, asfocus lies on write-performanceTUNABLEAvailableConsistencyUse case dependentlevel of consistency13/01/2014Cassandra Introduction & Key Features by Philipp Potisk9
  • 10.
    When do wehave strong consistency?• Simple Formula:jsmith(nodes_written + nodes_read) >replication_factorjsmitht1t2NW: 2NR: 2RF: 3t1t2jsmitht1• Ensures that a read alwaysreflects the most recent write• If not: Weak consistency Eventually consistentjsmith13/01/2014Cassandra Introduction & Key Features by Philipp Potiskt210
  • 11.
    Column-oriented Key-Value StoreRowKey1ColumnKey1ColumnValue1ColumnKey2ColumnValue2ColumnKey3ColumnValue3………• Data is stored in sparsemultidimensional hash tables• A row can have multiple columns –not necessarily the same amount ofcolumns for each row• Each row has a unique key, whichalso determines partitioning• No relations!Stored sorted by row key *Stored sorted by column key/valueMap<RowKey, SortedMap<ColumnKey, ColumnValue>>* Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly13/01/2014Cassandra Introduction & Key Features by Philipp Potisk11
  • 12.
    CQL – AnSQL-like query interface• “CQL 3 is the default and primary interface into the Cassandra DBMS” *• Familiar SQL-like syntax that maps to Cassandras storage engine andsimplifies data modellingCRETE TABLE songs (id uuid PRIMARY KEY,title text,album text,artist text,data blob,tags set<text>);INSERT INTO songs(id, title, artist,album, tags)VALUES('a3e64f8f...','La Grange','ZZ Top','Tres Hombres'‚{'cool', 'hot'});SELECT *FROM songsWHERE id = 'a3e64f8f...';“SQL-like” but NOTrelational SQL* http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf13/01/2014Cassandra Introduction & Key Features by Philipp Potisk12
  • 13.
    High Performance• Optimizedfrom the ground upfor high throughput• All disk writes are sequential,append only operations• No reading before writing• Cassandra`s threading-concept isoptimized for running onmultiprocessor/ multicoremachines13/01/2014Optimized for writing,but fast reads arepossible as wellCassandra Introduction & Key Features by Philipp Potisk13
  • 14.
    Benchmark from 2011(Cassandra 0.7.4)*opsCassandra showedoutstanding throughput in“INSERT-only” with 20,000opsInsert: Enter 50 million 1K-sized recordsRead: Search key for a one hour period + optional updateHardware: Nehalem 6 Core x 2 CPU, 16GB Memory13/01/2014Cassandra Introduction & Key Features by Philipp Potisk*NoSql Benchmarking by Curbithttp://www.cubrid.org/blog/dev-platform/nosqlbenchmarking/14
  • 15.
    Benchmark from 2013(Cassandra 1.1.6)** Benchmarking Top NoSQL Databases by End Point Corporation,http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdfYahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB13/01/2014Cassandra Introduction & Key Features by Philipp Potisk15
  • 16.
    When do weneed these features?Lots ofWrites,Statistics, andAnalysisGeographicalDistributionLargeDeployments13/01/2014EvolvingApplicationsCassandra Introduction & Key Features by Philipp Potisk16
  • 17.
    Who is usingCassandra?13/01/2014Cassandra Introduction & Key Features by Philipp Potisk17
  • 18.
    ebay Data Infrastructure*••••••Thousandsof nodes> 2K sharded logical host> 16K tables> 27K indexes> 140 billion SQLs/day> 5 PB provisioned• 10+ clusters• 100+ nodes• > 250 TB provisioned(local HDD + shared SSD)• > 9 billion writes/day• > 5 billion reads/day• Hundreds of nodes• Persistent & in-memory• > 40 billion SQLs/dayNot replacing RDMBS butcomplementing!Hundreds of nodes> 50 TB> 2 billion ops/day• Thousands of nodes• The world largest clusterwith 2K+ nodes*by Jay Patel, Cassandra Summit June 2013 San Francisco13/01/2014Cassandra Introduction & Key Features by Philipp Potisk18
  • 19.
    Cassandra Use Caseat EbayApplication/Use Case• Time-series data and real-time insights• Fraud detection & prevention• Quality Click Pricing for affiliates• Order & Shipment Tracking•…• Server metrics collection• Taste graph-based next-gen recommendationsystem• Social Signals on eBay Product & Item pages13/01/2014Why Cassandra?• Multi-Datacenter (active-active)• No SPOF• Easy to scale• Write performance• Distributed CountersCassandra Introduction & Key Features by Philipp Potisk19
  • 20.
  • 21.
    Summary• History• Keyfeatures of Cassandra•••••••Distributed and DecentralizedElastic ScalabilityHigh Availability and Fault ToleranceTunable ConsistencyColumn-oriented key-value storeCQL interfaceHigh Performance• Ebay Use Case13/01/2014Apache project: http://cassandra.apache.orgCommunity portal: http://planetcassandra.orgDocumentation: http://www.datastax.com/docsCassandra Introduction & Key Features by Philipp Potisk21

[8]ページ先頭

©2009-2025 Movatter.jp