Movatterモバイル変換


[0]ホーム

URL:


49,126 views

From cache to in-memory data grid. Introduction to Hazelcast.

The document provides an in-depth presentation on caching, its evolution to in-memory data grids, and a detailed introduction to Hazelcast as an in-memory distributed cache solution. It covers the basics of caching, common cache attributes, various cache patterns, and provides insights into the configuration and usage of Hazelcast with practical demonstrations. Additionally, it compares Hazelcast with other solutions like Infinispan and offers best practices and personal recommendations for using Hazelcast effectively.

Embed presentation

From cache to in-memory data grid. Introduction to Hazelcast. By Taras Matyashovsky
Introduction
About me • Software engineer/TL • Worked for outsource companies, product companies and tried myself in startups/ freelancing • 7+ years production Java experience • Fan of Agile methodologies, CSM
What? • This presentation: • covers basics of caching and popular cache types • explains evolution from simple cache to distributed, and from distributed to IMDG • not describes usage of NoSQL solutions for caching • is not intended for products comparison or for promotion of Hazelcast as the best solution
Why? • to expand horizons regarding modern distributed architectures and solutions • to share experience from my current project where Infinispan was replaced with Hazelcast as in-memory distributed cache solution
Agenda 1st part: • Why software caches? • Common cache attributes • Cache access patterns • Cache types • Distributed cache vs. IMDG
Agenda 2nd part: • Hazelcast in a nutshell • Hazelcast configuration • Live demo sessions • in-memory distributed cache • write-through cache with Postgres as storage • search in distributed cache • parallel processing using executor service and entry processor • Infinispan vs. Hazelcast • Best practices and personal recommendations
Caching Basics
Why Software Caching? • application performance: • many concurrent users • time and costs overhead to access application’s data stored in RDBMS or file system • database-access bottlenecks caused by too many simultaneous requests
So Software Caches • improve response times by reducing data access latency • offload persistent storages by reducing number of trips to data sources • avoid the cost of repeatedly creating objects • share objects between threads • only work for IO-bound applications
So Software Caches are essential for modern high-loaded applications
But • memory size • is limited • can become unacceptably huge • synchronization complexity • consistency between the cached data state and data source’s original data • durability • correct cache invalidation • scalability
Common Cache Attributes • maximum size, e.g. quantity of entries • cache algorithm used for invalidation/eviction, e.g.: • least recently used (LRU) • least frequently used (LFU) • FIFO • eviction percentage • expiration, e.g.: • time-to-live (TTL) • absolute/relative time-based expiration
Cache Access Patterns • cache aside • read-through • refresh-ahead • write-through • write-behind
Cache Aside Pattern • application is responsible for reading and writing from the storage and the cache doesn't interact with the storage at all • the cache is “kept aside” as a faster and more scalable in-memory data store Client Cache Storage
Read-Through/Write-Through • the application treats cache as the main data store and reads/writes data from/to it • the cache is responsible for reading and writing this data to the database Client Cache Storage
Write-Behind Pattern • modified cache entries are asynchronously written to the storage after a configurable delay Client Cache Storage
Refresh-Ahead Pattern • automatically and asynchronously reload (refresh) any recently accessed cache entry from the cache loader prior to its expiration Client Cache Storage
Cache Strategy Selection RT/WT vs. cache-aside: • RT/WT simplifies application code • cache-aside may have blocking behavior • cache-aside may be preferable when there are multiple cache updates triggered to the same storage from different cache servers
Cache Strategy Selection Write-through vs. write-behind: • write-behind caching may deliver considerably higher throughput and reduced latency compared to write-through caching • implication of write-behind caching is that database updates occur outside of the cache transaction • write-behind transaction can conflict with an external update
Cache Types
Cache Types • local cache • replicated cache • distributed cache • remote cache • near cache
Local Cache a cache that is local to (completely contained within) a particular cluster node
Local Cache Pros: • simplicity • performance • no serialization/deserialization overhead Cons: • not a fault-tolerant • scalability
Local Cache Solutions: • EhCache • Google Guava • Infinispan local cache mode
Replicated Cache a cache that replicates its data to all cluster nodes
Get in Replicated Cache Each cluster node (JVM) accesses the data from its own memory, i.e. local read:
Put in Replicated Cache Pushing the new version of the data to all other cluster nodes:
Replicated Cache Pros: • best read performance • fault–tolerant • linear performance scalability for reads Cons: • poor write performance • additional network load • poor and limited scalability for writes • memory consumption
Replicated Cache Solutions: • open-source: • Infinispan • commercial: • Oracle Coherence • EhCache + Terracota
Distributed Cache a cache that partitions its data among all cluster nodes
Get in Distributed Cache Access often must go over the network to another cluster node:
Put in Distributed Cache Resolving known limitation of replicated cache:
Put in Distributed Cache • the data is being sent to a primary cluster node and a backup cluster node if backup count is 1 • modifications to the cache are not considered complete until all backups have acknowledged receipt of the modification, i.e. slight performance penalty • such overhead guarantees that data consistency is maintained and no data is lost
Failover in Distributed Cache Failover involves promoting backup data to be primary storage:
Local Storage in Distributed Cache Certain cluster nodes can be configured to store data, and others to be configured to not store data:
Distributed Cache Pros: • linear performance scalability for reads and writes • fault-tolerant Cons: • increased latency of reads (due to network round-trip and serialization/deserialization expenses)
Distributed Cache Summary Distributed in-memory key/value stores supports a simple set of “put” and “get” operations and optionally read-through and write-through behavior for writing and reading values to and from underlying disk-based storage such as an RDBMS
Distributed Cache Summary Depending on the product additional features like: • ACID transactions • eviction policies • replication vs. partitioning • active backups also became available as the products matured
Distributed Cache Solutions: • open-source: • Infinispan • Hazelcast • NoSQL storages, e.g. Redis, Cassandra, MongoDB, etc. • commercial: • Oracle Coherence • Terracota
Remote Cache a cache that is located remotely and should be accessed by a client(s)
Remote Cache Majority of existing distributed/replicated caches solutions support 2 modes: • embedded mode • when cache instance is started within the same JVM as your application • client-server mode • when remote cache instance is started and clients connect to it using a variety of different protocols
Remote Cache Solutions: • Infinispan remote cache mode • Hazelcast client-server mode • Memcached
Near Cache a hybrid cache; it typically fronts a distributed cache or a remote cache with a local cache
Get in Near Cache When an object is fetched from remote node, it is put to local cache, so subsequent requests are handled by local node retrieving from local cache:
Near Cache Pros: • it is best used for read only data Cons: • increases memory usage since the near cache items need to be stored in the memory of the member • reduces consistency
In-memory Data Grid
In-memory Data Grid (IMDG)
In-memory Data Grid In-memory distributed cache plus: • ability to support co-location of computations with data in a distributed context and move computation to data • distributed MPP processing based on standard SQL and/or Map/Reduce, that allows to effectively compute over data stored in-memory across the cluster
IMDC vs. IMDG • in-memory distributed caches were developed in response to a growing need for data high-availability • in-memory data grids were developed to respond to the growing complexities of data processing
IMDG in a nutshell Adding distributed SQL and/or MapReduce type processing required a complete re-thinking of distributed caches, as focus has shifted from pure data management to hybrid data and compute management
In-memory Data Grid Solutions
Hazelcast
Hazelcast The leading open source in-memory data grid free alternative to proprietary solutions, such as Oracle Coherence, VMWare Pivotal Gemfire and Software AG Terracotta
Hazelcast Use-Cases • scale your application • share data across cluster • partition your data • balance the load • send/receive messages • process in parallel on many JVMs, i.e. MPP
Hazelcast Features • dynamic clustering, backup, discovery, fail-over • distributed map, queue, set, list, lock, semaphore, topic, executor service, etc. • transaction support • map/reduce API • Java client for accessing the cluster remotely
Hazelcast Configuration • programmatic configuration • XML configuration • Spring configuration Nuance: It is very important that the configuration on all members in the cluster is exactly the same, it doesn’t matter if you use the XML based configuration or the programmatic configuration.
Sample Application
Live Demo “Configuration”
Sample Application Technologies: • Spring Boot 1.0.1 • Hazelcast 3.2 • Postgres 9.3 Application: • RESTful web service to get/put data from/to cache • RESTful web service to execute tasks in the cluster • one Instance of Hazelcast per application * Some samples are not optimal and created just to demonstrate usage of existing Hazelcast API
Global Hazelcast Configuration Defined global Hazelcast configuration in separate config in common module. It contains skeleton for future Hazelcast instance as well as global configuration settings: • instance configuration skeleton • common properties • group name and password • TCP based network configuration • join config • multicast and TCP/IP config • default distributed map configuration skeleton
Hazelcast Instance Each module that uses Hazelcast for distributed cache should have its own separate Hazelcast instance. The “Hazelcast Instance” is a factory for creating individual cache objects. Each cache has a name and potentially distinct configuration settings (expiration, eviction, replication, and more). Multiple instances can live within the same JVM.
Hazelcast Cluster Group Groups are used in order to have multiple isolated clusters on the same network instead of a single cluster. JVM can host multiple Hazelcast instances (nodes). Each node can only participate in one group and it only joins to its own group, does not mess with others. In order to achieve this group name and group password configuration properties are used.
Hazelcast Network Config In our environment multicast mechanism for joining the cluster is not supported, so only TCP/IP-cluster approach will be used. In this case there should be a one or more well known members to connect to.
Live Demo “Map Store”
Hazelcast Map Store • useful for reading and writing map entries from and to an external data source • one instance per map per node will be created • word of caution: the map store should NOT call distributed map operations, otherwise you might run into deadlocks
Hazelcast Map Store • map pre-population via loadAllKeys method that returns the set of all “hot” keys that need to be loaded for the partitions owned by the member • write through vs. write behind using “write-delay- seconds” configuration (0 or bigger) • MapLoaderLifecycleSupport to be notified of lifecycle events, i.e. init and destroy
Live Demo “Executor Service”
Hazelcast Executor Service • extends the java.util.concurrent.ExecutorService, but is designed to be used in a distributed environment • scaling up via threads pool size • scaling out is automatic via addition of new Hazelcast instances
Hazelcast Executor Service • provides different ways to route tasks: • any member • specific member • the member hosting a specific key • all or subset of members • supports execution callback
Hazelcast Executor Service Drawbacks: • work-queue has no high availability: • each member will create local ThreadPoolExecutors with ordinary work-queues that do the real work but not backed up by Hazelcast • work-queue is not partitioned: • it could be that one member has a lot of unprocessed work, and another is idle • no customizable load balancing
Hazelcast Features More useful features: • entry listener • transactions support, e.g. local, distributed • map reduce API out-of-the-box • custom serialization/deserialization mechanism • distributed topic • clients
Hazelcast Missing Features Missing useful features: • update configuration in running cluster • load balancing for executor service
Infinispan vs. Hazelcast
Infinispan vs. Hazelcast Infinispan Hazelcast Pros • backed by relatively large company for use in largely distributed environments (JBoss) • been in active use for several years • well-written documentation • a lot of examples of different configurations as well as solutions to common problems • easy setup • more performant than Infinispan • simple node/cluster discovery mechanism • relies on only 1 jar to be included on classpath • brief documentation completed with simple code samples
Infinispan vs. Hazelcast Infinispan Hazelcast Cons • relies on JGroups that proven to be buggy especially under high load • configuration can be overly complex • ~9 jars are needed in order to get Infinispan up and running • code appears very complex and hard to debug/trace • backed by a startup based in Palo Alto and Turkey, just received Series A 2.5 M funding from Bain Capital Ventures • customization points are fairly limited • some exceptions can be difficult to diagnose due to poorly written exception messages • still quite buggy
Hazelcast Summary
Best Practices • each specific Hazelcast instance should have its unique instance name • each specific Hazelcast instance should have its unique group name and password • each specific Hazelcast instance should start on separate port according to predefined ranges
Personal Recommendations • use XML configuration in production, but don’t use spring:hz schema. Our Spring based “lego bricks” approach for building resulting Hazelcast instance is quite decent. • don’t use Hazelcast for local caches as it was never designed with that purpose and always performs serialization/deserialization • don’t use library specific classes, use common collections, e.g. ConcurrentMap, and you will be able to replace underlying cache solution easily
Hazelcast Drawbacks • still quite buggy • poor documentation for more complex cases • enterprise edition costs money, but includes: • elastic memory • JAAS security • .NET and C++ clients
Q/A?
Thank you! by Taras Matyashovsky
References • http://docs.oracle.com/cd/E18686_01/coh.37/e18677/cache_intro.htm • http://coherence.oracle.com/display/COH31UG/Read-Through,+Write- Through,+Refresh-Ahead+and+Write-Behind+Caching • http://blog.tekmindsolutions.com/oracle-coherence-diffrence-between-replicated- cache-vs-partitioneddistributed-cache/ • http://www.slideshare.net/MaxAlexejev/from-distributed-caches-to-inmemory-data- grids • http://www.slideshare.net/jaxlondon2012/clustering-your-application-with-hazelcast • http://www.gridgain.com/blog/fyi/cache-data-grid-database/ • http://gridgaintech.wordpress.com/2013/10/19/distributed-caching-is-dead-long- live/ • http://www.hazelcast.com/resources/the-book-of-hazelcast/ • https://labs.consol.de/java-caches/part-3-3-peer-to-peer-with-hazelcast/ • http://hazelcast.com/resources/thinking-distributed-the-hazelcast-way/ • https://github.com/tmatyashovsky/hazelcast-samples/

Recommended

PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
PPTX
Introduction to Storm
KEY
Introduction to memcached
PDF
Thousands of Threads and Blocking I/O
PPTX
Stability Patterns for Microservices
PDF
Introduction to Redis
PDF
Etsy Activity Feeds Architecture
PDF
Cassandra Introduction & Features
PPTX
Introduction to Apache ZooKeeper
PDF
Scalability, Availability & Stability Patterns
PPTX
Apache Spark Architecture
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PPTX
DNS Security Presentation ISSA
PDF
RocksDB Performance and Reliability Practices
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
Top 5 Mistakes When Writing Spark Applications
PPTX
Optimizing Apache Spark SQL Joins
PDF
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
PDF
Apache ZooKeeper
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
ODP
Presto
PDF
Intro to HBase
PPTX
Real-time Analytics with Trino and Apache Pinot
PDF
Introduction to Stream Processing
PPTX
Processing Large Data with Apache Spark -- HasGeek
PDF
Distributed applications using Hazelcast
PDF
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES

More Related Content

PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
PPTX
Introduction to Storm
KEY
Introduction to memcached
PDF
Thousands of Threads and Blocking I/O
PPTX
Stability Patterns for Microservices
PDF
Introduction to Redis
PDF
Etsy Activity Feeds Architecture
PDF
Cassandra Introduction & Features
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Introduction to Storm
Introduction to memcached
Thousands of Threads and Blocking I/O
Stability Patterns for Microservices
Introduction to Redis
Etsy Activity Feeds Architecture
Cassandra Introduction & Features

What's hot

PPTX
Introduction to Apache ZooKeeper
PDF
Scalability, Availability & Stability Patterns
PPTX
Apache Spark Architecture
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PPTX
DNS Security Presentation ISSA
PDF
RocksDB Performance and Reliability Practices
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
Top 5 Mistakes When Writing Spark Applications
PPTX
Optimizing Apache Spark SQL Joins
PDF
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
PDF
Apache ZooKeeper
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
ODP
Presto
PDF
Intro to HBase
PPTX
Real-time Analytics with Trino and Apache Pinot
PDF
Introduction to Stream Processing
PPTX
Processing Large Data with Apache Spark -- HasGeek
Introduction to Apache ZooKeeper
Scalability, Availability & Stability Patterns
Apache Spark Architecture
Apache Spark in Depth: Core Concepts, Architecture & Internals
DNS Security Presentation ISSA
RocksDB Performance and Reliability Practices
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
HBase and HDFS: Understanding FileSystem Usage in HBase
Top 5 Mistakes When Writing Spark Applications
Optimizing Apache Spark SQL Joins
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Apache ZooKeeper
Building Reliable Lakehouses with Apache Flink and Delta Lake
Presto
Intro to HBase
Real-time Analytics with Trino and Apache Pinot
Introduction to Stream Processing
Processing Large Data with Apache Spark -- HasGeek

Similar to From cache to in-memory data grid. Introduction to Hazelcast.

PDF
Distributed applications using Hazelcast
PDF
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
PDF
Caching principles-solutions
 
PPTX
Jug Lugano - Scale over the limits
PPTX
Cache Rules Everything Around Me - Momentum - October 2022.pptx
PDF
Overview of the ehcache
PPTX
Cache Rules Everything Around Me - DevIntersection - December 2022
PPTX
CREAM - That Conference Austin - January 2024.pptx
PDF
Don’t give up, You can... Cache!
PPTX
Selecting the right cache framework
PDF
In-memory No SQL- GIDS2014
PDF
Caching in Distributed Environment
PDF
JCON World 2023 - Cache, but Cache Wisely.pdf
PDF
Scaling Your Cache
PPTX
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
PPTX
Cache-Aside Cloud Design Pattern
PPT
Caching for J2ee Enterprise Applications
PDF
Data has a better idea the in-memory data grid
PDF
Infinispan from POC to Production
PDF
Infinispan from POC to Production
Distributed applications using Hazelcast
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Caching principles-solutions
 
Jug Lugano - Scale over the limits
Cache Rules Everything Around Me - Momentum - October 2022.pptx
Overview of the ehcache
Cache Rules Everything Around Me - DevIntersection - December 2022
CREAM - That Conference Austin - January 2024.pptx
Don’t give up, You can... Cache!
Selecting the right cache framework
In-memory No SQL- GIDS2014
Caching in Distributed Environment
JCON World 2023 - Cache, but Cache Wisely.pdf
Scaling Your Cache
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
Cache-Aside Cloud Design Pattern
Caching for J2ee Enterprise Applications
Data has a better idea the in-memory data grid
Infinispan from POC to Production
Infinispan from POC to Production

More from Taras Matyashovsky

PPTX
Morning 3 anniversary
PPTX
Distinguish Pop from Heavy Metal using Apache Spark MLlib
PPTX
Introduction to ML with Apache Spark MLlib
PPTX
Morning at Lohika 2nd anniversary
PPTX
Confession of an Engineer
PPTX
Influence. The Psychology of Persuasion (in IT)
PPTX
JEEConf 2015 - Introduction to real-time big data with Apache Spark
PPTX
Morning at Lohika 1st anniversary
PPTX
Introduction to real time big data with Apache Spark
PPTX
New life inside monolithic application
PPTX
Morning at Lohika
Morning 3 anniversary
Distinguish Pop from Heavy Metal using Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
Morning at Lohika 2nd anniversary
Confession of an Engineer
Influence. The Psychology of Persuasion (in IT)
JEEConf 2015 - Introduction to real-time big data with Apache Spark
Morning at Lohika 1st anniversary
Introduction to real time big data with Apache Spark
New life inside monolithic application
Morning at Lohika

Recently uploaded

PPTX
Industrial Plant Safety – Comprehensive Guide for Workplace Safety & Risk Pre...
PPTX
Track & Monitor Preventive Maintenance — Best Practices with MaintWiz CMMS
PPTX
ISO 14224 Compliance & CMMS Software — A Comprehensive Guide for Reliable Mai...
PPTX
Role of In Vitro and In Vivo Testing biomedical engineering
PDF
Handheld_Laser_Welding_Presentation 2.pdf
PPTX
Shutdown Maintenance Explained — Full Plant Turnaround & Best Practices with ...
PPTX
Introduction Blockchains and Smart Contracts
PDF
Lecture -06-Hybrid Policies - Chapter 7- Weeks 6-7.pdf
PDF
Narrows Planning Collective Transportation Capstone.pdf
 
PDF
Human computer Interface ppt aUNIT 3.pdf
PPTX
Step-by-step guide to designing standard a microbiology laboratory in pharmac...
PPTX
علي نفط.pptx هندسة النفط هندسة النفط والغاز
PPTX
Data Science with R Final yrUnit II.pptx
PDF
IPEC Presentation - Partial discharge Pro .pdf
PPTX
firewall Selection in production life pptx
PPTX
Natural Gas fundamentals and GRU for associated gas trap.pptx
PPTX
How to Select the Right CMMS Software for Your Organization — A Complete Buye...
PPT
63490613-Boiler-Tube-Leakage-analysis-symptoms-causes.ppt
PPTX
The Complete Guide to Energy Audits_ Unlocking Savings, Sustainability, and P...
PPTX
UnrealGameplayAbilitySystemPresentation.pptx
Industrial Plant Safety – Comprehensive Guide for Workplace Safety & Risk Pre...
Track & Monitor Preventive Maintenance — Best Practices with MaintWiz CMMS
ISO 14224 Compliance & CMMS Software — A Comprehensive Guide for Reliable Mai...
Role of In Vitro and In Vivo Testing biomedical engineering
Handheld_Laser_Welding_Presentation 2.pdf
Shutdown Maintenance Explained — Full Plant Turnaround & Best Practices with ...
Introduction Blockchains and Smart Contracts
Lecture -06-Hybrid Policies - Chapter 7- Weeks 6-7.pdf
Narrows Planning Collective Transportation Capstone.pdf
 
Human computer Interface ppt aUNIT 3.pdf
Step-by-step guide to designing standard a microbiology laboratory in pharmac...
علي نفط.pptx هندسة النفط هندسة النفط والغاز
Data Science with R Final yrUnit II.pptx
IPEC Presentation - Partial discharge Pro .pdf
firewall Selection in production life pptx
Natural Gas fundamentals and GRU for associated gas trap.pptx
How to Select the Right CMMS Software for Your Organization — A Complete Buye...
63490613-Boiler-Tube-Leakage-analysis-symptoms-causes.ppt
The Complete Guide to Energy Audits_ Unlocking Savings, Sustainability, and P...
UnrealGameplayAbilitySystemPresentation.pptx

From cache to in-memory data grid. Introduction to Hazelcast.

  • 1.
    From cache toin-memory data grid. Introduction to Hazelcast. By Taras Matyashovsky
  • 2.
  • 3.
    About me •Software engineer/TL • Worked for outsource companies, product companies and tried myself in startups/ freelancing • 7+ years production Java experience • Fan of Agile methodologies, CSM
  • 4.
    What? • Thispresentation: • covers basics of caching and popular cache types • explains evolution from simple cache to distributed, and from distributed to IMDG • not describes usage of NoSQL solutions for caching • is not intended for products comparison or for promotion of Hazelcast as the best solution
  • 5.
    Why? • toexpand horizons regarding modern distributed architectures and solutions • to share experience from my current project where Infinispan was replaced with Hazelcast as in-memory distributed cache solution
  • 6.
    Agenda 1st part:• Why software caches? • Common cache attributes • Cache access patterns • Cache types • Distributed cache vs. IMDG
  • 7.
    Agenda 2nd part:• Hazelcast in a nutshell • Hazelcast configuration • Live demo sessions • in-memory distributed cache • write-through cache with Postgres as storage • search in distributed cache • parallel processing using executor service and entry processor • Infinispan vs. Hazelcast • Best practices and personal recommendations
  • 8.
  • 9.
    Why Software Caching?• application performance: • many concurrent users • time and costs overhead to access application’s data stored in RDBMS or file system • database-access bottlenecks caused by too many simultaneous requests
  • 10.
    So Software Caches• improve response times by reducing data access latency • offload persistent storages by reducing number of trips to data sources • avoid the cost of repeatedly creating objects • share objects between threads • only work for IO-bound applications
  • 11.
    So Software Cachesare essential for modern high-loaded applications
  • 12.
    But • memorysize • is limited • can become unacceptably huge • synchronization complexity • consistency between the cached data state and data source’s original data • durability • correct cache invalidation • scalability
  • 13.
    Common Cache Attributes• maximum size, e.g. quantity of entries • cache algorithm used for invalidation/eviction, e.g.: • least recently used (LRU) • least frequently used (LFU) • FIFO • eviction percentage • expiration, e.g.: • time-to-live (TTL) • absolute/relative time-based expiration
  • 14.
    Cache Access Patterns• cache aside • read-through • refresh-ahead • write-through • write-behind
  • 15.
    Cache Aside Pattern• application is responsible for reading and writing from the storage and the cache doesn't interact with the storage at all • the cache is “kept aside” as a faster and more scalable in-memory data store Client Cache Storage
  • 16.
    Read-Through/Write-Through • theapplication treats cache as the main data store and reads/writes data from/to it • the cache is responsible for reading and writing this data to the database Client Cache Storage
  • 17.
    Write-Behind Pattern •modified cache entries are asynchronously written to the storage after a configurable delay Client Cache Storage
  • 18.
    Refresh-Ahead Pattern •automatically and asynchronously reload (refresh) any recently accessed cache entry from the cache loader prior to its expiration Client Cache Storage
  • 19.
    Cache Strategy SelectionRT/WT vs. cache-aside: • RT/WT simplifies application code • cache-aside may have blocking behavior • cache-aside may be preferable when there are multiple cache updates triggered to the same storage from different cache servers
  • 20.
    Cache Strategy SelectionWrite-through vs. write-behind: • write-behind caching may deliver considerably higher throughput and reduced latency compared to write-through caching • implication of write-behind caching is that database updates occur outside of the cache transaction • write-behind transaction can conflict with an external update
  • 21.
  • 22.
    Cache Types •local cache • replicated cache • distributed cache • remote cache • near cache
  • 23.
    Local Cache acache that is local to (completely contained within) a particular cluster node
  • 24.
    Local Cache Pros:• simplicity • performance • no serialization/deserialization overhead Cons: • not a fault-tolerant • scalability
  • 25.
    Local Cache Solutions:• EhCache • Google Guava • Infinispan local cache mode
  • 26.
    Replicated Cache acache that replicates its data to all cluster nodes
  • 27.
    Get in ReplicatedCache Each cluster node (JVM) accesses the data from its own memory, i.e. local read:
  • 28.
    Put in ReplicatedCache Pushing the new version of the data to all other cluster nodes:
  • 29.
    Replicated Cache Pros:• best read performance • fault–tolerant • linear performance scalability for reads Cons: • poor write performance • additional network load • poor and limited scalability for writes • memory consumption
  • 30.
    Replicated Cache Solutions:• open-source: • Infinispan • commercial: • Oracle Coherence • EhCache + Terracota
  • 31.
    Distributed Cache acache that partitions its data among all cluster nodes
  • 32.
    Get in DistributedCache Access often must go over the network to another cluster node:
  • 33.
    Put in DistributedCache Resolving known limitation of replicated cache:
  • 34.
    Put in DistributedCache • the data is being sent to a primary cluster node and a backup cluster node if backup count is 1 • modifications to the cache are not considered complete until all backups have acknowledged receipt of the modification, i.e. slight performance penalty • such overhead guarantees that data consistency is maintained and no data is lost
  • 35.
    Failover in DistributedCache Failover involves promoting backup data to be primary storage:
  • 36.
    Local Storage inDistributed Cache Certain cluster nodes can be configured to store data, and others to be configured to not store data:
  • 37.
    Distributed Cache Pros:• linear performance scalability for reads and writes • fault-tolerant Cons: • increased latency of reads (due to network round-trip and serialization/deserialization expenses)
  • 38.
    Distributed Cache SummaryDistributed in-memory key/value stores supports a simple set of “put” and “get” operations and optionally read-through and write-through behavior for writing and reading values to and from underlying disk-based storage such as an RDBMS
  • 39.
    Distributed Cache SummaryDepending on the product additional features like: • ACID transactions • eviction policies • replication vs. partitioning • active backups also became available as the products matured
  • 40.
    Distributed Cache Solutions:• open-source: • Infinispan • Hazelcast • NoSQL storages, e.g. Redis, Cassandra, MongoDB, etc. • commercial: • Oracle Coherence • Terracota
  • 41.
    Remote Cache acache that is located remotely and should be accessed by a client(s)
  • 42.
    Remote Cache Majorityof existing distributed/replicated caches solutions support 2 modes: • embedded mode • when cache instance is started within the same JVM as your application • client-server mode • when remote cache instance is started and clients connect to it using a variety of different protocols
  • 43.
    Remote Cache Solutions:• Infinispan remote cache mode • Hazelcast client-server mode • Memcached
  • 44.
    Near Cache ahybrid cache; it typically fronts a distributed cache or a remote cache with a local cache
  • 45.
    Get in NearCache When an object is fetched from remote node, it is put to local cache, so subsequent requests are handled by local node retrieving from local cache:
  • 46.
    Near Cache Pros:• it is best used for read only data Cons: • increases memory usage since the near cache items need to be stored in the memory of the member • reduces consistency
  • 47.
  • 48.
  • 49.
    In-memory Data GridIn-memory distributed cache plus: • ability to support co-location of computations with data in a distributed context and move computation to data • distributed MPP processing based on standard SQL and/or Map/Reduce, that allows to effectively compute over data stored in-memory across the cluster
  • 50.
    IMDC vs. IMDG• in-memory distributed caches were developed in response to a growing need for data high-availability • in-memory data grids were developed to respond to the growing complexities of data processing
  • 51.
    IMDG in anutshell Adding distributed SQL and/or MapReduce type processing required a complete re-thinking of distributed caches, as focus has shifted from pure data management to hybrid data and compute management
  • 52.
  • 53.
  • 54.
    Hazelcast The leadingopen source in-memory data grid free alternative to proprietary solutions, such as Oracle Coherence, VMWare Pivotal Gemfire and Software AG Terracotta
  • 55.
    Hazelcast Use-Cases •scale your application • share data across cluster • partition your data • balance the load • send/receive messages • process in parallel on many JVMs, i.e. MPP
  • 56.
    Hazelcast Features •dynamic clustering, backup, discovery, fail-over • distributed map, queue, set, list, lock, semaphore, topic, executor service, etc. • transaction support • map/reduce API • Java client for accessing the cluster remotely
  • 57.
    Hazelcast Configuration •programmatic configuration • XML configuration • Spring configuration Nuance: It is very important that the configuration on all members in the cluster is exactly the same, it doesn’t matter if you use the XML based configuration or the programmatic configuration.
  • 58.
  • 59.
  • 60.
    Sample Application Technologies:• Spring Boot 1.0.1 • Hazelcast 3.2 • Postgres 9.3 Application: • RESTful web service to get/put data from/to cache • RESTful web service to execute tasks in the cluster • one Instance of Hazelcast per application * Some samples are not optimal and created just to demonstrate usage of existing Hazelcast API
  • 61.
    Global Hazelcast ConfigurationDefined global Hazelcast configuration in separate config in common module. It contains skeleton for future Hazelcast instance as well as global configuration settings: • instance configuration skeleton • common properties • group name and password • TCP based network configuration • join config • multicast and TCP/IP config • default distributed map configuration skeleton
  • 62.
    Hazelcast Instance Eachmodule that uses Hazelcast for distributed cache should have its own separate Hazelcast instance. The “Hazelcast Instance” is a factory for creating individual cache objects. Each cache has a name and potentially distinct configuration settings (expiration, eviction, replication, and more). Multiple instances can live within the same JVM.
  • 63.
    Hazelcast Cluster GroupGroups are used in order to have multiple isolated clusters on the same network instead of a single cluster. JVM can host multiple Hazelcast instances (nodes). Each node can only participate in one group and it only joins to its own group, does not mess with others. In order to achieve this group name and group password configuration properties are used.
  • 64.
    Hazelcast Network ConfigIn our environment multicast mechanism for joining the cluster is not supported, so only TCP/IP-cluster approach will be used. In this case there should be a one or more well known members to connect to.
  • 65.
  • 66.
    Hazelcast Map Store• useful for reading and writing map entries from and to an external data source • one instance per map per node will be created • word of caution: the map store should NOT call distributed map operations, otherwise you might run into deadlocks
  • 67.
    Hazelcast Map Store• map pre-population via loadAllKeys method that returns the set of all “hot” keys that need to be loaded for the partitions owned by the member • write through vs. write behind using “write-delay- seconds” configuration (0 or bigger) • MapLoaderLifecycleSupport to be notified of lifecycle events, i.e. init and destroy
  • 68.
  • 69.
    Hazelcast Executor Service• extends the java.util.concurrent.ExecutorService, but is designed to be used in a distributed environment • scaling up via threads pool size • scaling out is automatic via addition of new Hazelcast instances
  • 70.
    Hazelcast Executor Service• provides different ways to route tasks: • any member • specific member • the member hosting a specific key • all or subset of members • supports execution callback
  • 71.
    Hazelcast Executor ServiceDrawbacks: • work-queue has no high availability: • each member will create local ThreadPoolExecutors with ordinary work-queues that do the real work but not backed up by Hazelcast • work-queue is not partitioned: • it could be that one member has a lot of unprocessed work, and another is idle • no customizable load balancing
  • 72.
    Hazelcast Features Moreuseful features: • entry listener • transactions support, e.g. local, distributed • map reduce API out-of-the-box • custom serialization/deserialization mechanism • distributed topic • clients
  • 73.
    Hazelcast Missing FeaturesMissing useful features: • update configuration in running cluster • load balancing for executor service
  • 74.
  • 75.
    Infinispan vs. HazelcastInfinispan Hazelcast Pros • backed by relatively large company for use in largely distributed environments (JBoss) • been in active use for several years • well-written documentation • a lot of examples of different configurations as well as solutions to common problems • easy setup • more performant than Infinispan • simple node/cluster discovery mechanism • relies on only 1 jar to be included on classpath • brief documentation completed with simple code samples
  • 76.
    Infinispan vs. HazelcastInfinispan Hazelcast Cons • relies on JGroups that proven to be buggy especially under high load • configuration can be overly complex • ~9 jars are needed in order to get Infinispan up and running • code appears very complex and hard to debug/trace • backed by a startup based in Palo Alto and Turkey, just received Series A 2.5 M funding from Bain Capital Ventures • customization points are fairly limited • some exceptions can be difficult to diagnose due to poorly written exception messages • still quite buggy
  • 77.
  • 78.
    Best Practices •each specific Hazelcast instance should have its unique instance name • each specific Hazelcast instance should have its unique group name and password • each specific Hazelcast instance should start on separate port according to predefined ranges
  • 79.
    Personal Recommendations •use XML configuration in production, but don’t use spring:hz schema. Our Spring based “lego bricks” approach for building resulting Hazelcast instance is quite decent. • don’t use Hazelcast for local caches as it was never designed with that purpose and always performs serialization/deserialization • don’t use library specific classes, use common collections, e.g. ConcurrentMap, and you will be able to replace underlying cache solution easily
  • 80.
    Hazelcast Drawbacks •still quite buggy • poor documentation for more complex cases • enterprise edition costs money, but includes: • elastic memory • JAAS security • .NET and C++ clients
  • 81.
  • 82.
    Thank you! byTaras Matyashovsky
  • 83.
    References • http://docs.oracle.com/cd/E18686_01/coh.37/e18677/cache_intro.htm• http://coherence.oracle.com/display/COH31UG/Read-Through,+Write- Through,+Refresh-Ahead+and+Write-Behind+Caching • http://blog.tekmindsolutions.com/oracle-coherence-diffrence-between-replicated- cache-vs-partitioneddistributed-cache/ • http://www.slideshare.net/MaxAlexejev/from-distributed-caches-to-inmemory-data- grids • http://www.slideshare.net/jaxlondon2012/clustering-your-application-with-hazelcast • http://www.gridgain.com/blog/fyi/cache-data-grid-database/ • http://gridgaintech.wordpress.com/2013/10/19/distributed-caching-is-dead-long- live/ • http://www.hazelcast.com/resources/the-book-of-hazelcast/ • https://labs.consol.de/java-caches/part-3-3-peer-to-peer-with-hazelcast/ • http://hazelcast.com/resources/thinking-distributed-the-hazelcast-way/ • https://github.com/tmatyashovsky/hazelcast-samples/

[8]ページ先頭

©2009-2025 Movatter.jp