Movatterモバイル変換


[0]ホーム

URL:


PPTX, PDF2,244 views

Apache Hadoop YARN: Past, Present and Future

The document discusses the past, present, and future of Apache Hadoop YARN. It describes how YARN started as a sub-project of Hadoop to improve its resource management capabilities. Today, YARN is central to modern data architectures, providing centralized resource management and scheduling. Going forward, YARN aims to better support containers, simplified APIs, treating services as first-class citizens, and enhance its user experience.

Embed presentation

Downloaded 169 times
1 © Hortonworks Inc. 2011 – 2016. All Rights ReservedApache Hadoop YARN:Past, Present andFutureMelbourne, Aug.31 2016Junping Du
2 © Hortonworks Inc. 2011 – 2016. All Rights ReservedWho.JSON{"name" : "Junping Du" ,"job_title" : "Lead Software Engineer @ Hortonworks YARN core team","experiences" : [ {"software_industry_years" : 10,"hadoop_experience" : "Hadoop contributor before YARN comes out, ApacheHadoop committer & PMC, Release Manager for Apache Hadoop 2.6",”non_hadoop_experience" : “Architect in cloud computing and enterprise software"}],"email" : "junping_du@apache.org"}
3 © Hortonworks Inc. 2011 – 2016. All Rights ReservedWhat is Apache Hadoop YARN ?⬢ YARN is short for “Yet Another Resource Negotiator”⬢ Big Data Operating System–Resource Management and Scheduling–Support for “colorful” applications, like: Batch, Interactive,Real-Time, etc.⬢ Enterprise adoption accelerating–Secure mode becoming more widespread–Multi-tenant support–Diverse workloads⬢ SLAs–Tolerance for slow running jobs decreasing–Consistent performance desired
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPast
5 © Hortonworks Inc. 2011 – 2016. All Rights ReservedA brief Timeline1st line of Code Open sourced First 2.0 alpha First 2.0 betaJune-July 2010 August 2011 May 2012 August 2013⬢ Sub-project of Apache Hadoop⬢ Releases tied to Hadoop releases⬢ Alphas and betas
6 © Hortonworks Inc. 2011 – 2016. All Rights ReservedGA Releases2.2 2.3 2.4 2.5Oct. 2013 Feb. 2014 Apr. 2014 Aug. 2014• 1st GA• MR binarycompatibility• YARN APIcleanup• Testing!• 1st Post GA• Bug fixes• Alpha features- Load simulator- LCEenhancements• RM Fail-over• CS Preemption• TimelineService V1• Writable RESTAPIs• TimelineService V1security
7 © Hortonworks Inc. 2011 – 2016. All Rights ReservedGA Releases (Recent + Planning)2.6 2.7 2.8/2.9 3.0Nov. 2014 Apr. 2015 2nd H 2016 (estimated) TBD• KMS• Long runningservice support• RollingUpgrade• Node LabelSupport• DockerContainer• PluggableAuthorization• SharedResourceCache• TimelineService V1.5• GracefulDecommission• Log CLIEnhancement• TimelineService V2
8 © Hortonworks Inc. 2011 – 2016. All Rights ReservedOutstanding YARN Features released in 2.6/2.7Default PartitionPartition BGPUsPartition CWindowsJDK 8 JDK 7 JDK 7⬢ Rolling UpgradeNode LabelPluggable ACLs
9 © Hortonworks Inc. 2011 – 2016. All Rights ReservedRecent Maintenance Releases Updates⬢ 2.6 and 2.7 maintenance releases are carried out–Only blockers and critical fixes are added⬢ Apache Hadoop 2.6–2.6.4 released in Feb. 2016–2.6.3 released in Dec. 2015–2.6.2 released in Oct. 2015⬢ Apache Hadoop 2.7–2.7.3 released in Aug. 2016–2.7.2 released in Jan. 2016–2.7.1 released in Jul. 2015
10© Hortonworks Inc. 2011 – 2016. All Rights Reserved10© Hortonworks Inc. 2011 – 2016. All Rights ReservedPresent
11© Hortonworks Inc. 2011 – 2016. All Rights ReservedYARN in Modern Data Architecture⬢ Modern Data Architecture–Enable applications to have access to allyour enterprise data through anefficient centralized platform–Supported with a centralized approachgovernance, security and operations–Versatile to handle any applications anddatasets no matter the size or type⬢ YARN’s Evolution–The “CORE” of Modern DataArchitecture–Centralized resource management, highefficient scheduling, flexible resourcemodel, isolation in security andperformance, “colorful” applicationssupport, etc.
12© Hortonworks Inc. 2011 – 2016. All Rights ReservedApache Hadoop YARNResourceManager(active)ResourceManager(standby)NodeManager1NodeManager2NodeManager3NodeManager4Resources: 128G, 16 vcoresAuto-calculate node resourcesLabel: SASDynamically update noderesources
13© Hortonworks Inc. 2011 – 2016. All Rights ReservedNodeManager Resource Management⬢ Options to report NM resources based on node hardware–YARN-160–Restart of the NM required to enable feature⬢ Alternatively, admins can use the rmadmin command to update the node’s resources–YARN-291–Looks at the dynamic-resource.xml–No restart of the NM or the RM required
14© Hortonworks Inc. 2011 – 2016. All Rights ReservedApache Hadoop YARN SchedulerInter queue pre-emptionImprovements to pre-emptionApplicationQueue B – 25%Queue C – 25%Label: SAS (non-exclusive)Queue A – 50%Priority/FIFO, FairResourceManager(active)Application, Queue A, 4G, 1 vcoreSupport for application priorityReservation for applicationSupport for cost based placementagentUser
15© Hortonworks Inc. 2011 – 2016. All Rights ReservedCapacity scheduler⬢ Support for application priority within a queue–YARN-1963–Users can specify application priority–Specified as an integer, higher number is higher priority–Application priority can be updated while it’s running⬢ Improvements to reservations–YARN-2572–Support for cost based placement agent added in addition to greedy⬢ Queue allocation policy can be switched to fair sharing–YARN-3319–Containers allocated on a fair share basis instead of FIFO
16© Hortonworks Inc. 2011 – 2016. All Rights ReservedCapacity scheduler⬢ Support for non-exclusive node labels–YARN-3214–Improvement over partition that existed earlier–Better for cluster utilization⬢ Improvements to pre-emption
17© Hortonworks Inc. 2011 – 2016. All Rights ReservedNode 1NodeManagerSupport added for gracefuldecomissioning128G, 16 vcoresLaunch Applicaton 1 AMAM process/Docker container(alpha)Launch AM process viaContainerExecutor – DCE, LCE, WSCE.Monitor/isolate memory and cpu.Support added for disk and networkisolation via CGroups(alpha)Apache Hadoop YARN Application LifecycleResourceManager(active)Request containersAllocate containersSupport added to resize containers. Container 1 process/Dockercontainer(alpha)Container 2 process/Dockercontainer(alpha)Launch containers on node using DCE,LCE, WSCE. Monitor/isolate memory andcpu. Support added for disk and networkisolation using Cgroups(alpha).History Server(ATS 1.5– leveldb+ HDFS, JHS - HDFS)HDFSLog aggregation
18© Hortonworks Inc. 2011 – 2016. All Rights ReservedApache Hadoop YARN⬢ Graceful decommissioning of NodeManagers–YARN-914–Drains a node that’s being decommissioned to allow running containers to finish⬢ Resource isolation support for disk and network–YARN-2619, YARN-2140–Containers get a fair share of disk and network resources using CGroups–Alpha feature⬢ Docker support in LinuxContainerExecutor–YARN-3853–Support to launch Docker containers alongside process containers–Alpha feature
19© Hortonworks Inc. 2011 – 2016. All Rights ReservedApache Hadoop YARN⬢ Support for container resizing–YARN-1197–Allows applications to change the size of an existing container⬢ ATS 1.5–YARN-4233–Store timeline events on HDFS–Better scalability and reliability
20© Hortonworks Inc. 2011 – 2016. All Rights ReservedOperational support⬢ Improvements to existing tools (like yarn logs)⬢ New tools added (yarn top)⬢ Improvements to the RM UI to expose more details about running applications
21© Hortonworks Inc. 2011 – 2016. All Rights Reserved21© Hortonworks Inc. 2011 – 2016. All Rights ReservedFuture
22© Hortonworks Inc. 2011 – 2016. All Rights ReservedPackaging Containers– Lightweight mechanism for packaging and resource isolation– Popularized and made accessible by Docker– Can replace VMs in some cases– Or more accurately, VMs got used in places where they didn’tneed to be Native integration ++ in YARN– Support for “Container Runtimes” in LCE: YARN-3611– Process runtime– Docker runtime
23© Hortonworks Inc. 2011 – 2016. All Rights ReservedAPIs Applications need simple APIs Need to be deployable “easily” Simple REST API layer fronting YARN– https://issues.apache.org/jira/browse/YARN-4793– [Umbrella] Simplified API layer for services and beyond Spawn services & Manage them
24© Hortonworks Inc. 2011 – 2016. All Rights ReservedYARN as a Platform YARN itself is evolving to support services and complex apps– https://issues.apache.org/jira/browse/YARN-4692– [Umbrella] Simplified and first-class support for services in YARN Scheduling– Application priorities: YARN-1963– Affinity / anti-affinity: YARN-1042– Services as first-class citizens: Preemption, reservations etc
25© Hortonworks Inc. 2011 – 2016. All Rights ReservedYARN as a Platform (Contd) Application & Services upgrades– ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”– YARN-4726 Simplified discovery of services via DNS mechanisms: YARN-4757 YARN Federation – to infinity and beyond: YARN-2915
26© Hortonworks Inc. 2011 – 2016. All Rights ReservedYARN Service Framework Platform is only as good as the tools A native YARN framework– https://issues.apache.org/jira/browse/YARN-4692– [Umbrella] Native YARN framework layer for services andbeyond Slider supporting a DAG of apps:– https://issues.apache.org/jira/browse/SLIDER-875
27© Hortonworks Inc. 2011 – 2016. All Rights ReservedOperational and User Experience Modern YARN web UI - YARN-3368 Enhanced shell interfaces Metrics: Timeline Service V2 – YARN-2928 Application & Services monitoring, integration with other systems First class support for YARN hosted services in Ambari– https://issues.apache.org/jira/browse/AMBARI-17353
28© Hortonworks Inc. 2011 – 2016. All Rights ReservedUse-cases.. Assemble!YARN and Other Platform ServicesStorageResourceManagement SecurityServiceDiscovery ManagementMonitoringAlertsHoliday AssemblyHBaseWebServerIOT AssemblyKafka Storm HBase SolrGovernanceMR Tez Spark …
29© Hortonworks Inc. 2011 – 2016. All Rights ReservedFuture Work List (I)⬢ Arbitrary resource types–YARN-3926–Admins can decide what resource types tosupport–Resource types read via a config file⬢ New scheduler features–YARN-4902–Support richer placement strategies such asaffinity, anti-affinity⬢ Distributed scheduling–YARN-2877, YARN-4742–NMs run a local scheduler–Allows faster scheduling turnaround⬢ YARN federation–YARN-2915–Allows YARN to scale out to tens of thousands ofnodes–Cluster of clusters which appear as a single clusterto an end user⬢ Better support for disk and network isolation–Tied to supporting arbitrary resource types
30© Hortonworks Inc. 2011 – 2016. All Rights ReservedFuture Work List (II)⬢ Simplified and first-class support forservices in YARN–YARN-4692–Container restart (YARN-3988)•Allow container restart withoutlosing allocation–Service discovery via DNS (YARN-4757)•Running services can bediscovered via DNS–Allocation re-use (YARN-4726)•Allow AMs to stop a container butnot lose resources on the node⬢ Enhance Docker support–YARN-3611–Support to mount volumes–Isolate containers using CGroups⬢ ATS v2 Phase 2–YARN-2928 (Phase 1), YARN-5355 (Phase 2)–Run timeline service on Hbase–Support for more data, better performance⬢ Also in the pipeline–Switch to Java 8 with Hadoop 3.0–Add support for GPU isolation–Better tools to detect limping nodes–New RM UI – YARN-3368
31© Hortonworks Inc. 2011 – 2016. All Rights ReservedHDP Evolution with Apache Hadoop YARN
32© Hortonworks Inc. 2011 – 2016. All Rights Reserved32© Hortonworks Inc. 2011 – 2016. All Rights ReservedThank you!

Recommended

PPTX
Row/Column- Level Security in SQL for Apache Spark
PPTX
Streamline Hadoop DevOps with Apache Ambari
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PPTX
Apache Hadoop YARN: state of the union
PPTX
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
PPTX
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
PPTX
Running Services on YARN
PDF
The state of SQL-on-Hadoop in the Cloud
PPTX
Why is my Hadoop cluster slow?
PPTX
Debugging Apache Hadoop YARN Cluster in Production
PPTX
Hive edw-dataworks summit-eu-april-2017
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PPTX
Log Analytics Optimization
PPTX
An Apache Hive Based Data Warehouse
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PPTX
A Multi Colored YARN
PPTX
Benefits of an Agile Data Fabric for Business Intelligence
PPTX
Apache NiFi 1.0 in Nutshell
PPTX
Manage democratization of the data - Data Replication in Hadoop
PPTX
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
PPTX
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
PPTX
From Zero to Data Flow in Hours with Apache NiFi
PPTX
Schema Registry - Set Your Data Free
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Why is my Hadoop* job slow?
PDF
Scalable OCR with NiFi and Tesseract
PPTX
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
PPTX
Big Data in Azure

More Related Content

PPTX
Row/Column- Level Security in SQL for Apache Spark
PPTX
Streamline Hadoop DevOps with Apache Ambari
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PPTX
Apache Hadoop YARN: state of the union
PPTX
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
PPTX
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Row/Column- Level Security in SQL for Apache Spark
Streamline Hadoop DevOps with Apache Ambari
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Apache Hive 2.0: SQL, Speed, Scale
Hadoop & Cloud Storage: Object Store Integration in Production
Apache Hadoop YARN: state of the union
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting

What's hot

PPTX
Running Services on YARN
PDF
The state of SQL-on-Hadoop in the Cloud
PPTX
Why is my Hadoop cluster slow?
PPTX
Debugging Apache Hadoop YARN Cluster in Production
PPTX
Hive edw-dataworks summit-eu-april-2017
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PPTX
Log Analytics Optimization
PPTX
An Apache Hive Based Data Warehouse
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PPTX
A Multi Colored YARN
PPTX
Benefits of an Agile Data Fabric for Business Intelligence
PPTX
Apache NiFi 1.0 in Nutshell
PPTX
Manage democratization of the data - Data Replication in Hadoop
PPTX
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
PPTX
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
PPTX
From Zero to Data Flow in Hours with Apache NiFi
PPTX
Schema Registry - Set Your Data Free
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Why is my Hadoop* job slow?
PDF
Scalable OCR with NiFi and Tesseract
Running Services on YARN
The state of SQL-on-Hadoop in the Cloud
Why is my Hadoop cluster slow?
Debugging Apache Hadoop YARN Cluster in Production
Hive edw-dataworks summit-eu-april-2017
An Overview on Optimization in Apache Hive: Past, Present Future
Log Analytics Optimization
An Apache Hive Based Data Warehouse
Hadoop & Cloud Storage: Object Store Integration in Production
A Multi Colored YARN
Benefits of an Agile Data Fabric for Business Intelligence
Apache NiFi 1.0 in Nutshell
Manage democratization of the data - Data Replication in Hadoop
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
From Zero to Data Flow in Hours with Apache NiFi
Schema Registry - Set Your Data Free
Apache Hadoop YARN: Past, Present and Future
Why is my Hadoop* job slow?
Scalable OCR with NiFi and Tesseract

Viewers also liked

PPTX
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
PPTX
Big Data in Azure
PPTX
Solving Cyber at Scale
PPTX
Best Practices for Enterprise User Management in Hadoop Environment
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
PPTX
Automatic Detection, Classification and Authorization of Sensitive Personal D...
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
PPTX
Apache Metron: Community Driven Cyber Security
PDF
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
PPTX
Hadoop 3 in a Nutshell
PPTX
Apache Kafka Best Practices
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Big Data in Azure
Solving Cyber at Scale
Best Practices for Enterprise User Management in Hadoop Environment
File Format Benchmark - Avro, JSON, ORC and Parquet
Automatic Detection, Classification and Authorization of Sensitive Personal D...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
Apache Metron: Community Driven Cyber Security
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Hadoop 3 in a Nutshell
Apache Kafka Best Practices

Similar to Apache Hadoop YARN: Past, Present and Future

PPTX
YARN - Past, Present, & Future
PPTX
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
PPTX
Apache Hadoop 3 updates with migration story
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
PPTX
YARN - Next Generation Compute Platform fo Hadoop
PPTX
Apache Hadoop YARN: Present and Future
PPTX
MHUG - YARN
PDF
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
PPTX
YARN - Hadoop Next Generation Compute Platform
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
PDF
Apache Hadoop YARN: state of the union
PDF
Apache Hadoop YARN: state of the union - Tokyo
PDF
Apache Hadoop YARN: State of the Union
PPTX
Apache Hadoop YARN: state of the union
PDF
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
PDF
YARN - Strata 2014
PDF
How YARN Enables Multiple Data Processing Engines in Hadoop
YARN - Past, Present, & Future
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Apache Hadoop 3 updates with migration story
Apache Hadoop YARN: Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
YARN - Next Generation Compute Platform fo Hadoop
Apache Hadoop YARN: Present and Future
MHUG - YARN
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
YARN - Hadoop Next Generation Compute Platform
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: state of the union
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
YARN - Strata 2014
How YARN Enables Multiple Data Processing Engines in Hadoop

More from DataWorks Summit/Hadoop Summit

PPT
Running Apache Spark & Apache Zeppelin in Production
PPT
State of Security: Apache Spark & Apache Zeppelin
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
PDF
Enabling Digital Diagnostics with a Data Science Platform
PDF
Revolutionize Text Mining with Spark and Zeppelin
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
PDF
Hadoop Crash Course
PDF
Data Science Crash Course
PDF
Apache Spark Crash Course
PDF
Dataflow with Apache NiFi
PPTX
Schema Registry - Set you Data Free
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
Mool - Automated Log Analysis using Data Science and ML
PPTX
How Hadoop Makes the Natixis Pack More Efficient
PPTX
HBase in Practice
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
PPTX
Backup and Disaster Recovery in Hadoop
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Data Science Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop

Recently uploaded

PDF
The Enterprise Web3 Landscape - a view from Kaleido based on the past 10 year...
PDF
Rustici Software: eLearning standards in the age of AI
PDF
AI and Zero Trust: What it takes to do it right
PDF
Why Many Smart Device Platforms Fail to Scale?
PDF
ICT500 - CRITICAL AND CREATIVE THINKING FOR INFORMATION TECHNOLOGY SOLUTIONS:...
PDF
Understanding Foldable 3-Wheel Electric Scooters for Everyday Use
PDF
UiPath Automation Developer Associate Training Series 2026 - Session 1
PDF
Designing a Blog Using Wordpress
PPTX
2026 SCORM Troubleshooting Rustici + dominKnow.pptx
PDF
Poročilo odbora CIS (CH08873) za leto 2025 na letni skupščini IEEE Slovenija ...
PDF
Analyze and Preserve Logs - RHCSA (RH134).pdf
PPTX
The Lex Wire Precedent: A Technical Standard for Machine-Mediated Authority ...
PPTX
TechSprint (SJBIT) 2025-26 Hackathon Winners & Awards Ceremony
PDF
Getting Started with Apache Spark: Big Data Made Simple [Free Meetup]
PDF
From DeFi POC to Production MVP - A 12-16 Week Blueprint.pdf
PDF
Configure and Manage Systemd Timers- RHCSA (RH134).pdf
PPTX
Retrieval Augmented Generation- The Synergistic Power of Prompt Engineering
PDF
Parental Control App for Phones_ The Complete 2026 Guide for Safer, Smarter P...
PDF
Safer’s Picks: The 5 FME Transformers You Didn’t Know You Needed
PDF
What Is the Azure AI Foundry and Why Does It Matter for Enterprises?
The Enterprise Web3 Landscape - a view from Kaleido based on the past 10 year...
Rustici Software: eLearning standards in the age of AI
AI and Zero Trust: What it takes to do it right
Why Many Smart Device Platforms Fail to Scale?
ICT500 - CRITICAL AND CREATIVE THINKING FOR INFORMATION TECHNOLOGY SOLUTIONS:...
Understanding Foldable 3-Wheel Electric Scooters for Everyday Use
UiPath Automation Developer Associate Training Series 2026 - Session 1
Designing a Blog Using Wordpress
2026 SCORM Troubleshooting Rustici + dominKnow.pptx
Poročilo odbora CIS (CH08873) za leto 2025 na letni skupščini IEEE Slovenija ...
Analyze and Preserve Logs - RHCSA (RH134).pdf
The Lex Wire Precedent: A Technical Standard for Machine-Mediated Authority ...
TechSprint (SJBIT) 2025-26 Hackathon Winners & Awards Ceremony
Getting Started with Apache Spark: Big Data Made Simple [Free Meetup]
From DeFi POC to Production MVP - A 12-16 Week Blueprint.pdf
Configure and Manage Systemd Timers- RHCSA (RH134).pdf
Retrieval Augmented Generation- The Synergistic Power of Prompt Engineering
Parental Control App for Phones_ The Complete 2026 Guide for Safer, Smarter P...
Safer’s Picks: The 5 FME Transformers You Didn’t Know You Needed
What Is the Azure AI Foundry and Why Does It Matter for Enterprises?

Apache Hadoop YARN: Past, Present and Future

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights ReservedApache Hadoop YARN:Past, Present andFutureMelbourne, Aug.31 2016Junping Du
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights ReservedWho.JSON{"name" : "Junping Du" ,"job_title" : "Lead Software Engineer @ Hortonworks YARN core team","experiences" : [ {"software_industry_years" : 10,"hadoop_experience" : "Hadoop contributor before YARN comes out, ApacheHadoop committer & PMC, Release Manager for Apache Hadoop 2.6",”non_hadoop_experience" : “Architect in cloud computing and enterprise software"}],"email" : "junping_du@apache.org"}
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights ReservedWhat is Apache Hadoop YARN ?⬢ YARN is short for “Yet Another Resource Negotiator”⬢ Big Data Operating System–Resource Management and Scheduling–Support for “colorful” applications, like: Batch, Interactive,Real-Time, etc.⬢ Enterprise adoption accelerating–Secure mode becoming more widespread–Multi-tenant support–Diverse workloads⬢ SLAs–Tolerance for slow running jobs decreasing–Consistent performance desired
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPast
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights ReservedA brief Timeline1st line of Code Open sourced First 2.0 alpha First 2.0 betaJune-July 2010 August 2011 May 2012 August 2013⬢ Sub-project of Apache Hadoop⬢ Releases tied to Hadoop releases⬢ Alphas and betas
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights ReservedGA Releases2.2 2.3 2.4 2.5Oct. 2013 Feb. 2014 Apr. 2014 Aug. 2014• 1st GA• MR binarycompatibility• YARN APIcleanup• Testing!• 1st Post GA• Bug fixes• Alpha features- Load simulator- LCEenhancements• RM Fail-over• CS Preemption• TimelineService V1• Writable RESTAPIs• TimelineService V1security
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights ReservedGA Releases (Recent + Planning)2.6 2.7 2.8/2.9 3.0Nov. 2014 Apr. 2015 2nd H 2016 (estimated) TBD• KMS• Long runningservice support• RollingUpgrade• Node LabelSupport• DockerContainer• PluggableAuthorization• SharedResourceCache• TimelineService V1.5• GracefulDecommission• Log CLIEnhancement• TimelineService V2
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights ReservedOutstanding YARN Features released in 2.6/2.7Default PartitionPartition BGPUsPartition CWindowsJDK 8 JDK 7 JDK 7⬢ Rolling UpgradeNode LabelPluggable ACLs
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights ReservedRecent Maintenance Releases Updates⬢ 2.6 and 2.7 maintenance releases are carried out–Only blockers and critical fixes are added⬢ Apache Hadoop 2.6–2.6.4 released in Feb. 2016–2.6.3 released in Dec. 2015–2.6.2 released in Oct. 2015⬢ Apache Hadoop 2.7–2.7.3 released in Aug. 2016–2.7.2 released in Jan. 2016–2.7.1 released in Jul. 2015
  • 10.
    10© Hortonworks Inc.2011 – 2016. All Rights Reserved10© Hortonworks Inc. 2011 – 2016. All Rights ReservedPresent
  • 11.
    11© Hortonworks Inc.2011 – 2016. All Rights ReservedYARN in Modern Data Architecture⬢ Modern Data Architecture–Enable applications to have access to allyour enterprise data through anefficient centralized platform–Supported with a centralized approachgovernance, security and operations–Versatile to handle any applications anddatasets no matter the size or type⬢ YARN’s Evolution–The “CORE” of Modern DataArchitecture–Centralized resource management, highefficient scheduling, flexible resourcemodel, isolation in security andperformance, “colorful” applicationssupport, etc.
  • 12.
    12© Hortonworks Inc.2011 – 2016. All Rights ReservedApache Hadoop YARNResourceManager(active)ResourceManager(standby)NodeManager1NodeManager2NodeManager3NodeManager4Resources: 128G, 16 vcoresAuto-calculate node resourcesLabel: SASDynamically update noderesources
  • 13.
    13© Hortonworks Inc.2011 – 2016. All Rights ReservedNodeManager Resource Management⬢ Options to report NM resources based on node hardware–YARN-160–Restart of the NM required to enable feature⬢ Alternatively, admins can use the rmadmin command to update the node’s resources–YARN-291–Looks at the dynamic-resource.xml–No restart of the NM or the RM required
  • 14.
    14© Hortonworks Inc.2011 – 2016. All Rights ReservedApache Hadoop YARN SchedulerInter queue pre-emptionImprovements to pre-emptionApplicationQueue B – 25%Queue C – 25%Label: SAS (non-exclusive)Queue A – 50%Priority/FIFO, FairResourceManager(active)Application, Queue A, 4G, 1 vcoreSupport for application priorityReservation for applicationSupport for cost based placementagentUser
  • 15.
    15© Hortonworks Inc.2011 – 2016. All Rights ReservedCapacity scheduler⬢ Support for application priority within a queue–YARN-1963–Users can specify application priority–Specified as an integer, higher number is higher priority–Application priority can be updated while it’s running⬢ Improvements to reservations–YARN-2572–Support for cost based placement agent added in addition to greedy⬢ Queue allocation policy can be switched to fair sharing–YARN-3319–Containers allocated on a fair share basis instead of FIFO
  • 16.
    16© Hortonworks Inc.2011 – 2016. All Rights ReservedCapacity scheduler⬢ Support for non-exclusive node labels–YARN-3214–Improvement over partition that existed earlier–Better for cluster utilization⬢ Improvements to pre-emption
  • 17.
    17© Hortonworks Inc.2011 – 2016. All Rights ReservedNode 1NodeManagerSupport added for gracefuldecomissioning128G, 16 vcoresLaunch Applicaton 1 AMAM process/Docker container(alpha)Launch AM process viaContainerExecutor – DCE, LCE, WSCE.Monitor/isolate memory and cpu.Support added for disk and networkisolation via CGroups(alpha)Apache Hadoop YARN Application LifecycleResourceManager(active)Request containersAllocate containersSupport added to resize containers. Container 1 process/Dockercontainer(alpha)Container 2 process/Dockercontainer(alpha)Launch containers on node using DCE,LCE, WSCE. Monitor/isolate memory andcpu. Support added for disk and networkisolation using Cgroups(alpha).History Server(ATS 1.5– leveldb+ HDFS, JHS - HDFS)HDFSLog aggregation
  • 18.
    18© Hortonworks Inc.2011 – 2016. All Rights ReservedApache Hadoop YARN⬢ Graceful decommissioning of NodeManagers–YARN-914–Drains a node that’s being decommissioned to allow running containers to finish⬢ Resource isolation support for disk and network–YARN-2619, YARN-2140–Containers get a fair share of disk and network resources using CGroups–Alpha feature⬢ Docker support in LinuxContainerExecutor–YARN-3853–Support to launch Docker containers alongside process containers–Alpha feature
  • 19.
    19© Hortonworks Inc.2011 – 2016. All Rights ReservedApache Hadoop YARN⬢ Support for container resizing–YARN-1197–Allows applications to change the size of an existing container⬢ ATS 1.5–YARN-4233–Store timeline events on HDFS–Better scalability and reliability
  • 20.
    20© Hortonworks Inc.2011 – 2016. All Rights ReservedOperational support⬢ Improvements to existing tools (like yarn logs)⬢ New tools added (yarn top)⬢ Improvements to the RM UI to expose more details about running applications
  • 21.
    21© Hortonworks Inc.2011 – 2016. All Rights Reserved21© Hortonworks Inc. 2011 – 2016. All Rights ReservedFuture
  • 22.
    22© Hortonworks Inc.2011 – 2016. All Rights ReservedPackaging Containers– Lightweight mechanism for packaging and resource isolation– Popularized and made accessible by Docker– Can replace VMs in some cases– Or more accurately, VMs got used in places where they didn’tneed to be Native integration ++ in YARN– Support for “Container Runtimes” in LCE: YARN-3611– Process runtime– Docker runtime
  • 23.
    23© Hortonworks Inc.2011 – 2016. All Rights ReservedAPIs Applications need simple APIs Need to be deployable “easily” Simple REST API layer fronting YARN– https://issues.apache.org/jira/browse/YARN-4793– [Umbrella] Simplified API layer for services and beyond Spawn services & Manage them
  • 24.
    24© Hortonworks Inc.2011 – 2016. All Rights ReservedYARN as a Platform YARN itself is evolving to support services and complex apps– https://issues.apache.org/jira/browse/YARN-4692– [Umbrella] Simplified and first-class support for services in YARN Scheduling– Application priorities: YARN-1963– Affinity / anti-affinity: YARN-1042– Services as first-class citizens: Preemption, reservations etc
  • 25.
    25© Hortonworks Inc.2011 – 2016. All Rights ReservedYARN as a Platform (Contd) Application & Services upgrades– ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”– YARN-4726 Simplified discovery of services via DNS mechanisms: YARN-4757 YARN Federation – to infinity and beyond: YARN-2915
  • 26.
    26© Hortonworks Inc.2011 – 2016. All Rights ReservedYARN Service Framework Platform is only as good as the tools A native YARN framework– https://issues.apache.org/jira/browse/YARN-4692– [Umbrella] Native YARN framework layer for services andbeyond Slider supporting a DAG of apps:– https://issues.apache.org/jira/browse/SLIDER-875
  • 27.
    27© Hortonworks Inc.2011 – 2016. All Rights ReservedOperational and User Experience Modern YARN web UI - YARN-3368 Enhanced shell interfaces Metrics: Timeline Service V2 – YARN-2928 Application & Services monitoring, integration with other systems First class support for YARN hosted services in Ambari– https://issues.apache.org/jira/browse/AMBARI-17353
  • 28.
    28© Hortonworks Inc.2011 – 2016. All Rights ReservedUse-cases.. Assemble!YARN and Other Platform ServicesStorageResourceManagement SecurityServiceDiscovery ManagementMonitoringAlertsHoliday AssemblyHBaseWebServerIOT AssemblyKafka Storm HBase SolrGovernanceMR Tez Spark …
  • 29.
    29© Hortonworks Inc.2011 – 2016. All Rights ReservedFuture Work List (I)⬢ Arbitrary resource types–YARN-3926–Admins can decide what resource types tosupport–Resource types read via a config file⬢ New scheduler features–YARN-4902–Support richer placement strategies such asaffinity, anti-affinity⬢ Distributed scheduling–YARN-2877, YARN-4742–NMs run a local scheduler–Allows faster scheduling turnaround⬢ YARN federation–YARN-2915–Allows YARN to scale out to tens of thousands ofnodes–Cluster of clusters which appear as a single clusterto an end user⬢ Better support for disk and network isolation–Tied to supporting arbitrary resource types
  • 30.
    30© Hortonworks Inc.2011 – 2016. All Rights ReservedFuture Work List (II)⬢ Simplified and first-class support forservices in YARN–YARN-4692–Container restart (YARN-3988)•Allow container restart withoutlosing allocation–Service discovery via DNS (YARN-4757)•Running services can bediscovered via DNS–Allocation re-use (YARN-4726)•Allow AMs to stop a container butnot lose resources on the node⬢ Enhance Docker support–YARN-3611–Support to mount volumes–Isolate containers using CGroups⬢ ATS v2 Phase 2–YARN-2928 (Phase 1), YARN-5355 (Phase 2)–Run timeline service on Hbase–Support for more data, better performance⬢ Also in the pipeline–Switch to Java 8 with Hadoop 3.0–Add support for GPU isolation–Better tools to detect limping nodes–New RM UI – YARN-3368
  • 31.
    31© Hortonworks Inc.2011 – 2016. All Rights ReservedHDP Evolution with Apache Hadoop YARN
  • 32.
    32© Hortonworks Inc.2011 – 2016. All Rights Reserved32© Hortonworks Inc. 2011 – 2016. All Rights ReservedThank you!

[8]ページ先頭

©2009-2026 Movatter.jp