











































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Slides and notes from a lecture on Cloud Computing given by Wes J. Lloyd at the School of Engineering and Technology, University of Washington - Tacoma. The lecture covers topics such as data, thread-level, task-level parallelism, parallel architectures, SIMD architectures, vector processing, multimedia extensions, graphics processing units, speed-up, Amdahl's Law, Scaled Speedup, properties of distributed systems, and modularity. The lecture also introduces Cloud Computing concepts, technology, and architecture. feedback from students and additional resources on MapReduce.
Typology: Lecture notes
1 / 51
This page cannot be seen from the preview
Don't miss anything!
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma
Questions from 10/ Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition) Data, thread-level, task-level parallelism & Parallel architectures Class Activity 1 – Implicit vs Explicit Parallelism SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 2
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma Please classify your perspective on material covered in today’s class (47 respondents): 1 - mostly review, 5-equal new/review, 10-mostly new Average – 6.89 ( - previous 6.16) Please rate the pace of today’s class: 1 - slow, 5-just right, 10-fast Average – 5.62 ( - previous 5.35) Response rates: TCSS 462: 25/32 – 78.1% TCSS 562: 22/26 – 84.6% October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 3 MATERIAL / PACE
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 4 FEEDBACK FROM 10/
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma Am seeking some clarification for what MAP - REDUCE is besides a framework that uses lots of data processed in parallel. Are cloud computing ser vices built using this infrastructure and then it decides how the work is broken up for ser vers with dif ferent system hardware (heterogeneous, homogeneous, etc.)? MapReduce is a programming model for writing applications to process vast amounts of data (multi - terabyte data-sets) in parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner We also consider for data parallelism, data processing tasks that can be sped up using a divide - and-conquer approach MapReduce provides a programming model and architecture for repeatedly applying the divide - and-conquer pattern October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 7 FEEDBACK - 4 MapReduce consists of two sequential tasks: Map and Reduce. MAP filters and sorts data while converting it into key - value pairs. REDUCE takes this input and reduces its size by performing some kind of summary operation over the dataset MapReduce drastically speeds up big data tasks by breaking down large datasets and processing them in parallel MapReduce paradigm was first proposed in 2004 by Google and later incorporated into the open-source Apache Hadoop framework for distributed processing over large datasets using files Apache Spark supports MapReduce over large datasets in RAM Amazon Elastic Map Reduce (EMR) provides cloud provider managed services for Apache Hadoop and Spark services October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 8 MAP-REDUCE
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma Original Google paper on MapReduce: https://static.googleusercontent.com/media/research.google. com/en//archive/mapreduce - osdi04.pdf Apache Spark: https://spark.apache.org/ Apache Hadoop: https://hadoop.apache.org/ Amazon Elastic Map Reduce: https://aws.amazon.com/emr/ October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 9 MAP-REDUCE - ADDITIONAL RESOURCES
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 10 FEEDBACK - 3
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma I ntroduction to Bash Scripting https://faculty.washington.edu/wlloyd/courses/tcss562/tutorials/T CSS462_562_f2022_tutorial_2.pdf Review tutorial sections: Create a BASH webser vice client
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma Form groups of ~3 - in class or with Zoom breakout rooms Each group will complete a MSWORD DOCX worksheet Be sure to add names at top of document as they appear in Canvas Activity can be completed in class or after class The activity can also be completed individually When completed, one person should submit a PDF of the Google Doc to Canvas Instructor will score all group members based on the uploaded PDF file To get started: ▪ Log into your UW Google Account (https://drive.google.com) using you UW NET ID ▪ Follow the link: https://tinyurl.com/tcss462- 562 - a October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 15 ACTIVITY 1 Solutions to be discussed.. October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 16 ACTIVITY 1
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 19 PARALLELISM QUESTIONS
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 20 PARALLELISM QUESTIONS - 2
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma Questions from 10/ Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition) Data, thread-level, task-level parallelism & Parallel architectures Class Activity 1 – Implicit vs Explicit Parallelism SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 21 OBJECTIVES – 10/ Michael Flynn’s proposed taxonomy of computer architectures based on concurrent instructions and number of data streams (1966)
For fault tolerance, may want to execute same instructions redundantly to detect and mask errors – for task replication October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 22 MICHAEL FLYNN’S COMPUTER ARCHITECTURE TAXONOMY
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously and independently At any time, different processors/cores may execute different instructions on different data Multi-core CPUs are MIMD Processors share memory via interconnection networks ▪ Hypercube, 2D torus, 3D torus, omega network, other topologies MIMD systems have different methods of sharing memory ▪ Uniform Memory Access (UMA) ▪ Cache Only Memory Access (COMA) ▪ Non-Uniform Memory Access (NUMA) October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 25 FLYNN’S TAXONOMY - 2
▪ SIMD can perform many fast matrix operations in parallel
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 26 ARITHMETIC INTENSITY
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma When program reaches a given arithmetic intensity performance of code running on CPU hits a “roof” CPU performance bottleneck changes from: memory bandwidth (left) → floating point performance (right) October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 27 ROOFLINE MODEL
Questions from 10/ Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition) Data, thread-level, task-level parallelism & Parallel architectures Class Activity 1 – Implicit vs Explicit Parallelism SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 28 OBJECTIVES – 10/
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma Parallel hardware and software systems allow:
The speed-up (S) measures effectiveness of parallelization: S(N) = T(1) / T(N)
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 31 PARALLEL COMPUTING Consider embarrassingly parallel image processing Eight images (multiple data) Apply image transformation (greyscale) in parallel 8 - core CPU, 16 hyperthreads Sequential processing: perform transformations one at a time using a single program thread ▪ 8 images, 3 seconds each: T(1) = 24 seconds Parallel processing ▪ 8 images, 3 seconds each: T(N) = 3 seconds Speedup: S(N) = 24 / 3 = 8x speedup Called “per fect scaling” Must consider data transfer and computation setup time October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 32 SPEED-UP EXAMPLE
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 33 AMDAHL’S LAW S = theoretical speedup of the whole task f= fraction of work that is parallel (ex. 25% or 0.25) N= proposed speed up of the parallel part ( ex. 5 times speedup ) % improvement of task execution = 100 * (1 – (1 / S)) Using Amdahl’s law, what is the maximum possible speed - up? October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 34 AMDAHL’S LAW
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma
Can be used to estimate runtime of parallel portion of program
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 37 GUSTAFSON'S LAW
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 38 GUSTAFSON'S LAW
TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma QUESTION: What is the maximum theoretical speed - up on a 2 - core CPU?
What is the maximum theoretical speed - up on a 16-core CPU?
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 39 GUSTAFSON’S EXAMPLE QUESTION: What is the maximum theoretical speed - up on a 2 - core CPU?
What is the maximum theoretical speed - up on a 16 - core CPU?
October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 40 GUSTAFSON’S EXAMPLE For 2 CPUs, speed up is 1.25x For 16 CPUs, speed up is 4.75x For 2 CPUs, speed up is 1.25x For 16 CPUs, speed up is 4.75x