Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Google Cloud Big Data and Machine Learning, Study notes of Operating Systems

Bharati Vidyapeeth Operating Systems

Google Big Data Platform Cloud Dataproc

Typology: Study notes

2017/2018

Available from 01/15/2022

vyom-bhardwaj 🇮🇳

3 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

Google Cloud Big Data and Machine Learning

Google Big Data Platform

Google Big Data solutions will help to transform the users and their business

experiences through data insights which are known as Integrated Serverless

Platform.

 Google Big Data solutions are a part of GCP Services that are fully

maintained and managed. You only pay for the resources that you consume.

 The following services are offered by Google Big Data Platform and are

integrated to help create custom solutions:

o Apache Hadoop is an open-source framework for Hadoop, which is

based on the MapReduce programming model.

o MapReduce model consists of Map Function, which runs with a large

dataset to generate intermediate results. By taking these results as

input, Reduce Function will produce the final output.

o Along with Apache Hadoop, there are other related projects

like Apache Pig, Hive and Spark.

o On Google Cloud Platform, Cloud Dataproc can be used to

run Hadoop, Spark, Hive and Pig.

Cloud Dataproc

 When you request a Hadoop Cluster, it will be built in less than 90 seconds

on top of the VM. Scaling can be done up and down based on the processing

power.

 You can monitor the cluster using Processing Power.

 Running the clusters in On-premises will require hardware investment. But,

running them in Dataproc will allow you to pay only for the hardware

resources that you use while creating the cluster.

 Cloud Dataproc is billed per second, and GCP stops the billing once the

cluster is deleted.

 You can also use Preemptible instances for the batch processing to save

costs.

 Once the cluster consumes the data, Spark and SparkSQL can be used for

data mining.

 You can also use Apache Spark Machine Learning Libraries to discover

patterns through Machine Learning.

Cloud Dataflow

Cloud Dataproc is suitable when you know your cluster size. But if your

cluster size is unpredictable or your data shows up in real-time, then your

choice should be Cloud Dataflow.

Partial preview of the text

Download Google Cloud Big Data and Machine Learning and more Study notes Operating Systems in PDF only on Docsity!

Google Cloud Big Data and Machine Learning

Google Big Data Platform

Google Big Data solutions will help to transform the users and their business experiences through data insights which are known as Integrated Serverless Platform.  Google Big Data solutions are a part of GCP Services that are fully maintained and managed. You only pay for the resources that you consume.  The following services are offered by Google Big Data Platform and are integrated to help create custom solutions: o Apache Hadoop is an open-source framework for Hadoop, which is based on the MapReduce programming model. o MapReduce model consists of Map Function, which runs with a large dataset to generate intermediate results. By taking these results as input, Reduce Function will produce the final output. o Along with Apache Hadoop, there are other related projects like Apache Pig, Hive and Spark. o On Google Cloud Platform, Cloud Dataproc can be used to run Hadoop, Spark, Hive and Pig.

Cloud Dataproc

 When you request a Hadoop Cluster, it will be built in less than 90 seconds on top of the VM. Scaling can be done up and down based on the processing power.  You can monitor the cluster using Processing Power.  Running the clusters in On-premises will require hardware investment. But, running them in Dataproc will allow you to pay only for the hardware resources that you use while creating the cluster.  Cloud Dataproc is billed per second, and GCP stops the billing once the cluster is deleted.  You can also use Preemptible instances for the batch processing to save costs.  Once the cluster consumes the data, Spark and SparkSQL can be used for data mining.  You can also use Apache Spark Machine Learning Libraries to discover patterns through Machine Learning.

Cloud Dataflow

Cloud Dataproc is suitable when you know your cluster size. But if your cluster size is unpredictable or your data shows up in real-time, then your choice should be Cloud Dataflow.

 Cloud Dataflow is a managed service that allows you to develop and execute a large range of processing patterns by extracting, transforming, and loading batch computation or continuous computation.  Cloud Dataflow is used to build Data pipelines for both Batch and Streaming data.  It is used to automate processing resources, free you from operational tasks like Performance optimization and Resource Management.  Cloud Dataflow can read data from BigQuery, process it, apply transforms like Map operations and Reduce Operations and write it to the Cloud Storage.  Use cases include Fraud Detection and Financial Services, IoT Analytics, Manufacturing, Logistics, HealthCare and so on.

BigQuery

Say you possess a large dataset, and you need to perform ad-hoc SQL queries, then you need to go for BigQuery.  BigQuery is Google's fully managed, low-cost analytical data warehouse that has Petabyte storage.  You can get data into BigQuery either from Cloud Storage or Cloud Datastore and can stream that to BigQuery up to 100,000 rows per second.* You can perform super-fast SQL queries and read, write data to BigQuery through Cloud Dataflow, Spark and Hadoop.  You only pay for the queries that you are running.  When the data in BigQuery reaches 90 days, Google will automatically decrease the storage price.  BigQuery has an availability of 99.99%

Cloud Pub/Sub and Cloud Datalab

If you are working with events in real-time and you need a messaging service, then Cloud Pub/Sub will help you in the following ways:  Cloud Pub/Sub is a simple, reliable, and scalable foundation for stream analytics. By using it, you can build independent applications to send and receive messages.  Pub/Sub stands for Publishers and Subscribers.  The application will publish their messages to pub/sub, and the subscribers who are subscribed to them will receive the messages.  Cloud Pub/Sub can also be integrated with Cloud Dataflow.  Cloud Datalab helps to explore the data, and it can also be integrated with multiple GCP Services like BigQuery, Cloud Storage and Compute Engine.

 Cloud Natural API provides different natural language technologies to developers around the world by doing Syntax analysis, identifying verbs, nouns, adverbs, adjectives, and it can also find the relationship between words.  Cloud Translation API can convert a simple arbitrary string to a supported language through a simple interface.  Cloud Video Intelligence API helps to annotate videos in different formats. You can use it to make your video content searchable.

Hands-on scenario

John is performing few tasks on GCP environment but he is stuck at a place and could not proceed further. Assist John to complete the following tasks.  Login to GCP with the provided credentials.  Create a Google cloud virtual network, subnet, firewall rule and compute instance with the name as per your choice.  Create a Google cloud storage bucket with the name as per your choice and execute the command 'gsutil cp gs://cloud-training/gcpfci/my-excellent- blog.png my-excellent-blog.png' to retrieve an image.  Copy the image to your newly created bucket.  Create Google cloud SQL instance and SQL user.  Enable the APIs like 'Kubernetes Engine API' and 'Container Registry API'.  Start a Kubernetes Engine managed by Kubernetes Cluster with the name asper your choice and configure it to run 2 nodes.  Launch a single instance of the Nginx container (with version 1.10.0) and expose it to the internet using target port 80. Then scale the number of pods to 3 and confirm that the external IP Address is not changed.  Now, deploy the Guestbook application to App Engine. To do so, Clone a source code repository with a sample application from 'https://github.com/GoogleCloudPlatform/appengine-guestbook-python'. View your deployed application using gcloud command.  Create a new dataset called 'logdata' and set the data location as 'US'.  Create a new table in the logdata dataset and perform query operations on data using 'bq' command.  Finally, delete all the resources created as part of this hands-on.

Conclusion

Learn as if you were to live forever -Mahatma Gandhi In this course, we have discussed the following topics that laid the foundation for Google Cloud Platform:  Google Cloud VPC and Compute Engine  Google Cloud Storage and Bigtable  Cloud SQL, Spanner, and Datastore  Containers, Kubernetes and Kubernetes Engine

 App Engine Standard and Flexible  Cloud Development, Deployment and Monitoring  Google Cloud Big data and Machine Learning

Google Cloud Big Data and Machine Learning, Study notes of Operating Systems

Related documents

Partial preview of the text

Download Google Cloud Big Data and Machine Learning and more Study notes Operating Systems in PDF only on Docsity!

Google Cloud Big Data and Machine Learning

Google Big Data Platform

Cloud Dataproc

Cloud Dataflow

BigQuery

Cloud Pub/Sub and Cloud Datalab

Hands-on scenario

Conclusion