
















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This document introduces the concept of data science and its importance in today's world. It discusses the various applications of data analytics and the 5 Vs of big data. It also covers the data science life cycle and the importance of data preparation. the importance of Python in data science and introduces the Pandas library. It also covers the various operations that can be performed on data using Pandas. the problems associated with dirty data and the importance of data preprocessing.
Typology: Schemes and Mind Maps
1 / 56
This page cannot be seen from the preview
Don't miss anything!
SCHOOL OF mechanical EGINEERING AND TECHNOLOGY
Exciting new effective applications of data analytics e.g., Google Flu Trends: Detecting outbreaks two weeks ahead of CDC data New models are estimating which cities are most at risk for spread of the Ebola virus. Prediction model is built on Various data sources, types and analysis.
Graph Data 4 Lots of interesting data has a graph structure:
user graph)
What can you do with the data? 5
From Alex Bayen, UCB
DATA SCIENCE – WHAT IS IT?
Data Science – A Definition 9 Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products.
Jeff Hammerbacher’s Model 11
Data Scientist’s Practice Digging Around in Data Hypothesize Model Large Scale Exploitation Evaluate Interpret Clean, prep
Data Science: Getting Value out of Data
Data Science: Getting Value out of Data
Data Science: Getting Value out of Data
Why the Increased Interest in Data Science?
Applications
Contrast: Databases Databases Data Science Data Value “Precious” “Cheap” Data Volume Modest Massive Examples Bank records, Personnel records, Census, Medical records Online clicks, GPS logs, Tweets, Building sensor readings Priorities Consistency, Error recovery, Auditability Speed, Availability, Query richness Structured Strongly (Schema) Weakly or none (Text) Properties Transactions, ACID* CAP* theorem (2/3), eventual consistency Realizations SQL NoSQL: MongoDB, CouchDB, Hbase, Cassandra,… ACID = Atomicity, Consistency, Isolation and Durability CAP = Consistency, Availability, Partition Tolerance