Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Statistics in Foundations of Data Science: Understanding Data and Making Inferences with R, Summaries of Statistics

European School of Economics (ESE)Statistics

An introduction to the role of statistics in data science, focusing on the COMP6235 Foundations of Data Science course taught by Markus Brede at the University of Southampton. the importance of statistics in data science, including data description, comparison, and figuring out what is special about given data. It also introduces the R package as a tool for statistical analysis and provides resources for learning R. intended to give students a basic understanding of statistics and familiarize them with R, but not require memorization of every technique.

Typology: Summaries

2021/2022

Uploaded on 09/27/2022

globelaw 🇬🇧

4.2

(43)

323 documents

1 / 18

This page cannot be seen from the preview

Don't miss anything!

Statistics part of

COMP6235

Opening lecture

Markus Brede, mb8@ecs.soton.ac.uk

(some material used here was

developed by Jason Noble)

Partial preview of the text

Download Statistics in Foundations of Data Science: Understanding Data and Making Inferences with R and more Summaries Statistics in PDF only on Docsity!

Statistics part of

COMP

Opening lecture

Markus Brede, mb8@ecs.soton.ac.uk

(some material used here was

developed by Jason Noble)

Some background ● (^) Why do you need statistics in a Foundations of Data Science course? Several aims of data science are facilitated by statistics: ● (^) Describing given data: We will discuss are distributions, mean/median values, variance, and co-dependencies between data sets. ● (^) Figuring out what is special about given data: This needs comparison to reference models which are often at the basis of statistics – e.g. Are data more/less varied than expected? ● (^) Comparing data: is one set of data different from another?

Aims of the stats part We want you to achieve the following: ●Give you some idea of what stats is about and familiarise you with some basic tools you might find useful for data science (but this is only a starting point!) ● (^) Familiarise you with the R package – a very common stats package ●We don't need you to have to have memorized every technique, but we want you to know what you need to look up and why. ● (^) Approach of the lectures is very experimental and hands- on, the aim is to enable you to use basic stats

Where to get module information ● (^) I have previously taught a stats module for PhD students, the old course information can be found here http://users.ecs.soton.ac.uk/mb8/stats/stats.html if you are keen to learn more. ● (^) The current slides and coursework information is available here: http://users.ecs.soton.ac.uk/mb8/datascience/datascience.html or also linked from the course wiki: https://secure.ecs.soton.ac.uk/noteswiki/w/COMP6235/1718#S lides

Online material for R R is actually a full-featured programming language. We will mostly be using it as an advanced statistical calculator. In getting to know R better, there's a lot of online help: ●The R-project home page: http://www.r-project.org/ ●Jason Noble's crash course: http://users.ecs.soton.ac.uk/jn2/simulation/introToR.html ●The "Quick-R" guide: http://www.statmethods.net/stats/index.html ●Intro to plotting in R: http://www.people.carleton.edu/~lchihara/Splus/RPlot.pdf ●The ggplot2 package for potentially nicer looking graphs: http://had.co.nz/ggplot2/

Online materials by other people We are also going to use various online supplements. There are some very high quality stats materials on the net, and I won't pretend I can do better than all of them. ●Khan academy: http://www.khanacademy.org/ ●David M. Lane's online course notes: http://davidmlane.com/hyperstat/index.html and http://onlinestatbook.com/2/index.html

An aside If anyone needs to build their enthusiasm for statistics and its uses, try listening to the amazing Hans Rosling for a while: http://www.gapminder.org/videos/the-joy-of-stats/

What is statistics all about? ●Important distinction between the related areas of statistics and probability. ●Probability says "I've got a data generating process" (e.g., throwing two dice and adding the result), "now tell me what sorts of outcomes I can expect from this data generating process." ●Statistics is the inverse. Statistics says "I have some outcomes" (i.e., data), "Now what can I infer about the process that generated them?".

What is statistics all about? ● (^) We reason that the real world result either is or isn't close enough to the hypothetical one to make us suspect that the hypothetical world's data-generating process is actually a good description of the real one. ● (^) Once you "get" that inferential strategy, all of statistics starts to make a lot more sense.

What is statistics all about? ●Remember that some of statistics is convention: e.g., why are we so interested in the squared differences from the mean? ●Not everything is exact, there's often more than one way to do things. ●The statistical tests we favour might have been different if we'd had a different history of statistical development. ●There's a pragmatic rather than a pure spirit about statistical thinking.

Topics we will cover ●The R package ●Probability distributions and how to describe them (measures of central tendency and spread) ●Sampling and the Central limit theorem. ●The normal distribution, confidence intervals. ●Correlation coefficients, R-squared values, and what they mean

Assessment There is a stats coursework worth 15% of the marks of the overall module. ● (^) I will give you a data set – cf. Course web page for details ● (^) I will ask you a couple of questions about the data set ● (^) You are supposed to handle the data set using R and write a report that answers the questions. ● (^) The coursework is due on: Nov 17 12 (noon), feedback by email in December (around Dec 8)

Statistics in Foundations of Data Science: Understanding Data and Making Inferences with R, Summaries of Statistics

Related documents

Partial preview of the text

Download Statistics in Foundations of Data Science: Understanding Data and Making Inferences with R and more Summaries Statistics in PDF only on Docsity!

Statistics part of

COMP

Opening lecture

Markus Brede, mb8@ecs.soton.ac.uk

(some material used here was

developed by Jason Noble)