Pratical Course, SS2012
Faculty of Physics and Astronomy, University of Heidelberg


Statistical Methods (UKSta)

Fridays 09:00-13:00, ten slots between 20 April and 13 July 2012
CIP-Pool (Computer room) in the Kirchhoff Institute for Physics (Neuenheimer Feld)

Dozent: Dr. Coryn Bailer-Jones (Email: calj AT mpia.de)
Assistant: Ronald Läsker

This course will this time be held on ten half days (Friday 09:00-13:00) during the summer semester, starting on Friday 20 April (the exact dates are listed below). Prior registration is necessary.

Overview
Prerequisites
Course formalities and registration
Syllabus
Textbooks
(Semi-)popular books
Podcasts
R stuff

Overview

This is an introductory statistics course for students in physics. It is a computer-based course, in which you will learn not only the principles and methods of statistical analysis, but will also put these into practice using a range of real-world data sets. The aim is to provide a basic understanding of data analysis using statistics and to provide tuition in using standard tools. Derivations will be avoided and mathematical theory will be kept to a minimum, although maths will be used to show the origin of and connections between the statistical methods and with probability. The course will be held in English, but discussions can be held in German.

Prerequisites

No knowledge of statistics beyond Abitur/high-school level is assumed. The course will make exclusive use of the R software package. Prior experience with this is not necessary, but you are strongly advised to get familiar with it before the start of the course. (It is freeware and easily be installed on Mac, linux and Windows computers.) The R web page gives links to manuals, tutorials, FAQs etc.

Course formalities and registration

This course is most appropriate for students entering their third semester (or later). You must be a matriculated member of Heidelberg university to participate. The course counts for 3 LP (Lesitungspunkte) and has an estimated workload of 90 hours, of which 50 hours are to be done as homework. For more details see the Physics BSc handbook (Modulhandbuch). (Note that the syllabus below supercedes that in the Modulhandbuch.)

There is no examination. To obtain the LPs you must attend the whole course and complete/submit all of the homework completed to a sufficiently high standard.

Due to the limited number of computer consoles, the number of participants is limited to 25 (one person per console). In order to participate in the course you must register in advance by email. Please send me the following details:

Surname, Forename, Email address, Matriculation number, Semester you are in, 1-2 line summary of your post-school experience of statistics (e.g. courses taken)

Places will be allocated on a first-come first-served basis and the registration is binding. (If you have to cancel for reasons outside of your control, please inform me as early as possible so that I can inform someone on the waiting list.) I will not acknowledge registrations, so assume you that have a place if you do not hear from me.

Syllabus

Syllabus

Each day will comprise three parts: (1) presentation and discussion of the homework from the previous day, plus discussion of any issues; (2) a lecture on a new topic; (3) computer-based exercises on the new topic. The homework will be a mixture of computer-based and paper-based work. Each of the ten days concerns a different topic. After each lecture, the script/notes will be availble. The homework exercises will be provided separately.

These lecture notes are no longer available. They are superceded by those used in the 2013 version of this course.

  1. Introduction and probability (homework)
  2. Statistical models and probability distributions (homework) (notes on the Sally Clark cot death case)
  3. Estimation, errors and uncertainty (homework)
  4. Orthodox hypothesis testing (homework) (supplementary script 10A on hypothesis testing)
  5. Linear models and regression (homework) (supplementary script 10B on categorical regression)
  6. Binomial and Poisson processes (homework)
  7. Likelihood-based (Bayesian) parametric modelling (homework)
  8. Bayesian modelling using Monte Carlo methods for sampling and marginalization (homework)
    (monte_carlo.R) (08_sampling_marginalizing.R)
  9. Nonlinear and nonparametric methods (homework) (09_nonlinear_nonparametric_methods.R)
  10. Use and abuse of statistics

The course will take place on the following dates

  1. 20 April
  2. 27 April
  3. 11 May
  4. 18 May
  5. 25 May
  6. 15 June
  7. 22 June
  8. 29 June
  9. 6 July
  10. 13 July

Textbooks

I recommend that you get hold of an introductory statistics text to use during this course. There are many around, varying in their scope, level, emphasis and quality. The course does not follow single book, but I provide a summary of a somewhat random sample. The course focuses on use of statistics in the physical sciences, so many even basic methods in the social sciences will not be covered: you may want to take this into account when buying a book. There are several texts which examine specifially the use of R in statistics, which is useful, although these tend to be bit too recipe-oriented to obtain a proper level of understanding. Some of these are briefly reviewed on the R web site.

Most of the books listed below can be inspected on amazon.de and several are in the University Library.

Barlow, Statistics
A classic. This is a well-written introduction with some useful mathematical background and simple derivations and good descriptions. It is written for physics students, so it even has a chapter titled "Errors". I can recommend it if you want to go beyond just having recipes (which you should), in particular as it contains derivations which Crawley, Everitt & Hothorn and Dalsgaard omit. Like most introductory statistics text books, it takes a very orthodox or frequentist approach (probability only appears in chapter 7!), which can make the different topics seem like set of disconnected techniques. The book also demonstrates a lack of understanding of Bayesian statistics.

Crawley, Statistics. An Introduction using R
This text emphasises statistics for biological and to some extent physical (but not social) sciences. It has a reasonable balance between explaining the methods and demonstrating them in R. While there are examples, there is more of an emphasis on principles and the basic maths than there is in Everit & Hawthorn or Dalgard, for example. Indeed, the maths is very basic and many methods are not properly explained (the course will go beyond this level). However, it is visually appealing and has the advantage of being relatively cheap. Like most statistics books, it presents statistics in the traditional way (look at the Table of Contents),

Dalgaard, Introductory Statistics with R
An introduction to both R and statistics. The mathematical treatment is limited and it takes a somewhat "recipes"-like approach. As the title suggests, R takes a central role. Includes exercises and answers.

Everitt and Hothorn, A Handbook of Statistical Analyses using R
R takes quite a very central place, with lots of examples, data sets (and perhaps a few too-many screen dumps). As the title suggests, this is a guide to using R for statistics rather than a book from which you can learn statistics. Moreover, it covers several topics which are not typical for an introductory statistics course (and which we won't cover). It is as R-centric as Crawley and Dalgaard but a bit more advanced.

Gregory, Bayesian logical data analysis for the physical sciences
A good introduction to both the principles and practical application of Bayesian methods. One of very very books giving a broad introduction and guide for physical scientists (there are lots more such books for social scientists and specific analytic models). He uses Mathematica to illustrate the method. If you only look at one book on Bayesin methods, look at this one.

Jaynes, Probability theory
E.T. Jaynes was one of the main proponents of Bayesian inference. This is a a rather unconventional book describing numerous elements of Bayesian probability theory and inference, ranging from the basics through pratical examples to funadamental philosophical discussions. This book is unconventional and even polemical in places, and is probably not appropriate for a first exposure to Bayesian inference. But it contains some very thought-provoking discussions.

Mackay, Information theory, inference and learning algorithms
Not a traditional statistics book, and not a first book for learning Bayesian inference from, but a great book for learning about inference both in principle and in practice. He has a great didactic style, and this book contains some very illuminating examples. Also look here for a good introduction to MCMC. Mackay and CUP have done us a great service by making the book available online.

Maindonald and Braun, Data Analysis and Graphics using R
This is essentially a handbook for using R for statistical data analysis rather than a book from which to learn statistics. It is similar in approach and coverage to the clasic book of Venables & Ripley (see below), in that it also covers what one would call machine learning methods (e.g. trees, discriminant analysis), but at a slightly lower level. It contains very little mathematics. At 26cm x 18cm x 3.5cm, it won't fit in your pocket.

Sivia, Data Analysis. A Bayesian Tutorial
The first edition was an excellent introduction to data analysis in the Bayesian perspective. (A new second edition adds three more chapters.) I recommended it if really want to understand what statistics is and how it relates to probability theory, rather than just learn a bunch of frequentist recipes. That is, don't look in here for p-values and Neyman-Pearson hypothesis testing. It includes numerous examples which are analytically solvable, but covers less on the numerical solutions. It goes well beyond the scope of the course. It does not cover R or other packages.

Sachs, Angewandte Statistik. Methodensammlung mit R
A very detailed and mathematical introduction to statistics. It contains a lot more than you'll need for the course but the level of mathematics is not as high (or as offputting) as first appearances might suggest. R is used to illustrate the statistics (rather than the other way around, as is the case is some other books). With problems and solutions. Available online via the University Library (you can download the whole book as PDF). I've not used this book, but judging from (a) the Forward, (b) the lack of virtually any reference to Bayesian statistics or Richard Cox or Harold Jeffreys, this is an unashamedly frequentist approach to statistics. You have been warned!

Toutenburg and Heumann, Deskriptive Statistik and Induktive Statistik
This pair of books - in German - gives a detailed introduction to statistics and R from a somewhat mathematical perspective. It goes into more theory and depth than you'll need for this course. Lots of examples and solutions. I've not used it myself.

Venables and Ripley, Modern Applied Statistics with S (MASS)
"S" is essentially just another name for R. This books provides a very good introduction to R and its use for both basic and advanced data analysis. However, it assumes the reader is already reasonably familar with the techniques, so this is not a book which can be used alone to learn basic statistics. It goes well beyond the course, covering also topics such as GLIMs, neural networks and spatial statistics. The accompanying R package "MASS" contains many functions which will be used in the course.

Verzani, Using R for introductory statistics
Quite R-oriented and rather (too) basic. It's essentially an R guide rather than a statistics text. Available online via the University Library as an e-book.

(Semi-)popular books

Here is a sample of popular or semi-popular books on probability which I have read and which I can recommend to anyone interested in how probability can be used in every day life. I don't necessarily agree with everything written in these books (but with much!).

Evans, Dylan, Risk Intelligence
A study of how we (should) use simple probability theory in everyday life to help us assess risks and make decisions. Evans' thesis is that many people, regardless of intelligence, have poor risk intelligence, i.e. are not very good at assessing probability, risk, expected gains and losses. This is a very readable and insightful book.

Gigerenzer, Gerd, Reckoning with Risk
A look at how uncertainty and probability is represented and, more often, misrepresented in everyday life: in the media, in law, and especially in medicine. He guides you through interpreting probabilistic information, and how you can use this correctly to make informed decisions. He has some very interesting examples.

Kahneman, Daniel, Thinking, fast and slow
A collection of very interesting insights - and results of experiments and surveys - into how we think about probability and statistics. He looks as how people actually assess information and make decisions. One of the main theses is that our intuitive brain is rather poor (in particular, biased) at probabilistic assessments. Very readable, and much of it is convincing.

Podcasts

More or Less
This is an excellent BBC radio programme - also available as a podcast - on statistical issues in the media. To quote the BBC web site Tim Harford investigates numbers in the news. Numbers are used in every area of public debate. But are they always reliable? Tim and the More or Less team try to make sense of the statistics which surround us. Strongly recommended. Some of the stories are also available in written form at the More or Less website.

R pages


Coryn Bailer-Jones, calj at mpia.de
Last updated 24 August 2012