Gianluca Baio's blog: March 2014

Friday 21 March 2014

The sampling frame is a list, but not every list is a sampling frame

Yesterday and today, I spent some time marking the in-course assessment (ICA) for my course (the teaching term is over next week $-$ yay!).

The course is called "Social Statistics" and it's intended to deal with surveys and sampling. However, since I inherited 3 years ago, I've tried to include more material on missing data and some stuff about clustering too, with a view to teaching some modelling.

For the ICA, this year I decided that I would randomly group the students and have them do a quick survey. Admittedly the time was quite short (only 1 week from assignment) and when they got the ICA we were basically only halfway through the course, so they didn't really know about important stuff (such as more advanced sampling methods, or sample size calculations).

All in all, I was impressed by the creativity of most groups in selecting the topics for their surveys. Some groups did a very good job getting all the parts right $-$ stuff like realising that you can't really extend your inference to a much larger population if you only have a small convenience sample (which may still be a reasonable choice, given time & resources constraints), or going out of their way to find a proper list so that they can use simple random sampling.

A few groups, however, used the list of emails for all the students enrolled in the class (which could be a sampling frame $-$ if the target population were the class!) as it were a sampling frame for a much larger target population (eg the whole of UCL students).

Thursday 13 March 2014

Canada et al

1. Today was the first day of our course on Bayesian methods in health economics. After my lecture on intro to health economics, Chris has given 2 lectures on Bayesian methods and their implementation in BUGS and then Richard has talked about Bayesian analysis of individual-level data (on costs and then both costs & benefits). In between the lectures, we did BUGS-based practicals $-$ so all in all it was quite heavy on the participants. But nobody gave clear signs of imminent crisis (in fact, we've had quite a few interesting questions!)...

2. We're being extremely lucky, weather-wise. We're told that last week it was around -25C (and it still shows: the river Saskatchewan is completely frozen $-$ I mean: solid!), but now it's quite nice and pleasant $-$ sunny and around 10/15C. Of course, Canadians are in mid-August mood and we've seen quite a few kids in shorts. I wouldn't quite go as far as that, but we can't really complain!

3. I'm told of a nice review of BMHE, which is just appeared on Biometrics. The author of the review says

The book seems to be suitable for researchers and practitioners who want to learn and apply statistical methods to health economics. Also it can be a good text for graduate courses in statistical analysis of health economic data. The author tries to keep mathematics at a low level and provides many interesting figures and tables for the readers with weak mathematical / statistical background. It provides step-by-step guidance to practical application of the Bayesian methods by using popular statistical software R and BUGS/JAGS. This would be very attractive to practitioners for they can easily implement Monte Carlo simulation methods necessary for Bayesian inference without fear.

4. Quick update on me getting all emotional for no reason while on an intercontinental flight: nothing to report. I struggled a bit at the end of this episode of How I met your mother, but nothing major $-$ not even a real tear. In any case, to avoid embarrassment, I stopped watching TV.

Monday 10 March 2014

Man at work(-ish)

Perhaps one could argue that the obvious, manly activity to do at the weekend when you're home alone is to put and organise stuff in the garage. Well, I was home alone last weekend and my very own version of this was to arxiv the first paper coming out from our research on the regression discontinuity design (RDD) $-$ I know: probably *not* so manly. I did watch rugby and football, though....

The main of points of the papers are these:

1. How and why the RDD can be effectively applied to primary care data. The RDD works when there is some sort of external guideline that decides the allocation to some intervention $-$ drugs are often regulated so that patients with a certain profile should be given them (although, as we discuss, this is often a lot less clear cut...);
2. The implications of including genuine prior information in such an analysis. In our case study (prescription of statins), there's typically a lot of evidence coming from RCTs; and this may be the case in other areas where a recommendation exists to regulate prescriptions.

I think the plan is to explore next a few interesting (both methodological and substantial) matters, such as how this can be extended to non-continuous outcomes, or used to identify the "optimal" threshold for prescription, based on available primary care data (in addition to RCTs evidence).

The paper can be downloaded here.

Sunday 9 March 2014

Money(proper foot)ball?

Probably the best football computer game ever

This is an interesting (although a bit overused, of late) topic. In some quarters, we statisticians are all akin to "moneyballs" (by the way: I should say I haven't read the book or watched the movie $-$ but that's by design, as I suspect I wouldn't really like either. Also, I've never understood why the Americans call theirs "foot"ball, when it's mostly played using the hands...) and are trying to take over the world by senselessly reducing every possible problem in the world to estimating proportions...

As any good old decision problem, I think football can be aided, not necessarily managed by formal stats. And in any case, for this to happen, one would need to assume rational behaviour on the part of the actors (chairmen, managers, players) $-$ not really their cup of tea, though, rationality...

Monday 3 March 2014

Issue with thinning in R2OpenBUGS vs R2jags

While preparing the practicals for our course at the University of Alberta, I've discovered something kind of interesting. I'm sure this is nothing new and actually people who normally use both OpenBUGS and JAGS have already figured this out.

But since I normally just use JAGS, it took me a bit to see this, so I thought I should post about it...

So: the issue is that when running a Bayesian model based on MCMC (eg Gibbs sampling), often it helps to improve convergence if the chains are "thinned" $-$ basically, instead of saving all the successive iterations of the process, only 1 on $s$ are stored and this generally reduces autocorrelation.

R2jags (which is the R library to interface R and JAGS) lets you select the total number of iterations you want to run, the number of "burn-in" iterations (which will be discarded) and the thinning factor. So a command
m <- jags(..., n.iter=20000, n.burnin=9500, n.thin=21, n.chains=2)
will generate two chains, each with 20000 iterations, discard the first 9500, which of course leaves 10500 iterations, and then save only one every 21 of these $-$ a total of 500 iterations per chain.

Conversely, R2OpenBUGS thinks in slightly different terms: as the help (which to be fair I've never bothered reading...) says "The thinning is implemented in the OpenBUGS update phase, so thinned samples are never stored, and they are not counted in n.burnin or n.iter. Setting n.thin=2, doubles the number of iterations OpenBUGS performs, but does not change n.iter or n.burnin".

So the command
m <- bugs(..., n.iter=20000, n.burnin=9500, n.thin=21, n.chains=2)
will actually save 21000 iterations altogether (10500 per chain).

I realised this because the same relatively simple model was taking for ever when run using OpenBUGS and was really quick in JAGS $-$ no kidding!...

Sunday 2 March 2014

Fun with flags

The Scotland independence referendum is approaching (relatively fast), and so are all sorts of related, very important issues, like what would the new UK flag be, if Scotland decide to leave?

The Guardian has taken a poll and apparently these two are the preferred options:

I like the comment that option no. 2 looks like the UK had annexed Italy, except that we would never tolerate such a hideous flag $-$ I would like to be insulted at the suggestion that Italians are so vain and only care about looks. But the flag really is ugly, I think...

Gianluca Baio's blog