Gianluca Baio's blog: September 2012

Saturday 29 September 2012

The nineties are not cost-effective

After a few errands, this afternoon I took Marta to the station (on her way to meeting some friends in London) and got back at home to prepare my presentation for next week talk at a conference in Turin (the last of the 4 to which I've been invited this year $-$ personal best, so far).

They asked me to talk about Bayesian methods in pharmacoeconomics, which of course is just up my street and so I said yes. I've got lots to draw from and to say, but the time is not much (I only have 30 minutes as the invited session is pretty packed; Andrea Manca will be there too, so I think it'll be good).

I've summarised the main points and because probably the audience will be made mostly by clinicians and practitioners, I've tried to avoid formulae as much as possible. I've not finished yet, but so far I'm fairly happy with what I've got, especially on sensitivity analysis, for which I've prepared flashy graphs and animations that should make the two tables on the next slides less boring and more comprehensible.

On the negative side, it took me far too long (and certainly longer than expected), mostly because 10 minutes after starting I had the brilliant idea to put some music on YouTube; I don't know why, but the first song I searched was this:

Great tune! But of course, for some weird mechanism, that completely filled my mind with songs from the 90s/early 00s. And so, while still occasionally modifying the presentation and checking that the LaTeX formatting was OK, I spent a lot of time searching them (probably even more than actually listening to them!).

But it gets worse, because as I opened a new video, YouTube kept suggesting others I might like, for example this:

and this

which in turn made me think to even more I thought I completely forgot (some of which quite cheesy, I have to admit).

So, at the end of it, I can conclude with indisputable scientific evidence that the Nineties are indeed not cost-effective. Unless, of course, I used this example in the talk to introduce cost-effectiveness analysis...

Making babies with Markov models

In the last couple of weeks we've been hard at work to extend the HPV model. In the original version, we were considering a cohort of 280 thousand 12 years old "virtual girls"; we simulated a follow up of 90 years, in which individuals could exit because of deaths or move around the clinical states.

Now we're trying to make the cohort dynamic, ie allow new entries, as well as exits. The way we're doing it is by allowing the original women to have babies; when they get old enough, the children will enter the model as well. Thus, effectively, we'll account for two generations.

But that's not easy at all. One big problem is that, by definition of the Markov model, once "individuals" coming from different states have been merged into a common one, it is basically impossible to discern where they were coming from.

One good solution is to use "tunnel" nodes (eg in the graph above patients may die from cancer at different points in time, but to account for the fact that their survival probability is different depending on how long they have been diagnosed with cancer, there are 3 different cancer states). For now, we still have it under control, but using this strategy the number of nodes in the model tends to become very large and thus it can be problematic to estimate the actual transitions.

This has got me thinking that it would be nice to build some form of "object-oriented Markov models"; something like a set of "modules" which in fact contains smaller Markov models. The main Markov model would then connect these modules and "individuals" would transition across them (and of course among the nodes that define the inner Markov models within each compartment). I once did something like that using Bayesian networks.

It's still a bit hazy in my head (and in fact, I've not even looked if somebody has already done this!), but it seems like a nice idea.

Tuesday 25 September 2012

Twitter Economics

I think I've spent far too much time trying to find a suitable picture to go with this post (but I have to say I'm very happy with the final output!). Anyway: this is potentially quite interesting.

The Academic Health Economists' Blog are launching the Health Economics Twitter Journal Club. The idea is quite simple, ie to arrange a virtual meeting to discuss a relevant paper over a Twitter account. I suppose this should make comments and discussion quick, so that the conversation flows even if the participants are not physically in the same place.

As suggested at the webpage linked above, the inaugural meeting is this coming Monday (October 1st) and this time the discussion will be on this paper, which sounds quite interesting too.

The only two drawbacks (as far as I'm concerned) are that a) the meetings are at 8pm London (UK) time (which of course is done on purpose so that most people can participate, but still bothers me because it clashes with The Big Bang Theory); and b) so far I've made a point of not having a Twitter account, since in general I find it a bit pointless and by far the less interesting of the social media.

Truthful to my Tuscan nature, I tend to strongly support something and then despise its competitor(s), eg Linux (good) vs Apple (evil). So, because I've already have a Facebook account, I really shouldn't have a Twitter one... But I might change my mind just for this one.

Sunday 23 September 2012

Mixed treatment counterfactual Obama

While I was working on something completely unrelated to this, last Friday I came across a paper by Simon Jackman; I've known of his work for a few years (for example, I think his book is quite good) and I quite like the fact he's applying high level Bayesian stats to political science, something I'm interested too.

This paper (number 1 in the list here) is quite thought provoking, I thought, because it mixed a few topics I've been thinking about lately. In a nutshell, they are trying to estimate the "Obama effect" in the 2008 US election. The starting question is "Would the outcome of the 2008 U.S. presidential election have been different if Barack Obama had not been the Democratic nominee?" Now, this obviously has a counterfactual flavour; of course, Obama was the democratic nominee and thus (if, like me, you're that kind of person) you may think that the question does not even make too much sense, in the first place.

But this is slightly different than usual causal analysis (usually based on counterfactual arguments); first of all, the "Obama effect" (mainly defined in terms of the president's race and its impact on the US electorate) does matter both for Obama himself, in the next election, as well as for similar situations in other elections. In addition, the analysis is based on a standard hierarchical model (ie it does not use potential outcomes) and the authors are extremely clear and upfront about the assumptions and limitations in terms of their causal conclusions, which I think is really good.

The study is based on a mixture of external data and specific surveys in which the authors tried to assess the sample's preference for different, hypothetical head-to-head comparisons. For example, before the 2008 election, they repeatedly surveyed US voters asking who would they vote for if the next election involved different combinations of Democratic-Republican candidates (Bill and Hillary Clinton, Obama, Edwards and Gore vs Giuliani, Bush, Huckabee and Romney).

Effectively the design is based on mixed treatment comparisons, as not all the samples were given the choice among all the possible head-to-head pairs. This is quite interesting and is quite straightforward to deal with using Bayesian hierarchical models $-$ if some exchangeability assumptions hold, then different comparisons can inform each other, so that all the possible comparisons can be estimated (of course comparisons informed by less hard evidence will be associated with larger uncertainty, but that's entirely reasonable).

Wednesday 19 September 2012

Twinned

As of yesterday, I think, this blog is officially "twinned" with R-bloggers (this shouldn't be too surprising, seeing that they have been in my blogroll for a while).

I like the concept of R-bloggers, which effectively acts as a repository of posts, tutorials or comments about R. Of course, because it sort of covers all the possible areas of interest in which R could be applied, there is lot of stuff that may not be directly relevant to all visitors. But given the very large number of contributors, chances are you can find useful tips and tricks.

Monday 17 September 2012

INLA functions (yet again)

This links back to previous posts here and here. Earlier today, I had a quick chat with Michela (by email, actually) on this topic. In particular, she was trying to use the function I've written to compute summaries from the posterior distribution of the standard deviation (rather than the precision, given by INLA by default) on an SPDE (Stochastic Partial Differential Equations $-$ some description here) model.

As it turns out, together with the precision for the structured effects, in such a model the element $marginals.hyperpar contains also two other terms, which are specific to SPDE and, more importantly, can also be negative (or are always negative? I'm not quite sure about this). Thus, if you try to apply the function inla.contrib.sd to an SPDE model it basically starts yelling at you for asking to compute a square root of a negative number.

I have to say I'm no expert on SPDE (Marta, Michela and I discuss it too in our soon-to-come-out review paper on INLA, but Michela took care of this part); in fact, I hadn't even thought about it when writing the function, which I did to deal with my own more or less standard hierarchical model. However, I'll try to fix this and may be make inla.contrib.sd more general.

Thursday 13 September 2012

BCEA examples

I've prepared a document (which I've put on the website here, together with some scripts at this page) which I think is helpful, if you're trying to work out BCEA. I have never really thought of this, but I believe that when you write an academic software (or rather a library in this case) most of the times it grows out of a personal need. For example, I wrote BCEA while preparing a paper and as I went along I realised that it might be helpful in other circumstances too. But in doing so, you don't (or at least I didn't) realise immediately all the complications associated with making the software usable by other people. In this case what stroke me is that it's probably not enough to be a statistician to run BCEA.

Well, of course it helps a lot (and because the functions are not that complicated, it wouldn't take you long to figure it out!). But because the content-matter is quite specific you have to know what all the different functions actually are for before you can make sense of them. I realised this when I started thinking about a possible journal to which submit a paper discussing how BCEA works and what it does: on the one hand, a stats journal is probably not interested, because the stats behind it is not ground-breaking. But on the other hand a health economics journal may not be interested because computer programs are not their main business... So we'll have to figure something clever out!

Anyway, for the moment what I've done is to write a mini-manual describing one example in details. Specifically, I have some description of the actual model behind the economic evaluation (which to some extent one might even skip, if only really interested in learning how to use the package) and then a thorough discussion of how you work with BCEA (and kind of assuming that you've done the hard part $-$ ie developing the Bayesian model). However, I've put on the website also the files with the JAGS code and the R scripts to run it, so that, in theory, one can actually do the whole process (including the final economic analysis).

Wednesday 12 September 2012

League (kitchen) table

Yesterday we met our friend Paolo after work. We hadn't seen him for a while, so it was nice to catch up. We were in Soho and we ended up having a bite in this little place (I know: the website is not really helpful at the moment, but at least it gives the address...).

It is a very typical "vineria", the kind of place there are plenty of in Florence, which sell amazing (if slightly pricey) prosciutto, cheese and typical food from Tuscany, and of course all sorts of nice wines. The setting is also nice and very authentic, with "designer-ish wood tables", as TimeOut calls them.

Speaking of tables, Mark posted a link to the QS research producing a new ranking of world universities. [Of course, nobody really believes in these league tables, even though they are always widely reported in the media and institutional newsletters (to point out their flaws if you're coming out badly, or to unassumingly make the point if you're a winner)].

In this league table, UCL comes fourth (up from last year's seventh and second in Europe). Now, to go back to Florence and our proverbial campanilismo, the good news is that we are better than Imperial (personal rival, since Marta works there) and way better than King's (official rival), who are not even in the top 10. So I suppose this is the table to be taken seriously...

Tuesday 11 September 2012

ISBA/BioPharm

The new section on Biostatistics and Pharmaceutical Statistics of ISBA has just been established. It's quite an interesting idea and aims at bringing together statisticians from academia and the industry.

I believe that so far there're only 20 members (including, of course, yours truly); but some big names such as Don Berry, Peter Muller and David Heckerman are involved. "Bayes on drugs", like Christian Robert put it.

Monday 10 September 2012

Domino effect

I had quickly read something about this on the papers earlier today, and tonight I saw the episode of Dispatches, presenting the "school dinner scandal". There are several points that were raised by the documentary which are really, really interesting.

Perhaps the most striking is the story about the UK Secretary of State for Education Michael Gove. (In my very personal opinion, by just looking at the guy or listening to him talk, you can immediately see that he is a massive, massive Dickensian!).

Anyway, one of his policies involved exempting Academy Schools from nutritional standards (introduced by the last government, as Dispatches reports), promising that standards would not deteriorate. Well, the evidence seems to point decisively against the opposite conclusion with previously banned products such as junk/fast food and sugar-filled fizzy drinks, being reintroduced in school dinners.

On a separate note, it seems that fast food chains, eg Domino's Pizza, are increasingly seeking to open branches around schools, "luring" kids into getting their food from them, instead of from the supposedly healthy school canteens. (Of course, don't even get me started on this and I can't stress enough my utter disgust at such fake pizzas. On the other hand, of course Domino's are not the problem $-$ but certainly they're part of it, in my opinion). As it turns out, the note is not so separate, and this links back nicely to the Secretary of State who seems to have received donations of almost £50,000 from shareholders in Domino’s.

Last week I went to a seminar where the researcher presented preliminary results from Ten Top Tips (TTT), a study whose aim is to encourage life style changes to take in fewer calories through food and burn more calories through activity. Of course the problem is about the food culture, particularly in a country like Britain. Perhaps kids should be taught better so as to avoid them being "lured" in fast foods and interventions like TTT may just do the job (most certainly on a small scale, but still...).

But may be this is not the only kind of intervention; for example, New Zealand and Canada are (have) planning (effected) bans on junk food advertising, seemingly with positive effects. After all, similar strategies have been applied for smoking, which we know causes all sorts of problems: why shouldn't we try this for other things that we also know cause all sorts of problems?

INLA functions (continued)

I have polished up one of the two functions I've thought of implementing for INLA and it's now available in the development version of the R package. So if you've got INLA installed in your R version (see how you can do it here, if you don't), you can update it to the development version by typing

inla.upgrade(testing=TRUE)

This will add the new function inla.contrib.sd which can be used to express the uncertainty in the structured ("random") effects of your INLA model in terms of the standard deviation, instead of the precision (which INLA gives by default).

In the help, I consider a simple example: I simulate $N=12$ Binomial trials. For each, first I simulate the sample size to be a number between 80 and 100 (subjects)

n=12
Ntrials = sample(c(80:100), size=n, replace=TRUE)

Then I simulate the actual observations $y$ as Binomial variables depending on the sample size Ntrials and a probability prob which in turn depends on a variable $\eta\sim\mbox{Normal}(0,0.5)$, defined as $$\mbox{prob} = \frac{\exp(\eta)}{1+\exp(\eta)}$$

which I do with the code

eta = rnorm(n,0,0.5)

prob = exp(eta)/(1 + exp(eta))

y = rbinom(n, size=Ntrials, prob = prob)

At this point, I define a very simple hierarchical model where each trial represents a clustering unit (ie I assume a trial-specific effect). You do this in INLA by using the command

data=data.frame(y=y,z=1:n)

formula=y~f(z,model="iid")

m=inla(formula,data=data,family="binomial",Ntrials=Ntrials)

summary(m)

This produces an INLA object m and the standard summary is

Call:

"inla(formula = formula, family = \"binomial\", data = data, Ntrials = Ntrials)"

Time used:

Pre-processing Running inla Post-processing Total

0.20580840 0.02721024 0.78909874 1.02211738

Fixed effects:

mean sd 0.025quant 0.5quant 0.975quant kld

(Intercept) 0.1815634 0.1606374 -0.1371095 0.1815601 0.5003695 5.379434e-05

Random effects:

Name Model Max KLD

z IID model

Model hyperparameters:

mean sd 0.025quant 0.5quant 0.975quant

Precision for z 4.506 2.300 1.508 4.032 10.290

Expected number of effective parameters(std dev): 10.16(0.5718)

Number of equivalent replicates : 1.181

Marginal Likelihood: -56.18

Now, while the summary of the posterior distribution for the precision of the structured effect $z$ is of course just as informative, it is in general (more) difficult to interpret, because it is on the scale of 1/standard deviation. On the other hand, you can't just take take the reciprocal of the (square-rooted) summaries to obtain info about the posterior distribution of the standard deviation, because the transformation is not linear and thus it does not directly apply to the moments and quantiles.

One easy way out is to sample from the posterior marginal of the precision, transform the draws from this distribution to draws of the resulting distribution for $\sigma = \frac{1}{\sqrt{\mbox{precision}}}$ and then summarise that posterior marginal distributions. That's what inla.contrib.sd does

s <- inla.contrib.sd(m)

[You need to specify the INLA model, m in this case, and you can also input the number of draws you require from the posteriors $-$ the default is nsamples=1000].

The resulting object s contains two elements:

s$hyper

returns a table

mean sd 2.5% 97.5%

sd for z 0.5096883 0.125651 0.3132221 0.7948447

with the summary on the standard deviation scale. The other element s$samples contains the simulated values from the posterior, which can be used for example to draw a histogram of the distribution

hist(s$samples,xlab="standard deviation for z",main="")

which in this case produces the following graph.

Sunday 9 September 2012

Run keeping to keep running

To keep up with the Olympic spirit, and on Marta's insistence, I think I've started doing some running. I've gone twice in a week: last Saturday we first went to Kent to pick up some stuff we bought off Ebay (in the process we first stopped by Tunbridge Wells to pay our respects to the Reverend; then, we had some fish & chips in Rye) and then we we got back home went for a 35 minutes run.

As it turns out (shockingly!), fish & chips is not a good idea before running; in addition, I pulled a muscle about 100 meters after I started, which wasn't really pleasant. This morning I went again; my leg didn't hurt so much and I managed a 4.5km run in which, according to the fancy app I downloaded on my phone, I burned 300 calories.

I have to say running has never been a favourite activity of mine $-$ although when I was younger I used to do a lot of it when training with my football team (I'm afraid that some of my former team mates would probably disagree, as running wasn't the biggest part of my game...). But I'm intrigued by the technology that allows you to record virtually everything about every outing.

Being the data-freak I am, that's really the main motivation for me to keep this up!

Wednesday 5 September 2012

Olympic graduation

I've had a very intense day, yesterday. First, I went to the graduation ceremony. The main event was in Bedford Square Gardens

(that's the best pic I could find, and it doesn't really make it justice...). The cool thing was that, while students, parents and friends had to take their seat in the main room, we academics were directed in the Green room, which featured free drinks. It felt a bit like being a guest on Jonathan Ross' show (except people had better hairdo and nobody cheered us so much when we got in the main room).

Then they dressed us up; I didn't take a picture of myself, but they gave me a University of London PhD gown, something like these:

(luckily, I was wearing a pink-ish shirt, which was quite colour-coordinated). As soon as they dressed me, Tom, my head of department who was also there, came to me and asked me if I felt ridiculous enough. My answer was no, since I hadn't put my ridiculous hat on yet... When we all eventually went fully ridiculous (ie hats on), we entered the main room; that's the "academic processions", led on by a trumpet playing a medieval tune $-$ a bit like the Palio.

The actual ceremony basically consists of the students marching towards the podium one by one, giving a piece of paper with their name on to the Dean, who then announces it; then they proceed to shake hands with the Provost (in our case it was the vice-Provost), while everybody else applauds. Occasionally, a student would be a bit eccentric and wave a kiss to the audience, or something. This goes on for about 2 hours.

Slightly boring as it was (but I have to say it wasn't bad at all!), the very cool part was during the Dean's speech, in which he mentioned our research on HPV!
[To be perfectly honest, he had asked each department in the faculty to give him some research topics he could mention, so in a way we prepped him... But still, that was pretty cool!]

When the ceremony was over, I then rushed off to the Olympic Park. In the middle of the Olympics (when Marta was constantly high with the Olympic spirit and could not stop watching, regardless on the event that was actually on), we decided to try and get tickets also for the Paralympics. (We did get them, but of course I forgot that I'd already said yes to the graduation ceremony, which of course was on at more or less the same time).

I nearly ran my way to Stratford and got at the stadium at about 9.20, so I only stayed for about one hour; but it was enough to enjoy it a lot. The venue is really awesome

(as usual, the photo I took is not wonderful, but I've already excused myself for not being good at this).

It was amazing when David Weir won the gold in the 1500m. The whole stadium went mental $-$ you'd think by now people would be over this, given Team GB's amazing results at the Olympics and at the Paralympics so far; but, no: they are all extremely excited! It's just a shame that they didn't do the ceremony for this event.

The only downside to yesterday was that we got home really late and I was really knackered. But I'd say it was well worth it!

Monday 3 September 2012

Stan

I don't mean the weirdo that locks his girlfriend in the boot of his car and drives into a river, after having dyed his hair so he looks like that other weirdo Eminem (who, I have to admit, for some reason I kind of liked, back in the days).

I mean Stan the new software for Bayesian analysis developed by Andrew Gelman et al. The announcement of its 1.0 release has been widely discussed in several fora (including of course Gelman's own blog). Predictably, I liked Martyn Plummer's discussion.

The whole point of Stan is to be particularly efficient for problems involving large hierarchical models, where (possibly high) correlation among parameters typically creates problems for Gibbs sampling, leading to excruciatingly long compilation and running time and often to non-convergence.

To solve this problem, Stan uses Hamiltonian Monte Carlo (HMC), which I have to admit I don't know very well (but plan to read about in the near future). I believe that Mark has been working on this a lot. As Martyn puts it, HMC works by using a very clever blocking strategy, effectively updating all the parameters at once, which makes it so quick.

I've tried to play around with a couple of very easy examples and Stan does look impressive. What I want to do next is to translate one of my own models (originally run in JAGS) to see how things change and whether I could then do everything I normally do after I've fit my models (eg performing a health economic analysis with BCEA). I think a nice example would be the Eurovision model, which is a bit slow using standard Gibbs sampling.

All in all, I think it's really cool that we now have so many pieces of software that allow us to do Bayesian statistics. I was talking with Marta the other day and we thought that it's really exciting that you can switch from JAGS to INLA (and now to Stan) to deal with all the problems you have in real world applications. In fact, I think I should find a way to at least mention all these in my teaching, so that students are aware of all the possibilities.

Old habits...

While riding into work this morning, I was at a traffic light in Fulham, when I noticed the curious sign on the number plate of the scooter in front of me.

I think that if you click on the picture and it zooms in, you can distinctly see where that scooter comes from.

With that and the fact that it was sunny and reasonably warm, for a moment it felt like I was on my way to work from Antella to Careggi, once again...

Gianluca Baio's blog