## Wednesday, 31 December 2014

### New Year's fireworks

I have to thank Xi'an for this: as he rightly guesses in his comments to my original post (earlier this year $-$ well, for a few more hours still, at least!), my spam filter has worked a treat and if it weren't for him, I would have missed this incredible spectacle of pre-new Year's eve fireworks by Vincent Granville

In his blog, Granville takes on evil statisticians who constantly talk down on data scientists. He seems to have a particular grudge with Andrew Gelman, who, Granville says, is our "leader and only influencer".

What I found particularly amusing (and I needed this, as I'm still fighting off a nasty flu, so thank you very much, Xi'an!) is the bit when VC boasts about not even reading Gelman's publications, as they "are disseminated to a very small audience in obscure journals that pretty much no mainstream people read".

Now: I know that Google Scholar cannot be taken at face value and there are better ways of measuring how influential one's research or publications are. But you can see Gelman's numbers here. 14,000 citations on one book are sort-of suggesting a little less obscurity than that, I'd say. Sadly, Google Scholar can shed no light on how "mainstream" the people citing him are...

Happy new year!!

## Tuesday, 16 December 2014

### Lazy(?)

It's nearly the Christmas break and as I was writing the previous post (on our Workshop on cost-effectiveness thresholds), I just noticed the post-counter in the blog archive. While the decrease in the number of posts from 2012 to 2013 was minimal (98 to 90), this year I seem to have been fairly lazy (at posting, that is...) $-$ unless I frantically manage to write about 20 posts from now to the end of the year, of course, which I don't quite see happening...

I guess most of the "missing posts" (ie those I haven't written) are probably those about actually doing some stats $-$ which are fun to do, but also take time. For instance, the other day I found some data on Italian MPs attendance which I thought would be interesting to analyse, but then didn't have the time to actually do it. So, new year's resolution (slightly in advance): try and make some time for this.

By the way: while the one in the picture is not my cat, that's exactly what he's been doing pretty much for the whole day...

### NICE and the cost-effectiveness thresholds: Can good intentions compensate for bad practice?

That's the title of the workshop we've held at UCL yesterday (I'd mentioned it in a previous post). I think it went remarkably well (OK $-$ as I'm the organiser, I may be over-enthusiastic, but I really think it was a very good day!). Despite the fact that we've purposely limited advertisement, as we wanted to keep it simple to start with, we had a good turnout (about 30 people!).

All the speakers have agreed to make their talks available (which are here) and we've agreed that I'd try and summarise some of the main discussion points coming out of the presentations, so that we can perhaps move this forward, somehow. Here's my attempt (any inconsistency is obviously due to my poor recollection of the events!):
1. Matt gave a very interesting talk focussing on the international comparison (of 5 main countries); he started by discussing the different approaches at defining the cost-effectiveness threshold (which was also reprised by James in the second talk). Then, he moved to give some evidence of how in general agencies (with the notable exception of NICE) tend to resist a clear definition of what the threshold is. He presented some interesting work that has tried to elicit the underlying value of the threshold by analysing a set of past decisions in the Australian setting. What I found interesting of the Dutch case is the idea that the threshold can be set as a function of the disease severity, which made me think of potential links to my talk on risk aversion (see later). One of the interesting points of his discussion of NICE's case is the idea of a range of thresholds (rather than an absolute value). We had a little less time to discuss France and Japan, which are all interesting cases anyway (France because the formal use of cost-effectiveness methods is newly established; Japan because of the mix of private and public funding as well as the recent move towards formal inclusion of cost-effectiveness considerations, mainly driven by the need of capping health expenditure). Probably, in the end, the main message is to wonder whether strict adherence to a very specific guidance (such as NICE's) is a good idea. As I was listening to the talk, I also thought that in a more expanded version of this, it would be perhaps interesting to look at North America.
2. James's talk concentrated on different ways of determining the threshold, effectively (and quite amusingly) pointing out at the many flaws in virtually all of these (which in itself is evidence of how complex this problem is!). I found the discussion of the "historical precedent" method fascinating and the fact that this is actually used in some countries is kind of strange. Then he gave a very clear account of the "budget exhaustion" method, linking back to some applied examples (eg Netherlands and Ireland), showing the incredible range of variability present in many cases. I found his comments on why it may not be ideal to have a range of thresholds very sensible and while I was listening to the talk I kind of changed my mind about that 2 or 3 times... James also linked to the Claxton et al paper on value based assessment and its potential impact in setting some plausible values. I think this is a very important piece of work that we ought to consider very careful (if we are to discuss the problem in general terms). Finally, I think another very good point that James made is about the idea that in fact, all this procedure is dynamic --- setting a threshold based on current budget and then ranking interventions accordingly may need continuous revisions, which in turn may also lead to modifications of the threshold itself, to add to the complexity of the problem.
3. I gave arguably the most confused talk of the lot. The first part discussed the meaning of PSA and what are the quantities that should be involved (and why so). This, I feel, is something that I have thought about quite a lot and I think I have a clear idea of what the problem is and how it should be tackled. I was trying to lead this back to the main issue of selecting the threshold but I think Chris Jackson's comment that, as statisticians we kind of take that for granted, sums up nicely the fact that it is difficult to embed this aspect in the statistical model. In the second part of my talk, I have tried to (still confusingly) address the problem of risk aversion and its implications to PSA in general and the threshold in particular. I have presented a couple of potential utility functions that could be used to include a risk aversion parameter, together with a brief description of their characteristics. I liked James' comment that while difficult to think about it, it would be possible to elicit reasonable values of the risk aversion parameter, which in turn would make the analysis more applicable. While I was talking (and relating to the previous presentations) I also thought that may be risk-propensity is just as important (eg in cases where a disease is perceived to be so important as to be granted some sort of special treatment). Also, Matt asked the very fundamental question of whether decision-makers should be risk averse, in the first place $-$ I think nobody knew the answer to that one...
4. Finally, Mike presented to us his model addressing the impact of the several assumptions underlying the standard definition of the threshold. The model attempts at transforming this into some more complex object, embedding the ideas of "value" and "information" available to the (different) decision-makers. The model is quite complex, but Mike explained it brilliantly (one of my PhD students came to me after the talk still shaking with the excitement!). The idea is to try and include formally several aspects that characterise the fixed assumptions underlying the definition of the threshold (eg the divisibility of the technologies involved, the level of information available to the decision-makers and, crucially, the fact that the process is driven by sequential decisions made, in general, by a set of actors, rather than the vaguely defined "decision-maker"). Model complexity aside (which I quite liked), the results are also very interesting and show that in fact, the straight line in the cost-effectiveness plane can be thought of as a special case of a more general threshold which can assume very different shapes (including kinks and steps). The very interesting point made by Mike is that if you start accounting for value and information, then you (read: NICE) may find yourself in the situation where it may be cost-effective to replace an existing technology with something that, while non cost-effective in the conventional sense is still a better option. This is kind of intuitive, but the model actually formalises the intuition and produces thresholds that have suitable shapes to accommodate this situation.
As I said, I don't really know where we go from here. But it was a very interesting start to this discussion!

## Sunday, 7 December 2014

### The smartest guys in the room

I've just finished reading this book, which tells the story of Enron, the American power company who raised to incredible fame and generated ridiculous amounts of money to its shareholders $-$ mostly by cooking its books, through "creative accounting".

I found the book quite interesting, if a times slightly difficult and technical (but then again, I suppose that's the nature of this story, which spans over nearly 2 decades of American Corporate finance) $-$ it took me nearly 2 months to read it all!

Reading the book, you are obviously drawn to physically hate the protagonists and their greed $-$ at times it really feels like Enron (and the likes) are all that is wrong with the world. But equally, I couldn't help but admiring some of their business ideas, which most of the times were way ahead of their competitors.

I had got into the Enron story when I went to see this play a few years ago $-$ I vaguely remember the story but evidently, at the time I hadn't registered it for all its implications (which kind of annoys me, right now). Also, the story got me thinking about how most of our students are still in awe of a career in finance $-$ every time I tell one of my academic tutees about the many other possibilities for a statistician in areas other than that, they all look so surprised that such things even exist...

## Friday, 28 November 2014

As part of the newly established Statistics/Health Economics seminars that our group is now organising at UCL, we are preparing a very exciting event for December 15th (so basically just a couple of weeks away).

A few of us got to talk about several general issues and we came up with the idea of a one-day workshop exploring issues around the setting of cost-effectiveness thresholds in economic evaluations. One thing led to another and we decided to try and ambush Mike Paulden (who's done quite a bit of work on this) on his way back to England for the Christmas holiday (I would like to take full credit for this, but I think it was James to suggest it) $-$ thus the probably unusual timing for this!

As if this wasn't enough, we'd also taken advantage of Chris McCabe's ability for finding catchy titles and sort-of asked him to suggest one $-$ I think the end result was brilliant, so we named the workshop NICE and the cost-effectiveness thresholds: Can good intentions compensate for bad practice?

The semi-final programme is available on the website $-$ but the plan is to keep it as informal as possible, with lots of time for discussion during and after the talks. In addition, I will also talk about how including risk-aversion in utility functions may be linked to issues around the choice of cost-effectiveness threshold (if at all!). My talk will probably be (much) less "mature" than everybody else's $-$ I have been thinking about this for some time, but never had the time to fully clarify my thoughts... Hopefully I'll be forced to produce something vaguely reasonable in the next few days...

I should say that participation is free, but the space is limited, so if you are reading this and think you may be interested, please drop me an email, so we know what to expect!

### Bayes Pharma 2015 - call for abstracts

The organisation of the next Bayes Pharma conference is in full swing. We've confirmed the invited speakers and finalised quite a few of the details too. We've now opened the call for abstracts.

A flyer with some extra info is available from here. Finally, registration is open from here.

## Tuesday, 18 November 2014

### Secretary

Second time lucky, I've just been elected Secretary of the International Society for Bayesian Analysis (ISBA) Section on Biostatistics and Pharmaceutical Statistics

The aim of this specialised section of ISBA is to help network and federate under a common well-known "brand" all the initiatives to spread Bayesian methods and ideas, to solve problems in Biostatistics and its applications to the pharmaceutical industry.

I'm actually excited at the opportunity, not least because of the incredible group of officers. I think one of the tasks will be to make the Bayesian Biostatistics conference (which is usually held at MD Anderson in Houston) occasionally travel to other venues and increase the visibility and membership of the section. Hopefully, we'll be able to throw Bayes Pharma in the mix as well!

### Another job

We have another job available in the Department of Statistical Science at UCL. This will be a joint post between the department and University College Hospital (we have strong links with the Joint Research Office and do collaborate with many clinicians on their applied work).

The job will be a mixture of various health research studies and clinical trials conducted at UCL and the associated NHS Trusts. I think it is interesting that the successful candidate will be able to work on applied projects as well as on some more methodological ones.

Lots of information (including links to apply) on the UCL website. Deadline for the applications is January 4th $-$ we expect to interview shortly after.

## Friday, 14 November 2014

### Best job ever

The job advert for the postdoc position in our MRC-funded project on the Regression Discontinuity Design is finally out.

Aidan has done a fantastic job in his little over a year in the position, but he's now moved to a lectureship in our department and so we need to find a suitable replacement. In fact, the new post has been extended and will be jointly funded by the project and the UCL department of Primary Care and Population Health $-$ who are collaborators on the RDD anyway.

The project is doing well and we do have a couple of interesting papers out $-$ here and here, for example. We're also currently working on some more extensions of the method, as well as the actual applications to the THIN dataset.

As they formally say, "Informal enquiries regarding the vacancy may be addressed to Dr Gianluca Baio, email: g.baio@ucl.ac.uk"...

## Wednesday, 12 November 2014

### How do you spell your name?

I've just got back from ISPOR, at which I managed to chat with several people $-$ I guess that's one of those conferences where the amount of people in the sessions is basically the same as those outside, talking (more or less quietly).

In fact, we'll try to arrange a few workshops/meetings $-$ the first one is this coming Monday at UCL, when James O'Mahony will talk about his work on the choice of comparator strategies in cost-effectiveness analyses of Human Papillomavirus testing in cervical screening. We'll try and have seminars/events monthly.

Among the highlights of my three days in Amsterdam:
• after a while, I noticed that every time I was walking through the exhibitors' booths, said exhibitors would intently check my name-tag out until they actually saw my affiliation and suddenly decided they weren't really interested and move their glance away (which I thought was quite funny);
• the first night, as I got to my hotel room, I tried to connect to the wireless using the instructions I'd been given (which was: use your room number and surname). Because it wasn't working, I phoned the reception to ask for assistance. The lady asked me: "What name are you using", to which I replied: "mine". To which she replied: "but how do you spell it?" To which I replied: "the way it is spelled: B-A-I-O". To which she replied: "Oh, but to us it is spelt B-A-L-O. That's what you should use". It worked;
• the nice dinner at this Italian restaurant (I didn't choose it, but it was quite good).

## Wednesday, 29 October 2014

### "Football"... I mean "soccer"... I mean "football"...

A couple of weeks ago, I was contacted by Daniel Weitzenfeld $-$ a Chicago freelance data scientist (his own definition). Daniel got interested in modelling sports results and googled our football paper $-$ in his post here, he jokes that, because we're Italians, in our paper "football" means "soccer". But of course, I would respond saying that the real story is that when he says "football" he means "the weird version of rugby that Americans play"...

Anyway, he set out to adapt our model to last year's Premier League data, using pymc. In fact, he's slightly modified our model $-$ we exchanged a couple of emails to clarify some issues from our original model (he did make a couple of good points). He then discusses the issue of shrinkage in the results of the model $-$ as he says (quoting John Kruschke) shrinkage is not necessarily good or bad and it's just a feature of how the data are modelled.

In our case, however, model fit was massively improved by using a more complex specification that (including some prior knowledge about the potential strength of the teams) would reduce the amount of shrinkage $-$ in effect, we had assumed three different data generating processes (or some form of conditional exchangeability); one for "good" teams (fighting for the title), one for "average" teams and one for the "poor" teams (struggling for relegation).

I was quite interested in the pymc modelling $-$ I'll have to have a closer look at some point...

## Monday, 27 October 2014

### ISPOR posters

As the short course is fast approaching and I'm fighting with the last organisational details, I spent most of today preparing the two posters for the ISPOR congress, which I'm attending the week after next.

At first, I was a bit disappointed that none of the two abstracts I submitted had been accepted for a podium presentation, as they call it. But on reflection, in such a huge conference as ISPOR, the time scheduled for a talk is only 12 minutes, which makes it really difficult to present any methodological work, anyway. So, inspired by Marshall, I thought that posters may not be too bad after all...

The posters are about the structural zero model (which I've also discussed here and here; and some material is here) and BCEA. Because ISPOR is one of the biggest conferences in health economics, I think there's an opportunity to show these to potentially interested people $-$ also, I'll meet a few colleagues, plus I like Amsterdam, so all in all, sounds like a good trip (hopefully!).

Anyway, I've put a pdf version of the two posters here and here $-$ check them out, if you like!

## Tuesday, 14 October 2014

### 1 in 5 million

Earlier today, I've got an email from UCL Library Services, telling me that our research publications repository (UCL Discovery) has "recently passed the exciting milestone of 5 million downloads".

As it happens, the 5 million-th download was our paper on football results predictions $-$ I've already mentioned it in a few posts, for example here, here and here).

The best part of the story is that there is a "small prize" that I will be given at a forthcoming Library Conference to "mark this achievement" $-$ the achievement being having won the lottery, really...

As my friend Virgilio said, who knows what my colleagues that do "more serious stuff" will think of that...

## Sunday, 5 October 2014

### Evil companies, stereotypes and coffee

A couple of weeks ago, our favourite coffee place in Walton (where we live) just closed out of the blue. We were really surprised as we thought they were doing really well, as the shop was almost always buzzing with people. So we did a bit of investigation and then found out the story $-$ the supermarket giant Tesco had invested in nearly 50% of the shares in the (originally) small coffee shop chain and opened several branches in some of their supermarkets. Only to successively close a few non-supermarket branches (which were opened before the take-over), including ours.

So we had to change coffee shop and had to (provisionally, I think) resort to one of the "Italian" places $-$ well, they do have Italian names (this, or this) and may be they are owned or founded by Italians, but they certainly do not exist in Italy... Anyway, the other day, Marta and I were in the local branch and while drinking our coffee, we noticed the big pictures on the wall, which are meant to portray Italian life, to give a touch of authenticity to the place.

There was a picture of a couple of old men sitting outside a bar, gesticulating and arguing. And another picture of a few young men checking out a girl who had just passed by. The immediate reaction was that it was a bit insulting, really, to get stereotyped like that. But then we also thought of something we had seen last week, when we were in Italy: a few men were preparing to carry a coffin, but as a girl wearing a rather short skirt walked by, all of them intently stared at her (no whistles or shouts, though).

### Bayes of thrones

My friend and colleague Andreas sent me a link to a working paper published by a statistician at the University of Christchurch (New Zealand) and discussed here. The main idea of the paper was to use a Bayesian model to predict the number of future chapters will each of the main characters of Game of Thrones feature in.

I'm not a great fan of Game of Thrones, but I know many people who are (including in my own household). So I can't really comment on the results. However, on a very cursory look at the paper, it seems as though the model is based on vague priors for all the parameters, which is kind of a bummer, as I would have thought this is the kind of model for which you do have some strong subjective (or "expert") prior to use... Still, nice idea, I think...

## Wednesday, 24 September 2014

### Book of the year

The next issue of Significance (it will be a December double issue) will feature a festive version of the usual book review section.

Readers and contributors can nominate their favourite statistics books of the year and alongside the nomination, we're looking for a 100-word explanation of why they recommend it.

Full details are on the Significance website.

## Friday, 19 September 2014

### Mini-tour

The last two days have been kind of a very interesting mini-tour for me $-$ yesterday the Symposium that we organised at UCL (the picture on the left is not a photo taken yesterday) and today the workshop on efficient methods for value of information, in Bristol.

I think we'll put the slides from yesterday's talks on the symposium website shortly.

## Wednesday, 17 September 2014

### BCEA 2.1

We're about to release the new version of BCEA, which will contain some major changes.

1. A couple of changes in the basic code that should improve the computational speed. In general, BCEA doesn't really run into troubles because most of the computations are fairly easy. However, there are a couple of parts in which the code wasn't really optimised; Chris Jackson has suggested some small but substantial modifications $-$ for instance using ColMeans instead of apply($\cdot$,2,mean)
2. Andrea has coded a function to compute the cost-effectiveness efficiency frontier, which is kind of cool. Again, the underlying analysis is not necessarily very complicated, but the resulting graph is quite neat and it is informative and useful too.
3. We've polished the EVPPI functions (again, thanks to Chris who's spotted a couple of blips in the previous version).
I'll mention this changes in my talk at the workshop on "Efficient Methods for Value of Information Calculations". If all goes to plan, we'll release BCEA 2.1 by the end of this week.

## Monday, 8 September 2014

### Unbelievable(?)

This is an old story (it dates back to July last year) but it just got under my radar and I think it's quite unbelievable $-$ or may be it isn't after all...

In a survey of just over 1000 individuals developed by professional company Ipsos Mori, respondents were asked to give their opinion on a series of "urban myths" (as it turns out).

For example, the perception of the surveyed sample on benefit fraud is way out of line ("the public think that £24 of every £100 of benefits is fraudulently claimed. Official estimates are that just 70 pence in every £100 is fraudulent - so the public conception is out by a factor of 34", as the Independent article puts it).

Other topics on which the public seems to have a very biased opinion are immigration (with 31% of the population believed to have recently migrated into the UK, while the official figure is actually around 13%) and teen pregnancy (perceived to be 25 times as prevalent than it actually is!). That'll make for a nice example in my course on Social Statistics...

### B my J

As part of our work on the Regression Discontinuity Design for the British Journal of Medicine, we decided we should prepare a short, introductory research paper. We weren't holding our breath, as we thought that, while obviously interesting to clinicians, the topic may be a little too complex and technical for the BMJ audience. So we tried really hard to strip it out of the technicalities to highlight the substantial points $-$ which they liked!

The paper was reviewed rather quickly and the comments were positive (although iI remember thinking that there was a sense of "you need more, but also much less" (which reminded me of Jeremy from Peep Show)...

Anyway, they seem to have liked the revisions too and the paper is now out.

## Thursday, 4 September 2014

### No surprises

Yesterday was the day of my talk at the RSS Conference. As I mentioned here, I hadn't been back to Sheffield for nearly 20 years, so it is really no no surprise that I found it reeeeally (I mean: really) changed. In fact, I think I'm being a victim of some confounding here $-$ not sure of quite as much of how I thought things changed is due to the fact they have really have changed or rather to the fact that I have changed, since then...

Back then, in the 18th century, it was my first time outside Italy, so everything was new and unfamiliar, although I seem to remember that there really was no proper coffee place (or just coffee to be had), outside an eight-decent French place... Also, while I distinctly remember enjoying being there, I couldn't really, fully recognise the streets I was walking (even if I'm sure I had walked along them before). So I suppose the place must be changed indeed!

The talk went well (the file is large, because of the couple of maps I've included $-$ but I thought they looked nice, so I left them in). I joked around a bit $-$ it wasn't difficult, given the topic. I made the point that it's not time to panic and leave the EU, just yet, at least not on account of the fact that the Eastern European countries hate the UK and thus do not vote for them in the Eurovision.

On other news, I really liked Tim Harford's talk $-$ it was funny and it told a very nice story, which is good. He gave a couple of more or less (pun intended) known examples of "big" (or, as he also put it, "found") data leading to some undesirable results and made the general argument that we shouldn't really dismiss the core of statistical methodology, just because we can get a lot of data and we can deal with them. How to not agree?

## Wednesday, 27 August 2014

Next week I'm off to the RSS conference in Sheffield, where I'll present our work on the Eurovision contest. I'm quite excited about going back to Sheffield, where some time in the last century, I've spent a semester as an Erasmus Exchange student. In fact, this will be the first time I'm back since then $-$ I've spoken with a few friends who tell me that the city has changed so much in the last few years, so I'm curious to see what I'll make of it.

I have fond memories of my experience and I'm very glad I've taken that opportunity (although at the time the exchange rate between Italian Lire and the Pound was 2 million to 1...). In particular, I'm glad I got to be at one of the last editions of the Pyjama Jump!

## Wednesday, 20 August 2014

### Workshop on Efficient Methods for Value of Information

Nicky Welton has invited me to talk at a very interesting workshop, which she has organised at the University of Bristol (here's a flyer). The day will be about the recent (and current) development of methods to perform calculations of the expected value of information, particularly for specific parameters (EVPPI).

This is something that is extremely relevant in health economic evaluation and I've already implemented a couple of possible methods in BCEA (in fact, we're writing more on this in the BCEA book $-$ we're running a little late on the original timeline, but it's looking good and I'll post more on this, later).

At the workshop, Chris Jackson and I will give a joint talk to describe how the different methods can be integrated in a single, general framework. We'll also have a new PhD student who will start working in September on fully Bayesian extension of Gaussian Processes approximations to the net benefit. Pretty exciting stuff (if you're easily excited by these kind of things, that is...).

## Thursday, 14 August 2014

### (Some) Spaces available

Requests for registration to our short course on Bayesian methods in Health Economics are coming in steadily $-$ in fact, we had started advertising quite in advance (the course is in November), but we're nearly booked up.

We set a total of 30 participants, so hurry up if you're interested!

## Monday, 11 August 2014

### Two weights and two measures?

This is an interesting story about the Meningitis B vaccine (some additional background here and here). In a nutshell, the main issue is that vaccines are subject to a slightly different regulation than other "normal" drugs. For example, patents do not really apply to vaccines (I believe the argument is that the composition is so difficult to set up that in effect there is no point in patenting it $-$ although there may be more to this...).

More to the point, unlike "normal" drugs or health interventions, the economic evaluation of vaccines in the UK is within the remit of a special body, the Joint Committee on Vaccination and Immunisation (JCVI), rather than NICE

On the one hand, this is perfectly reasonable, as, arguably, vaccines do have some specific characteristics that make modelling and evaluation slightly more complex $-$ for example, vaccination is usually associated with phenomena such as herd immunity (the more people are vaccinated, the more people are directly or indirectly protected). While it is essential to include these dynamic aspects in modelling, it also makes for more complicated mathematical/statistical structures.

On the other hand, however, this raises the question as to whether it makes sense at all to try and evaluate these very special interventions using the same yardstick used for the others (eg cost-utility/effectiveness analysis). Or whether the thresholds for cost-effectiveness should be the same $-$ after all, infectious diseases may have incredible burden during epidemics and so, arguably, effective interventions may be worth extra money than the usual £20-30,000 per QALY.

There are all sort of related issues (some of which perhaps more of a political nature, for example in terms of the overall evaluation process, in direct comparison to what NICE do) $-$ I think I'll discuss them some more at a later stage. But this is interesting nonetheless, also from a technical point of view.

In my opinion, the point is that, all the more for infectious diseases, to continuously re-assess the evidence and its implications for modelling is absolutely fundamental. Techniques such as the value of information (some discussed and available in BCEA) should be used more widely. And both regulators and industry should be open to this sort of step-wise approach to marketing.

## Friday, 25 July 2014

### Pat pat

This is probably akin to an exercise in self-pleasing, but I'll indulge in this anyway to celebrate the fact that our paper on the Bias in the Eurovision song contest voting (the last in a relatively long series of posts on this is here) has now over 4,000 "article views".

The Journal of Applied Statistics website defines these as: "Article usage statistics combine cumulative total PDF downloads and full-text HTML views from publication date [23 Apr 2014, in our case] to 23 Jul 2014. Article views are only counted from this site.

In case you're wondering, neither Marta nor I have actually downloaded the paper to boost the numbers!

## Monday, 7 July 2014

### The Oracle (8) - let's go all the way!

This is (may be) the final post in the series dedicated to the prediction of the World Cup results $-$ I'll try and actually write another to wrap things up and summarise a few comments, but this will probably be a bit later on. Finally, we've decided to use our model, which so far has been applied incrementally, ie stage-by-stage, to predict the result of both the semifinals and the finals.

The first part is relatively straightforward; the quarter finals have been played and we do know the results that have occurred. Thus, we can re-iterate the procedure (which we described here) and i) update the data with the observed results; ii) update the "current form" variable and the offset; iii) re-run the model to estimate each team's propensity to score; iv) predict the result of the unobserved games $-$ in this case the two semifinals (Brazil-Germany and Argentina-Netherlands).

However, to give the model a nice twist, I thought we should include some piece of extra information that is available right now, ie the fact that Brazil will, for certain, play their semifinal without their suspended captain Thiago Silva and their injured "star player" Neymar (who will also miss the final, due to the gravity of his injury). Thus, we ran the model by modifying the offset variable (see a more detailed description here) for Brazil, to slightly decrease their "short-term" quality. [NB: if this were a "serious" model, we would probably try to embed these changes in a more formal way, rather than as "ad hoc" modifications to the general set up. Nevertheless, I believe that the possibility of dealing with additional information, possibly in the form of subjective/expert knowledge, is actually a strength of the modelling framework. Of course, you could say that the selection of the offset distribution is arbitrary and other possibilities were possible $-$ that's of course true and a "serious" model would certainly require more extensive sensitivity analysis at this stage!]

Using this formulation of the model, we get the following results, in terms of the overall probability of going through to the final (ie accounting for potential draws in the 90 minutes and then extra times and possibly penalties, as discussed here):

Brazil       Germany      0.605  0.395
Argentina Netherlands  0.510  0.490

So, the second semifinal is predicted to be much tighter (nearly 50:50), while Brazil are still favourites to reach the final, according to the model prediction.

As I said earlier, however, this time we've gone beyond the simple one-step prediction and have used these results to also re-run the model before the actual results of the semifinals are known and thus predict the overall outcome, ie who's winning the World Cup.

Overall, our estimation gives the following probabilities of winning the championship (these may not sum to 1 because of rounding):

Brazil: 0.372
Germany: 0.174
Argentina: 0.245
Netherlands: 0.206

Of course, these probabilities encode extra uncertainty, because we're going one extra step forward in the future $-$ we don't know which of the potential futures will occur for the semifinals. Leaving the model aside), I think would probably like the Netherlands to win $-$ if only for the fact that in that way, Italy would still be the 2nd most frequent World Cup winners, only one title behind Brazil, and one and two above Germany and Argentina, respectively.

## Thursday, 3 July 2014

### The Oracle (7)

We're now down to 8 teams left in the World Cup. Interestingly, despite a pretty disappointing display by some of the (more or less rightly so) highly rated teams, such as Spain, Italy, Portugal or England, European sides are exactly 50% of the lot. Given the quarter final game between France and Germany, at least one European team is certain to reach the semifinals. Also, it is worth noticing that the 8 remaining teams are the group winners $-$ which kind of confirms Michael Wallace's point.

We've now re-updated the data, the "form" and the "offset" variables (as briefly explained here) using the results of the round of 16. The model had predicted (as shown in the graphs here) wide uncertainty for the potential outcomes of the games (also, we had not included the added complication of extra times & penalties $-$ more on this later). I believe this has been confirmed by the actual games. In many cases (in fact, probably all but the Colombia-Uruguay game, which was kind-of-dominated by the former), the games have been substantially close. As a result, we've observed a slightly higher than usual proportion of games ending up at extra times.

So, we've also complicated (further!) our model to estimate the result by including extra times and penalties. In a nutshell, when the game is predicted to be a draw (ie the predicted number of goals scored by the two teams is the same), then we've additionally simulated the outcome of extra times.

In doing this, we've used the same basic structure as for the regular time, but we've added a decremental factor to the linear predictor (describing the "propensity" of team A to score when playing against team B). This makes sense, since the duration of extra time is 1/3 of the normal game. Also, there is added pressure and teams normally tend to be more conservative. Thus, in this prediction, we've increased the chance of observing 0 goals and accounted for the shorter time played. If the prediction is still for a draw, then we've determined the winner by assuming that penalty shoot outs essentially are a randomising device $-$ each team have 50% chance of winning them.

These are the contour plots for the posterior predictive distribution of the goals scored in the quarter finals, based on our revised model.
Basically all games are again quite tight $-$ perhaps with the (reasonable?) exception of Netherlands-Costa Rica in which the Dutch are favourite and predicted to have a higher chance of scoring more goals (and therefore winning the game).

As shown in the above graph, draws are quite likely in almost all the games; the European derby is probably the closest game (and this seems to make sense given both the short- and long-term standing of the two teams). Brazil and Argentina both face tough opponents (based on the model $-$ but again, in line with what we've seen so far).

Using the result of the model in terms of prediction of the results at extra time & penalties, we estimate the overall probability of winning the game (ie either within 90 minutes or beyond) as

Brazil
Colombia
0.657
0.343
Netherlands
Costa Rica
0.776
0.224
France
Germany
0.497
0.503
Argentina
Belgium
0.607
0.393

(in the above table, the third and fourth columns indicate, respectively, the predicted chance that the team in column one and two, respectively, win the game and progress to the semifinals).

One final remark, which I think it's generally interesting, is that by the time we've reached the quarter finals, the value of the "current form" variable for Brazil (who started as hot favourites based on the evidence synthesis of the published odds that we've used to define it at the beginning of the tournament) is lower than that of their opponent. But again, Colombia have sort of breezed through all of their games so far, while Brazil have kind of stuttered and have not won games that they probably should have (taking at face value their "strength"). This doesn't seem enough to make Colombia favourites in their game against the host $-$ but beware of surprises! After all, the distribution of the possible results is not so clear cut...

## Wednesday, 2 July 2014

### Short course: Bayesian methods in health economics

Chris, Richard and I tested this last March in Canada (see also here) and things seem to have gone quite well. So we have decided to replicate the experiment (so that we can get a bigger sample size!) and do the short course this coming November (3-5th), at UCL.

Full details (including links for registration) are available here. As we formally say in an advert we've circulated on a couple of relevant mailing lists:

"This course is intended to provide an introduction to Bayesian analysis and MCMC methods using R and MCMC sampling software (such as OpenBUGS and JAGS), as applied to cost-effectiveness analysis and typical models used in health economic evaluations.

The course is intended for health economists, statisticians, and decision modellers interested in the practice of Bayesian modelling and will be based on a mixture of lectures and computer practicals, although the emphasis will be on examples of applied analysis: software and code to carry out the analyses will be provided. Participants are encouraged to bring their own laptops for the practicals.

We shall assume a basic knowledge of standard methods in health economics and some familiarity with a range of probability distributions, regression analysis, Markov models and random-effects meta-analysis. However, statistical concepts are reviewed in the context of applied health economic evaluations in the lectures."

The timetable and additional info are here.

## Saturday, 28 June 2014

### Break!

Just to break the mono-thematic nature of the recent posts, I thought I'd just linked to this article which has appeared in the Significance website.

That's an interesting analysis conducted by researchers at the LSE, demystifying the myth that migrants in the UK have unfair advantages in accessing social housing.

I believe these kinds of findings should be made as widely accessible as possible, because of the incredible way in which these myths are used to build up conspiracy theories or political arguments that are passed for based on "facts", but are, in fact, just a partial (if not biased at all) view of a phenomenon.

Our paper on the Eurovision song contest (ESC) (here, here and here) is of course far less important to society than housing; but it struck me that a few of the (nasty) comments we got in some of the media that reported the press release of our findings were effectively along the lines: I think Europe hate Britain and the fact that we don't win the ESC is evidence to that; you say otherwise based on numbers; that doesn't move my position by an inch

At the time, I felt a bit petty because this bothered me a bit. But, a couple of days later, I came across a couple of articles (here and here) on one politician who used the same kind of reasoning to make their point (eg Britain should get out of the EU $-$ the ESC is evidence that they hate us). Which bothered me even more...

## Friday, 27 June 2014

### The Oracle (6)

Quick update, now that the group stage is finished. We needed a few tweaks to the simulation process (described in some more details here), which we spent some time debating and implementing.

First off, the data on the last World Cups show that during the knock out stage, there are substantially fewer goals scored. This makes sense: from tomorrow it's make or break. This wasn't too difficult to deal with, though $-$ we just needed to modify the distribution for the zero component of the number of goals ($\pi$, as described here). In this case, we've used a distribution centered on around 12% with most of the mass concentrated between 8% and 15%.

These are the predictions for the 8 games. Brazil, Germany, France and (only marginally) Argentina have a probability of winning exceeding 50%. The other games look closer.

Technically, there is a second issue, which is of course that in the knock out stage draws can't really happen $-$ eventually game ends either after extra time, or at penalties. For now, we'll just use this prediction, but I'm trying to think of a reasonable way to include the extra complication in the model; the main difficulty is that in extra time the propensity to score drops even further $-$ about 30% of the games that go to extra time end up at penalties. I'll try and update this (if not for the this round, possibly for the next one).

## Monday, 23 June 2014

### The Oracle (5. Or: Calibration, calibration, calibration...)

First off, a necessary disclaimer: I haven't been able to write this post before a few of the games of the final round of the group stage have been played, but I have not watched the games so far and have run the model to predict round 3 as if none of the games had been played.

Second, we've been thinking about the model and whether we could improve it in light of its predictive ability proven so far. Basically, as I mentioned here, the "current form" variable may not be very well calibrated to start with $-$ recall it's based on an evidence synthesis of the odds against each of the teams, which is then updated given the observed results and weighting them by the predicted likelihood of each outcome occurring.

Now: the reasoning is that, often, to do well at competitions such as the World Cup, you don't necessarily need to be the best team (although this certainly helps!) $-$ you just need to be the best in that month or so. This goes, it seems to me, over and above the level and impact of "current form".

To make a clearer example, consider Costa Rica (arguably the dark horses of the tournament, so far): the observed results (two wins against relatively highly rated Uruguay and Italy) have improved their "strength", by nearly doubling it in comparison to the initial value (based on the evidence synthesis of a set of published odds). However, before any game was played, they were considered the weakest team among the 32 participating nations. Thus, even after two big wins (and we've accounted for the fact that these have been big wins!), their "current form/strength" score is still only 0.08 (on a scale from 0 to 1).

Consequently, by just re-running the model including all the results from round 2 and the updated values for the "current form" variable, the prediction for their final game is a relatively easy win for their opponent, England, who on the other hand have had a disappointing campaign and have already been eliminated. The plots below show the prediction of the outcomes of the remaining 16 games, according to our "baseline" model.

So we thought of a way to potentially correct for this idiosyncrasy of the model [Now: if this were serious work (well $-$ it is serious, but it is also a bit of fun!) I wouldn't necessarily do this in all circumstances (although I believe what I'm about to say makes some sense)]

Basically, the idea is that because it is based on the full dataset (which sort of accounts for "long-term" effects), the "current form" variable describes the increase (decrease) in the overall propensity to score of team A when playing team B. But at the World Cup, there are also some additional "short-term" effects (eg the combination of luck, confidence, good form, etc) that the teams experience just in that month.

We've included these in the form of an offset, which we first compute (I'll describe how in a second) and then add to the estimated linear predictor. This, in turn, should make the simulation of the scores for the next games more in line with the observed results $-$ thus making for better prediction (and avoiding not-too-realistic prediction).

The computation of the offset is like so: first, we compute the difference between the expected number of points accrued so far by each of the team and the observed value. Then, we've labelled each team as doing "Much better", "Better", "As expected", "Worse" or "Much worse" than expected, according to the magnitude of the difference between observed and expected.

Since each team have played 2 games so far, we've applied this rule:
• Teams with a difference of more than 4 points between observed and expected are considered to do "much better" (MB) than expected;
• Teams with a difference of 3 or 2 points between observed and expected are considered to do "better" (B) than expected;
• Teams with a difference between -1 and 1 point between observed and expected are considered to do just "as expected" (AE);
• Teams with a difference of -2 or -3 points between observed and expected are considered to do "worse" (W) than expected;
• Teams with a difference of more than -4 points between observed and expected are considered to do "much worse" (MW) than expected.
Roughly speaking this means that if you're exceeding expectation by more than 66% then we consider this to be outstanding, while if you're difference with the expectation is within $\pm$ 20%, then you're effectively doing as expected. Of course, this is an arbitrary categorisation $-$ but I think it is sort of reasonable.

Then, the offset is computed using some informative distributions. We used Normal distributions based on average inflations (deflations) of 1.5, 1.2, 1, 0.8 and 0.5, respectively for MB, B, AE, W and MW performances. We choose the standard deviations for these distributions so that for teams performing "much better" than expected the chance of an offset greater than 1 on the natural scale (meaning an increase in the performance predicted by the "baseline" model) would be approximately 1 (for MB), .9 (for B), .5 (for AE), .1 (for W) and 0 (for MW). The following picture shows this graphically.

Including the offsets computed in this way produces the results below.

The Costa Rica-England game is now much tighter $-$ England are still predicted to have a higher chance of winning it, but the joint posterior predictive distribution of the goals scored looks quite symmetrical, indicating how close the game is predicted to be.

So, based on the results of the model including the offset, these are our predictions for the round of 16.