Friday, 7 July 2017

Conflict of interest

I am fully aware that this post is seriously affected by a conflict of interest, because what I'm about to discuss (in positive terms!) is work by Anthony, who's doing a very good job on his PhD (which I co-supervise).

But, I thought I'd do like our former PM (BTW: see this; I really liked the series) and sort conflict of interests by effectively ignoring them (to be fair, this seems to be a popular strategy, so let's not be too harsh on Silvio...).

Anyway, Anthony has written an editorial, which has received some traction in the mainstream media (for example here, here or here). Not much that I disagree with in Anthony's piece, except that I am really sceptical of any bake & eat situation $-$ the only exception is when I actually make pizza from scratch...

Tuesday, 20 June 2017

Picky people

Our book on Bayesian cost-effectiveness analysis using BCEA is out (I think as of last week). This has been a long process (I've talked about this here, here and here). 

Today I've come back to the office and have open the package with my copies. The book looks nice $-$ I am only a bit disappointed about a couple of formatting things, specifically the way in which computer code got badly formatted in chapter 4. 
We had originally used specific font, but for some reason in that chapter all computer code is formatted in Times New Romans. I think we did check in the proofs and I don't recall seeing this (which, to be fair, isn't necessarily to swear that we didn't miss it, while checking...).

Not a biggie. But it bothers me, a bit. Well, OK: a lot. But then again, I am a(n annoyingly) picky person...

Monday, 19 June 2017

Homecoming (of sort...)

I spent last week in Florence for our Summer School. Of course, it was home-coming for me and I really enjoyed being back to Florence $-$ although it was really hot. I would say I'm not used to that level of heat anymore, if it wasn't for the fact that I have caught my brother (who still lives there) huffing and complaining about it several times!...

I think it was a very good week $-$ we had capped the number of participants at 27; everybody showed up and I think had a good time. I think I can speak for myself as well as for Chris, Nicky, Mark and Anna and say that we certainly enjoyed being around people who were so committed and interested! We did joke at several points that we didn't even have to ask the questions $-$ they were starting the discussion almost without us prompting it...

The location was also very good and helped make sure everybody was enjoying it. The Centro Studi in Fiesole is an amazing place $-$ not too close to Florence that people always disappears after the lectures, but not too far either. So there was always somebody there even for dinner and a chat in the beautiful garden, although some people would venture down the hill (notably, many did so by walking!). We also went to Florence a couple of times (the picture is one of my favourite spots of the city, which I obviously brought everybody to...).

Friday, 9 June 2017


So: for once I woke up this morning feeling slightly quite tired for the late night, but also rather upbeat after an election. The final results of the general election are out and have produced quite some shock. 

Throughout yesterday, it looked as though the final polls were returning an improved majority for the Conservative party $-$ this would have been consistent with the "shy Tory" effect. Even Yougov had presented their latest poll suggesting a seven points lead and improved Tory majority. So I guess many people were unprepared for the exit polls, which suggested a very different figure...

First off, I think that the actual results have vindicated Yougov's model (rather than the poll), based on a hierarchical model informed by over 50,000 individual-level data on voting intention as well as several other covariates. They weren't spot on, but quite close. 

Also, the exit polls (based on a sample of over 30,000) were remarkably good. To be fair, however, I think that exit polls are different than the pre-election polls, because unlike them they do not ask about "voting intentions", but the actual vote that people have just cast.

And now, time for the post-mortem. My final prediction using all the polls at June 8th was as follows:

                mean       sd 2.5% median 97.5%     OBSERVED
Conservative 346.827 3.411262  339    347   354          318
Labour       224.128 3.414861  218    224   233          261
UKIP           0.000 0.000000    0      0     0            0
Lib Dem       10.833 2.325622    7     11    15           12
SNP           49.085 1.842599   45     49    51           35
Green          0.000 0.000000    0      0     0            1
PCY            1.127 1.013853    0      2     3            4

Not all bad, but not quite spot on either and to be fair, less spot on than Yougov's (as I said, I was hoping they were closer to the truth than my model, so not too many complaints there!...).

I've thought a bit about the discrepancies and I think a couple of issues stand out:

  1. I (together with several other predictions and in fact even Yougov) have overestimated the vote and, more importantly, the number of seats won by the SNP. I think in my case, the main issue had to do with the polls I have used to build my model. As it has happened, the battleground in Scotland has been rather different than the rest of the country, I think. But what was feeding into my model were the data from national polls. I had tried to bump up my prior for the SNP to counter this effect. But most likely this has exaggerated the result, producing an estimate that was too optimistic.
  2. Interestingly, the error for the SNP is 14 seats; 12 of these, I think, have (rather surprisingly) gone to the Tories. So, basically, I've got the Tory vote wrong by (347-318+12)=41 seats $-$ which if you actually allocate to Labour would have brought my prediction to 224+41=265. 
  3. Post-hoc adjustements aside, it is obvious that my model had overestimated the result for the Tories, while underestimating Labour's performance. In this case, I think the problem was that the structure I had used was mainly based on the distinction between leave and remain areas at last year's referendum. And of course, these were highly related to the vote that in 2015 had gone to UKIP. Now: like virtually everybody, I have correctly predicted that UKIP would get "zip, nada, zilch" seats. In my case, this was done by combining the poor performance in the polls with a strongly informative prior (which, incidentally, was not strong enough and combined with the polls, I did overestimate UKIP vote share). However, I think that the aggregate data observed in the polls had consistently tended to indicate that in leave areas the Tories would have had massive gains. What actually happened was in fact that the former UKIP vote has split nearly evenly between the two major parties. So, in strong leave areas, the Tories have gained marginally more than Labour, but that was not enough to swing and win the marginal Labour seats. Conversely, in remain areas, Labour has done really well (as the polls were suggesting) and this has in many cases produced a change in colours in some Conservative marginal seats.
  4. I missed the Green's success in Brighton. This was, I think, down to being a bit lazy and not bothering telling the model that in Caroline Lucas' seat the Lib Dem had not fielded a candidate. This in turn meant that the model was predicting a big surge in the vote for the Lib Dems (because Brighton Pavilion is a strong remain area), which would eat into the Green's majority. And so my model was predicting a change to Labour, which never happened (again, I'm quite pleased to have got it wrong here, because I really like Ms Lucas!).
  5. My model had correctly guessed that the Conservatives would regain Richmond Park, but that the Lib Dems had got back Twickenham and Labour would have held Copeland. In comparison to Electoralcalculus's prediction, I've done very well in predicting the number of seats for the Lib Dems. I am not sure about the details of their model, but I am guessing that they had some strong prior to (over)discount the polls, which has lead to a substantial underestimation. In contrast, I think that my prior for the Lib Dems was spot on.
  6. Back to Yougov's model, I think that the main, huge difference, has been the fact that they could rely on a very large number of individual level data. The published polls would only provide aggregated information, which almost invariably would only cross-tabulate one variable at a time (ie voting intention in Leave vs Remain, or in London vs other areas, etc $-$ but not both). To actually be able to analyse the individual level data (combined of course with a sound modelling structure!) has allowed Yougov to get some of the true underlying trends right, which models based on the aggregated polls simply couldn't, I think.
It's been a fun process $-$ and all in all, I'm enjoying the outcome...

Wednesday, 7 June 2017


Today I've taken a break from the general election modelling $-$ well, not really... Of course I've checked whether there were new polls available and have updated the model! 

But: nothing much changes, so for today, I'll actually concentrate on something else. I was invited to give a talk at the Imperial/King's College Researchers' Society Workshop $-$ I think this is something they organise routinely.

They asked me to talk about "Blogging and Science Communication" and I decided to have some fun with this. My talk is here. I've given examples of weird stuff associated with this blog $-$ not that I had to look very hard to find many of them...

And I did have fun giving the talk! Of course, the posts about the election did feature, so eventually I got to talk about them to...

Tuesday, 6 June 2017

The Inbetweeners

When it first was shown, I really liked "The Inbetweeners" $-$ it was at times quite rude and cheap, but it did make me laugh, despite the fact that, as it often happens, all the main characters did look a bit older than the age they were trying to portrait...

Anyway, as is increasingly often the case, this post has very little to do with its title and (surprise!) it's again about the model for the UK general election.

There has been lots of talk (including in Andrew Gelman's blog) in the past few days about Yougov's new model, which is based on Gelman's MRP (Multilevel Regression and Post-stratification). I think the model is quite cool and it obviously is very rigorous $-$ it considers a very big poll (with over 50,000 responses), assumes some form of exchangeability to pool information across different individual respondents' characteristics (including geographical area) and then reproportions the estimated vote shares (in a similar way to what my model does) to produce an overall prediction of the final outcome.

Much of the hype (particularly in the British mainstream media), however, has been related to the fact that Yougov's model produces a result that is very different from most of the other poll analyses, ie a much worse performance for the Tories, who are estimated to gain only 304 seats (with a 95% credible interval of 265-342). That's even less than the last general election. Labour are estimated to get 266 (230-300) seats and so there have been hints of a hung parliament, come Friday.

Electoralcalculus (EC) has a short article in their home page to explain the differences in their assessment, which (more in line with my model) still gives the Tories a majority of 361 (to Labour's 216).

As for my model, the very latest estimate is the following:

                mean        sd 2.5% median   97.5%
Conservative 347.870 3.2338147  341    347 355.000
Labour       222.620 3.1742205  216    223 230.000
UKIP           0.000 0.0000000    0      0   0.000
Lib Dem       11.709 2.3103369    7     12  16.000
SNP           48.699 2.0781525   44     49  51.000
Green          0.000 0.0000000    0      0   0.000
PCY            1.102 0.9892293    0      1   2.025
Other          0.000 0.0000000    0      0   0.000

so somewhere in between Yougov and EC (very partisan comment: man how I wish Yougov got it right!).

One of the points that EC explicitly models (although I'm not sure exactly how $-$ the details of their model are not immediately evident, I think) is the poll bias against the Tories. They counter this by (I think) arbitrarily redistributing 1.1% of the vote shares from Labour to the Tories. This probably explains why their model is a bit more favourable to the Conservatives, while being driven by the data in the polls, which seem to suggest Labour are catching up.

I think Yougov model is very extensive and possibly does get it right $-$ after all, speaking only for my own model, Brexit is one of the factors and possibly can act as proxy for many others (age, education, etc). But surely there'll be more than that to make people's mind? Only few more days before we find out...

Friday, 2 June 2017

The code (and other stuff...)

I've received a couple of emails or comments on one of the General Election posts to ask me to share the code I've used. 

In general, I think this is a bit dirty and lots could be done in a more efficient way $-$ effectively, I'm doing this out of my own curiosity and while I think the model is sensible, it's probably not "publication-standard" (in terms of annotation etc).

Anyway, I've created a (rather plain) GitHub repository, which contains the basic files (including R script, R functions, basic data and JAGS model). Given time (which I'm not given...), I'd like to put a lot more description and perhaps also write a Stan version of the model code. I could also write a more precise model description $-$ I'll try to update the material on the GitHub.

On another note, the previous posts have been syndicated in a couple of places (here and here), which was nice. And finally, here's a little update with the latest data. As of today, the model predicts the following seats distribution.

                mean        sd 2.5% median 97.5%
Conservative 352.124 3.8760350  345    352   359
Labour       216.615 3.8041091  211    217   224
UKIP           0.000 0.0000000    0      0     0
Lib Dem       12.084 1.8752228    8     12    16
SNP           49.844 1.8240041   45     51    52
Green          0.000 0.0000000    0      0     0
PCY            1.333 0.9513233    0      2     3
Other          0.000 0.0000000    0      0     0

Labour are still slowly but surely gaining some ground $-$ I'm not sure the effect of the debate earlier this week (which was deserted by the PM) are visible yet as only a couple of the polls included were conducted after that.

Another interesting thing (following up on this post) is the analysis of the marginal seats that the model predicts to swing from the 2015 Winners. I've updated the plot, which now looks as below.

Now there are 30 constituencies that are predicted to change hand, many still towards the Tories. I am not a political scientists, so I don't really know all the ins and outs of these, but I think a couple of examples are quite interesting and I would venture some comment...

So, the model doesn't know about the recent by-elections of Copeland and Stoke-on-Trent South and so still label these seats as "Labour" (as they were in 2015), although the Tories have actually now control of Copeland.

In the prediction given the polls and the impact of the EU referendum (both were strong Leave areas with with 60% and 70% of the preference, respectively) and the Tories did well in 2015 (36% vs Labour's 42% in Copeland and 33% to Labour's 39% in 2015). So, the model is suggesting that both are likely to switch to the Tories this time around.

In fact, we know that at the time of the by-election, while Copeland (where the contest was mostly Labour v Tories) did go blue, Stoke didn't. But there, the main battle was between the Labour's and the UKIP's candidate (UKIP had got 21% in 2015). And the by-election was fought last February, when the Tories lead was much more robust that it probably is now.

Another interesting area is Twickenham $-$ historically a constituency leaning to the Lib Dems, which was captured by the Conservatives in 2015. But since then, in another by-election the Tories have lost another similar area (Richmond Park,with a massive swing) and the model is suggesting that Twickenham could follow suit, come next Thursday. 

Finally, Clapton was the only seat won by UKIP in 2015, but since then, the elected MP (a former Tory-turned-UKIP) has defected the party and is not contesting the seat. This, combined with the poor standing of UKIP in the polls produces the not surprisingly outcome that Clapton is predicted to go blue with basically no uncertainty...

These results look reasonable to me $-$ not sure how life will turn out of course. As many commentators have noted much may depend on the turn out among the younger. Or other factors. And probably there'll be another instance of the "Shy-Tory effect" (I'll think about this if I get some time before the final prediction). But the model does seem to make some sense...

Tuesday, 30 May 2017

The swingers

Kaleb has left a comment on a previous post, asking what constituencies my model predicted to change hands, with respect to the 2015 election. This is not too difficult to do, given the wealth of results and quantities that can be computed, once the posterior distributions are estimated.

Basically, what I have done is to compute, based on the "possible futures" simulated by the model, the probability that the parties win each of the 632 seats in England, Wales and Scotland. Many of them seem to be very safe seats $-$ I think this is consistent with current political knowledge, although in an election like this possibly more can change...

Anyway, using the very latest analysis (as of today, 30th May and based on all polls published so far, but discounting older ones), there are 39 seats that are predicted to change hands. The following graph shows the predicted distribution of the probability of winning each of those seats, together of an indication of who won in 2015.

Of course, Labour are the big losers (there are many of the 39 constituencies that were Labour in 2015, but are predicted to swing to some other party in 9 days time). Conversely, the Tories are the big winners and most often, when they do, they are predicted to win that seat with a very large probability. There aren't very many real 50:50s $-$ a couple, I'd say, where the results are predicted to be rather uncertain. 

Incidentally, as of today, this is the distribution of seats predicted by the model.

                mean        sd 2.5% median 97.5%
Conservative 359.467 5.4492757  351    358   371
Labour       209.276 5.3613961  198    211   218
UKIP           0.000 0.0000000    0      0     0
Lib Dem       14.699 2.1621920   10     15    19
SNP           48.055 2.7271620   42     48    52
Green          0.000 0.0000000    0      0     0
PCY            0.503 0.8286602    0      0     3
Other          0.000 0.0000000    0      0     0

Labour are continuing to close the gap on the Tories, but are still a long way out. I'm curious to see what last night not-a-debate did to the polls...

Friday, 26 May 2017

(Too) slowly but surely?

After the tragic events in Manchester and the suspension in the campaigns, things have started again and a couple new polls have been released. Some of the media have also picked up the trend I was observing from my model and so I have re-updated the results.

The increasing trend for Labour does see another little surge, as does the decreasing trend for the Tories. In comparison to my last update, the Lib Dem are slightly picking up again. But all in all, the numbers still tell kind of the same story, I guess.

                mean        sd 2.5% median   97.5%
Conservative 369.251 5.1765622  357    370 378.000
Labour       197.886 5.2142298  190    197 211.000
UKIP           0.000 0.0000000    0      0   0.000
Lib Dem       15.085 2.3852598   11     15  19.025
SNP           49.263 2.3965756   44     49  53.000
Green          0.000 0.0000000    0      0   0.000
PCY            0.515 0.8499985    0      0   3.000
Other          0.000 0.0000000    0      0   0.000

These are the summary results as of today (again after discounting past polls). Lib Dem move from a median number of expected seats of 14 to the current estimate of 15; Labour go from 191 to 197 and the Tories go from 376 to 370, still comfortably in the lead. 

Monday, 22 May 2017

Quick update

This is going to be a very short post. I've been again following the latest polls and have updated my election forecast model $-$ nothing has changed in the general structure, only new data coming as the campaigns evolve.
The dynamic forecast (which considers for each day from 1 to 22 May only the polls available up to that point) show an interesting progression for Labour, who seem to be picking up some more seats. They are still a long way from the Tories, who are slightly declining. Also, the Lib Dems are also going down and the latest results seem to suggest a poor result for Plaid Cymru in Wales too (the model was forecasting up to 4 seats before, where now they are expected to get 0).

The detailed summary as of today is as follows.
                mean         sd    2.5% median 97.5%
Conservative 375.109 4.02010949 367.000    376   382
Labour       192.134 3.94862452 186.000    191   200
UKIP           0.000 0.00000000   0.000      0     0
Lib Dem       14.320 2.24781064  10.000     14    18
SNP           50.053 2.12713792  45.975     50    53
Green          0.007 0.08341438   0.000      0     0
PCY            0.377 0.77036645   0.000      0     3
Other          0.000 0.00000000   0.000      0     0

I think the trend seems genuine $-$ Labour go from a median number of predicted seats of 175 at 1st May to the current estimate of 191, the Tories go from 381 to 376 and the Lib Dems from 23 to 14. Probably not enough time to change things substantially (bar some spectacular faux pas from the Tories, I think), though...

I've also played around with the issue of coalitions $-$ there's still some speculation in the media that the "Progressives" (Labour, Lib Dems and Greens) could try and help each other by not fielding a candidate and support one of the other parties in selected constituencies, so as to maximise the chance of ousting the Conservatives. I've simply used the model prediction and (most likely unrealistically!) assumed 100% compliance from the voters, so that the coalition would get the sum of the votes originally predicted for each of the constituent parties. Here's the result.

The Progressive come much closer and the probability of an outright Tory majority is now much smaller, but still...

Monday, 15 May 2017

Through time & space

I've continued to fill in the data from the polls and re-run the model for the next UK general election. I think the dynamic element is interesting in principle, mainly because of how the data from the most recent polls could be weighed differently than those further in the past.

Roberto had done an amazing job, building on Linzer's work and using a rather complex model to account for the fact that the polls are temporally correlated and, as you get closer to election day, the historical data are much less informative. This time, I have done something much simpler and somewhat more arbitrary, simply based on discounting the polls depending on how distant they are from "today".

This is the results given by my model in the period from May 1st to May 12th $-$ at every day, I've only included the polls available at that time and discounted using a 10% rate, assuming modern life really runs very fast (which it reasonably does...). Not much is really changing and the predictions in terms of the number of seats won by the parties in England, Wales and Scotland seems fairly stable $-$ Labour is probably gaining a couple of seats, but the story is basically unchanged.

The other interesting thing (which I had done here and here too) is to analyse the predicted geographical distribution of the votes/seats. Now, however, I'm taking full advantage of the probabilistic nature of the model and not only am I plotting on the map the "most likely outcome" (assigning a colour to each constituency, depending on who's predicted to win it). In the graph below, I've also computed the probability that the party most likely to win a given seat actually does so (based on the simulations from the posterior distributions of the vote shares, as explained here) $-$ I've shaded the colours so that lighter constituencies are more uncertain (i.e. the win may be more marginal).

There aren't very many marginal seats (according to the model) and most of the times, the chance of a party winning a constituency exceeds 0.6 (which is fairly high, as it would mean a swing of over 10% from the prediction to overturn this).

This is also the split across different regions $-$ again, not many open battlefields, I think. In London, Hornsey and Wood Green is predicted to go Labour but with a probability of only 54%, while Tooting is predicted to go Tory (with a chance of 58%).

Friday, 5 May 2017

Flash forward sampling

Slowly but surely, I've managed to think a bit more about the elections model. Here, I've described how I included some prior information in my model to try and "discount" the evidence provided by the polls, to obtain estimates that may be more reasonable and less affected by the short-term shocks that may (over)influence people's opinions.

However, I wasn't entirely happy with the strategy I had used $-$ the informative priors I had set on the parameters $\alpha_p$ and $\beta_p$ did induce rather precise distributions. In addition, the analysis I have made wasn't making the most of the actual inferential machine I had constructed, because it was estimating the number of seats for the average vote shares profile. But in fact, I can do better than that and actually propagate fully the uncertainty in the vote shares and have an entire posterior distribution of the seats configuration.

So, first off, I think I've refined my priors and I did so by running the model simply through "forward sampling" $-$ in other words, by not including any of the polls in my analysis to better understand what implications were deriving by my choice of priors. By selecting the means and standard deviations for the vectors $\alpha$ and $\beta$, I effectively imply the following prior expectation in terms of the vote share.
The red dots represent the "historical" averages over the past 3 general elections, which I used as a reference point. You could fiddle a bit more with the parameters of the distributions for $\alpha_p$ and $\beta_p$, but I am reasonably happy with the implications of the current choice $-$ I'm expecting the Conservatives to do much better than the historical figure; Labour is expected to be around how they normally do, but there is a chance they'll do worse than "usual" and on average they're also doing worse than in the 2015 election. The Lib Dems are predicted with relatively large uncertainty and still under their historical average $-$ I think this is reasonable and many pundits are also aligned with this. Similarly, the prior effectively gives a very low weight to UKIP $-$ and this is in line with general consensus (I think) as well as the result of last night local elections.

Interestingly,  I can map these results and propagate the uncertainty to estimate the distribution of seats in Parliament (still with no data from the polls included), to produce the following graph.
Again, I think this picture is even more convincing than the analysis of the probabilities and I feel relatively confident with this. (But of course, one could replicate the whole analysis and try different specifications, which I have to some degree).

So it's now time to include the data that are pouring in from the polls. In particular, I now have information collected over the past two weeks or so and I think in a fast-moving election such as this where opinions may be changed by a large number of "facts" and stories, it's useful to "discount" the older data. There are many ways of doing this, more or less formally $-$ I'm using a rather quick and dirty strategy, by applying a simple discount rate defined as a function of time since today. 

Each observed poll gets rescaled as $$y^{j*}_{ip}= \frac{y^j_{ip}}{(1+\delta)^t}, $$
where $ y^{j*}_{ip}$ is the number of voting intentions for party $p$ in poll $i$ under voters of type $j$ (=1 for Leavers and =2 for Remainers); and $\delta$ is an arbitrarily defined discount rate. I've tested a few versions (ranging from 0.03 to 0.1) and the results do not vary dramatically $-$ the larger the discount rate, the more older polls are discounted, which tends to reduce by a minimum of 1 and a maximum of 4 the number of seats associated with the Conservatives. This is because in the very first few polls, the advantage associated with the Tories was bigger than in the most recent).

With a discount rate $\delta=0.1$, the results estimated in terms of seats won are as in the following graph.
So, Conservatives with a median estimated number of seats of 379 (and a 95% interval estimate of 369-391, way above the line of 325 seats that are needed for a majority), Labour with 175 (163-185), Lib Dems with 25 (17-31), SNP with 49 (46-54), Green with 1 and Plaid Cymru with 3 (0-4).

I think this analysis is interesting because it is fairly easy to assess the uncertainty propagated through the model up to the actual quantity of interest (the seats won). Other pundits are being a lot less favourable to the Lib Dems, but I'm kind of happy of how my model works, especially after considering the prior analysis.

Plenty more fun to come $-$ well, depending on your definition of fun...

Friday, 28 April 2017

Face value

I found a little more time to think about the election model and fiddle with the set up, as well as use some more recent polls $-$ I have now managed to get 9 polls detailing voting intention for the 7 main parties competing in England, Scotland and Wales.

I think one thing I was not really happy with the basic set up I've used so far is that it kind of takes the polls at "face value", because the information included in the priors is fairly weak. And we've seen in recent times on several occasions that polls are often not what they seem...

So, I've done some more analysis to: 1) test the actual impact of the prior on the basic setting I was using; and 2) think of something that could be more appropriate, by including more substantive knowledge/data in my model.

First off, I was indeed using some information to define the prior distribution for the log "relative risk" of voting for party $p$ in comparison to the Conservatives, among Leavers ($\alpha_p$) and Remainers ($\alpha_p + \beta_p$), but I think that kind of information was really weak. It is helpful to run the model by simply "forward sampling" (i.e. pretending that I had no data) to check what the priors actually imply. As expected, in this case, the prior vote share for each party was close to basically $(1/P)\approx 0.12$. This is consistent with a "vague" structure, but arguably not very realistic $-$ I think nobody is expecting all the main parties to get the same share of the vote before observing any of the polls...

So, I went back to the historical data on the past 3 General Elections (2005, 2010 and 2015) and used these to define some "prior" expectation for the parameters determining the log relative risks (and thus the vote shares).

There are obviously many ways in which one can do this $-$ the way I did it is to first of all weigh the observed vote shares in England, Scotland and Wales to account for the fact that data from 2005 are likely to be less relevant than data from 2015. I have arbitrarily used a ratio of 3:2:1, so that the latest election weighs 3 times as much as the earliest. Of course, if this was "serious" work, I'd want to check sensitivity to this choice (although see below...).

This gives me the following result:

Conservative     0.366639472
Green            0.024220681
Labour           0.300419740
Liberal Democrat 0.156564215
Plaid Cymru      0.006032815
SNP              0.032555551
UKIP             0.078807863
Other            0.034759663

Looking at this, I'm still not entirely satisfied, though, because I think UKIP and possibly the Lib Dem may actually have different dynamics at the next election, than estimated by the historical data. In particular, it seems that UKIP has clear problems in re-inventing themselves, after the Conservatives have by and large so efficiently taken up the role of Brexit paladins. So, I have decided to re-distribute some of the weight for UKIP to the Conservatives and Labour, who were arguably the most affected by the surge in popularity for the Farage army. 

In an extra twist, I also moved some of the UKIP historical share to the SNP, to safeguard against the fact that they have a much higher weight when it counts for them (ie Scotland) than the national average suggests. (I could have done this more correctly by modelling the vote in Scotland separately).

These historical shares can be turned into relative risks by simply re-proportioning them by the Conservative share, thus giving me some "average" relative risk for each party (against the reference $=$ Conservatives). I called these values $\mu_p$ and have used them to derive some rather informative priors for my $\alpha_p$ and $\beta_p$ parameters. 

In particular, I have imposed that the mixture of relative risks among leavers and remainers would be centered around the historical (revisited) values, which means I'm implying that $$\hat{\phi}_p = 0.52 \phi^L_p + 0.48 \phi^R_p = 0.52 \exp(\alpha_p) + 0.48\exp(\alpha_p)\exp(\beta_p) \sim \mbox{Normal}(\mu_p,\sigma).$$ If I fix the variance around the overall mean $(\sigma^2)$ to some value (I have chosen 0.05, but have done some sensitivity analysis around it), it is possible to do some trial-and-error to figure out what the configuration of $(\alpha_p,\beta_p)$ should be so that on average the prior is centered around the historical estimate.

I can then re-run my model and see what the differences are by assuming the "minimally informative" and the "informative" versions. 
Here, the black dots and lines indicate the mean and 95% interval of the minimally informative prior, while the blue dots and lines are the posterior estimated vote shares (ie after including the 9 polls) for that model. The red and magenta dots/lines are the prior and posterior results for the informative model (based on the historical/subjective data).

Interestingly, the 9 polls seem to have quite substantial strength, because they are able to move most of the posteriors (eg the Conservatives, Labour, SNP, Green, Plaid Cymru and Other). The differences between the two versions of the model are not huge, necessarily, but they are important in some cases.

The actual results in terms of seats won are as in the following.

Party        Seats (MIP)  Seat (IP)
Conservative        371        359
Green                 1          1
Labour              167        178
Lib Dem              30         40
Plaid Cymru          10          3
SNP                  53         51

Substantively, the data + model assumptions seem to suggest a clear Conservative victory in both versions. But the model based on informative/substantive prior seems to me a much more reasonable prediction $-$ strikingly, the minimally informative version predicts a ridiculously large number of seats for Plaid Cymru.

The analysis of the swing of votes is shown in the following (for the informative model).
  2015/2017         Conservative Green Labour Lib Dem PCY SNP
  Conservative               312     0      0      17   0   1
  Green                        0     1      0       0   0   0
  Labour                      45     0    178       8   0   1
  Liberal Democrat             0     0      0       9   0   0
  Plaid Cymru                  0     0      0       0   3   0
  SNP                          1     0      0       6   0  49
  UKIP                         1     0      0       0   0   0

Labour are predicted to now win any new seats and their losses are mostly to the Conservatives and the Lib Dems. This is how the seats are predicted across the three nations. 

As soon as I have a moment, I'll share a more intelligible version of my code and will update the results as new polls become available.

Tuesday, 25 April 2017


In the grand tradition of all recent election times, I've decided to have a go and try and build a model that could predict the results of the upcoming snap general election in the UK. I'm sure there will be many more people having a go at this, from various perspectives and using different modelling approaches. Also, I will try very hard to not spend all of my time on this and so I have set out to develop a fairly simple (although, hopefully reasonable) model.

First off: the data. I think that since the announcement of the election, the pollsters have intensified the number of surveys; I have found already 5 national polls (two by Yougov, two by ICM and one by Opinium $-$ there may be more and I'm not claiming a systematic review/meta-analysis of the polls.

Arguably, this election will be mostly about Brexit: there surely will be other factors, but because this comes almost exactly a year after the referendum, it is a fair bet to suggest that how people felt and still feel about its outcome will also massively influence the election. Luckily, all the polls I have found do report data in terms of voting intention, broken up by Remain/Leave. So, I'm considering $P=8$ main political parties: Conservatives, Labour, UKIP, Liberal Democrats, SNP, Green, Plaid Cymru and "Others". Also, for simplicity, I'm considering only England, Scotland and Wales $-$ this shouldn't be a big problem, though, as in Northern Ireland elections are generally a "local affair", with the mainstream parties not playing a significant role.

I also have available data on the results of both the 2015 election (by constituency and again, I'm only considering the $C=632$ constituencies in England, Scotland and Wales $-$ this leaves out the 18 Northern Irish constituencies) and the 2016 EU referendum. I had to do some work to align these two datasets, as the referendum did not consider the usual geographical resolution. I have mapped the voting areas used 2016 to the constituencies and have recorded the proportion of votes won by the $P$ parties in 2015, as well as the proportion of Remain vote in 2016.

For each observed poll $i=1,\ldots,N_{polls}$, I modelled the observed data among "$L$eavers" as $$y^{L}_{i1},\ldots,y^{L}_{iP} \sim \mbox{Multinomial}\left(\left(\pi^{L}_{1},\ldots,\pi^{L}_{P}\right),n^L_i\right).$$ Similarly, the data observed for " $R$emainers" are modelled as $$y^R_{i1},\ldots,y^R_{iP} \sim \mbox{Multinomial}\left(\left(\pi^R_{1},\ldots,\pi^R_P\right),n^R_i\right).$$
In other words, I'm assuming that within the two groups of voters, there is a vector of underlying probabilities associated with each party  ($\pi^L_p$ and $\pi^R_p$) that are pooled across the polls. $n^L_i$ and $n^R_i$ are the sample sizes of each poll for $L$ and $R$.

I used a fairly standard formulation and modelled $$\pi^L_p=\frac{\phi^L_p}{\sum_{p=1}^P \phi^L_p} \qquad \mbox{and} \qquad \pi^R_p=\frac{\phi^R_p}{\sum_{p=1}^P \phi^R_p} $$ and then $$\log \phi^j_p = \alpha_p + \beta_p j$$ with $j=0,1$ to indicate $L$ and $R$, respectively. Again, using fairly standard modelling, I fix $\alpha_1=\beta_1=0$ to ensure identifiability and then model $\alpha_2,\ldots,\alpha_P \sim \mbox{Normal}(0,\sigma_\alpha)$ and $\beta_2,\ldots,\beta_P \sim \mbox{Normal}(0,\sigma_\beta)$. 

This essentially fixes the "Tory effect" to 0 (if only I could really do that!...) and then models the effect of the other parties with respect to the baseline. Negative values for $\alpha_p$ indicate that party $p\neq 1$ is less likely to grab votes among leavers than the Tories; similarly positive values for $\beta_p$ mean that party $p \neq 1$ is more popular than the Tories among remainers. In particular, I have used some informative priors by defining the standard deviations $\sigma_\alpha=\sigma_\beta=\log(1.5)$, to mean that it is unlikely to observe massive deviations (remember that $\alpha_p$ and $\beta_p$ are defined on the log scale). 

I then use the estimated party- and EU result-specific probabilities to compute a "relative risk" with respect to the observed overall vote at the 2015 election $$\rho^j_p = \frac{\pi^j_p}{\pi^{15}_p},$$ which essentially estimates how much better (or worse) the parties are doing in comparison to the last election, among leavers and remainers. The reason I want these relative risks is because I can then distribute the information from the current polls and the EU referendum to each constituency $c=1,\ldots,C$ by estimating the predicted share of votes at the next election as the mixture $$\pi^{17}_{cp} = (1-\gamma_c)\pi^{15}_p\rho^L_p + \gamma_c \pi^{15}_p\rho^R_p,$$ where $\gamma_c$ is the observed proportion of remain voters in constituency $c$.

Finally, I can simulate the next election by ensuring that in each constituency the $\pi^{17}_{cp} $ sum to 1. I do this by drawing the vote shares as $\hat{\pi}^{17}_{cp} \sim \mbox{Dirichlet}(\pi^{17}_1,\ldots,\pi^{17}_P)$.

In the end, for each constituency I have a distribution of election results, which I can use to determine the average outcome, as well as various measures of uncertainty. So in a nutshell, this model is all about i) re-proportioning the 2015 and 2017 votes based on the polls; and ii) propagating uncertainty in the various inputs.

I'll update this model as more polls become available $-$ one extra issue then will be about discounting older polls (something like what Roberto did here and here, but I think I'll keep things easy for this). For now, I've run my model for the 5 polls I mentioned earlier and this is the (rather depressing) result.
From the current data and the modelling assumption, this looks like the Tories are indeed on course for a landslide victory $-$ my results are also kind of in line with other predictions (eg here). The model here may be flattering to the Lib Dems $-$ the polls seem to indicate almost unanimously that they will be doing very well in areas of a strong Remain persuasion, which means that the model predicts they will gain many seats, particularly where the 2015 election was won with a little margin (and often they leapfrog Labour to the first place).

The following table shows the predicted "swings" $-$ who's stealing votes from whom:

                      Conservative Green Labour Lib Dem PCY SNP
  Conservative                 325     0      0       5   0   0
  Green                          0     1      0       0   0   0
  Labour                        64     0    160       6   1   1
  Liberal Democrat               0     0      0       9   0   0
  Plaid Cymru                    0     0      0       0   3   0
  Scottish National Party        1     0      0       5   0  50
  UKIP                           1     0      0       0   0   0

Again, at the moment, bad day at the office for Labour who fails to win a single new seat, while losing over 60 to the Tories, 6 to the Lib Dems, 1 to Plaid Cymru in Wales and 1 to the SNP (which would mean Labour completely erased from Scotland). UKIP is also predicted to lose their only seat $-$ but again, this seems a likely outcome.

Thursday, 20 April 2017


If you fancy becoming like the crazy, purple minion, we have a Research Associated position at the UCL Institute for Global Health (with whom I've been heavily involved in the past year or so, while organising our new MSc Health Economics & Decision Science). 

All the relevant details and the link to the application form are available here. The deadline is 13 May.

Tuesday, 18 April 2017

Hope & Faith

In a remarkable and unpredictable (may be?) turn of events, the UK Prime Minister has today sort-of-called a general election for this coming June $-$ sort-of, if you don't follow UK politics, because technically a law prevents the PM to call snap elections, unless 66% of Parliament agrees to this and so there will need to be a discussion and then it will be Parliament to actually call the election...

Anyway, current polls give the ruling Conservative party way ahead with 43%. The Labour Party (who are supposed to be the main opposition, but have been in a state of chaos for quite a while now) have 25% $-$ this is compared to the results at the 2015 general election where the Tories got 37% and Labour 30%.

As if the situation weren't bleak enough for people of the left-ish persuasion (with Brexit and all), this doesn't seem to be very good news and many commentators (and perhaps even the PM herself) are predicting a very good result for the Tory, may be even a landslide.

But because of the electoral system, may be the last word has not been said: the fact is that the UK Parliament is elected on a first-pass-the-post basis and so Labour may not actually lose too many seats (as some commentators have suggested).

I went back to the official general election data and looked at the proportion of seats won by the main parties, by the size of the majority $-$ the output is in the graph below.

The story is that while there are some very marginal seats (where Labour won a tiny majority just two years ago), a 5% decrease in the vote may not be as bad as it looks $-$ although the disaffection with Labour is not necessarily uniformly distributed across the country.

More interestingly, I've also linked the data from the 2015 General Election with last year EU referendum $-$ one of the main arguments following the Brexit outcome was that the Remain camp were not able to win in Labour strongholds, particularly in the North-East of England. 

The 5 constituencies in which Labour holds a majority of less than 1% are distributed as follows, in terms of the proportion of the Remain vote:

  • Brentford and Isleworth: 0.5099906
  • City of Chester: 0.4929687
  • Ealing Central and Acton: 0.6031031
  • Wirral West: 0.5166292
  • Ynys Dulas: 0.4902312
(I know I have way too many significant figures here, but I thought it'd be interesting to actually see these values). So, apart from Ealing & Acton (strong Remain), there may be a good chance that the other four constituencies be made by people who are fed up with Labour and could be voting for some other party.

When you actually consider all the constituencies with a Labour majority of less than 10%, then the situation is like in the following graph.

Indeed, many of these are strong Leavers, which may actually be a problem for Labour. A few, on the other hand, may not be affected so much (because the "Remain" effect may counterbalance the apathy for Labour) $-$ although it may well go the other way and parties on a clear Pro-EU platform (eg the Lib Dems) may gain massively.

At the other hand of the spectrum, the corresponding graph for Conservative-hold areas with small majorities is like in the following graph.
For the Tories, the problem may be in Remain areas where they have a small majority $-$ there aren't a massive number of them, but I guess about 40% of these may be fought very hard (because they are relatively close or above 50% in terms of the proportion of Remain)?

Anyway $-$ I'm not sure whether Hope & Faith should be all smiley if you're a Pro-EU migrant. But then again, there is still some hope & faith...

Workshop on The Regression Discontinuity Design

As part of our bid to get an MRC grant (which we managed to do), we promised that, if successful, we'd also have a dissemination workshop, at the end of the project. Well, the project on the Regression Discontinuity Design (RDD) has now finished for a few months, but we're keeping our word and we have actually organised something that, as it happens, has probably turned into something slightly bigger (and better!) than intended...

As I was talking to Marcos (who's a co-director of our MSc programme, which I've mentioned for example here), we realised that the RDD is in fact a common interest of ours and so I jumped on his offer to do something together.

The plan is to have a full day on the 27th June at the Institute of Fiscal Studies in London, with the idea of mixing economists, statisticians and epidemiologists. We have a nice line up of speakers $-$ the original idea was more to show off the outputs of the project, but I think this works much better!

The registration is now open and free $-$ all the relevant information is here!