Gianluca Baio's blog: December 2012

Monday 31 December 2012

XY

For a while, I thought that the best personal news of the year was the book; until this happened, that is...

Wednesday 19 December 2012

You'd think that the last week before the holidays would be very quiet and not much would be going on. Well, if you did, you'd be wrong, I guess, as the last few days have been quite busy (for many reasons).

Anyway, I managed to track down some better data on a set of recent polls for the upcoming Italian elections. Most of the public sites I found had a basic table with the proportions representing the voting intentions for a set of units sampled in a given poll. This means that the actual sample size (or any other information about the poll) is in general not present, making it impossible to appreciate any form of uncertainty associated with the estimations.

I think I may be able to get good quality information from YouTrend.it, who collect this kind of information too (although it is not directly present in the data that are publicly available). This would be good; and we may be able to update the estimations in some clever way, as more polls are conducted closer to the actual date of the elections.

But in the meantime I've compiled a list of the polls conducted after October and, for nearly all of them, I was able to link the sample size. This was by no means systematic, as I was trying to get what was available and usable; on the plus side, however, I think that I actually got most of the reasonably sized ones. The resulting dataset consists of $N=45$ polls, collecting voting intentions for $J=11$ parties, which will all take part in the "event".

I've hastily run a relatively simple multinomial regression model. For each poll, I modelled the observed vector of voting intentions $\mathbf{y}_i = (y_{i1},\ldots,y_{iJ})$ as
$$ \mathbf{y}_i \sim \mbox{Multinomial}(\boldsymbol{\pi}_i,n_i) $$
where $\boldsymbol\pi_i=(\pi_{i1},\ldots,\pi_{iJ})$ is the vector of polls- and party-specific probabilities; each of them is defined as
$$ \pi_{ij} = \frac{\phi_{ij}}{\sum_{j=1}^J \phi_{ij}}. $$
The elements $\phi_{ij}$ represent some sort of "un-normalised" probabilities (voting intentions) and for now I have modelled them using a convenient log-linear predictor
$$ \log(\phi_{ij}) = \alpha_j + \beta_{ij}. $$
The parameters $\alpha_j$ are the "pooled estimation" of the (un-normalised) voting intentions for party $j$ (where pooling is across the $N$ polls), while the parameters $\beta_{ij}$ are poll-party interactions. For now, I gave them vague, independent Normal priors and I re-scaled the $\alpha_j$'s to obtain an estimation of the pooled probabilities of votes for each party
$$ \theta_j = \frac{\exp(\alpha_j)}{\sum_{j=1}^J \exp(\alpha_j)}. $$
Although relatively quick and dirty, the model gives some interesting results. Using the coefplot2 package in R, one can summarise them in a graph like the following.

The dots are the means of the posterior distributions, while the light and dark lines represent, respectively, the 50% and 95% interval estimations. Much as I thought before, the interesting thing is that the Democratic Party (PD) seems to have a firm lead, while the two parties fiercely competing for the worst name ever to be bestowed upon a political party, Berlusconi's PDL (which translates as "Freedom People") and M5S ("5 Stars Movement") are also competing fiercely for the same chunk of votes. Also, there's a myriad of small parties, fighting for 2%-5%.

Of course, the model is far from ideal $-$ for example, I think one could model the parameters $\alpha_1,\ldots,\alpha_J \sim \mbox{Normal}(\boldsymbol{\mu},\boldsymbol{\Sigma})$ to account for potential correlation across the parties. This for example would make it possible to encode the fact that those showing intention to vote for PD are very unlikely to then switch to PDL.

Thursday 13 December 2012

Nataniele Argento

Effectively, the Italian election campaign is already in full swing, so I tried to start collecting some data to see what we are about to face. If I have time and manage to get some good data, I'll try to replicate the analysis I made for the US election (surely Italy needs its own version of Nate Silver too?).

However, there are quite a few differences in this case. First, the availability of data is not necessarily so good. I could find some downloadable lists of recent polls. But for almost all, the only information provided was about the estimated proportion of votes (with no summary of sample size or uncertainty). [Of course, I'm not saying that such data don't exist in Italy $-$ just that it was far easier to find for the US].

Second, Italy is very much a multi-party system; so I guess it is a bit more complicated to produce predictions and modelling, because you can't just use simple-ish Binomial assumptions (e.g. you either go Republican or Democratic). This makes for more complex interactions as well, since probably there is a level of correlation in the voting intentions towards parties that may (but also may not) eventually ally in coalition to form a government.

Anyway, I quickly played around with some data I found on recent polls (in fact this goes back to 2007). The graph below shows the voting intention (as percentages) for the two main parties (PDL in blue is Berlusconi's party, while PD in red is the "centre-left" Democratic Party). Also, I plotted the relatively new "Movimento 5 Stelle" (M5S, or "5 Stars Movement" $-$ I've already complained about the names of Italian political parties; I really don't know where they get them!).

The vertical lines indicate:

The time when news that Berlusconi had attended the birthday party of 18 year old Noemi Letizia got out. This effectively started the scandal... what's the word I'm looking for? "linking"? Berlusconi to young girls;
The time when Berlusconi was sent to trial for a reference about a sexual relationship with the teenager El Mahroug, aka "Ruby Rubacuori";
The time when Mario Monti took over as PM, following resignation by Berlusconi amid concerns about the possibility of Italy defaulting its debt.

It seems to me that three things stand out from the graph:

PD are sort of stable, around 30%. The current increase in the voting intentions may be due to high exposure in the media related to the primary elections that they held to determine the candidate to lead the party into the next general election;
PDL are plummeting in the polls, and at least the first of the three time points I mentioned above seems to clearly mark the beginning of this decrease;
M5S seems to have captured much of the votes lost by PDL (of course this is absolutely partial, as there are other 2 million small parties in the fray; nonetheless it looks like a clear trend).

Of course, there is so much more to this $-$ simply because no party is likely to win enough votes to govern on its own. Thus other players, like La Lega, usually Berlusconi's allies; or relatively recently formed SEL (Sinistra, Ecologia e Libertà $-$ Left, Ecology and Freedom) who are likely to support the Democrats, will certainly play a very important role.

I'm tempted to say that if PD keep their focus and choose reasonable partners in coalition, they should be able to hold on their core voters and gain enough to (comfortably?) win the elections. But (beside the fact that the campaign is just started and it's not even clear whether Monti will make himself available $-$ and if so, who with), this being Italy, crazy stuff can always happen...

Sunday 9 December 2012

Thriller (or the return of the living dead?)

As it turns out, another Italian government is about to end, as Professor-turned-Prime Minister Mario Monti has resigned, following former PM Silvio Berlusconi's party "categorical judgement of no confidence".

Arguably, the situation is no piece of cake in Italy and surely Monti's all-technocrats government have not got all their decisions right. But, it seems to me, you really can't expect everything to go just fine, after so many years of living well above the level that the country could really afford.

Berlusconi has declared that he's back in contention for the job and will be (effectively by divine right $-$ you may have guessed by now that I'm no big fan of his) the centre-right party's candidate.
[Incidentally, it is not clear what the party will be called this time around. It was "Forza Italia" in 1994, then "Popolo delle Libertà" in 2009 $-$ neither seems to me like a serious political party name, but of course they have proved quite popular among at least half of my fellow country men and women.]
This has thrown the whole country in a state of either excitement or utter despair at the prospect of Berlusconi back in power.

I think it's amazing that this is still possible. What with all the trials, you'd think that Berlusconi was a widely discredited politician $-$ after all, I suppose that Bill Clinton was busted for less and his political career was effectively over, post-Monica Lewinsky. Berlusconi on the other hand is seemingly unfazed by all this. More importantly, many people in Italy are.

Tuesday 4 December 2012

Seminar next week

As part of the UCL Biostatistics Network, we have what sounds like a very good seminar lined up for next week (Tuesday 11th December, 4pm; Drayton B20 Jevons Lecture Theatre $-$ you can find a map here).

The speaker is Stephen Senn; I've seen him speak a few times and he is just really, really good!

Title: Bad JAMA?
Speaker: Stephen Senn
Abstract:
"But to be kind, for the sake of completeness, and because industry and researchers are so keen to pass the blame on to academic journals, we can see if the claim is true….Here again the journals seem blameless: 74 manuscripts submitted to the Journal of the American Association (JAMA) were followed up, and there was no difference in acceptance for significant and non-significant findings." (Bad Pharma, p34). A central argument in Ben Goldacre's recent book Bad Pharma is that although trials with negative results are less likely to be published than trials with positive results, the medical journals are blameless: they are just as likely to publish either. I show, however, that this is based on a misreading of the literature and would rely, for its truth, on an assumption that is not only implausible but known to be false, namely that authors are just as likely to submit negative as positive studies. I show that a completely different approach to analysing the data has be used: one which compares accepted papers in terms of quality. When this is done, what studies have been performed, do, in fact, show that there is a bias against negative studies. This explains the apparent inconsistency in results between observational and experimental studies of publication bias.

Gianluca Baio's blog