Feeds:
Posts
Comments

Punditocracy – Results

Last night I published a Google Docs spreadsheet with a number of predictions for the 2014 Election.  Now the results are in I have updated it, and also added a few more sets of predictions from The RuminatorBen Kluge (@benkluge), and Grumpollie; and added the last single poll predictions from the various media outlets on the advice of Thomas Lumley.

On the advice of Thomas I have also adjusted the “number polled”, which is used to calculate the standard errors, from 1000 to 400, which gives an average chi-squared value of about 12.5 for 9 degrees of freedom (not too far off from what we would expect) and roughly agrees with Thomas’ estimate of 2 for the poll-to-poll variation.

We can then use the chi-squared values in column E to give a measure of how close each prediction was to the actual results.  I was a bit surprised to come out in first place amongst the pundits (chi-squared = 4.5), ahead of Gavin White (5.1) and Bryce Edwards (6.1).  David Cunliffe (5.3) is technically in 3rd spot, but I’m not going to count that because he left a lot of blanks in his predictions which I just filled in based on a scaled average of everybody else’s guesses, as explained yesterday.  The top performing poll-of-polls was William Bowe’s (5.8), followed by David Farrar’s Kiwiblog Weighted Averages (5.9).  A handful of pundits and polls-of-polls actually lost out to the 2011 Election results (chi-squared = 10.5), which is a bit of a surprise, but perhaps goes to show what an uneventful three years it has been for the major NZ political parties!

The average prediction in the table was about as useful as a randomly-sampled poll of about 290 people, which might not sound particularly good, but keep in mind that the performance of the average poll (typically with a sample size of about 900-or-so) was only about as useful as a randomly-sampled poll of about 260 people.

The only other point of note: even after (generously) rounding up the results to account for special votes every single one of the 23 predictions in the table will still have over-estimated the votes the Greens would win.  And for many that would be after correcting for the Greens under-performing relative to the polls at the last election.  Nobody on Twitter seems to have a credible explanation for why this happened.

Punditocracy

Punditry Is Fundamentally Useless – Nate Silver

A few people on Twitter and elsewhere have been making pre-election predictions for tonights results, so I thought it would be handy to have them all in one place so we can see how everybody performed.  I have created a Google Docs spreadsheet with everybody’s guesses.  Please feel free to download a copy and/or share it as you wish.

Fist things first: I made a previous reckons on Twitter, but I left out the “other” numbers, which I think will be quite high this year, so my updated predictions are as follows.

  • National 47.40%
  • Labour 25.70%
  • Green 12.00%
  • NZ First 7.00%
  • CCCP 3.50%
  • IMP 1.50%
  • Maori 1.20%
  • ACT 0.70%
  • United Future 0.20%
  • All other 0.80%

Not everybody in the spreadsheet made a full set of predictions; if there were results missing for any parties I simply took the averages of everybody else who made a prediction for those parties and scaled them so that the total numbers would add up to 100%.

The variance in the predictions is about what we expect if they were taken from a set of different polls with 1000-or-so respondents, so that number is in the spreadsheet at the top in cell B2.  This number is then used to calculate a “standard error” for each prediction (columns O-X) and measure how far off the actual results each prediction was (columns Y-AH).  These are then summed to get a Chi-squared value (NDF = 9), which is shown in column C.  This gives a measure of how close each prediction was to the actual results.

I will update the sheet when results are finalised.  In the mean time, if you know of anybody who has made a public prediction of the results that isn’t included please hit me up in the comments or on Twitter.

Following on from previous posts, another short update on the iPredict stocks for National and Labour to win the 2014 election.

Daily average trade prices for National broke through the 80c barrier on 26 June, and Labour went below 20c on the same day.  The prices were reasonably stable around those levels for about six weeks until 13 August 2014 when the Dirty Politics scandal broke and National took a steep hit.  Average daily prices for National were below 70c for several days, and bottomed out at about 64c on 22 August, well before Collins’ resignation on 30 August.  The stocks have since rebounded, with National today trading for around 84c, an all-time high.

While there was obvious movement, most likely attributable to fear over the fallout from the Dirty Politics scandal, it was short lived.  As mentioned in the previous post the clock is running out for Labour, which needs to find some sort of game-changer, and there is less and less time left before the election for them to do so.

Graph of prices below:

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Since Dirty Politics was released trading volumes and volatility are up significantly. During the last month there have been over 7,100 trades (National and Labour combined), and total volume was over 115,000.  Since opening on 26 October 2011 the total volumes traded are about 195,000 for National and 173,000 for Labour, so almost 1/3 of total volume traded in the last 3 years has been in the last month.  The stocks definitely aren’t moving about on small volumes.

Weekly volume, 2014 National election victory stock on iPredict.

Weekly volume, 2014 National election victory stock on iPredict.

Weekly volatility, 2014 National election victory stock on iPredict.

Weekly volatility, 2014 National election victory stock on iPredict.

The consensus seems to be that this isn’t going to be a particularly close election.

Undecided?

There has been a bit of debate on Twitter about this post at The Political Scientist, which argues that the National’s rise in the polls is merely the result of Labour voters switching to “undecided” and reducing the denominator.  There were also a few requests for an explanation, and Thomas at StatsChat has obliged here (also, I stole his title).  I have a few points I want to add or expand on.

Firstly, as an aside, many of the mistakes are the same as those made in this post on Kiwiblog which argues based on correlation coefficients that in 2011 it was mainly National voters who stayed home, not Labour voters.

Secondly, we don’t actually know the number of undecided voters.  As pointed out in comments on the StatsChat rebuttal many of the raw numbers are weighted by demographics, probability of voting, and others (whether or not they also have a cellphone?).

Thirdly, the results for the correlation coefficients are very susceptible to the number of polls.  On first read this particular table from The Political Scientist stood out:

Correlation coefficients, from The Political Scientist.

Correlation coefficients, from The Political Scientist.

The table shows the correlation coefficients with the undecided vote for four parties for all nine Fairfax polls from July 2012 to June 2014 (top), and for only eight polls, with the June 2014 poll results excluded (bottom).  You can see that the correlation coefficient for National changes from 0.7 to 0.3 with the addition of a single poll!  Obviously the results aren’t particularly robust, and that is equally as true for the other three parties as well, even if they just happened to show smaller changes in the table above.

Taking this a step further, it should be reasonably obvious that you can’t trust estimates of correlation coefficients based on a small number of data points.  When you have only two data points to work with you must get a correlation coefficient of 1 even if there is no actual correlation between the things you are measuring, because for any two given points it is possible to draw a straight line that passes through both of them (or, rephrasing, two points define a straight line).  Adding more data points will move your estimate of the correlation coefficient closer to the true value, but with a small number of polls you can never be very confident.

As another aside, always be suspicious when you see results quoted to a large number of significant figures.  There’s nothing wrong with it in principle, but it raises the question of how accurate they really are.  In this particular case, if the addition of a single poll moves National’s coefficient from 0.7 to 0.3 then there’s no point quoting more than one decimal place, if at all.

Fourthly, there seems to be confusion between different coefficients.

Thomas covers this point, the difference between correlation coefficients and regression coefficients, in paragraphs 2-3.

More intuitively though, the correlation coefficients shown in the table above between NZ First and undecided voters (0.8) is almost that same as that for Labour’s.  Does the drop in NZ First support cause the increase in undecided voters?  In the last two year the number of respondents supporting NZ First fell from 32 to 24 (see linked table below), while the number of undecided respondents went from about 110 to 230.  Would you argue that the 8 former supporters per 1000 lost by NZ First turned into 220 new undecided voters?  Of course not!

Poll results, and estimated number of respondents, from The Political Scientist.

You may argue that the real evidence is that the number of supporters lost by labour is (roughly) equal to the increase in the number of respondents who are undecided, and that correlation coefficients have nothing to do with it.  And that’s fine.  But then why bother publishing the correlation coefficients at all?

Fifthly, correlation does not imply causation (see also, xkcd).  When dealing with correlation effects you have to be very careful to avoid false causation.  Even assuming the changes aren’t just a statstical fluctuation we still can’t say whether Labour voters are really becoming undecided.  As Thomas says

You could fit the data just as well by saying that Labour voters have switched to National and National voters have switched to Undecided by the same amount — this produces the same counts, but has different political implications.

If you’re a frequentist then Thomas’ alternative explanation is just as convincing.  If you’re Beysian then now might be a good time to break out Occam’s Razor and say that you thought that Labour voters were switching to undecided anyway, so you believe the first hypothesis.  Which is fine.  But in that case was there any value in the analysis?

The only way to figure out what it really going on is to do a longitudinal study where you use the same sample of voters for each poll.

Sixthly, in their conclusion The Political Scientist says

Without taking into account the level of the undecided vote this presents a misleading picture of how many voting age New Zealanders support each party.

Of course, by limiting reporting only to those who have declared party support and are likely to vote the reporting may very well reflect what could happen in an election.

This is sort of hedging both ways.  If the point of the Fairfax poll is to gauge support and/or try and predict the results of an election “if it were held today”, then the pollsters must do something with the undecided voters.  Unless you have evidence that they break differently than for decided voters (which could be the case), it seems sensible to just ignore them when publishing the headline results.  It’s not “a very misleading picture” at all.

Bonus: check out this excellent diagram from Wikipedia showing the differences between regression coefficients and correlation coefficients.  All the graphs in the middle row (except the centre) have the same absolute correlation coefficients.

Correlation coefficients.

Following on from previous posts, another short update on the iPredict stocks for National winning the 2014 election.

Daily average trade prices for National have trended up towards 80c, and Labour down toward 20c.  Prices are as follows:

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

You can see that the prices for National stabalised just above 70c in April and May after earlier peaking at around 76c in mid-March.  I don’t believe that any single event has caused the prices to move.  It’s more a case of running out the clock; Labour needs some sort of game-changer, and there is less and less time left before the election for them to find one.

Weekly trade volumes are up, and now amount to about 1600 shares per week, or $1100 per week.

Daily volume, 2014 National election victory stock on iPredict.

Weekly volume, 2014 National election victory stock on iPredict.

Segregation

Following on from the previous post about Electorate geodata, if we know the votes cast at the polling place-level we can look for geographic differences in vote distributions.  Because political prefernces are correlated with socio-economic status we can then use any differences to quantify the degree of segregation in each electorate.

As in the previous post we calculate the average of the positions of the polling places in each electorate weighted by the total number of party votes cast at each polling place, and also separately weighted by the number of left-wing party votes (Labour and Green) and right-wing party votes (National and ACT).  The distance between these left-wing vote and right-wing vote centres of gravity gives us a measure of segregation in each electorate.  (We normalise this distance relative to the two-dimensional normal distribution defined by the standard deviations of the positions of the centres of gravity for all party votes for each electorate.  It’s not the best way to measure this, but it’s a case of close enough is good enough.)

Based on the results of the 2011 election the most segregated electorate in New Zealand is Botany, shown below (click captions for interactive versions).

The centre of gravity for the left-wing vote in East Tamaki lies over 1km south of that for the right-wing vote in Dannemora.  This agrees reasonably well with prior expectations based on the differences in socio-economic status between Auckland suburbs such as Howick and Otara.

The second most segregated electorate is Mangare:

Again we see a difference as the more northern suburbs such as Mangare Bridge lean (relatively) more in favour of National than more southern suburbs such as Mangare and Mangare East.  The right wing vote lies approximately 1km NW of the left wing vote.

The least segregated electorate in New Zealand is Rotorua.

In spite of the rather large provincial/rural electorate the centres of gravity for the left-wing and right-wing votes lie only a few hundred metres apart, just 4km from central Rotorua.

In each of these three electorates, however, either Labour or National won a fairly high proportion of the party votes, which means that the segregation measures aren’t particularly robust.  A single enclave voting out of sync with the rest of the electorate could be enough to move the centres of gravity around and mess with the results.  If we confine the analysis to electorates closer to the national median, where the left wing vote was in the range of 60%-100% of the right wing vote, then the most segregated electorate was Whanganui:

We can see a clear difference caused by Hawera and rural South Taranaki voting (relatively) more in favour of National, and Whanganui city voting (relatively) more in favour of Labour.

The least segregated electorate was Ohariu:

These calculations have implications for get-out-the-vote efforts in the different electorates.  Given limited resources, Labour supporters in Whanganui would be best to focus their efforts on Whanganui city.  Likewise, National supporters in Botany should be mainly focussing their efforts on the northern two-thirds of the Botany electorate.

Any segregation occurring in electorates also gives us some idea about how the Representation Commission goes about their task of drawing up the electoral boundaries.  From the Electoral Commission website:

All electoral districts must contain electoral populations within 5% of the quota for the North Island or South Island General
electoral districts or the Māori electoral districts as applicable.

The Representation Commission, in forming General electoral districts, is required by the [Electoral Act 1993] to give due consideration to:

  • the existing boundaries of General electoral districts;
  • community of interest;
  • facilities of communications;
  • topographical features; and
  • any projected variation in the General electoral population of those districts during their life.

Studying the segregation implied by the differences in vote distributions amongst polling places gives us some idea of how the Commission go about their task.  We are lucky in New Zealand that because we have MMP the drawing up of electorate boundaries is largely apolitical, and we don’t really have to worry about gerrymandering.  Nevertheless, there are tradeoffs between the different features given consideration, and the existence of any form of segregation as seen above is evidence that the Commission is, at least to some extent, prioritising existing boundaries over communities of interest and topographical features.  This is particularly true in the case of Botany, which was added during the 2007 boundary review to reflect population growth in the Auckland region.  In addition to being the most segregated electorate the Botany electorate boundary looks ugly too, which is no doubt the result of trying to carve an area out of the Auckland region for a new electorate whilst maintaing as much as possible of the existing boundaries.

Appendix

Again had a few hassles dealing with the geodata.  The first step was to download the 2007 electorate district boundary files from Statistics New Zealand.  Unfortunately I couldn’t get ArcExplorer running, so instead I installed the GDAL package and ran the ogr2ogr wrapper from the terminal to convert them to KML format.  Thirdly, we want to extract the boundaries for single electorates, which was done by selecting layers in Google Earth and exporting.  Fourthly, upload the individual boundary files in KML to Google Maps Engine using the classic Google Maps interface.  And finally we can use the new Google Maps Engine to import and convert old KML from classic maps, and then add the other data points as necessary as a separate layer on top.  There has to be an easier way to do this without dropping $USD400 for the full version of Google Earth!  Hit me up if you have any recommendations.

Following on from previous posts, another update on the iPredict stocks for National winning the 2014 election.

Daily average trade prices are as follows:

Average daily trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

And zoomed in on the last nine months, starting just before Shearer resigned as leader:

Average daily trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

After topping out at $0.78 on 28 March prices for NATIONAL have stabalised in the low 70% range.

Things have not really gotten better for Labour, with the split now fairly stable at 70/30 odds over the last month.  The book is still a little asymmetrical, stronger on the bid side than the ask side, although there is a bit more room for the price to move around than there was previously.

I still think National’s chances to win are priced a little high, but at 70% it’s close enough that I’m not going to bother throwing money at it to try and fix it.

Follow

Get every new post delivered to your Inbox.