Archive for the ‘Methodology’ Category

One of the first things you must decide upon when working on a MC simulation is the granularity of the simulation.

In particle physics the simulation would normally be at the level of the leptons, hadrons and photons we can see in the detectors, or perhaps even at the level of quarks if you are simulating plasma or collisions at higher energies.

There is a tradeoff, though, between accuracy at small scales and other factors such as computing power, data size, and also access to the raw data you need to run the model.  It wouldn’t make sense to try to simulate the weather or a tsunami at the quark-level, for example. Instead you would carve the atmosphere or the ocean up into an appropriately sized grid with granularity anywhere from a hundred meters-or-so up to tens of kilometers.

We have similar issues when trying to simulate election results, and there are a handful of fairly obvious choices for the granularity of the simulation.

  1. Nationwide: basically take the polling averages (with or without considering any margins of error) and assume that that is how the party votes will fall.  Throw in reasonable assumptions about which party’s candidates will win in Epsom, Ohariu, and the seven Maori electorates and you’ve got your result.  This is a good solution if you just want a rough guess at which side will form the government, and it is the level of reporting you typically get from the media and the blogs whenever a new poll is released.
  2. Electorate and candidate-level: This is a little more complicated.  Ideally you would want polling for each electorate, but even without that you can do an alright job by using the relative differences in results between electorates from a previous election.  This will cause problems when electorate boundaries change, however, so while it might have worked alright for the 2011 election it is a bit of a dodgy proposition for 2014.
  3. Polling place-level: The New Zealand Electoral Commission publish polling place analysis by electorate for both the party vote and the electorate candidate vote, helpfully in CSV format as well as HTML.  As with an electorate-level or candidate-level simulation you can do an acceptable job by using the relative differences in results between polling places from a previous election.  Difficulties occur when polling places are discontinued or new polling places are added between elections, and also when there is significant migration, such as that which occurred in and around Christchurch after the 2011 Christchurch Earthquake.
  4. Voter-level: Very handy if you are a political party and you want to tailor campaign material and get-out-the-vote efforts at specific individuals.  In fact, the Obama 2012 campaign data team is well known for performing simulations and doing analysis at this level of granularity. Many New Zealand libraries have copies of the Habitation Directory Habitation Index, which is the electoral role from the most recent 2011 General Election ordered by address.  Assuming you could get your hands on a digital copy then it is at least theoretically possible to geomap individual voters and make reasonable assumptions about their education, income, where they voted and who they voted for, albeit with a lot of attenuation bias.  Unfortunately if that is all the information you have to work with then that is about where you would get stuck.  If you had access to poll results with voting preferences for each person polled then things could start to get interesting, but for obvious reasons only the aggregate polling results are made available to the public.  I wouldn’t be surprised to see the two major parties working on this level of analysis and undertaking highly targeted messaging a few election cycles down the track, but I don’t think anybody in New Zealand is there yet.  Having said that, see the photo in this tweet from @somewhereben for evidence that MPs and volunteers knocking doors are already working to get their hands on some of the voter-level information that will be needed to pull this off.

I still haven’t made a final commitment to what level granularity to work at, but in the mean time I’m playing around with the polling place results try and see if we can understand what happens to voter turnout between elections at that level.  Hopefully the turnout at each polling place will be reasonably constant over time.

Read Full Post »

When I started the blog a couple of years ago I sort of promised to write a series of posts on how the simulation works so that others could replicate the results, if need be. Unfortunately gainful employment has interfered, and one week out from the election there is no way I will get this finished. Still, better late than never.

While there is a bit of maths going on behind the scenes, the general principle is surprisingly simple: average all available NZ political polls, and then run a Monte Carlo Simulation to get all the interesting information we need to make the graphs. This process is summarised in the schematic below:

General overview of poll averaging and election simulation.

General overview of poll averaging and election simulation.

The process can be divided up in to 3 main steps:

  1. Polling information (red): Moving averages of the political polls and information regarding electorate swings are calculated from the input polls and the results of the 2005 and 2008 General Elections. (NB, this information, along with the party lists in step 3, is the only information that goes in to the calculations.)
  2. Election simulation (blue): Using the Monte Carlo method, running on a standard laptop with a standard pseudo-random number generator, an election is simulated, based on the polling averages and electorate swings calculated in step 1.
  3. Scenario analysis (green): Using the simulated election results from step 2, we look at the party lists and figure out who gets in to parliament.  We then look at any other result that may be of interest.  Normally this would be the number of seats won be each party, which parties will form a coalition and so on, but in theory, if the simulation in step 2 is working correctly, it can be anything you may be interested in looking at after a real election.  For example, if you wanted to, you could look at the number of women candidates winning a South Island electorate seat.

Of course, depending on the pseudo-random numbers dished up in step 2 you may get a relatively unlikely result: perhaps based on current polling your simulation gives National 47% of the vote, Labour 32%, Greens get 14%, and New Zealand First 5%.  This is possible, but not the most probable outcome.  To make sure the results are realistic we simply repeat steps 2 and 3 a large number of times, and keep a running total for each variable or outcome we are interested in measuring.  By doing this any unlikely statistical fluctuations should cancel each other out, and we can get a meaningful measurement of the numbers we are interested in.

Typically steps 2 and 3 are repeated 50,000 times for each day we simulate an election for, which takes about a minute or so of computer time.  To get the time series graphs, we have to do these simulaitons for each day we are interested in, although normally they are just run for the last couple of hundred days to update any recent movement, such as the Scenario Analysis time series graph, for example (scroll down to “Scenario Analysis”).

Each time we complete step 3, we update a running total of the variables we are interested in (number of seats won by National, number of women candidates winning a South Island electorate seat, etc.) and also the variables-squared (number of seats won by National squared, number of women candidates winning a South Island electorate seat squared, etc.). We then divide by the total number of simulations (say, 50,000) and that gives the expected values and expected value-squareds. For example, in yesterdays simulations the National party won 3,270,000 seats, and dividing by 50,000 gives an expected value of 65.4 seats. A bit of seventh-form stats gives the root mean square (RMS) error on the expected value, and that is how we get the final value of 65.4 +/- 0.8 seats for National (scroll down to “Seats in Parliament”).

That’s all there is to it. The calculations for the poll averaging and the simulation get a bit more involved, although probably not much harder than a first-year uni level maths course, but the general principle of the calculation should be surprisingly simple.

Read Full Post »

A busy week for political polls. We now have another new poll released on Sunday, 18 April by TV3-Reid Research. The updated polling averages now have National on 54.4% +/- 1.4% and Labour on 30.7% +/- 1.3%, both of which appear to be statistically significant changes when compared with the previous update five days ago. Changes for other parties are not statistically significant, which means in the last couple of weeks National has most likely won about 3% +/- 2% points of support at the expense of the Labour Party.

As usual, the two graphs below summarise the polling averages for the party vote after the new poll. The horizontal axes represent the date, starting 60 days before the 2008 NZ General Election, and finishing on the present day. The solid lines with grey error bands show the moving averages of the party vote for each party, and circles show individual polls with the vertical lines representing the total errors.

Party vote support for the eight major and minor NZ political parties

Party vote support for the eight major and minor NZ political parties as determined by moving averages of political polls. Colours correspond to National (blue), Labour (red), Green Party (green), New Zealand First (black), Maori Party (pink), ACT (yellow), United Future (purple), and Progressive (light blue) respectively.

Party vote support for the six minor NZ political parties

Party vote support for the six minor NZ political parties as determined by moving averages of political polls. Colours correspond to Green Party (green), New Zealand First (black), Maori Party (pink), ACT (yellow), United Future (purple), and Progressive (light blue) respectively.

As always, please check the Graphs page for further simulation results.

I should probably point out here that when calculating the polling averages I correct for “pollster bias” – the tendency for each pollster to systematically overestimate support for some parties and underestimate support for others – by comparing each pollster’s results to the trendline and subtracting out any differences. The problem here is that TV3 have moved from using TNS as their pollster prior to the 2008 Election to using Reid Research since, but the calculation treats TNS and Reid Research as the same pollster. More specifically, the calculation assumes that Reid Research and TNS have the same systematic polling biases. TNS used to consistently overestimate support for Labour relative to the trendline, which means that polling numbers for Labour are now generally forced down a little, even though Reid Research don’t tend to overestimate Labour’s support relative to the trendline in the same manner. For this reason, the polling averages given above are probably a little worse for Labour than they realistically should be, and the opposite applies for National. In the future I may have to fix this problem by treating TNS and Reid Research as separate pollsters; it’s something I’m looking into at the moment and I’ll make a decision before the countdown to the 2011 Election.

In other news, it appears National has officially decided to not campaign in Epsom and Ohariu for the electorate seats held by the leaders of their coalition partners ACT and United Future, respectively. This could be a game changer come election time. I have previously blogged on the strategic implications of the Ohariu seat for Labour, which haven’t really changed, although their best short term tactic may now be to contest the seat aggressively instead of throwing it to National, as suggested in the previous analysis.

Read Full Post »

One of the major goals of this site is to try and predict election results based on recent relevant political polling. This is intended to include not just the total number of seats won by each party, but also viable coalition possibilities and electorate level results.

Today I present the simulation results at the candidate level, including probabilities for each major or minor party candidate to be elected to parliament by either winning an electorate or being selected off their party list. First though, in the interests of disclosure, I thought I should release the data I’m using for the party lists. The 2008 Party Lists are available in MS Excel (.xlsx) format [82kB].

The simulation requires the party lists to be input in the form of a list linking each candidate to the electorate they stood for (or else indicating they were a list only candidate.) The list is based largely on the information on the Party lists for the 2008 General Election page on the Elections New Zealand website, and the electorate information from the Candidates by electorate page from Wikipedia, and is amended at discretion.

The data format is as follows:

  1. Party Code : A unique code for each political party.  Parties are numbered 0~7, and ordered firstly by the number of seats won in the 2008 NZ General Election, and secondly by the number of party votes received.
  2. Party
  3. List Ranking
  4. Name : The name of the candidate as given on the Elections New Zealand website.
  5. Electorate : The electorate the candidate stood in. For list-only candidates this will read “list only.”
  6. Electorate Code : A unique code for the electorate.  Electorates are numbered in alphabetical order, with general electorates (#0 ~ #62) preceding Maori electorates (#63 ~ #69). If the candidate is a list-only candidate candidate this code will take the value -1.

For computational reasons each major or minor party candidate standing in an electorate must have a list ranking. In order to avoid having this requirement affect the results the lists are simply extended to 100 candidates for each party, with electorate-only candidates placed in the lowest ranked positions such that they will never be elected to a list seat. The intermediate positions are then filled with dummy candidates: for example, the candidate “NAT-74-list only” refers to the 74th ranked candidate on the National Party list, with the “list only” suffix indicating that they are a list-only candidate. Please feel free to use or amend the file at will. Corrections gratefully accepted.

The current simulation, however, does not use the 2008 Party Lists. Instead it references an alternative list that has been amended to take into consideration changes during the current term of Parliament. This current list used for the simulation is also available in MS Excel (.xlsx) format [82kB]. A summary of changes is as follows:

  1. Labour : The list reflects the retirements of Helen Clark and Michael Cullen during the term of Parliament. Candidates ranked #3 (Phil Goff) onwards are moved up two list places each. Current Parliamentary members are moved up in the list ahead of unsuccessful candidates (ahead of list candidate #42 Judith Tizard). A new candidate replacing Helen Clark (David Shearer, Mount Albert) is included, ranked #41 and inserted into the list ahead of Judith Tizard. A new dummy list candidate (“LAB-77-list only”) is inserted in position #77 in place of Michael Cullen.
  2. National : Richard Worth (unsuccessful candidate for Epsom, list rank #23) is removed, and subsequent candidates are moved up one rank. A new dummy candidate for Epsom (“NAT-67-Epsom”) is inserted in position #67 in his place.
  3. Green : Jeanette Fitzsimons (list-only candidate, list rank #1) and Sue Bradford (candidate for East Coast Bays, list rank #3) are removed, and subsequent candidates moved up. A new dummy candidate for East Coast Bays (“GRE-66-East Coast Bays”) is inserted in position #66 replacing Sue Bradford. A new dummy list candidate (“GRE-67-list only”) is inserted in position #67 in place of Jeanette Fitzsimons.

The above modifications are only intended to capture the spirit of changes since the beginning of the term of parliament and are of course subject to change when the parties release their official party lists closer to the date of the 2011 General Election. If anybody has any serious objections – or is just curious how things would work out with different party lists – and is willing to provide an updated list in the same format then I would be happy to rerun any simulations.

The candidate level results of the most recent simulation (January 22nd, after the release of the latest Roy Morgan Research poll) are shown in the table below (please click for an enlarged view.)

Probabilities for each candidate to be elected to Parliament

The table gives the probabilities for each candidate to be elected to Parliament by winning an electorate, by being elected from their party list, and an overall probability for either method combined. The “Rank” column gives the respective candidate’s relative likelihood of being elected, and is ordered firstly by probability to be elected, and then by party code and list ranking where there is a tie. Probabilities are rounded to the nearest percent.

The first 89 ranked candidates are guaranteed to be elected, and will of course win 89 seats between them. Candidates ranked 90 through to 114 are considered highly likely to be elected, and each have individual probabilities in the 90% to 100% range. These 25 candidates are expected to win a further 24.6 seats between them, for a cumulative total of 113.6 seats in Parliament. After this we get to the marginal list and electorate candidates: those ranked 115 through to 135 have probabilities in the 10% to 90% range. These 21 candidates are expected to win a further 9.4 seats between them, for a cumulative total of 123.0 seats in Parliament. Next we have 14 more canadidates ranked 136 through to 149 who are considered highly unlikely to be elected to Parliament, with probabilities of less than 10% each. These 14 candidates are expected to win only 0.2 seats between them, bringing the cumulative total to 123.2 seats – an expected overhang of 3.2 seats. Finally, candidates ranked 150 through 800 (many not shown in the table for ease of viewing) have no chance of being elected to Parliament based on current polling data.

I realise that the above calculations may seem little more than trivia at the moment given that we are so far out from the next election, and that the finalised party lists and electorate candidates will not be known for a long time. However, the main motivation for doing this simulation is a hope that people will be able to see which individual list candidate their party vote is likely to be counted for. Closer to the 2011 General Election I will begin publishing “effective party lists” on a regular basis. These effective lists will show only those candidates on the cusp of winning a list seat for each party, and will hopefully give NZ voters a better idea of where their vote is going, and a more meaningful alternative to a quick glance at the top of the lists for each party which some seem to use now when deciding how to cast their votes.

Read Full Post »

Correctly predicting the winners of the 70 electorate contests is vital for any New Zealand General Election simulation to be meaningful. There are two major reasons for this; firstly, for many of the minor political parties the electorate waiver determines eligibility to receive additional list seats in parliament, a current example being the four list seats held by the ACT Party solely because Rodney Hide won the Epsom electorate. Secondly, electorate seats can be the cause of an overhang which alters the number of seats needed to form a majority: in the 2008 New Zealand General Election the Maori Party’s five electorate seats caused an overhang of two seats in parliament, meaning the governing coalition would need 62 seats for a majority instead of the normal 61.

Unfortunately, accurately predicting the winners of the electorate seats is also quite a difficult task, mainly because they are not subject to the same level of polling intensity as the nation as a whole.  This means it is necessary to model the electorate contests by some other means.  I hope to do a series of detailed posts later about how the poll-averaging and election simulation works, but I figured the electorate seat calculation was likely to be a bit contentious, so I thought I’d get a rough explanation out of the way first.

There’s a handful of ways to predict these results:

  1. Just assume the results are unchanged from the previous General Election.  This is the zero-knowledge solution, and is used by David Farrar for his Curiablog public poll average calculations [actually, it’s a bit more complicated, see below].
  2. Calculate the results for each electorate by calibrating them against another index that you can measure.  This was the method used by David Farrar in his “Electoral Pendulum” series leading up to the 2008 election.  It is also the method used by FiveThirtyEight for their calculation of the “Partisan Polling Index (PPI)”.
  3. Try to predict the vote for an electorate by use of regression analysis on a variety of different variables. In the case of New Zealand these may include age distribution, ethnic distribution, qualifications, iwi and religious distributions, family incomes, marital and socio-economic status, occupations and others – all of which are available on the New Zealand Parliament website. This was another of the methods FiveThirtyEight used to predict the state-by-state results of the Electoral College for the 2008 US Presidential Election.

In addition to the above methods it is preferable to include electorate-level polling data where available, but it is not possible to rely on it completely due to sparsity and small sample sizes (I believe the Curiablog polling average does include electorate level polling data where available, and uses the results from the last election as a fallback position where it is not.)

For this simulation/website we’ve decided to more or less go with method #2 above. There are a few reasons for this; firstly, it’s feels intuitively correct. Secondly, there is a bit of historical data in New Zealand and overseas to indicate the swing in the electorate polling correlates with a swing nation-wide. Thirdly, it’s relatively simple (compared to method #3 above.) Fourthly, New Zealand votes under the MMP electoral system, which means that while the exact electoral results are important for determining the exact number of seats held by each party in parliament, they are of only limited importance in determining the overall result of an election. In addition to the above reasons, it is easy enough to combine this method with electorate-level polling data when it is available (most likely in the lead-up to an election,) so any unusual results should hopefully take care of themselves in the long run anyway.

Effectively the algorithm operates by assigning eight numbers to each electorate to indicate how the electorate vote in that electorate differs from the party vote in the nation as a whole. These eight numbers are determined from the results of the 2005 and 2008 NZ General Elections. For Tauranga, Epsom, Wigram, Ohariu, and the seven Maori electorates this can get a bit complicated, but for the remaining 59 electorates only one of these eight numbers is effectively meaningful; a number we denote \delta e_0, and which roughly parametrizes the swing in the vote in the electorate as viewed on a traditional left-right political scale. Positive values of \delta e_0 correspond to a swing towards the National Party, negative values to a swing towards the Labour Party, and near-zero values indicate New Zealand’s bellwether electorates. The values for some electorates are shown in the table below.

Electorate biases.

Electorate biases for a sample of New Zealand electorates. Helensville, Taranaki-King Country and Clutha-Southland are National strongholds, East Coast Bays, Ilam and Nelson are typical National-leaning electorates, Rotorua, Otaki and Hamilton West are bellwether electorates, Christchurch Central, Hutt South and Rimutaka are typical Labour-leaning electorates, and Mt Albert, Manukau East and Mangare are Labour-strongholds.

Based on the values of \delta e_0 and the current polling averages we simulate the results for each electorate if an election were held today. The probabilities for candidates from each party to win an electorate are shown below for the same electorates in the table above.

Sample electorate seat results.

Simulation results of selected electorate seats based on current polling averages. The columns denote the National Party (NAT), Labour Party (LAB), ACT, Maori Party (MAO), Progressive's (PRO), and United Future (UNF). The Green Party and New Zealand First party are not expected to win any electorate seats, and are not shown. A large swing in favour of National is expected; in the 2008 General Election the electorates of Christchurch Central, Hutt South, Rimutaka and Manukau East were all convincingly won by Labour.

Tomorrow, I’ll show graphs indicating how many electorates each party are expected to pick up in total.

Read Full Post »