One of the first things you must decide upon when working on a MC simulation is the granularity of the simulation.
In particle physics the simulation would normally be at the level of the leptons, hadrons and photons we can see in the detectors, or perhaps even at the level of quarks if you are simulating plasma or collisions at higher energies.
There is a tradeoff, though, between accuracy at small scales and other factors such as computing power, data size, and also access to the raw data you need to run the model. It wouldn’t make sense to try to simulate the weather or a tsunami at the quark-level, for example. Instead you would carve the atmosphere or the ocean up into an appropriately sized grid with granularity anywhere from a hundred meters-or-so up to tens of kilometers.
We have similar issues when trying to simulate election results, and there are a handful of fairly obvious choices for the granularity of the simulation.
- Nationwide: basically take the polling averages (with or without considering any margins of error) and assume that that is how the party votes will fall. Throw in reasonable assumptions about which party’s candidates will win in Epsom, Ohariu, and the seven Maori electorates and you’ve got your result. This is a good solution if you just want a rough guess at which side will form the government, and it is the level of reporting you typically get from the media and the blogs whenever a new poll is released.
- Electorate and candidate-level: This is a little more complicated. Ideally you would want polling for each electorate, but even without that you can do an alright job by using the relative differences in results between electorates from a previous election. This will cause problems when electorate boundaries change, however, so while it might have worked alright for the 2011 election it is a bit of a dodgy proposition for 2014.
- Polling place-level: The New Zealand Electoral Commission publish polling place analysis by electorate for both the party vote and the electorate candidate vote, helpfully in CSV format as well as HTML. As with an electorate-level or candidate-level simulation you can do an acceptable job by using the relative differences in results between polling places from a previous election. Difficulties occur when polling places are discontinued or new polling places are added between elections, and also when there is significant migration, such as that which occurred in and around Christchurch after the 2011 Christchurch Earthquake.
- Voter-level: Very handy if you are a political party and you want to tailor campaign material and get-out-the-vote efforts at specific individuals. In fact, the Obama 2012 campaign data team is well known for performing simulations and doing analysis at this level of granularity. Many New Zealand libraries have copies of the
Habitation DirectoryHabitation Index, which is the electoral role from the most recent 2011 General Election ordered by address. Assuming you could get your hands on a digital copy then it is at least theoretically possible to geomap individual voters and make reasonable assumptions about their education, income, where they voted and who they voted for, albeit with a lot of attenuation bias. Unfortunately if that is all the information you have to work with then that is about where you would get stuck. If you had access to poll results with voting preferences for each person polled then things could start to get interesting, but for obvious reasons only the aggregate polling results are made available to the public. I wouldn’t be surprised to see the two major parties working on this level of analysis and undertaking highly targeted messaging a few election cycles down the track, but I don’t think anybody in New Zealand is there yet. Having said that, see the photo in this tweet from @somewhereben for evidence that MPs and volunteers knocking doors are already working to get their hands on some of the voter-level information that will be needed to pull this off.
I still haven’t made a final commitment to what level granularity to work at, but in the mean time I’m playing around with the polling place results try and see if we can understand what happens to voter turnout between elections at that level. Hopefully the turnout at each polling place will be reasonably constant over time.