Feeds:
Posts
Comments

There has been talk on Twitter lately of the demographic makeup of the foreign diplomatic corps stationed in Wellington.  Luckily the Ministry of Foreign Affairs and Trade (MFAT) keep a handy Diplomatic and Consular List of the foreign representatives to New Zealand by country of origin, which also includes the country where they are stationed and other information.  By having a quick browse through the list it is simple enough to make a table of Wellington-based foreign diplomats.  It is also easy to narrow the list down to, for example, only Commonwealth countries.  There are 14 high commissions in Wellington with Wellington-based staff: Australia, British, Canada, Cook Islands, Fiji, India, Malaysia, Niue, Pakistan, Papua New Guinea, Samoa, Singapore, Solomon Islands, and South Africa.

The list does omit a lot of information we would like, but we can reconstruct some of it.  For example, while the Diplomatic and Consular List does not include gender, it does include a salutation for most diplomats.  If we assume those with the salutations “Mr” or “His Excellency” are men, and those with the salutations “Mrs” or “Ms” or “Her Excellency” are women, then we can draw up a graph showing the gender of diplomats of Commonwealth countries stationed in Wellington.  The “unknown” category indicates those with salutations for which gender is not obvious (“Captain”, “Major”, “Dr”, etc.)  We don’t take names in to consideration in order to avoid any bias according to how popular names from other countries may be in New Zealand.

Graph showing gender of Wellington-based representatives to New Zealand of Commonwealth countries.

Gender of Wellington-based representatives to New Zealand of Commonwealth countries.

37 of 70 Commonwealth representatives to New Zealand stationed in Wellington are men, 26 are women, and 7 are “unknown”.  As an aside, 8 of 13 Commonwealth Heads of Mission stationed are women.

We can also look at whether or not they have travelled to New Zealand with a partner who is registered with MFAT, and the gender of their partner (partners’ genders are determined from salutation as above).

Graph showing partner status of Wellington-based representatives to New Zealand of Commonwealth countries.

Partner status of Wellington-based representatives to New Zealand of Commonwealth countries.

As you can see, representatives of Commonwealth countries stationed in Wellington have mostly travelled with an opposite-gender partner.  There are no same-sex couples, and 16 of 70 representatives have no registered partner.  The “unknown” couples are mostly those where the representative is a Defence Attache, and so we can’t tell their gender from their salutation.  As a practical matter many countries, including many Commonwealth countries, are reluctant to issue diplomatic passports to same-sex partners.

If we assume that the orientation of the small number of “unknown” couples is the same as that for known couples then we can additionally determine the gender of diplomatic representatives from their spouses gender.  Doing so, the number of male representatives by country now looks as follows:

Number of male, Wellington-based representatives to New Zealand by Commonwealth country.

Number of male, Wellington-based representatives to New Zealand by Commonwealth country.

We can also get quite a lot of information about age distribution from the diplomatic ranks; third secretaries tend to be younger, with first secretaries, counsellors, deputy heads of mission, and heads of mission tending to be more experienced and older.  There are also some more specific job titles, such as “Consul-General”, “Defence Advisor”, etc.  These tend to be older and more experienced than first secretaries, but younger than typical deputy heads of mission.

The distribution of diplomatic ranks is as follows:

Diplomatic rank of Wellington-based representatives to New Zealand of Commonwealth countries.

Diplomatic rank of Wellington-based representatives to New Zealand of Commonwealth countries.

If you are interested in the distribution by age you can do a reasonable job by defining “young” diplomats as those with the ranks of third secretary and second secretary, junior attaches, and administration and technical staff.  There is a bit of guesstimation here about the relationship between job titles and age, but the important point is that we are using the same definition for all countries so that whatever we are measuring, the numbers should at least be reasonably well-defined and comparable.  The graph of numbers of young male diplomats by country is as follows:

Number of young male, Wellington-based representatives to New Zealand by Commonwealth country.

Number of young, male, heterosexual or unknown orientation Wellington-based representatives to New Zealand by Commonwealth country.

Also of interest is turnover.  Many foreign diplomatic missions will rotate staff through Wellington in two- to four-year stints, but there is a lot of variation, particularly at the Head of Mission-level.  HE Mr William Dihm, High Commissioner of Papua New Guinea has been in Wellington since August 2009, whereas heads of mission from some other countries would be lucky to last two years.  By comparing the list of diplomatic staff to previous versions you could get a list of which diplomats have been recalled from Wellington in the last few of months, including name, diplomatic rank, gender, partner status, sexual orientation, and approximate age.  For example, if you could get your hands on a copy of the Diplomatic and Consular List from 09:45:16am on October 11, 2013, then you would be able to quickly get a list of all changes in staff at each diplomatic mission in the last 262 days.  Checking back further still would give you a very good indication of the length of their posting, so you could also look for unusually short stays in Wellington.  But let’s leave that analysis for another day.

Undecided?

There has been a bit of debate on Twitter about this post at The Political Scientist, which argues that the National’s rise in the polls is merely the result of Labour voters switching to “undecided” and reducing the denominator.  There were also a few requests for an explanation, and Thomas at StatsChat has obliged here (also, I stole his title).  I have a few points I want to add or expand on.

Firstly, as an aside, many of the mistakes are the same as those made in this post on Kiwiblog which argues based on correlation coefficients that in 2011 it was mainly National voters who stayed home, not Labour voters.

Secondly, we don’t actually know the number of undecided voters.  As pointed out in comments on the StatsChat rebuttal many of the raw numbers are weighted by demographics, probability of voting, and others (whether or not they also have a cellphone?).

Thirdly, the results for the correlation coefficients are very susceptible to the number of polls.  On first read this particular table from The Political Scientist stood out:

Correlation coefficients, from The Political Scientist.

Correlation coefficients, from The Political Scientist.

The table shows the correlation coefficients with the undecided vote for four parties for all nine Fairfax polls from July 2012 to June 2014 (top), and for only eight polls, with the June 2014 poll results excluded (bottom).  You can see that the correlation coefficient for National changes from 0.7 to 0.3 with the addition of a single poll!  Obviously the results aren’t particularly robust, and that is equally as true for the other three parties as well, even if they just happened to show smaller changes in the table above.

Taking this a step further, it should be reasonably obvious that you can’t trust estimates of correlation coefficients based on a small number of data points.  When you have only two data points to work with you must get a correlation coefficient of 1 even if there is no actual correlation between the things you are measuring, because for any two given points it is possible to draw a straight line that passes through both of them (or, rephrasing, two points define a straight line).  Adding more data points will move your estimate of the correlation coefficient closer to the true value, but with a small number of polls you can never be very confident.

As another aside, always be suspicious when you see results quoted to a large number of significant figures.  There’s nothing wrong with it in principle, but it raises the question of how accurate they really are.  In this particular case, if the addition of a single poll moves National’s coefficient from 0.7 to 0.3 then there’s no point quoting more than one decimal place, if at all.

Fourthly, there seems to be confusion between different coefficients.

Thomas covers this point, the difference between correlation coefficients and regression coefficients, in paragraphs 2-3.

More intuitively though, the correlation coefficients shown in the table above between NZ First and undecided voters (0.8) is almost that same as that for Labour’s.  Does the drop in NZ First support cause the increase in undecided voters?  In the last two year the number of respondents supporting NZ First fell from 32 to 24 (see linked table below), while the number of undecided respondents went from about 110 to 230.  Would you argue that the 8 former supporters per 1000 lost by NZ First turned into 220 new undecided voters?  Of course not!

Poll results, and estimated number of respondents, from The Political Scientist.

You may argue that the real evidence is that the number of supporters lost by labour is (roughly) equal to the increase in the number of respondents who are undecided, and that correlation coefficients have nothing to do with it.  And that’s fine.  But then why bother publishing the correlation coefficients at all?

Fifthly, correlation does not imply causation (see also, xkcd).  When dealing with correlation effects you have to be very careful to avoid false causation.  Even assuming the changes aren’t just a statstical fluctuation we still can’t say whether Labour voters are really becoming undecided.  As Thomas says

You could fit the data just as well by saying that Labour voters have switched to National and National voters have switched to Undecided by the same amount — this produces the same counts, but has different political implications.

If you’re a frequentist then Thomas’ alternative explanation is just as convincing.  If you’re Beysian then now might be a good time to break out Occam’s Razor and say that you thought that Labour voters were switching to undecided anyway, so you believe the first hypothesis.  Which is fine.  But in that case was there any value in the analysis?

The only way to figure out what it really going on is to do a longitudinal study where you use the same sample of voters for each poll.

Sixthly, in their conclusion The Political Scientist says

Without taking into account the level of the undecided vote this presents a misleading picture of how many voting age New Zealanders support each party.

Of course, by limiting reporting only to those who have declared party support and are likely to vote the reporting may very well reflect what could happen in an election.

This is sort of hedging both ways.  If the point of the Fairfax poll is to gauge support and/or try and predict the results of an election “if it were held today”, then the pollsters must do something with the undecided voters.  Unless you have evidence that they break differently than for decided voters (which could be the case), it seems sensible to just ignore them when publishing the headline results.  It’s not “a very misleading picture” at all.

Bonus: check out this excellent diagram from Wikipedia showing the differences between regression coefficients and correlation coefficients.  All the graphs in the middle row (except the centre) have the same absolute correlation coefficients.

Correlation coefficients.

Following on from previous posts, another short update on the iPredict stocks for National winning the 2014 election.

Daily average trade prices for National have trended up towards 80c, and Labour down toward 20c.  Prices are as follows:

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

You can see that the prices for National stabalised just above 70c in April and May after earlier peaking at around 76c in mid-March.  I don’t believe that any single event has caused the prices to move.  It’s more a case of running out the clock; Labour needs some sort of game-changer, and there is less and less time left before the election for them to find one.

Weekly trade volumes are up, and now amount to about 1600 shares per week, or $1100 per week.

Daily volume, 2014 National election victory stock on iPredict.

Weekly volume, 2014 National election victory stock on iPredict.

Segregation

Following on from the previous post about Electorate geodata, if we know the votes cast at the polling place-level we can look for geographic differences in vote distributions.  Because political prefernces are correlated with socio-economic status we can then use any differences to quantify the degree of segregation in each electorate.

As in the previous post we calculate the average of the positions of the polling places in each electorate weighted by the total number of party votes cast at each polling place, and also separately weighted by the number of left-wing party votes (Labour and Green) and right-wing party votes (National and ACT).  The distance between these left-wing vote and right-wing vote centres of gravity gives us a measure of segregation in each electorate.  (We normalise this distance relative to the two-dimensional normal distribution defined by the standard deviations of the positions of the centres of gravity for all party votes for each electorate.  It’s not the best way to measure this, but it’s a case of close enough is good enough.)

Based on the results of the 2011 election the most segregated electorate in New Zealand is Botany, shown below (click captions for interactive versions).

The centre of gravity for the left-wing vote in East Tamaki lies over 1km south of that for the right-wing vote in Dannemora.  This agrees reasonably well with prior expectations based on the differences in socio-economic status between Auckland suburbs such as Howick and Otara.

The second most segregated electorate is Mangare:

Again we see a difference as the more northern suburbs such as Mangare Bridge lean (relatively) more in favour of National than more southern suburbs such as Mangare and Mangare East.  The right wing vote lies approximately 1km NW of the left wing vote.

The least segregated electorate in New Zealand is Rotorua.

In spite of the rather large provincial/rural electorate the centres of gravity for the left-wing and right-wing votes lie only a few hundred metres apart, just 4km from central Rotorua.

In each of these three electorates, however, either Labour or National won a fairly high proportion of the party votes, which means that the segregation measures aren’t particularly robust.  A single enclave voting out of sync with the rest of the electorate could be enough to move the centres of gravity around and mess with the results.  If we confine the analysis to electorates closer to the national median, where the left wing vote was in the range of 60%-100% of the right wing vote, then the most segregated electorate was Whanganui …

…, where we can see a clear difference caused by Hawera and rural South Taranaki voting (relatively) more in favour of National, and Whanganui city voting (relatively) more in favour of Labour.

The least segregated electorate was Ohariu:

These calculations have implications for get-out-the-vote efforts in the different electorates.  Given limited resources, Labour supporters in Whanganui would be best to focus their efforts on Whanganui city.  Likewise, National supporters in Botany should be mainly focussing their efforts on the northern two-thirds of the Botany electorate.

Any segregation occurring in electorates also gives us some idea about how the Representation Commission goes about their task of drawing up the electoral boundaries.  From the Electoral Commission website:

All electoral districts must contain electoral populations within 5% of the quota for the North Island or South Island General
electoral districts or the Māori electoral districts as applicable.

The Representation Commission, in forming General electoral districts, is required by the [Electoral Act 1993] to give due consideration to:

  • the existing boundaries of General electoral districts;
  • community of interest;
  • facilities of communications;
  • topographical features; and
  • any projected variation in the General electoral population of those districts during their life.

Studying the segregation implied by the differences in vote distributions amongst polling places gives us some idea of how the Commission go about their task.  We are lucky in New Zealand that because we have MMP the drawing up of electorate boundaries is largely apolitical, and we don’t really have to worry about gerrymandering.  Nevertheless, there are tradeoffs between the different features given consideration, and the existence of any form of segregation as seen above is evidence that the Commission is, at least to some extent, prioritising existing boundaries over communities of interest and topographical features.  This is particularly true in the case of Botany, which was added during the 2007 boundary review to reflect population growth in the Auckland region.  In addition to being the most segregated electorate the Botany electorate boundary looks ugly too, which is no doubt the result of trying to carve an area out of the Auckland region for a new electorate whilst maintaing as much as possible of the existing boundaries.

Appendix

Again had a few hassles dealing with the geodata.  The first step was to download the 2007 electorate district boundary files from Statistics New Zealand.  Unfortunately I couldn’t get ArcExplorer running, so instead I installed the GDAL package and ran the ogr2ogr wrapper from the terminal to convert them to KML format.  Thirdly, we want to extract the boundaries for single electorates, which was done by selecting layers in Google Earth and exporting.  Fourthly, upload the individual boundary files in KML to Google Maps Engine using the classic Google Maps interface.  And finally we can use the new Google Maps Engine to import and convert old KML from classic maps, and then add the other data points as necessary as a separate layer on top.  There has to be an easier way to do this without dropping $USD400 for the full version of Google Earth!  Hit me up if you have any recommendations.

Following on from previous posts, another update on the iPredict stocks for National winning the 2014 election.

Daily average trade prices are as follows:

Average daily trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

And zoomed in on the last nine months, starting just before Shearer resigned as leader:

Average daily trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

Daily average trade price, 2014 election winner stocks on iPredict for National (blue) and Labour (red).

After topping out at $0.78 on 28 March prices for NATIONAL have stabalised in the low 70% range.

Things have not really gotten better for Labour, with the split now fairly stable at 70/30 odds over the last month.  The book is still a little asymmetrical, stronger on the bid side than the ask side, although there is a bit more room for the price to move around than there was previously.

I still think National’s chances to win are priced a little high, but at 70% it’s close enough that I’m not going to bother throwing money at it to try and fix it.

As explained yesterday, one of the first things we need to do is decide on the granularity for the analysis.  The New Zealand Electoral Commission makes election results available at the polling place-level, which makes them a good place to start, but difficulties occur when polling places are discontinued or new polling places are added between elections, and also when there is significant migration, such as that which occurred in and around Christchurch after the 2011 Christchurch Earthquake.  As such we need to look at where the polling places are physically located, and try to understand what happens to voter turnout between elections at the polling place-level.  This will give us an idea of where New Zealanders actually cast their votes.

But first, an aside on geographic datums.

The surface of the earth is flat, at least to first approximation.  This means that instead of dealing with spherical coordinates, such as longitude and latitude, it is often preferable to project the surface onto a plane, and treat it as a two-dimensional cartesian space where points are described by an x-coordinate and a y-coordinate (and a “height”, if necessary).  One example of such a projection is the New Zealand Transverse Mercator (NZTM) produced by LINZ specifically to represent the New Zealand mainland accurately.  Points are given as an “easting” (the x-coordinate) and a “northing” (the y-coordinate), and are expressed in meters east and north of an arbitrary reference point to the southwest of the country.  Because the NZTM is a conformal map shapes and direction are preserved, and it is therefore easy to calculate distances and bearings between nearby points.

As it happens, Elections New Zealand have handily produced a list of all the polling places used in the 2008 and 2011 elections along with their NZTM coordinates.  When we combine this information with the 2011 election party vote results by polling place we get a good idea of where people are voting.  The NZ Herald have an excellent interactive map that shows where the polling places fall under the previous 2008 electoral boundaries and the new 2014 electoral boundaries.  But we can go further and aggregate the information to look at, for example, the centre of gravity of each electorate as defined by the average of the polling place locations weighted by votes.  Results are as follows.

Clicking on the points shows the electorate name and number, total party votes (excluding special votes), and location in latitude and longitude.

One thing that immediately stands out is how sparsely populated much of the country is.  At a fundamental level the map is really only showing population density, but even then it is surprising that if you were to draw a line from Nelson to Invercargill and carve off everything to the west then that quarter-or-so of New Zealand’s land area would contain only a single electorate seat (West Coast-Tasman).  Similarly the middle of the North Island is pretty empty; if you were to cut out an area from Whakatane to New Plymouth to Whanganui to Napier you would have over one-third of the North Island and it would contain only a single electorate seat: East Coast.  (The Taupō electorate has been dragged north by polling places in Tokoroa and Cambridge.)

Of course you may think that this is a bit gimmicky, and you’d be right.  With the exception of a bit of data-vis showing election results this kind of thing isn’t of much use.  The real reasons why we want the polling place geodata are four-fold:

  1. Boundary changes.  The New Zealand Electoral Commission has just finished the latest review of electorate boundaries.  In order to simulate election results for the 2011 election we must figure out where each of the polling places used in the 2011 and 2008 elections are and which new electorate they now fall in.
  2. Discontinued polling places.  At every election some former polling places are discontinued, and other new polling places are added.  If we make the reasonable assumption that most people will vote at their nearest polling place then we can somewhat predict votes and turnout at polling places for the 2014 election even if they were not used for previous elections.
  3. Regression.  If we know the physical locations of each polling place, and again make reasonable assumptions about where people vote then we can at least theoretically make comparisons with the Census meshblock-level data from Statistics NZ and try and predict people’s votes based on their age, education, income, family size and so on.  A little outside the scope of my work here, but it would be a fascinating project.
  4. Get-out-the-vote.  Certain polling places will have a higher proportion than others of swing voters.  Depending on your political leanings polling place geodata will tell you where you need to concentrate your get-out-the-vote efforts.

The next post will look at political segregation in different electorates.

Appendix

I don’t have a lot of experience handling geodata, and it took a bit of effort to get the coordinate transforms into longitude and latitude working properly so that the results would show up correctly in Google Maps.  By far the best solution I found was to use the PROJ.4 – Cartographic Projections Library.  As I’m using python I needed the pyproj wrapper for PROJ.  On Mac the easiest way to get it is to use pip as follows:

curl https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python
pip install pyproj

There is an excellent tutorial on using geospatial data with python from the SciPy 2013 conference (see the first video).  The secret is to instantiate PROJ classes using the European Petroleum Survey Group (EPSG) Geodetic Parameter Dataset code numbers:

import pyproj
nztm_proj = pyproj.Proj("+init=EPSG:2193")
latlong_proj = pyproj.Proj("+init=EPSG:3857")
long, lat = pyproj.transform(nztm_proj, latlong_proj, easting, northing)

keywords: python, PROJ, pyproj, NZTM, Google Maps, EPSG.

 

Granularity

One of the first things you must decide upon when working on a MC simulation is the granularity of the simulation.

In particle physics the simulation would normally be at the level of the leptons, hadrons and photons we can see in the detectors, or perhaps even at the level of quarks if you are simulating plasma or collisions at higher energies.

There is a tradeoff, though, between accuracy at small scales and other factors such as computing power, data size, and also access to the raw data you need to run the model.  It wouldn’t make sense to try to simulate the weather or a tsunami at the quark-level, for example. Instead you would carve the atmosphere or the ocean up into an appropriately sized grid with granularity anywhere from a hundred meters-or-so up to tens of kilometers.

We have similar issues when trying to simulate election results, and there are a handful of fairly obvious choices for the granularity of the simulation.

  1. Nationwide: basically take the polling averages (with or without considering any margins of error) and assume that that is how the party votes will fall.  Throw in reasonable assumptions about which party’s candidates will win in Epsom, Ohariu, and the seven Maori electorates and you’ve got your result.  This is a good solution if you just want a rough guess at which side will form the government, and it is the level of reporting you typically get from the media and the blogs whenever a new poll is released.
  2. Electorate and candidate-level: This is a little more complicated.  Ideally you would want polling for each electorate, but even without that you can do an alright job by using the relative differences in results between electorates from a previous election.  This will cause problems when electorate boundaries change, however, so while it might have worked alright for the 2011 election it is a bit of a dodgy proposition for 2014.
  3. Polling place-level: The New Zealand Electoral Commission publish polling place analysis by electorate for both the party vote and the electorate candidate vote, helpfully in CSV format as well as HTML.  As with an electorate-level or candidate-level simulation you can do an acceptable job by using the relative differences in results between polling places from a previous election.  Difficulties occur when polling places are discontinued or new polling places are added between elections, and also when there is significant migration, such as that which occurred in and around Christchurch after the 2011 Christchurch Earthquake.
  4. Voter-level: Very handy if you are a political party and you want to tailor campaign material and get-out-the-vote efforts at specific individuals.  In fact, the Obama 2012 campaign data team is well known for performing simulations and doing analysis at this level of granularity. Many New Zealand libraries have copies of the Habitation Directory Habitation Index, which is the electoral role from the most recent 2011 General Election ordered by address.  Assuming you could get your hands on a digital copy then it is at least theoretically possible to geomap individual voters and make reasonable assumptions about their education, income, where they voted and who they voted for, albeit with a lot of attenuation bias.  Unfortunately if that is all the information you have to work with then that is about where you would get stuck.  If you had access to poll results with voting preferences for each person polled then things could start to get interesting, but for obvious reasons only the aggregate polling results are made available to the public.  I wouldn’t be surprised to see the two major parties working on this level of analysis and undertaking highly targeted messaging a few election cycles down the track, but I don’t think anybody in New Zealand is there yet.  Having said that, see the photo in this tweet from @somewhereben for evidence that MPs and volunteers knocking doors are already working to get their hands on some of the voter-level information that will be needed to pull this off.

I still haven’t made a final commitment to what level granularity to work at, but in the mean time I’m playing around with the polling place results try and see if we can understand what happens to voter turnout between elections at that level.  Hopefully the turnout at each polling place will be reasonably constant over time.

Follow

Get every new post delivered to your Inbox.