Dailymaverick logo

South Africa

South Africa

How the Council for Scientific and Industrial Research (CSIR) predicts elections results

How the Council for Scientific and Industrial Research (CSIR) predicts elections results
Predictions are flying as the vote counting takes place. But here’s one set of predictions that election watchers have been taken seriously — from the CSIR, which has designed a special model that gets more accurate as more results flow in.

The days between the close of voting and the announcement of the final result see South Africans being pitched hot takes and predictions from statistical snake oil salesmen and respectable institutions alike. 

One of the national vote predictions that election watchers have been taking most seriously is that from the Council for Scientific and Industrial Research (CSIR), which has created a model that uses a combination of past and current voting data and combines this with socioeconomic information to predict the overall national result.

How – it would be reasonable to ask – does that work? And how accurately can one predict a national election this way?

The basic idea behind the CSIR model is as follows. For every ward in the country, the CSIR compiles a number of variables that characterise that ward – what province it’s in, its socioeconomic data, demographic information and previous voting history. In this way, each ward has a kind of “fingerprint”, which makes it possible to see which wards are most like which other wards. The basic idea of the CSIR system, then, is that as early results come in, the organisation can use that data to predict other wards that are similar (in terms of previous voting behaviour, demography and socioeconomic state).

One feature of the system is that as more and more voting data comes in, the system has more and more detailed information about what kinds of wards have voted in what kinds of ways, which allows for finer and finer distinctions to be made when predicting results to come, since the model can now see in the existing data where fine distinctions in previous voting history or other variables might correlate with significant differences in voting behaviour.

In the case of brand-new parties, such as the MK party, the model would have had no previous voting history to use for predictions, but once the first 5% or so of the vote was counted, the system would be able to infer that wards that look like those who voted MK early on would themselves be likely to lean that way. In this way, it can pick up on new trends early on and try to infer what other wards might follow them.

Another feature of the system design is that because it is using existing voting data to make increasingly nuanced (and therefore, accurate) predictions, the CSIR’s model is likely to become more and more accurate as the share of counted votes increases. Dr Carike Karsten, a senior technologist on the CSIR prediction project estimates the accuracy of the CSIR model to be within 2% of the true result: a predicted national vote share of, say 47%, would in reality be likely to lie between 45 and 49%.

If you’re inclined to want to skip ahead to the end of this electoral story, then, a good prediction model can be helpful. It’s worth bearing in mind though, that even if you’re a believer in prediction, a 2%margin can conceal an enormous amount of political theatre yet. DM

Categories: