Quotas and random sampling within quotas - the PSA experience
This means that, from the point of view of reporting on them at Panel Services SA (PSA) and keeping them accurate for our panels on the system, there is little difficulty. However, when using variables for sampling (known as quota samples), there are very important considerations to be borne in mind. The point of a sampling frame is to define a subset of our panel that is most likely to be representative of the South African adult population.
- The most important of these is internet access. We aim to represent the SA adult online population – defined as those who access the internet at least once a week or more often.
- Variables that are used in defining quota samples should introduce as little bias as possible, should be clearly defined for each person, and should have a clearly understood meaning across the entire population.
Age, gender and location are three variables that best meet the requirements. PSA constructs interlocked sample quotas, and randomly samples the panellists who fit these criteria. Our standard quota structure is age (4 breaks), gender, and province (4 breaks). Age, in particular, is probably the most accurate variable, because actual age is calculated from date of birth every time we access one of our panellists’ records. All age groupings are constructed from that actual age.
Here are some important considerations around each of these variables:
- If a client is looking for a national representative sample (natrep) of 18-65, we could construct evenly-distributed strata and use that to build the sample quota for age. More generally, we use the ‘established’ age groups in regular usage (18-24, 25-34, 45-49 and 50+). These age bands broadly define the economic life of the majority of the population.
- We strongly recommend the four provincial strata of Gauteng, Western Cape, KZN and ‘other’, because the first three comprise over 65% of all SA adults who access the internet as per the table below.
There are two further reasons why we argue for this geographical sample quota. Firstly, each is unique in important ways – indeed, they could even be separate countries. The Western Cape is dominated by Cape Town and the surrounding Boland towns, has the lowest proportion of blacks and the highest proportion of metropolitan Afrikaans-speaking people. KZN has virtually only Zulu- and English-speaking people, is dominated by Durban, is proportionately the youngest population, and has the highest proportion of matriculants. Gauteng is primarily urban and metro, has the highest literacy rate (well over 90%) is dominated by English, with 34% of Zulu-speakers and about 30% of people who read South or North Sotho.
None of the other six provinces have a major metropole, with most of the population living in small towns and communities. Their families generally have only one wage earner, are unlikely to have education past high school and will have an indigenous home language other than Afrikaans. The implication is that forced natrep quotas in each of these ‘other’ provinces would skew any data, because people who answer surveys would be atypical of the population in those areas, if surveys are only administered in English – as is usual for PSA’s TellUsAboutIt consumer panel.
Secondly, from the above analysis of South African adults who are online, it should be clear that language is critical. While 93% if the online population read, write and understand English, and given that the nuances of life and living are grounded in the home language, we would need to use at least five more of the official languages truly to represent much more than the 93% we do at present. The added effort of sending English surveys to sufficient numbers of people in the other six provinces, serves only the risk of distorting the data for the reasons explained above.
The questions of race and LSM
Finally, some brief thoughts on two issues often discussed in terms of sampling: race and the LSM measurement.
We strongly recommend AGAINST race as a sample quota variable for two main reasons:
- There is no clear definition that defines race. Ethnic origin may be a better term, but does not resolve the main issue, namely by which set of rules people should classify themselves.
- It fails as a reliable metric because it is not objective, cannot be validated and cannot be extrapolated from online to more general studies that have been conducted face to face.
LSM remains a very useful segmentation developed by the South African Audience Research Foundation (SAARF). Its primary purpose is to classify households according to their standard of living, assuming the goal is comfort and convenience in the home. However, since its primary measures are durables in the home and, as such, is a household rather than individual measure, we believe it should preferably not be applied on individuals as a sampling quota variable.
None of this should be taken as an admonishment against using these descriptive variables to analyse and report data. There should be, however, an alert to the reader of a report about the strengths and weaknesses of these variables.
*Note that Bizcommunity staff and management do not necessarily share the views of its contributors - the opinions and statements expressed herein are solely those of the author.*
Source: PSA TellUsAboutIt
This survey was carried out by Panel Services Africa on their premium online research panel, TellUsAboutIt, comprising 40,000 registered online users.
Contact Claire Heckrath (087 150 5298) for more information, or az.oc.acirfasecivreslenap@ofni.