Where the sample size is under 100 but greater than 50, a star [*] appears next to the number; where the sample is under 50, two stars [**] appear next to the number.What does all this mean? Most people see stars - and immediately conclude that the dataset is unstable, unusable, and is not to be used.
Here is a simple cross-tab with the sample size shown as the first element because it is the most important number in the cell:
At first glance, one certainly sees stars! But, unfortunately, many people jump to the immediate conclusion that this means that the data can't be used because “the samples are too small” and they reject the table out of hand. This is incorrect and demonstrates that these people do not know how to read a cross-tab correctly.
The fact is that the sample sizes in the primary or originating cells are the key: if the sample size in each of the primary or originating cells is 100 or more, then EVERY RESULT in the interlocking cells is statistically valid and can be used.
So while it is true that 10% of respondents who live in LSM 10 households use Brand A, it would not be right to drill down any further to find out who these LSM 10 Brand A users are because there are only 68 respondents who fulfil both criteria in the sample, and this sample is too small to become a primary number.
So - what the stars are warning is that these numbers cannot be analysed further - they cannot become primary numbers in their own right; they are only valid while they are protected by the sample sizes in the two primary cells that created them. Any further analysis would indeed be unstable; but the primary analysis is good, valid, often insightful, information.
Here's proof - there are 5099 males and 9961 females in the sample in the example below. The question is “Are you pregnant?” The answer, in the case of the males, is a universal “No”. And that answer in the table below will have two stars next to it - but it is perfectly true, valid and reflects reality!
It is a true answer statistically speaking because there were over 100 respondents in the two primary cells that created that answer. And intuitively, you know that this is correct.
Where your intuition cannot help you, don't let the stars confuse you - remember the rule: provided the sample sizes in the primary cells in both the vertical as well as the horizontal axes are 100 or more, the result in the interlocking cell will reflect statistically valid, usable data. Just don't try to analyse this data further if the resulting sample size in that interlocking cell is under 100.
By the way - do a run using Choices to access the TGI data, and you won't be seeing stars! That is because we know that our users understand the basic rules that apply to cross-tabs and we apply this rule right from the start when the dataset is selected for the rows and columns in the first place.
So - don't let the stars get in your eyes...