A reader asked, “Yung latest survey ng SWS hindi naman 100% voters yung sinurvey nila. 92% registered voters, 8% non-voters according to their report. Why SWS include non-voters in their survey?”
The reader was referring to Businessworld - SWS Survey released yesterday, April 10th [Businessworld], which shows:
This is a valid question, especially since SWS has always been hounded with accusations of nepotism, as SWS President Mahar Mangahas happens to be Fernando Poe Jr.’s first cousin. Just like what I said in yesterday’s post, criminal intent is something that’s best left to state prosecutors. Instead, let’s take a look at the math and see if there’s something fishy going on.
But first, let's quote SWS' survey methodology:
The First Quarter 2016 Social Weather Survey -- conducted last March 30-April 2 via face-to-face interviews with 1,500 adults nationwide (of whom 1,377 or 92% are validated voters) and with sampling error margins of ±3 points -- showed Mr. Duterte with a score of 27% outpacing Ms. Poe’s 23%, Mr. Binay’s 20% and Mr. Roxas’s 18%.Given this, let’s ask the following questions:
- How can such a small sample size estimate the opinion of 50+ million?
- Why did SWS include non-voters in the sample?
- Only 92% are registered voters, are the results still reliable?
Determining Sample Size
SWS wants to estimate countrywide voter preference:
- at 95% confidence level, and,
- with 3%, or 0.03, margin of error
That is, we now ask the question:
What is the minimum sample size required to accurately gauge the opinion of 54 million Filipinos?
There are two ways to go about this, and let's start with the simpler one.
and Cochran's Formula [Cochran 1977].
For starters, let’s discuss the simpler (and more widely used) among the two: Slovin’s Formula Slovin's Formula [Israel 1992].
Slovin's Formula works under these assumptions [Tejada 2012]:
- Desired confidence level is 95% (exactly what SWS wants)
- Survey uses simple random sampling without replacement (of course, every voter votes once per election period)
- n is sample size,
- N is total population, equal to 54,363,329 [Rappler].
- e is desired error margin, which SWS has set at 0.03.
Plugging in the values for N and e, we get:
A little more mathematical rigor won't hurt. Thus, instead of just using Slovin's Formula, let's use its ancestor, the slightly-more-complicated Cochran's Formula [Cochran 1977].
Cochran's Formula is expressed as:
where:Before we go further, you may have noticed that are more variables in Cochran's than in Slovin's, but don't worry, we can simplify things a bit.
- n is sample size
- z is the z-score
- p is the population proportion
- q is pre-defined as (1-p)
- e is error margin
To eliminate q, we can just substitute (1 - p), that is:
Consider the expression "p (1 - p)". We want the sample size to as most conservative as possible, that is, we want to make sure that n is not too small. Since p is unknown, we can simply assume the value of p to be that which will maximize the value of "p (1 - p)", which will also maximize the value of n with respect to "p (1 - p)", as n is directly proportional to the latter.
To do this, we assume p = 0.5 so that p (1- p) equals 0.25.
Any value of p higher or lower than 0.5 yields a p (1-p) that's less than 0.25.
If you want a bit more rigor, let's use some basic differential calculus. Suppose we want to maximize K, where:
That is, K is highest when p is 0.5.
Moving on, with p = 0.5:
Now, e = 0.03 (desired error margin). Meanwhile, z = 1.96 at 95% confidence level [Yale], so that:
Thus, it's OKAY if only 1,377 or 92% of the 1500 respondents are registered voters, as long as SWS did not include the invalid responses (non-registered) in its statistical computations. After all, 1,377 is greater than the minimum number of respondents per Slovin's and Cochran's.
Now, the reader may want to ask the question, "Why did SWS include non-voters in the first place?"
The answer is pretty simple: because there is no practical way to verify if a respondent is registered or not. Thus, when SWS randomly selected 1500 people, they have no idea if they are voters or not. Basically, c'est la vie. That's life.
In as far as the basic math behind the SWS survey, Thinking Pinoy believes the SWS survey is fairly credible, as the methodology doesn't really differ from other survey bodies like Pulse Asia, DZRH, and Laylo.
But there's a catch
However, Thinking Pinoy wants to raise the issue regarding the pre-set margin of error. Margin of error can be minimized by taking a larger sample, i.e. interviewing more respondents.
- Did SWS set the error margin at 3% out of financial constraints or worse, laziness, or,
- Is the 3% error margin a deliberate attempt to make the leads look negligible?
|Large error margins increase the likelihood of statistical ties. |
This image is for instructional purposes only and is not made to scale.
Now, let's contrast SWS' 3% error margin with others:
- DZRH uses 7,490 respondents so that error is at +/- 1.13%
- Laylo uses 3,000 respondents so that error is at +/ 1.8%
How many respondents does SWS interview? Fewer than 1,500.
By late December, Tito Mahar [TP: SWS Survey] should have realized that unlike previous presidential elections, this one will probably be a tight race. Mahar is a lot smarter than Thinking Pinoy, so it's safe to say that he already knows that.
In the spirit of due diligence, Tito Mahar should have exerted some effort in refining SWS' methodology to account for this phenomenon. That is, he should have increased the sample size to minimize the margin of error.
But Tito Mahar did not.
Is SWS lazy?
Is SWS broke?
Or does SWS actually favor someone?
Did you like this post? Help ThinkingPinoy.com stay up! Even as little as 50 pesos will be a great help!