April 12, 2016

[FACT CHECK] Is the Businessworld-SWS Survey valid? Let's do the math



A reader asked, “Yung latest survey ng SWS hindi naman 100% voters yung sinurvey nila. 92% registered voters, 8% non-voters according to their report. Why SWS include non-voters in their survey?”


The reader was referring to Businessworld - SWS Survey released yesterday, April 10th [Businessworld], which shows:



This is a valid question, especially since SWS has always been hounded with accusations of nepotism, as SWS President Mahar Mangahas happens to be Fernando Poe Jr.’s first cousin. Just like what I said in yesterday’s post, criminal intent is something that’s best left to state prosecutors. Instead, let’s take a look at the math and see if there’s something fishy going on.

But first, let's quote SWS' survey methodology:
The First Quarter 2016 Social Weather Survey -- conducted last March 30-April 2 via face-to-face interviews with 1,500 adults nationwide (of whom 1,377 or 92% are validated voters) and with sampling error margins of ±3 points -- showed Mr. Duterte with a score of 27% outpacing Ms. Poe’s 23%, Mr. Binay’s 20% and Mr. Roxas’s 18%.
Given this, let’s ask the following questions:

  1. How can such a small sample size estimate the opinion of 50+ million?
  2. Why did SWS include non-voters in the sample?
  3. Only 92% are registered voters, are the results still reliable?
Thinking Pinoy believes that the best way to go about this is to give you a semi-walkthrough of the typical survey process. Please note that the following sections include a bit of post-secondary math, but a working knowledge of high school algebra is sufficient to understand them.
Let’s go.

Determining Sample Size


SWS wants to estimate countrywide voter preference:

  • at 95% confidence level, and,
  • with 3%, or 0.03, margin of error
In an ideal world, the best way to go about this is to interview all 54 million voters, but that would be impractical. Hence, we can only interview a small section of the population and use their answers to make a statistical inference.

That is, we now ask the question:

What is the minimum sample size required to accurately gauge the opinion of 54 million Filipinos?

There are two ways to go about this, and let's start with the simpler one.
and Cochran's Formula [Cochran 1977].

Slovin's Formula


For starters, let’s discuss the simpler (and more widely used) among the two: Slovin’s Formula Slovin's Formula [Israel 1992].

Slovin's Formula works under these assumptions [Tejada 2012]:
  • Desired confidence level is 95% (exactly what SWS wants)
  • Survey uses simple random sampling without replacement (of course, every voter votes once per election period)
Slovin's Formula is expressed as:


where:
  • n is sample size,
  • N is total population, equal to 54,363,329 [Rappler].
  • e is desired error margin, which SWS has set at 0.03.

Plugging in the values for N and e, we get:

That is, rounding up, SWS needs to interview at least 1112 randomly-selected registered voters.

Cochran's Formula


A little more mathematical rigor won't hurt. Thus, instead of just using Slovin's Formula, let's use its ancestor, the slightly-more-complicated Cochran's Formula [Cochran 1977].

Cochran's Formula is expressed as:

where:
  • n is sample size
  • z is the z-score
  • p is the population proportion
  • q is pre-defined as (1-p)
  • e is error margin
Before we go further, you may have noticed that are more variables in Cochran's than in Slovin's, but don't worry, we can simplify things a bit.

To eliminate q, we can just substitute (1 - p), that is:



Consider the expression "p (1 - p)". We want the sample size to as most conservative as possible, that is, we want to make sure that n is not too small. Since p is unknown, we can simply assume the value of p to be that which will maximize the value of "p (1 - p)", which will also maximize the value of n with respect to "p (1 - p)", as n is directly proportional to the latter.

To do this, we assume p = 0.5 so that p (1- p) equals 0.25.

Any value of p higher or lower than 0.5 yields a p (1-p) that's less than 0.25.

If you want a bit more rigor, let's use some basic differential calculus. Suppose we want to maximize K, where:

Taking the derivative of K with respect to p, we get:

To find p that will maximize K, we set the derivative to 0, that is:



That is, K is highest when p is 0.5.

Moving on, with p = 0.5:

So now, we are left with an equation of two unknowns: z and e.

Now, e = 0.03 (desired error margin). Meanwhile, z = 1.96 at 95% confidence level [Yale], so that:

That is, via Cochran's Formula and rounding up, we need to interview at least 1068 randomly-selected registered voters.

Conclusion


Thus,  it's OKAY if only 1,377 or 92% of the 1500 respondents are registered voters, as long as SWS did not include the invalid responses (non-registered) in its statistical computations. After all, 1,377 is greater than the minimum number of respondents per Slovin's and Cochran's.


Now, the reader may want to ask the question, "Why did SWS include non-voters in the first place?"

The answer is pretty simple: because there is no practical way to verify if a respondent is registered or not. Thus, when SWS randomly selected 1500 people, they have no idea if they are voters or not. Basically, c'est la vie. That's life.

In as far as the basic math behind the SWS survey, Thinking Pinoy believes the SWS survey is fairly credible, as the methodology doesn't really differ from other survey bodies like Pulse Asia, DZRH, and Laylo.

But there's a catch


However, Thinking Pinoy wants to raise the issue regarding the pre-set margin of error. Margin of error can be minimized by taking a larger sample, i.e. interviewing more respondents. 
  • Did SWS set the error margin at 3% out of financial constraints or worse, laziness, or,
  • Is the 3% error margin a deliberate attempt to make the leads look negligible? 
After all, 3% error means Duterte, the new frontrunner with 27%, is still statistically tied with Poe, who has 23%.

Large error margins increase the likelihood of statistical ties.
This image is for instructional purposes only and is not made to scale.

Now, let's contrast SWS' 3% error margin with others:
  • DZRH uses 7,490 respondents so that error is at +/- 1.13%
  • Laylo uses 3,000 respondents so that error is at +/ 1.8%

How many respondents does SWS interview? Fewer than 1,500.
 
By late December, Tito Mahar [TP: SWS Survey] should have realized that unlike previous presidential elections, this one will probably be a tight race. Mahar is a lot smarter than Thinking Pinoy, so it's safe to say that he already knows that.

In the spirit of due diligence, Tito Mahar  should have exerted some effort in refining SWS' methodology to account for this phenomenon. That is, he should have increased the sample size to minimize the margin of error.

But Tito Mahar did not.

Is SWS lazy?
Is SWS broke?
Or does SWS actually favor someone?

____


Did you like this post? Help ThinkingPinoy.com stay up! Even as little as 50 pesos will be a great help!