5. september 2012 - lecture #3 (Surveys and experimental design)

At the end of the lecture we divided into groups of maximum 6 people. We were to come up with a scientific question till next time.

Khan academy on correlation vs. causality

Surveys


Primary used when the goal is to determine what people on average think or believe about a topic. There are different ways to perform them like "face to face" interview or questionnaire, phoning, postal mail, electronic mail.. Reasons for choosing one above the other could be cost, time constraints, reachability, privacy/anonymity.

There are a lot of questions surveys cannot answer. A survey cannot prove anything other than saying what people think of a subject (if conducted correctly). "Does the majority of people posses the correct answer?"

A face to face interview style is costly, takes a lot of time, does not reach a lot of people (in a given time) and has low anonymity. Still, It's very precise and thorough. The interviewer also get the none verbal clues.
Electronic questionnaires can be done in short time and they might reach a lot of people. There is no need for manual entry of data since it's already in electronic form. The downside is reduction in quality and the number of asked actually answering. Anonymity can be hard to achieve.

Responder related challenges (demand characteristics)
Respondent will interpret the intent behind questions. How the result will be used, consequences, and because of that give answers:

  • in "your favor" or oppose you, depending on personal ambition.
  • that are not true because of fear of being honest (what they think is expected or socially desirable)

It is important that questions are easy to understand. Therefor short and not contain several questions at once. Ordering is important (positive or negative questions first), and avoiding leading questions. If multiple choice: are all possible answers listed? Avoid using uncommon terminology

Methods of control (avoid demand characteristics)

  • Blind (or double blind) methods: The respondent knows they are participating but does not now they are being measured or not. Double blind also goes the other way around.
  • False information: Basically lie, measure something else than what the participant expects. Don't tell what the goal of the experiment is.
  • Hidden or secret experiment when the participants don't even know they take part in the experiment

Sampling errors

Example of American president election poll 10 million voters asked, 2.4 million answered. The republican (Landon) was predicted to be the clear winner. In reality the democrat (Roosevelt) won by a greater margin. This took place during the deep depression in 1936. The population chosen to participate in the poll were more wealthy than average since they were picked out by:

  • Phone registers
  • Car owner registers
  • Membership in organizations (like their own readers)

The result were biased because the republican party has a greater share of business people and hence the more wealthy part of the population. 24% actually participated giving rise to a possible self selection situation.

Example of debate on Holmgang "Norway should close the border for migrants from distant countries" was the question asked to television viewers after the debate. About 90% agreed and 10% did not agree. 40.000 were participating. A little while later the "Opinion" asked the same question via phone to 600 random selected participants. In that study only 16% agreed while 77% did not agree (7% not sure). Why?
In the first "study" we've got representation with self selection. Maybe only those with strong opinions chose to answer? 40.000 is like 0.8% of the population. Maybe the viewers were strongly affected by what was said during the show? Maybe they were more informed because of listening to the debate?
The latest study were random, also including none viewers of the show. The main problem with this study could be avoidance of their true opinion since they were not anonymous (what they think is socially desirable).

Correct choice of sample (participants chosen to participate from the whole population), and is much more important than numbers itself. Important factors are gender, age, social, political, education, economic, religious and geographical factors. Methods are divided in probable and none-probable methods:

Probable

  • Random
  • Stratified (still random, but the population is divided in sub-groups and each sub-group get a fair share of the sample.)

None-random

  • Convenience (those we have access to - gather as much information as possible to be able to control for sampling errors)
  • Judgement (an expert chooses a representative sample)
  • Quota (like stratified with sub-groups but not randomly chosen - fill a quote)
  • Snowball (study rare phenomena, find one subject and ask that subject to point to "like mined"..)

Experimental design

In a true experiment the goal is to say something about the independent variables with a high degree of certainty. To ensure external validity one have to make sure the sample is huge enough and representable of the target population. The sample subjects are divided in control group(s) and experimental group(s) to ensure internal validity. Everything about the different groups has to be equal, and this is ensured by random selection and pretests. The experiment is conducted on the experiment group while the control group gets a "neutral" or placebo treatment. It is often important to make sure the subjects don't know in what group they belong (blind experiment) and even sometimes not even the experimenters should know what group you are in (double blind). If everything is equal in the groups pre test and the experimental groups shows a change while the control groups has no change (or relatively more change than..), we can conclude the experimental independent variable is the cause and the dependent variables the effect(s). The problem is keeping everything constant during the experiment, and differences we might find are called confounding variables. There are always differences in individual subjects and even the belief one get a treatment can have an effect, the placebo effect or the inverse nocebo.

Solomon four group: Does the initial test alter the result? Four groups are assigned subjects: Two control and two experiment groups. The first pair of "experiment and control" are assigned a pretest, the experiment is performed and an after test is run. The last two groups does not receive the initial test in order to find out. All are after tested.

Confounding variables can be uncontrolled events happening between tests and the subjects could mature in the time between tests (causing a natural improvement not dependent on the independent variable). The fact the subjects are pre tested (have performed a similar end test before) could lead to a better result on the end test. Instruments (when doing measurements) could change (become less accurate, or be set up in a slightly different way).
One thing to be aware of it that one test of anything will give extreme values, but those values might not be typically for the that particular test subject. In addition it is important to choose subjects randomly. Otherwise one could have a biased selection or even self selection. When subjects drop out of an experiment one also has to consider the possibilities different groups might have different chances of actually dropping out.

Often true experiments are not possible to conduct. They might be too resource demanding or not morally acceptable to perform. Different designs can be used to compensate for some factors and get close to a true experiment. Some principals:

  • pre-experimental design: one has two groups not randomly selected, does the experiment on one group and test both to look for changes. This method cannot tell causality since any changes of the dependent variables could be to any number of reasons including the independent variable. This method can be used to help formulating hypothesis that can be confirmed with a true experiment later on.
  • Time serial design: A cheaper method with lower number of participants but might take longer to perform. Several tests before and after stimuli. Stimuli can be introduced at different times for different participants. One way to use this is in Single subject design, when there is none to compare with, it is necessary to establish a known baseline, apply the "treatment" and see for changes. Often it is wanted to go back to the baseline situation to see that we get the old result back, and then confirm again by applying the treatment. (See page 241 in the practical research)
  • Within subject design: The goal is to introduce the independent variable (the experimental effect) to all participants, but at different times to see that the effect on the dependent variables happens on all the subjects, but still using some of them as control when not tested upon.
  • Factorial design when there are many independent variables, one can have random groups with different and/or combinations of the independent variables.

Cohort effect
When trying to measure a natural process taking a long time, in one point in time. Example used is height versus age in Japan. Because of the war and poor nutrition during that time older people tend to be shorter. A graph of height versus age might suggest it's normal to shrink dramatically at a certain age.