Controlling for Chance, Bias, and Confounding  Variables

In order to validate causal inference ("X is a cause of Y", for example, "Air Pollution causes Asthma") you will need to first establish that the association is a real association, and then you will need to examine that the nature of such association is one of cause and effect. In order to arrive at a point of real association (i.e., not spurious association), you will need to establish that the association you observe is not due to:
  1. Chance (we have just stated that you can rule out the play of chance by using p-values and setting up hypotheses testing)
  2. You will need to eliminate biases
  3. You will need to control for potential confounding variables.

Concept of Bias

The second step is to eliminate biases. Biases are defined as systematic errors in the observation or conduct of a study. For a comprehensive review of biases, see this link. For example, Mustafa Al-Zoughool and  colleagues conducted a case control study (we will learn about these types of studies later) to test the association between exposure to environmental tobacco smoke and lung cancer \cite{Al_Zoughool_2013}. In order to study this association, they interviewed 44 out of 1200 lung cancer patients from different hospitals in Montreal over four years, and randomly sampled 430 people free from lung cancer for the same time period. They obtained information about exposure to ETS from proxy respondents. What could be a problem about the relationship between ETS and lung cancer if information about exposure to ETS were to be taken like this? The information obtained from the respondents might not be reliable, as those people with lung cancer, if they suspected that ETS were responsible for their lung cancer would be more accurate or would likely  to be over cautious in responding or recalling ETS exposure than those in the control arm. As information obtained from these two groups would be different in each arm, this difference is likely to introduce a bias. This bias is referred to as "recall bias" A recall bias is a form of information bias. Other forms of biases are referred to as "selection bias", where the investigators differently recruit participants in the studies based on the aims of the study and the theory that they would like to test. For example, in the case control study on the association between ETS and lung cancer, if the lung cancer patients were all recruited from those areas or those neighbourhoods where they were likely to be exposed to heavy ETS (say low income neighbourhoods or those neighbourhoods with high smoking rates), and the control patients were selected or recruited from those neighbourhoods where they were less likly to be exposed to smoking, then this would be an example of selection bias. You will need to pay attention to these possibilities when you read a research study. 
You cannot eliminate selection and information biases at the end of the study or you cannot examine their impacts at the end of the conduct of the study, you can only eliminate biases at the planning stage of a study design. We will discuss this in the section on study designs. 

Concept of Confounding Variables

The third alternative explanation for the association between an exposure and a health outcome could be due to confounding variables. A confounding variable is referred to as a variable that is associated both with the exposure of interest and the outcome of interest but that variable should not come in the causal pathway (see the following figure)