Why do we need to do reproducible research?
In this paper, we are going to learn the benefits and technical details of conducting reproducible research \cite{Begum_2016}. Reproducible research refers to the principles of leaving the data sources and methods to others so that the steps of the research and the data analyses can be repeated. Other researchers can test whether using the same methods you did, they are able to obtain similar results. If you do not leave the data sources but you describe your methods and provide the source codes, then this research can be replicated but not reproduced.
In public health, reproducibility of findings is particularly important for programme implementation and testing of the effectiveness of health research. This is because for any policy related issue, it is important to test if there is a causal linkage between the exposure that policy makers would either control or promote and the downstream health effect. For example, imagine you are in charge of implementing a policy that will lead to reduced smoking among people. Before implementation of any intervention, it is important for you to assure that the intervention meets cause and effect relationship between cessation of smoking among the people where you will implement the programme. A cause and effect association not only assures you the effectiveness of such an intervention, it also allows you to measure with some levels of confidence how many individuals would benefit if this programme were to be implemented and if there are competing programmes, what would be their effectiveness that you would be able to compare and contrast between the different programmes and thereby make a choice.
On the other hand, there are situations where as a public health professional, you will be required to assess the health effect of an exposure. This is particularly important for environmental health related issues. Say you are interested to identify the health effects of people who are exposed to second hand smoking in your neighbourhood. The source of second hand smoking in your neighbourhood is environmental; you are interested to find out if there is an association between second hand smoking and asthma, to what extent would asthma be increased among non-smokers if they were exposed to second hand smoke. As in the case of intervention, here too, you would need to access research that would enable you to assess the effects.
In each of the above two situations, you can assess the impact of interventions or exposure using primary research data. However, in each of these above two situations, it would be important for you to assess under what conditions or circumstance the researches were conducted. If you have data on your own, it would be important for you to replicate the research that the other researchers have presented. Above and beyond these practical purposes, in order for you to assess the causal nature of the association, you would need to test if the results would be replicable. If the investigators were to share their original data, then availability of raw data and processed data would enable you to "reproduce" their results and have confidence in their study findings. Over and above, you could also use your own data and test whether your data when mixed with their data with similar population, would alter the results and if so, to what extent.
Therefore, reproducible research can enhance our own practice of research and systematising discoveries and utility value of our own work. But there are barriers including ethical limitations as to whether unidentifiable data can be shared widely and in public domain; some investigators would not like to share their patentable methods with others till a period of embargo, and there may be restrictions that funders of research would put before the raw data can be shared. There are yet other limitations such as restrictions on sharing work that are behind paywalls to public scrutiny.