shazino

Computational practices in scientific research

Thursday, 4 July 2013 in Science by Astrid Pellieux

Nowadays scientific research need increasing computing solutions. Statistical analysis or data modeling are some examples of scientific activities requiring more computing tools. To enhance the growth of science and contribute to the Open Science movement, those tools should be accessible, modifiable and reusable without any restrictions by as many researchers as possible. These are notably key conditions to ensure the reproducibility of a study. The Free software and Open Source movements greatly contribute to the lifting of these restrictions by providing licenses allowing to do so.

Logo Free Software Foundation. Source: http://www.fsf.org/resources/stickers

Free Softwares refer to computer programs which can freely be executed, copied, studied, modified and shared without restrictions. To ensure that, one precondition is to give access to the sequence of instructions of the software: the source code. Thus, Free Softwares give users the freedom to run them as they want. However they can still be used for commercial purposes. Consequently, Free Softwares must be distinguished from freewares, which are programs available with no cost but not displaying access to the source code. One good example of freewares is the Skype program. To promote Free Softwares, Richard Stallman founded in 1985 the Free Software Foundation (FSF). Later, to avoid the price ambiguity, Eric Raymond and Bruce Perens founded in 1998 the Open Source Initiative (OSI) to promote what they now named Open Source softwares.

Nowadays, both organizations defend the right for everybody to access, modify and reuse softwares by providing licenses complying with these ideals. The difference between them is that the OSI proposes some licenses less permissive than the FSF. The approved licenses of the FSF can be found here and the one of the OSI here.

With the advance of science, researchers need increasing computing solutions. Hence, to ensure the reproducibility of a study, it is necessary to not only share complete datasets and protocols, but also each computational information such as source code. The licenses cited above allow to do so. However, this is not yet a generality. To encourage researchers in this direction, organizations like The OpenScience Project emerge. The aim of this organization is to facilitate the use of Open source softwares in the scientific community by writing and releasing them itself. Another example is the Science Code Manifesto that lists five principles concerning the use of softwares in science as good practices that need to be follow. Publishers also began to involves themselves. Some ask that data and associated materials are made available for publication without specifically request source code. However, some make it clear for the authors. For example, Science request authors to make available to their readership all codes used for the creation or analysis of the data. Others vividly recommend it such as Biostatistics that rates their published papers according to the availability of data and source code.

Logo Open Source Initiative. Source: http://commons.wikimedia.org/wiki, File: Opensource.svg

To help researchers to share their Open Source and Free Softwares with the scientific community, some online service appeared. The GitHub platform is currently the most well known service offering a free hosting for all Open Source projects developed in any programming languages. Others like JavaForge or RubyForge provide good solutions depending on what you want to do. In fact, JavaForge (despite its name) host any programming languages, while RubyForge is only dedicated to the Ruby language. Those platforms allow researchers to deposit their code in a repository that everybody else can find. In this regard, researchers must add a link in their scientific publications.