What are the advantages of having Bioconductor, for the bioinformatics community?
I've read the 'About' section and skimmed thepaper, but still cannot really answer this.
I understand Bioconductor is released twice a year (unlikeR), but if I want to use the latest version of a package, I'll have to use the dev version anyway. A stamp ofapproval could be achieved much easier with a tag or something, so it sounds just like an extra (and unnecessary) layer to maintain.
Related to this, what are the advantages as a developer to have your package accepted into Bioconductor?
- 2$\begingroup$+1 … the idea of having centralised repositories is becoming more and more out of fashion because there’s no clear evidence for its supposed benefits.$\endgroup$Konrad Rudolph– Konrad Rudolph2017-06-11 15:27:21 +00:00CommentedJun 11, 2017 at 15:27
- 2$\begingroup$Is the question "What are the advantages of having Bioconductor over CRAN?" or "What are the advantages of having any kind of central repository? - lets take Bioconductor as example"?$\endgroup$Kamil S Jaron– Kamil S Jaron2017-06-11 21:42:56 +00:00CommentedJun 11, 2017 at 21:42
- 1$\begingroup$@KamilSJaron My question is neither -- or both:What are the advantages of having Bioconductor, or any similar central repository, over CRAN? If anything, the focus is on having any R repo for bioinformatics, as opposed to a website, where the approved packages are listed.$\endgroup$Peter– Peter2017-06-12 09:52:38 +00:00CommentedJun 12, 2017 at 9:52
4 Answers4
Benefits of central repository for Community
Having a central repository for packages is very useful. For couple of reasons:
- It makes very easy to resolvedependencies. Installing all the dependencies manually would be exhausting but also dangerous (point 2).
- Packagecompatibility! If I install package with dependencies, I would like to be sure that I install correct versions of all the dependencies.
- Reliability thanks to unified and integrated testing.
Bioconductoris trying really hard to force developers to write good test, they also have people manually testing submitted packages. They also remove packages that are not maintained. Packages inBioconductorare (reasonably) reliable.
In the end, installing dev versions of R packages is in my opinionvery bad practise for reproducible science. If developers delete GitHub repo, commit hash you have used won't be enough to get the code.
Benefits of central repository for developers
I forgot about the advantages for you as developer to submit your package toBioconductor:
- Your package will be more visible
- users will have a guarantee that your code was checked by third person
- Your package will be for users easier to install
- Your package will be forced to use standardized vignettes, version tags and tests -> will be more accessible by community to build on your code
Bioconductor specific advantages over CRAN
I see the big advantage in thecommunity support page, provided byBioconductor.@Llopis' comprehensive elaboration.
- 1$\begingroup$I think you're missing: data packages. CRAN has quite strict file size limits (5 MB) and biology needs e.g. genomes. Your points 1 and 3 are debatable (1: bio yes, otherwise no; 3: I'd argue it's harder, needs an external installer).$\endgroup$Michael Schubert– Michael Schubert2017-06-12 17:41:24 +00:00CommentedJun 12, 2017 at 17:41
- $\begingroup$You do not want write package with genomic data stored inside anyway - we rather write packages for pulling data from databases. But point taken. Now I see I made a smart move, when I numbered both overall advantages for community and advantages for developers.$\endgroup$Kamil S Jaron– Kamil S Jaron2017-06-12 18:10:44 +00:00CommentedJun 12, 2017 at 18:10
- $\begingroup$Ok, I turned upper numbers to bullet points and I made clear that I am speaking about central repositories in general.$\endgroup$Kamil S Jaron– Kamil S Jaron2017-06-12 18:32:17 +00:00CommentedJun 12, 2017 at 18:32
- 1$\begingroup$@KamilSJaron Regarding data packages, they’re an established best practice in Bioconductor (see BSGenome) though I tend to agree: I dislike how the Bioconductor data packages clutter my disk.$\endgroup$Konrad Rudolph– Konrad Rudolph2017-06-13 10:16:33 +00:00CommentedJun 13, 2017 at 10:16
Here is a list of the advantages of having Bioconductor for the bioinformatic community:
- Outreach: You have arepository for the field, in that language.
Some packages related to bioinformatics (in R) are distributed through personal repositories, CRAN, github, bitbucket, sourceforge, but they are less used and harder to find.
There are such efforts in other languages too: Biopython, Bioperl, Biojava, ...
Also is harder to find the repositories related to a subject in CRAN, you don't have the BiocViews, the equivalent is optional and not usually filled, which is quite useful when looking for a method.
- Quality: In Bioconductor each package is tested in Linux, Windows, and iOS, to make sure it works in all major operative systems (with all the dependencies).
In some rare cases likethis one, a packages is not supported for certain platform, but you can known it by checking thebuild report.
You are required to provide a vignette and examples in every exported element (and the vignette, examples and tests should pass). You are required to be able to install the package with stricter quality than CRAN, because there is a manual review (They pointed out acomment in one of my functions!).
They also provide docker images of the base packages. You don't need to install the latest R version to use the Bioconductor! But developers do so (at least when checked by Bioconductor servers) to ensure that the package will keep working in next R release.
- Reusing: Bioconductor provides the basic elements to a big number of applications.
For example the summarizedExperiment class is provided so that any package that needs a similar object can (should) use it. Or GSEABase is the base package to deal with GSEA enrichment analysis, providing functions, methods and classes for gene sets, and collections, making easier for anyone to create their own GSE analysis.
It is easier to build upon the work of others if you know you are following the same quality standards.
- Support: To support Q&A is mandatory, the package maintainer must be registered in thewebpage.
While in CRAN usually the support is given by each package in its own way, in Bioconductor you can directly reach the maintainer and the users by posting in the same central place.
- $\begingroup$Did you try submitting to CRAN? As far as I understand they alsorun tests, and perform a package review; some people even complain that CRAN people aretoo pedantic.$\endgroup$Iakov Davydov– Iakov Davydov2017-06-11 10:44:08 +00:00CommentedJun 11, 2017 at 10:44
Regarding what the advantage is to you as a developer of having a bioconductor package rather than using CRAN:
- There's a hierarchy of package quality, with bioconductor on the top (followed by CRAN and then "random github repos"). While there are many excellent CRAN packages, the average bioconductor package is better tested and documented. So if I as a user have two different packages that I can use and one is on bioconductor and the other CRAN, then I use the bioconductor package.
- Higher visibility. Since bioconductor packages are held in higher regard, they also become more visible. Further, the different "views" (e.g., "Alternative Splicing" and "Transcription") make it convenient to find relevant packages. You can always search CRAN, but allowing tagging like this aids in discovery.
- $\begingroup$I have been using a lot of packages from both CRAN and Bioconductor, I haven't noticed that CRAN packages are worse in any sense. I even noticed that documentation of some packages from Bioconductor was worse (at least that was the state several years ago).$\endgroup$Iakov Davydov– Iakov Davydov2017-06-11 10:16:06 +00:00CommentedJun 11, 2017 at 10:16
- $\begingroup$You've had nice luck with CRAN packages then.$\endgroup$Devon Ryan– Devon Ryan2017-06-11 10:17:22 +00:00CommentedJun 11, 2017 at 10:17
- $\begingroup$Pretty sure the quality hierarchy is wrong: if a package is maintained on Github, odds are its quality is in fact superior than that of the average CRAN or Bioconductor package. This might soon change once Github becomes more widespread amongst beginning R programmers but I’m pretty certain it currently still holds.$\endgroup$Konrad Rudolph– Konrad Rudolph2017-06-11 15:28:51 +00:00CommentedJun 11, 2017 at 15:28
- 1$\begingroup$@DevonRyan My mistake, I meant "is hosted on". There's a risk it's unmaintained (but the same is true for centralised repositories) but the odds of quality given location = Github are still high. I'll grant that it's possible that thevariance of quality is also higher on Github, but even for that I'm not sure.$\endgroup$Konrad Rudolph– Konrad Rudolph2017-06-11 19:22:33 +00:00CommentedJun 11, 2017 at 19:22
- 1$\begingroup$@KonradRudolph the most of Bioconductor packages are maintained on GitHub (at least the most of I have used). I have GitHub R package as well, but it is definitely not passing any tests, it is working on my computer in the way that I use it. No being on GitHub is not any sign of quality... but true, if the package is only there, nobody will know about it, so there won't be any bug reports.$\endgroup$Kamil S Jaron– Kamil S Jaron2017-06-11 21:45:54 +00:00CommentedJun 11, 2017 at 21:45
The most important reason is that Bioconductor has growing set of common data structures and base packages. If package X and package Y needs to work with the same type of data, having a common data structure in the core Bioconductor package Z makes our lives so much easier. I could do something in package X, take out the results and keep working on my data with package Y. This works because I'm using versions of package X and Y that are compatible with the data structure defined in package Z. The Bioconductor team makes sure that all packages use the common data structures and already existing packages where possible, so that we're all on the same page and so that people don't reinvent the wheel again and again.
Also, there's a sophisticated process aroundgetting your package accepted to Bioconductor. This ensures that packages use data structures and functions already available in other Bioconductor packages (where reasonable), it ensures that packages are well written, that they have good documentation, and that they are well tested.
Explore related questions
See similar questions with these tags.

