InfoPlatter: 2014

Saturday 5 April 2014

Bioinformatician's Pocket Reference !!

It is amusing how brain of bioinformaticians work! Learning a new programming language for days feels so much of fun that making 5 minute discussion with neighbours (unless under special circumstances!) in our own mother-tongue. Today every bioinformatician keeps more than few languages and core IT toolkits on their plate. It has become mandatory to be able to mould different code snippets to build our own custom workflows, and thus keeping syntax at our fingertips has become essential.

Although Google is best way to get syntax problem solved, it is not a bad idea to keep reference sheets is our smartphones or stick out some printed sheets on the back of your door, in the old fashion way!!

1) Apache

2) Awk/Gwak

3) C

4) C++

5) Debian

6) Git

7) HTML

8) Java

9) Mathematica

10) Matlab

11) MySQL

12) Perl

13) PHP

14) Python

15) Screen

16) Ubuntu

17) UNIX

18) Vim

These are handpicked reference sheets and you may encounter various other versions of these over Internet. If you find any version of reference sheet which is worth sharing, feel free to paste the link below.

At the end, I sincerely acknowledge the authors who put their efforts in designing these informative reference sheets and made it available for us.

Thursday 3 April 2014

Gene Ontology (GO) Enrichment Analysis in Novel Transcriptomes using BiNGO!!

A greater hurdle while dealing with differentially expressed transcripts in novel organisms is the Gene Ontology (GO) enrichment analysis and their visual interpretation.

To date there are several open-source applications available to extract GO terms corresponding to protein/nucleotide sequences (A detailed list can be accessed here, However, the best I have experienced for the De novo transcripts is InterProScan), and to perform enrichment analysis (A detailed list is here). Most of these enrichment tools work like a charm for model organisms, but only handful of them support the incorporation of custom annotations. One such tool is BiNGO (Biological Networks Gene Ontology tool), an open-source Java plug-in of Cytoscape. BiNGO can be used either on a list of genes, or interactively on subgraphs of biological networks visualized in Cytoscape. BiNGO maps the predominant functional themes of the tested gene set on the GO hierarchy.

In order to use BiNGO for novel organisms, one need to provide a custom annotation file (CAF). In principle, CAF contains the gene/transcript and GO relationship, with one relationship per line, eg.

XLOC_000001=0005515
XLOC_000001=0008270
XLOC_000001=0016491
XLOC_000002=0055114
XLOC_000003=0016491
.
.
XLOC_999999=9999999

The left value is the transcript name and right value is the GO category (without the prefix, 'GO:') obtained using InterProScan or synonymous tool.

The first line of GAF should always be:

(species=Custom_species)(type=Biological Process)(curator=GO)

You can choose to change species name from "Custom_species" to something else. Once the building of GAF (GAF.txt) is complete for all the annotated transcripts. It can be used in place of "Select organism/annotation" by choosing "Custom" option. (As shown in the figure below)

Additionally, one can also choose to switch to a newer ontology (obo) file downloaded from geneontology.org download page. After providing gene list of interest and choosing the appropriate options, hit the "Start BiNGO" button to start the analysis.

Cytoscape together with BiNGO offers several downstream network grooming options, which you may find useful. For more on this, visit BiNGO and Cytoscape user guides. Hope this helps in your endeavor.

Monday 3 February 2014

Docear: For Scientific Literature Management

In the field of research we encounter numerous useful articles. This plethora of literature, thus demands a methodical management. While I was struggling to find an efficient way to put all my articles into perspective, I came across 'Docear', which is described as a free and open source academic literature suite. After experiencing the simplicity and benefits of using Docear in academic projects, I was pleasantly surprised and equally tempted to write about it.

The most appealing features of Docear, includes:
1) The simple user-interface
2) Ability to help create mind maps and link documents directly
3) Seamless integration of PDFs along with their headings and custom annotations
4) Powerful search, and
5) Reference management (including support with MS word)
6) Works with Windows, Mac and Unix

The further detailed list of features can be accessed from their official website. Additionally, the website is equipped with tutorials and snapshots, which are very straight forward. I find this open-source project promising and I am hoping to receive timely updates for it in future.

Many kudos to the Docear team for developing this truly resourceful software suite.

Saturday 11 January 2014

Creating 'Swap' ['Virtual'] Memory on Linux/Unix Operating System

Here's some help for when you have too little RAM/memory and are trying to do memory-intensive steps, like indexing the human genome reference or doing other NGS-related processing.

The way to do it is to create a 'swap file', as follows:

1) Check disk/drive usages:

2) Create the space. This step is long if you select a large amount of space. In this example, 512MB is created under root (/), given that block size (bs) is 1024 bytes:

3) Switch it on:

*This will only last until you restart the operating system, so, useful as a temporary measure. To make the swapfile permanent, do the following:

4) Open the following file:

Paste in the following: