README

I. Dependencies

RNAseg make use of the Boost_Program_Options Library and requires at least version 1.20.
Furthermore, OpenMP is essential to make use of the parallel computation capability of RNAseg.
If you use a recent version of gcc, OpenMP-Support is already included. Otherwise follow the information at http://openmp.org or upgrade to a more recent gcc.

II. Compiling and Installing RNAseg

RNAseg is compiled and installed using the standard procedure:

	> ./configure
	> make
	> make install

III. Input file format

RNAseg needs a file containing the primary read starts, primary coverage, secondary read starts, and secondary read coverage in tab delimited format. The first row of this file corresponds to the first position in the genome/sequence that is analyzed, the second row to the second position and so on.
Example:

81	81	9	9
5	86	20	29
0	86	100	129
.	.	.	.
.	.	.	.
0	12	0	2000

You can convert a SAM file to this format using the python script sam2grp.py in the helper folder.
The syntax is: sam2grp.py some.sam <insert_size>
    insert_size is only relevant for paired-end data
This script generates for each used reference sequence two files: one for the forward and one for the reverse strand
The files ar named according to the following scheme: some_REF-NAME_FWD/REV.grp

In order to get valid input for RNAseg you need to apply the script to each dRNA-seq library individually. Afterwards you can join the correpsonfing files using the "paste" command, e.g. paste some_seq1_prim_fwd.grp some_seq1_sec_fwd.grp > some_seq1_fwd.grp.


IV. Helper Scripts

The helper directory provides three scripts:
* sam2grp.py:	Convert SAM to RNAseg input format


* rnaseg_parts.pl:  Splits a large analysis into small overlapping chunks and runs RNAseg on them. Addiitonally, it creates a so called boundary file (see next).

* summarize_transcripts.pl: This script takes the raw output of RNAseg and computes consensus transcripts. It can be fed with several RNAseg results simultaneously. If you used rnaseg_parts.pl to split a large computation, you should provide the created boundary file, since this is used to remove artifacts at the boundaries of the chunks.

V. Example data for reproducing the results presented in the paper

Example data can be found in the example folder. To repdocude the results from the paper you can run the run_test_data.sh shell script. For visualisation you can download the genome of H. pylori 26695 from the ncbi (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Helicobacter_pylori_26695_uid57787/NC_000915.gbk) and load it into the Artemis genome browser (http://www.sanger.ac.uk/resources/software/artemis/). You can load the RNAseg_results as an additional entry via the File Menu. 

VI. Citing RNAseg

When using RNAseg in your work you should cite Bischler, T, Kopf, M and Voß, B (2014): Transcript mapping based on dRNA-seq data, BMC Bioinformatics.
