Wednesday, May 26, 2010

walking through assembler 1

ultrasound or high pressure air stream randomly shatters the DNA into pieces

These libraries provide a “clone coverage” of more than 20-fold, meaning that, on average, 20
clones span each of the genome’s bases, thus offering the theoretical guarantee that each base is contained in at least one of the clones This guarantee assumes uniformly random-sampled clones from the genome. In practice, this requirement is seldom perfectly satisfied. Cloning biases lead to a nonrandom clone distribution, causing areas of the genome to remain unsequenced regardless of theamount of sequencing performed.

The gaps between contigs belonging to the same
scaffold are called sequence gaps. Although they
represent genuine gaps in the sequence, researchers
can retrieve the original clone inserts spanning the
gap and use a straightforward “walking” technique
to fill in the sequence.

The gaps between scaffolds are called physical
gaps because the physical DNA that would span
them is either not present in the clone inserts or
indeterminable due to misassemblies. Filling these
gaps involves a large amount of manual labor and
complex laboratory techniques

These limitations spurred the development of
new algorithms. Two approaches exploit techniques
developed in the field of graph theory: one
that represents the sequence reads as graph nodes
and another that represents them as edges.

Euler9 detects repeats by finding complex areas, or
tangles, in the graph constructed during assembly.

Wednesday, May 19, 2010

Illumina paired end data to use in newbler

Preparation of illumina paired end data for using in gsAssmebler
  • convert the illumina .txt file to standard .fastq file with maq script fqall2std.pl --- perl fqall2std sol2std test.txt test.fastq
  • convert the illumina fastq to sanger fastq --- maq sol2sanger test.fastq test1.fastq
  • convert the sanger fastq to fasta and qual --- perl fqall2std.pl std2qual out.prefix test1.fastq
  • convert the headers of fasta and quals for newbler so that it can recognize it as paired end --- perl replacefasta.pl test1.fasta F lib > test1.fna --- perl replacefatsa.pl test1.qual F lib > test11.qual
  • rename test11.qual to test1.qual
  • be sure your fna and qual files are in same folder