Primary Analysis Overview

Once sequencing is initiated, the system’s computational blade center performs real-time signal processing, base calling and quality assessment. Primary analysis data, including read length, distribution, polymerase speed and quality measurement are streamed directly to the secondary analysis software. This data, as well as trace and pulse data, are also available through the RS Touch and RS Remote interfaces for quick assessment of a sequenced SMRT Cell.

What files are transferred to secondary storage from primary analysis on the RSII blade server?

Below is a typical directory hierarchy of files transferred from the primary analysis blade server to secondary storage server:

/path/to/secondary/storage/2420294/0011
├── Analysis_Results
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.bax.h5
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.log
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.subreads.fasta
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.subreads.fastq
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.bax.h5
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.log
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.subreads.fasta
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.subreads.fastq
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.bax.h5
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.log
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.subreads.fasta
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.subreads.fastq
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.bas.h5
│   ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.sts.csv
│   └── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.sts.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.xfer.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.xfer.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.xfer.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.mcd.h5
└── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.metadata.xml

1 directory, 20 files

What files are required for importing SMRT Cells into SMRT Portal?

To import SMRT Cells into SMRT Portal, the above directory structure must be preserved. The minimum requirement for SMRT Cells to be recognized by SMRT Portal is the *.metadata.xml file and all *.bax.h5 and *.bas.h5 files. The bax.h5 files contain base call information from the sequencing run, and the bas.h5 file is essentially a pointer to the three bax.h5 files. The *.metadata.xml contains top level information about the data, including what sequencing enzyme and chemistry were used, sample name, and other metadata. The *.mcd.h5 file is not strictly required.

SMRT Pipe Job Directory Hierarchy

SMRT Pipe job output directories all have a basic top-level view.

$SMRT_ROOT/userdata/jobs/<JOB_PREFIX>/<JOB_ID>/
├── data/
├── log/
├── movie_metadata/
├── reference/
├── results/
├── workflow/
├── index.html
├── input.fofn
├── input.xml
├── job.sh
├── metadata.rdf
├── settings.xml
└── vis.jnlp

For more detail on specific protocol outputs, see [[Navigating the SMRT Pipe Job Directory]]

For more information on File Format Specifications, visit [PacBio DevNet] (http://www.pacbiodevnet.com).