Quick Start¶

Let's run through the entire process start to finish.

Useful Snakemake Flags¶

Two useful snakemake flags that you can add are:

--dryrun or -n: do a dry run of the workflow but do not actually run any commands
--printshellcmds or -p: print the shell commands that are being executed (or would be executed if combined with --dryrun)

Read Filtering¶

We will run two variations of the read filtering workflow, and perform a quality assessment of our reads both before and after quality trimming.

Before you begin, make sure you have Singularity installed as in the Installing documentation.

Start by cloning a copy of the repository:

git clone https://github.com/dahak-metagenomics/dahak

then move into the workflows/ directory of the Dahak repository:

cd dahak/workflows/

Now create a JSON file that defines a Snakemake configuration dictionary. This file should:

Provide URLs at which each read filtering file can be accessed
Provide a set of quality trimming values to use (2 and 30)
Specify which read files should be used for the workflow
Specify a container image from the biocontainers project to use with Singluarity
Set all read filtering parameters

(See the Read Filtering Snakemake page for details on these options.)

Copy and paste the following:

cat > readfilt.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
    },

    "workflows" : {
        "read_filtering_pretrim_workflow" : {
            "sample"    : ["SRR606249_subset10"],
        },
        "read_filtering_posttrim_workflow" : {
            "sample"    : ["SRR606249_subset10"],
        },
    },
}
EOF

This creates a workflow configuration file readfilt.json that will download the example data files and configure one or more workflows.

We will run two workflows: one pre-trimming quality assessment, and one post-trimming quality assessment, so we call Snakemake and pass it two build targets: read_filtering_pretrim_workflow and read_filtering_posttrim_workflow.

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
          --configfile=readfilt.json \
          read_filtering_pretrim_workflow read_filtering_posttrim_workflow

This command outputs FastQC reports for the untrimmed reads as well as the reads trimmed at both quality cutoffs, and also outputs the trimmed PE reads and the orphaned reads. All files are placed in the data/ subdirectory.

Assembly¶

We will run two assembler workflows using the Megahit assembler workflow implemented in Dahak.

(See the Assembly Snakemake page for details on these options.)

Create a JSON file that defines a Snakemake configuration dictionary:

cat > assembly.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
    },

    "workflows" : {
        "assembly_workflow_megahit" : {
            "sample"    : ["SRR606249_subset10"],
            "qual"      : ["2","30"],
        }
    },
}
EOF

To run the assembly workflow with both assemblers, we call Snakemake with the assembly_workflow_all target.

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
          --configfile=assembly.json \
          assembly_workflow_megahit

Comparison¶

In this section we will run a comparison workflow to compute sourmash signatures for both filtered reads and assemblies, and compare the computed signatures to a reference database.

Create a config file:

(See the Comparison Snakemake page for details on these options.)

Copy and paste the following:

cat > comparison.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a"
    },

    "workflows" : {
        "comparison_workflow_reads" : {
            "kvalue"    : ["21","31","51"],
        }
    },
}
EOF

Now, run the comparison_workflow_reads workflow:

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
          --configfile=comparison.json \
          comparison_workflow_reads

Taxonomic Classification¶

Taxonomic Classification with Sourmash¶

There are a number of taxonomic classification workflows implemented in Dahak. In this section we cover the use of the sourmash tool for taxonomic classification.

Before you begin, make sure you have everything listed on the Installing page available on your command line.

There are two taxonomic classification build rules that use sourmash: taxonomic_classification_signatures_workflow and taxonomic_classification_gather_workflow.

Signatures Workflow¶

The signatures workflow uses sourmash to compute k-mer signatures from read files. This is essentially the same as the compute signatures step in the comparison workflow.

(See the Taxonomic Classification Snakemake page for details on this workflow.)

cat > compute.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
    },

    "workflows" : {
        "taxonomic_classification_signatures_workflow" : {
            "sample"  : ["SRR606249_subset10"],
            "qual" : ["2","30"]
        }
    }
}
EOF

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
        --configfile=compute.json \
        taxonomic_classification_signatures_workflow

Gather Workflow¶

The gather workflow uses sourmash to (gather?) signatures computed from read files and compare them to signatures stored in a genome database.

Create a JSON file for the taxonomic classification gather workflow that defines a Snakemake configuration dictionary:

cat > gather.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
    },

    "workflows" : {
        "taxonomic_classification_gather_workflow" : {
            "sample"  : ["SRR606249_subset10"],
            "qual" : ["2","30"]
        }
    }
}
EOF

To run the gather workflow, we call Snakemake with the taxonomic_classification_gather_workflow target.

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
          --configfile=taxkaiju.json \
          taxonomic_classification_gather_workflow

Taxonomic Classification with Kaiju¶

There are several taxonomic classification workflows in Dahak that use the Kaiju tool as well. This section covers those workflows.

There are three taxonomic classification build rules that use kaiju:

taxonomic_classification_kaijureport_workflow
taxonomic_classification_kaijureport_filtered_workflow
taxonomic_classification_kaijureport_filteredclass_workflow

Kaiju Report Workflow¶

Create a JSON file that defines a Snakemake configuration dictionary:

cat > taxkaiju.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
    },

    "workflows" : {
        "taxonomic_classification_kaijureport_workflow" : {
            "sample"  : ["SRR606249_subset10"],
            "qual" : ["2","30"]
        }
    }
}
EOF

To run the taxonomic classification workflow to generate a kaiju report, we call Snakemake with the taxonomic_classification target.

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
          --configfile=taxkaiju.json \
          taxonomic_classification_kaijureport_workflow

Kaiju Filtered Species Report Workflow¶

The filtered kaiju workflow filters for species whose reads compose less than N% of the total reads, where N is a parameter set by the user.

Copy and paste the following:

cat > taxkaiju_filtered.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
    },

    "taxonomic_classification" : {
        "filter_taxa" : {
            "pct_threshold" : 1
        }
    },

    "workflows" : {
        "taxonomic_classification_kaijureport_filtered_workflow" : {
            "sample"  : ["SRR606249_subset10"],
            "qual" : ["2","30"]
        }
    }
}
EOF

To run the taxonomic classification filtered report workflow, we call Snakemake with the taxonomic_classification_kaijureport_filtered_workflow target.

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
          --configfile=taxkaiju_filtered.json \
          taxonomic_classification_kaijureport_filtered_workflow

Kaiju Filtered Species by Class Report Workflow¶

The last workflow implements filtering but also implements reporting the taxa level reported by kaiju. This iuses the "genus" taxonomic rank level by default.

Copy and paste the following:

cat > taxkaiju_filteredclass.json <<EOF
{
    "files" : {
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
    },

    "taxonomic_classification" : {
        "kaiju_report" : {
            "taxonomic_rank" : "genus"
        }
    },

    "workflows" : {

        "taxonomic_classification_kaijureport_filteredclass_workflow" : {
            "sample"  : ["SRR606249_subset10"],
            "qual" : ["2","30"]
        },
    }
}

To run the taxonomic classification workflow to generate this kaiju report, we call Snakemake with the taxonomic_classification_kaijureport_filteredclass_workflow target.

export SINGULARITY_BINDPATH="data:/data"

snakemake --use-singularity \
          --configfile=taxkaiju_filtered.json \
          taxonomic_classification_kaijureport_filteredclass_workflow