Skip to content

Snakemake Configuration

The workflows/config/ directory contains files used to set key-value pairs in the configuration dictionary used by Snakemake.

To use a custom configuration file (JSON or YAML format), use the --configfile flag:

snakemake --configfile my_workflow_params.json ...

Data Files, Workflow Config, and Parameters

The dahak/workflows/config/ directory contains configuration files for Dahak workflows. There are three types of configuration details that the user may wish to customize:

How do I specify my data files?

To set the names and URLs of read files, set the files key in the Snakemake configuration dictionary to a list of key-value pairs, where the key is the name of the read file and the value is the URL of the file (do not include http:// or https:// in the URL).

For example, the following JSON block will provide a list of reads and their corresponding URLs:

{
    "files" : {
        "SRR606249_1_reads.fq.gz" :           "files.osf.io/v1/resources/dm938/providers/osfstorage/59f0f9156c613b026430dbc7",
        "SRR606249_2_reads.fq.gz" :           "files.osf.io/v1/resources/dm938/providers/osfstorage/59f0fc7fb83f69026076be47",
        "SRR606249_subset10_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10134b83f69026377611b",
        "SRR606249_subset10_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f101f26c613b026330e53a",
        "SRR606249_subset25_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f1039a594d900263120c38",
        "SRR606249_subset25_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f104ed594d90026411f486",
        "SRR606249_subset50_1_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f1082d6c613b026430e5cf",
        "SRR606249_subset50_2_reads.fq.gz" :  "files.osf.io/v1/resources/dm938/providers/osfstorage/59f10ac6594d900262123e77",
    }
}

This can be placed in a JSON file like dahak/workflows/config/custom_datafiles.json and passed to Snakemake using the --config flag like:

snakemake --config=config/custom_datafiles.json \
        [FLAGS] <target>

NOTE: Dahak assumes that read files are not present on the local machine and uses a rule to download the reads from the given URL if they are not present. If your read files are present locally, they must be put into the temporary directory (set by data_dir key in the Snakemake configuration dictionary, data/ by default) so that Snakemake will find them. If Snakemake finds the read files, the download command will not be run. You can use an empty string for the URLs in this case.

How do I specify my workflow configuration?

The workflow configuration file specifies details about which workflow should be run. This is typically just the name of the samples to run the workflow on, and a quality value or a k value.

Set the workflows key of the Snakemake configuration dictionary to a dictionary containing details about the workflow you want to run. The workflow configuration values are values that will end up in the final output file's filename.

{
    "workflows" : {
        "name_of_workflow" : {
            "workflow_option" : "workflow_option_value"
        }
    }
}

For example, here is what the workflow configuration looks like for the assembly workflow:

{
    "workflows" : {
        "assembly_workflow_metaspades" : {
            "sample"    : ["SRR606249_subset10","SRR606249_subset25"],
            "qual"      : ["2","30"],
        },

        "assembly_workflow_megahit" : {
            "sample"    : ["SRR606249_subset10","SRR606249_subset25"],
            "qual"      : ["2","30"],
        },

        "assembly_workflow_all" : {
            "sample"    : ["SRR606249_subset10","SRR606249_subset25"],
            "qual"      : ["2","30"],
        },

    }
}

Each workflow component has a "Snakemake" page that covers workflow configuration options and details for the respective workflow. Use the navigation menu on the left and select the workflow component of interest, then pick the "Snakemake" page.

How do I specify my workflow parameters?

Workflow parameters specify parameters that are used when executing individual workflow steps. These parameters are not incorporated in the final filename and are usually more extensive.

These are set using a key that is the name of the workflow component. For example, the assembly workflow parameters section of the workflow parameters file looks like this:

{
    "assembly" : {
        "assembly_patterns" : {
            "metaspades_pattern" : "{sample}.trim{qual}_metaspades.contigs.fa",
            "megahit_pattern" : "{sample}.trim{qual}_megahit.contigs.fa",
            "assembly_pattern" : "{sample}.trim{qual}_{assembler}.contigs.fa",
            "quast_pattern" : "{sample}.trim{qual}_{assembler}_quast/report.html",
            "multiqc_pattern" : "{sample}.trim{qual}_{assembler}_multiqc/report.html",
        }
    }
}

Each workflow component has a "Snakemake" page that covers workflow parameters and details for the respective workflow. Use the navigation menu on the left and select the workflow component of interest, then pick the "Snakemake" page.

.settings (Defaults) vs .json (Overriding Defaults)

The *.settings files are Python scripts that set the default Snakemake configuration dictionary. Each *.settings file is prefixed by default_ because it is setting default values.

If the user does not specify a particular configuration dictionary key-value pair, then the default key-value pair will be used. However, if the user sets a key-value pair, it will override the default value.

This allows the user to customize any configuration key-value pair used by a workflow without having to explicitly specify every configuration key-value pair.

For example, by default the version of each container image for each program obtained from the biocontainers project is specified in the top-level biocontainers key in default_parameters.settings:

config_default = {

    ...

    "biocontainers" : {
        "sourmash" : {
            "use_local" : False,
            "quayurl" : "quay.io/biocontainers/sourmash",
            "version" : "2.0.0a3--py36_0"
        },
        "trimmomatic" : {
            "use_local" : False,
            "quayurl" : "quay.io/biocontainers/trimmomatic",
            "version" : "0.36--5"
        },
        "khmer" : {
            "use_local" : False,
            "quayurl" : "quay.io/biocontainers/khmer",
            "version" : "2.1.2--py35_0"
        },

        ...

(Note the *.settings files and the above code are Python, not JSON.)

If the user wishes to bump the version of trimmomatic to (e.g.) 0.38, but not change the version of khmer or sourmash, the user can specify a JSON file with the following configuration block:

{
    "biocontainers" : {
        "trimmomatic" : {
            "use_local" : false,
            "quayurl" : "quay.io/biocontainers/trimmomatic",
            "version" : "0.38--5"
        }
    }
}

This can be placed in a JSON file like dahak/workflows/config/custom_workflowparams.json and passed to Snakemake using the --config flag like:

snakemake --config=config/custom_workflowparams.json \
        [FLAGS] <target>

This will only override the container version used for trimmomatic, and will use the defaults for all other container images.