Assembly configurations

Shasta provides a number of command line options that can be used to set computational parameters and thresholds for assemblies. All of these options have default values, but the default values are not necessarily optimal for any particular combination of a number of factors:

The technology used to generate the reads. Technologies currently available to generate the long reads supported by Shasta are Oxford Nanopore (ONT) and Pacific BioSciences (HiFi and others).
The amount of coverage available (average number of reads overlapping each genome region).
The characteristics of the genome being sequenced, including heterozygosity, ploidy, and repeats content.

To adjust to these and other factors, options adjustments are generally necessary to achieve good quality assemblies. To facilitate the process of generating useful assembly options for a particular situation, Shasta uses assembly configurations. An assembly configuration is a predefined set of assembly options that can be stored in a configuration file in a format defined below. A number of sample configuration files applicable to specific situations are provided in shasta/conf. The applicability of each of the files is described in comments embedded in each file.

Shasta command line option --config is used to specify the configuration to be used, as described below in details. This option is mandatory when running an assembly. If any option is specified both in a configuration and explictly on the command line, the value on the command line takes precedence. This allows you to use a configuration as a useful set of defaults, while still overriding some of its options as desired.

In addition to configuration files, Shasta also provides a set of built-in configurations that are compiled in the Shasta executable. These built-in configurations can be used without the need for a configuration file. Each built-in configuration has a corresponding configuration file with the same name in shasta/conf, with an extension .conf. For example, configuration Nanopore-Oct2021 can be specified in one of two ways:

shasta --config Nanopore-May2022

shasta --config .../shasta/conf/Nanopore-May2022.conf

When using the second form, the file must be available, and the ... should be replaced depending on the location of the shasta directory.

To obtain a list of available built-in configurations, use Shasta command listConfigurations as follows:

shasta --command listConfigurations

At the time of writing (May 2022), this outputs the following list of built-in configurations:

Nanopore-Dec2019
Nanopore-UL-Dec2019
Nanopore-Jun2020
Nanopore-UL-Jun2020
Nanopore-Sep2020
Nanopore-UL-Sep2020
Nanopore-UL-iterative-Sep2020
Nanopore-OldGuppy-Sep2020
Nanopore-Plants-Apr2021
Nanopore-Oct2021
Nanopore-UL-Oct2021
HiFi-Oct2021
Nanopore-UL-Jan2022
Nanopore-Phased-Jan2022
Nanopore-UL-Phased-Jan2022
Nanopore-May2022
Nanopore-Phased-May2022
Nanopore-UL-May2022
Nanopore-UL-Phased-May2022
Nanopore-Human-SingleFlowcell-May2022
Nanopore-Human-SingleFlowcell-Phased-May2022

The following table summarizes configurations recommended at the time of writing (May 2022) under the following conditions:

Human assemblies
Oxford Nanopore reads.
Guppy 5.0.7 with "super" accuracy.

These configurations will first be available in Shasta release 0.10.0.

Read type	Coverage	Haploid assembly	Phased assembly
Standard reads	40x to 80x	`Nanopore-May2022`	`Nanopore-Phased-May2022`
Ultra-Long (UL) reads (N₅₀ ≳ 60 Kb)	40x to 80x	`Nanopore-UL-May2022`	`Nanopore-UL-Phased-May2022`
Standard reads	Human genome with a single flowcell (low coverage, around 30x)	`Nanopore-Human-SingleFlowcell-May2022`	`Nanopore-Human-SingleFlowcell-Phased-May2022`

To get details of a specific built-in configuration use Shasta command listConfiguration as follows, specifiying the built-in configuration of interest after --config:

shasta --command listConfiguration --config Nanopore-May2022

This output includes comments that describe the applicability of the selected configuration. Details of the configuration are written out in the configuration file format defined below. This allows you to create your own configuration file using a built-in configuration as a starting point.

Shasta command line option --config must be used to specified the desired configuration to be used for an assembly. The option must specify either a build-in configuration or a path to a configuration file.

Configuration file

Some options are only allowed on the command line, but most of them can also optionally be specified using a configuration file. Values specified on the command line take precedence over values specified in the configuration file. This makes it easy to override specific values in a configuration file.

Options that can be specified both on the command line and in a configuration file are of the form --SectionName.optionName. The format of the configuration file is as follows:

[SectionA]
option1 = valueA1
option2 = valueA2
[SectionB]
option1 = valueB1
option2 = valueB2

The above is equivalent to using the following command line options:

--SectionA.option1 valueA1 
--SectionA.option2 valueA2 
--SectionB.option1 valueB1 
--SectionB.option2 valueB2

For example, the value for option MarkerGraph.minCoverage can be specified in the [MarkerGraph] section of the configuration file as follows:

[MarkerGraph]
minCoverage = 0

In the configuration file, blank lines and lines begining with # are ignored and can be used to add coments and to improve readability of the configuration file.

Boolean switches

Some command line options are boolean switches, that is, control options that can be turned on or off rather then be given a value.

To turn on one of these switches on the command line, just add it to the command line without any value, for example --Assembly.storeCoverageData. To turn it off, just omit it from the command line (the default value is turned off).

To turn on one of these switches in a configuration file, you can either enter it without value

storeCoverageData =

or assign to it one of the following values: 1, true, True, yes, Yes. To turn off one of these switches in a configuration file, assign to it one of the following values: 0, false, False,no, No.

Boolean switches are indicated as such in the Description column in he tables below.

Table of contents