Table of contents

Exploring assembly results

Shasta has functionality for exploring details of many data structures used during assembly. It can be useful to understand assembler behavior for testing, debugging, parameter optimization, development of new algorithms, or for gaining insight not available by visually examining the assembled sequence files. When this functionality is activated, Shasta behaves as an http server, and the user communicates with it via a standard Internet browser.

Follow the direction below to activate this functionality.

Starting the Shasta http server on Linux

The Shasta http server uses Graphviz software to display graphs. To install it, use one of the following commands depending on the Linux system you are using:

To start the Shasta http server on Linux, follow these steps:

  1. Run an assembly with option --memoryMode filesystem. When this option is used, binary data used by the assembly are stored in memory mapped files that remain accessible after assembly is complete.
    • If you don't have root access via sudo on the machine you are using, also use option --memoryBacking disk. This is slower but guarantees that the results remain permanently available after an assembly completes, unless you use --command cleanupBinaryData.
    • If you do have root access via sudo on the machine you are using and you want to maximize assembly performance, also use option --memoryBacking 2M, which results in binary data being stored in memory on the Linux hugetlbfs filesystem (2 MB "huge" pages). These data only remain available until the next reboot or until you clean them up via --command cleanupBinaryData, but you can save them persistently on disk using --command saveBinaryData.
  2. Run the assembler again, this time specifying option --command explore, plus the same --assemblyDirectory option used for the assembly run (default is ShastaRun).

See here for more information on the --memoryMode and --memoryBacking command line options.

Starting the Shasta http server on macOS

The Shasta http server uses Graphviz software to display graphs. It also needs command gtimeout which is part of the coreutils package. To install them, use this command:

brew install graphviz coreutils

To start the Shasta http server on macOS, follow these steps:

  1. Run an assembly as usual. The macOS version of Shasta always stores binary data on disk. This is slower but guarantees that the binary data remain permanently available after an assembly completes, unless you use --command cleanupBinaryData.
  2. Run the assembler again, this time specifying option --command explore, plus the same --assemblyDirectory option used for the assembly run (default is ShastaRun).

Using a browser to explore assembly results

Following the above directions will start a Shasta process in a mode in which it behaves as an http server. It will also start your default browser and point it to the Shasta http server process. If you want to start additional browser sessions, just point your browser to the URL shown when the Shasta http server starts, usually http://localhost:17100.

The browser session will initially show an assembly summary page. At the top you will see a navigation menu that allows you to explore details of many assembler data structures. For example, this allows you to look at local subgraphs of the read graph, marker graph, and assembly graph (see here for more information). It also provides details of sequence assembly and of the input reads used. Finally, there are several useful links between the various data structures. For example, you can easily navigate from the assembly graph to the marker graph and vice versa, and from the read graph to the reads.

When you are done using the browser, remember to stop the server using Ctrl^C in the command window in which you started the server. You can restart the server later as many times as you like, as long as the binary data remain available.

Access control

By default, the server only responds to requests from the same user and computer running the server. However, you can use the command line option --exploreAccess to relax this restriction:

Things you can do with the HTTP server

and several other things.

Screenshots

Below are some sample screenshots obtained using --command explore.

Assembly Summary


Investigating individual reads

You can see the run-length representation of the read along with its markers.

Investigating Read Graph


Investigating local Marker Graph


Investigate Assembly Graph


Table of contents