This assumes that you completed successfuly the test described under Getting Started. You can begin by copying the run.py and runServer.py scripts from that run to the directory you want to use for your run with real data. Those scripts will probably require some customization as described below, but are a useful starting point.
The ExpressionMatrix2
software currently accepts input in three formats:
ExpressionMatrix2
software
built with HDF5 functionality - see here for additional prerequisites for that).
See here for detailed information on how to use each of these three formats.
Before a graph can be created in the http server, it is necessary to compute at least a set of pairs of similar cells. This is done in run.py with a call to findSimilarPairs0, which uses the following parameters:
For a large run, approximate computation of similar pairs is much faster than exact computation and gives very similar results. In both cases, the computing cost is O(N2), that is, proportional to the square of the number of cells. For a run with a few thousand cellsm exact computatioon will require hours, but approximate computation will require just minutes. It is possible to compute both exact and approximate similar pairs in the same run, as long as they are given different names in the call to findSimilarPairs0.
Future versions of the code will offer faster ways to find similar cell pairs that will have a better scaling than O(N2).
The ExpressionMatrix2 code uses binary files mapped in memory to store its data structures. It is likely that the binary format of these files will change as the code gets developed. This means that newer versions of ExpressionMatrix2 will not be able to access binary files created by older versions. In other words, binary compatibility between versions is not guaranteed. For this reason, the binary files should not be used for long-term storage of expression matrix data.