This script initializes the expression matrix from the data in the input files.
#!/usr/bin/python3
This line allows the script to be called directly from the shell,
"./input.py".
This assumes that you are on a platform that uses Python 3.
If you are on a platform that uses Python 2, you instead need to
invoke the script as
"python input.py".
from ExpressionMatrix2 import *
This makes the ExpressionMatrix2
code accessible
from Python, without the need to prefix it with a module name.
This is not necessarily a good idea, particularly for a large script,
but it does simplify the code a bit.
For this to work, ExpressionMatrix2.so
must be located in
a directory where the Python interpreter can find it.
There are several ways to do that, the simplest of which consists
of simply setting environment variable PYTHONPATH
to the name of the directory that contains ExpressionMatrix2.so
.
# Create a new, empty expression matrix. # The data directory must not exist. e = ExpressionMatrix( directoryName = 'data', geneCapacity = 100000, cellCapacity = 10000, cellMetaDataNameCapacity = 10000, cellMetaDataValueCapacity = 1000000 )
This creates the new ExpressionMatrix
object which,
at this point, is empty (that is, it does not contain any genes or cells).
The specified directory name must not exists, and is used
to store all subsequent data structures needed for this
ExpressionMatrix
object.
The four capacity arguments control the capacity of various
hash tables used to store genes names, cell names,
and cell meta names and values.
To avoid performance degradation in the hash tables, make sure to
set the capacities to at least a factor of two greater
that what you think you will need.
There is currently no automatic rehashing of the tables,
so if one of the capacities is exceeded the run will have to
be restarted from scratch with larger capacities.
See here for reference information on
the ExpressionMatrix
constructors.
# Add the cells. e.addCells( expressionCountsFileName = 'GBM_raw_gene_counts.csv', expressionCountsFileSeparators = ' ', cellMetaDataFileName = 'GBM_metadata.csv', cellMetaDataFileSeparators = ' ' )
This causes the expression data and the cell meta data contained
in the input files to be stored in the ExpressionMatrix
object (that is, in binary files in the data
directory).
See
here
for more information on the
addCells
call.
print('Input completed.')
Self-explanatory.