Alberlab API Reference¶
alab¶
-
class
alab.
contactmatrix
(filename, genome=None, resolution=None, usechr=['#', 'X'])[source]¶ A flexible matrix instant that supports various methods for processing HiC contacts
Parameters: - filename (str) – matrix file stored in hdf5 format or an integer for the matrix size to initialize an empty matrix instance
- genome (str) – genome e.g.’hg19’,’mm9’
- resolution (int) – the resolution for the hic matrix e.g. 100000
- usechr (list) – containing the chromosomes used for generating the matrix
-
matrix
¶ numpy 2d array – storing all infor for the hic contact matrix
-
idx
¶ numpy structure array – matrix index
-
genome
¶ str – the genome
-
resolution
¶ int – resolution for the contact matrix
-
assignDomain
(domain, pattern='')[source]¶ Load Domain information
Parameters: - domain (alab.files.bedgraph instance) – bedgraph for domain definition
- pattern (str) – a string use to filter the flags in the bedgraph
-
fmaxScaling
(fmax, force=False)[source]¶ use fmax to generate probability matrix for uniform fmax, simply divide the matrix by fmax and clip to 1 for neighbouring contact fmax P[i,j] = F[i,j]/min(fmax[i],fmax[j])
-
getDomainMatrix
(domainChrom, domainStartPos, domainEndPos, rowmask, minSize=1, maxSize=None)[source]¶ Return a submatrix defined by domainChrom, domainStartPos, domainEndPos
Parameters: - domainChrom (str) – domain chromosome e.g. ‘chr1’
- domainStartPos (int) – start position e.g. 0
- domainEndPos (int) – end position e.g. 700000
- minSize (int, > 0) – min domain size
- maxSize (int, optional) – max domain size, in bins if the domain is larger than a given number of bins, this function will return None
-
getfmax
(method='UF', minSize=1, maxSize=2000, removeZero=False, boxplotTrim=False, offdiag=1, target='median')[source]¶ calculate fmax based on different methods
Parameters: - method (str) – NM #neighbouring max UF #uniform fmax
- target (str) – ‘mean’/’median’
-
identifyInterOutliersCutoff
(N=100)[source]¶ Identify interchromosome outliers’ cutoff Do an N round random choice as the original contact freq distribution and estimate the sample std for every contact freq If the sample std is larger than half of the frequency (contact #), lable this contact frequency as spourious cutoff is set to the number that first consecutive 2 non-spourious frequency from the right side (scan from high frequency to low)
-
iterativeFmaxScaling
(domainAverageContacts=23.2, tol=0.01)[source]¶ Automatic fmax scaling to get domain level matrix and match the rowsum average domain level matrix to domainAverageContacts
-
krnorm
(mask=None, force=False, **kwargs)[source]¶ using krnorm balacing the matrix (overwriting the matrix!)
Parameters: - mask (list/array) – mask is a 1-D vector with the same length as the matrix where 1s specify the row/column to be ignored or a 1-D vector specifing the indexes of row/column to be ignored if no mask is given, row/column with rowsum==0 will be automatically detected and ignored
- large_mem (bool) – when large_mem is set to 1, matrix product is calculated using small chunks, but this will slowdown the process a little bit.
-
makeDomainLevelMatrix
(method='topmean', top=10, removeOutlier=True)[source]¶ Use domain INFO to generate Domain level matrix
Parameters: - method (str) – “topmean” or “median”
- top (int 0<top<100) – the top percentage to calculate the mean, top=10 means top 10% of the subdomain matrix
- removeOutlier (bool) – option to remove outlier using 1.5IQR
-
makeIntraMatrix
(chrom)[source]¶ substract a chromsome matrix given a chromsome name
Parameters: chrom (str, chromosome name e.g 'chr1') –
-
plot
(figurename, log=False, **kwargs)[source]¶ plot the matrix heat map
Parameters: - figurename (str) –
- log (bool) – if True, plot the log scale of the matrix if False, plot the original matrix
- clip_max (float) –
- clip_min (float) – 2 options that will clip the matrix to certain value
- cmap (matplotlib.cm instance) – color map of the matrix
- label (str) – label of the figure
-
plotSum
(figurename, outlier=False, line=None, **kwargs)[source]¶ Print the rowsum frequency histogram
Parameters: - figurename (string) – Name of the plot
- outlier (bool) – option to select plotting the outlier line, only functioning if ‘line’ parameter is set to None
- line (float/array/list) – draw vertical lines at a list of positions
-
removePoorRegions
(cutoff=1, usepvalue=0.1, force=False)[source]¶ Removes “cutoff” percent of bins with least counts
Parameters: - cutoff (int, 0<cutoff<100) – Percent of lowest-counts bins to be removed
- usepvalue (float, 0<usepvalue<1) – use this pvalue as correlation cutoff to remove bins bins whose pvalue greater than this cutoff will be removed
-
scale
(cellaverage=1)[source]¶ Scale matrix so that average of cells is the given value. By default, the rowsum will be the number of rows/columns
-
smoothGenomeWideHighValue
(w=3, s=3, p=3, z=5, force=False)[source]¶ Use power law smoothing function to smooth high spikes in chromosomes blocks
Parameters: - w (int) – the window size, the smoothing is computed using target +/- w
- s (int) – weight of the location deviation
- p (int) – power of the location deviation
- z (int) – range of standard deviation to set cutoff
-
class
alab.
tadmodel
(probfile, nucleusRadius=5000.0, chromosomeOccupancy=0.2, contactRange=1, level=None)[source]¶ A wrapper to do IMP modeling in TAD level beads
Parameters: - probfile (alab.matrix.contactmatrix instant) – probability matrix at TAD level, alab.matrix.contactmatrix hdf5 format is required
- nucleusRadius (float) – radius of nucleus, default 5000(nm)
- contactRange (int) – folds for surface-surface contact coefficient
- level (loglevel,) – default:None will record everything during caculation debug, info, warning, error, critical are supported
-
CondenseChromosome
(rrange=0.5)[source]¶ Collapse chains around centromere beads
Parameters: rrange (float) – scale parameter in [0,1] for the radius limit
-
SimulatedAnnealing
(hot, cold, nc=10, nstep=500)[source]¶ perform a cycle of simulated annealing from hot to cold
-
SimulatedAnnealing_Scored
(hot, cold, nc=10, nstep=500, lowscore=10)[source]¶ perform a cycle of simulated annealing but stop if reach low score
-
class
alab.
modelstructures
(filename, usegrp)[source]¶ Instance manipulating hdf5 packed model structures take .hms file generated by tad modeling
Parameters: - filename (str) – model result file *.hms
- usegrp (str) – group label array, usually assigned with probability prefix like [‘1’,‘0.2’]
-
class
alab.
structuresummary
(target, usegrp=None, nstruct=10000, pid=10, **kwargs)[source]¶ This class offers a series of methods to study the model structure populations.
Parameters: - target (str) – the output directory for population structures, containing copy*.hms files or can be seen as summary file *.hss
- usegrp (str) – the probablility key used in modeling, e.g. p005j
- nstruct (int) – number of structures to read
-
findBinIndex
(chrom, start, end)[source]¶ To find bead indexes given a chromosome region
Parameters: - chrom (str) – chromosome, should match the .idx representation
- start,end (int) – location range
Returns: Return type: Bin indexes array, or None if there is no valid ones
-
getABCopyMeanBeadRadialPosition
(nucleusRadius=5000.0)[source]¶ Calculate mean radial position for every bead in structures, and differentiate diploid copy into A/B by inner or outer radial position
Parameters: nucleusRadius (float) – radius of nucleus, default 5000(nm) Returns: Two 1-D array Return type: mean radian position for all beads, and first array contains the inner bead in the diploid genome.
-
getAveragePairwiseDistance
(form='list')[source]¶ Calculate pairwise distance mean for each pair of beads in the structure population
Parameters: form (str) – the return form of the function ‘list’ return the list form ‘matrix’ return the matrix form
-
getBeadRadialPosition
(beads, nucleusRadius=5000.0)[source]¶ Calculate radial position for every bead in the input list beads
Parameters: - beads (array-like) – list of all beads to calculate
- nucleusRadius (float) – radius of nucleus, default 5000(nm)
Returns: M*N matrix – M = len(beads) N = number of structures in population
Return type: radial position for all beads in the input and all structures in population
-
getChromosomeRadialPosition
(chrom, nucleusRadius=5000.0)[source]¶ Calculate radial position for the chrom
Parameters: - chrom (str) – the chromosome to calculate
- nucleusRadius (float) – radius of nucleus, default 5000(nm)
Returns: 2N*1 vector
Return type: radial position for the chromosome in all structures in population
-
getPairDistance
(bead1, bead2)[source]¶ Calculate pairwise distance across all structures
Parameters: bead1,bead2 (int) – two bead in a pair, input to calculate distance Returns: Return type: numpy array that has distance for the bead pair.
-
plotRadialPosition
(figurename, chrom, format='pdf', color='dodgerblue', nucleusRadius=5000.0)[source]¶ Plot Radial Position of beads for given chromosome
Parameters: - figurename (str) – name of the figure
- chrom (str) – given chromosome name
- color (str) – given the color for ploting
-
totalRestraints
¶ returns: numpy array :rtype: all restraints for each structure
-
totalViolations
¶ returns: numpy array :rtype: all violations for each structure
-
violationPercentage
¶ returns: numpy array :rtype: violation percentage for each structure