Alberlab API Reference

alab

class alab.contactmatrix(filename, genome=None, resolution=None, usechr=['#', 'X'])[source]

A flexible matrix instant that supports various methods for processing HiC contacts

Parameters:
  • filename (str) – matrix file stored in hdf5 format or an integer for the matrix size to initialize an empty matrix instance
  • genome (str) – genome e.g.’hg19’,’mm9’
  • resolution (int) – the resolution for the hic matrix e.g. 100000
  • usechr (list) – containing the chromosomes used for generating the matrix
matrix

numpy 2d array – storing all infor for the hic contact matrix

idx

numpy structure array – matrix index

genome

str – the genome

resolution

int – resolution for the contact matrix

applyed(method)[source]
assignDomain(domain, pattern='')[source]

Load Domain information

Parameters:
  • domain (alab.files.bedgraph instance) – bedgraph for domain definition
  • pattern (str) – a string use to filter the flags in the bedgraph
buildindex(**kwargs)[source]
columnsum()[source]
fmaxScaling(fmax, force=False)[source]

use fmax to generate probability matrix for uniform fmax, simply divide the matrix by fmax and clip to 1 for neighbouring contact fmax P[i,j] = F[i,j]/min(fmax[i],fmax[j])

fmaximization(**kwargs)[source]
getDomainMatrix(domainChrom, domainStartPos, domainEndPos, rowmask, minSize=1, maxSize=None)[source]

Return a submatrix defined by domainChrom, domainStartPos, domainEndPos

Parameters:
  • domainChrom (str) – domain chromosome e.g. ‘chr1’
  • domainStartPos (int) – start position e.g. 0
  • domainEndPos (int) – end position e.g. 700000
  • minSize (int, > 0) – min domain size
  • maxSize (int, optional) – max domain size, in bins if the domain is larger than a given number of bins, this function will return None
getICP(index)[source]

return inter-chromosomal proportion of a given bin index

getfmax(method='UF', minSize=1, maxSize=2000, removeZero=False, boxplotTrim=False, offdiag=1, target='median')[source]

calculate fmax based on different methods

Parameters:
  • method (str) – NM #neighbouring max UF #uniform fmax
  • target (str) – ‘mean’/’median’
identifyInterOutliersCutoff(N=100)[source]

Identify interchromosome outliers’ cutoff Do an N round random choice as the original contact freq distribution and estimate the sample std for every contact freq If the sample std is larger than half of the frequency (contact #), lable this contact frequency as spourious cutoff is set to the number that first consecutive 2 non-spourious frequency from the right side (scan from high frequency to low)

iterativeFmaxScaling(domainAverageContacts=23.2, tol=0.01)[source]

Automatic fmax scaling to get domain level matrix and match the rowsum average domain level matrix to domainAverageContacts

krnorm(mask=None, force=False, **kwargs)[source]

using krnorm balacing the matrix (overwriting the matrix!)

Parameters:
  • mask (list/array) – mask is a 1-D vector with the same length as the matrix where 1s specify the row/column to be ignored or a 1-D vector specifing the indexes of row/column to be ignored if no mask is given, row/column with rowsum==0 will be automatically detected and ignored
  • large_mem (bool) – when large_mem is set to 1, matrix product is calculated using small chunks, but this will slowdown the process a little bit.
makeDomainLevelMatrix(method='topmean', top=10, removeOutlier=True)[source]

Use domain INFO to generate Domain level matrix

Parameters:
  • method (str) – “topmean” or “median”
  • top (int 0<top<100) – the top percentage to calculate the mean, top=10 means top 10% of the subdomain matrix
  • removeOutlier (bool) – option to remove outlier using 1.5IQR
makeIntraMatrix(chrom)[source]

substract a chromsome matrix given a chromsome name

Parameters:chrom (str, chromosome name e.g 'chr1') –
plot(figurename, log=False, **kwargs)[source]

plot the matrix heat map

Parameters:
  • figurename (str) –
  • log (bool) – if True, plot the log scale of the matrix if False, plot the original matrix
  • clip_max (float) –
  • clip_min (float) – 2 options that will clip the matrix to certain value
  • cmap (matplotlib.cm instance) – color map of the matrix
  • label (str) – label of the figure
plotSum(figurename, outlier=False, line=None, **kwargs)[source]

Print the rowsum frequency histogram

Parameters:
  • figurename (string) – Name of the plot
  • outlier (bool) – option to select plotting the outlier line, only functioning if ‘line’ parameter is set to None
  • line (float/array/list) – draw vertical lines at a list of positions
plotZeroCount(figurename, **kwargs)[source]
range(chrom)[source]

return the index range for a give chromsome

removeDiagonal(force=False)[source]
removePoorRegions(cutoff=1, usepvalue=0.1, force=False)[source]

Removes “cutoff” percent of bins with least counts

Parameters:
  • cutoff (int, 0<cutoff<100) – Percent of lowest-counts bins to be removed
  • usepvalue (float, 0<usepvalue<1) – use this pvalue as correlation cutoff to remove bins bins whose pvalue greater than this cutoff will be removed
rowsum()[source]
save(filename)[source]

Save the matrix along with information in hdf5 file

scale(cellaverage=1)[source]

Scale matrix so that average of cells is the given value. By default, the rowsum will be the number of rows/columns

smoothGenomeWideHighValue(w=3, s=3, p=3, z=5, force=False)[source]

Use power law smoothing function to smooth high spikes in chromosomes blocks

Parameters:
  • w (int) – the window size, the smoothing is computed using target +/- w
  • s (int) – weight of the location deviation
  • p (int) – power of the location deviation
  • z (int) – range of standard deviation to set cutoff
smoothInterContactByCutoff(cutoff, w=3, s=3, p=3, force=False)[source]

given the cutoff, run a power law smoothing for the interchromosome matrix for contacts > cutoff

vcnorm(iterations=1, mask=None, force=False)[source]
class alab.tadmodel(probfile, nucleusRadius=5000.0, chromosomeOccupancy=0.2, contactRange=1, level=None)[source]

A wrapper to do IMP modeling in TAD level beads

Parameters:
  • probfile (alab.matrix.contactmatrix instant) – probability matrix at TAD level, alab.matrix.contactmatrix hdf5 format is required
  • nucleusRadius (float) – radius of nucleus, default 5000(nm)
  • contactRange (int) – folds for surface-surface contact coefficient
  • level (loglevel,) – default:None will record everything during caculation debug, info, warning, error, critical are supported
CondenseChromosome(rrange=0.5)[source]

Collapse chains around centromere beads

Parameters:rrange (float) – scale parameter in [0,1] for the radius limit
SimulatedAnnealing(hot, cold, nc=10, nstep=500)[source]

perform a cycle of simulated annealing from hot to cold

SimulatedAnnealing_Scored(hot, cold, nc=10, nstep=500, lowscore=10)[source]

perform a cycle of simulated annealing but stop if reach low score

cache_coordinates()[source]
cgstep(step, silent=False)[source]

perform conjugate gradient on model using scoring function sf

evaluateRestraints(restraintset, tolerance=0.05)[source]
mdstep(t, step, gamma=0.1, silent=False)[source]
mdstep_withChromosomeTerritory(t, step)[source]

perform an mdstep with chromosome terriory restraint

Parameters:
  • t (int) – temperature
  • step (int) – optimization steps
saveCoordinates(filename, prefix)[source]
savepym(filename)[source]
savepym_withChromosome(filename, s=1, v=1)[source]
set_consecutiveBeads(lowprob=0.1)[source]
set_contactRestraints(actdist, kspring=1)[source]
set_coordinates(coordinates=None)[source]
set_excludedVolume(ksping=1, slack=10)[source]
set_fmaxRestraints(kspring=1)[source]
set_nucleusEnvelope(kspring)[source]
shrinkingOptimization(drange, shrinkScore, minscore, interScale)[source]
updateScoringFunction(restraintset=None)[source]
class alab.modelstructures(filename, usegrp)[source]

Instance manipulating hdf5 packed model structures take .hms file generated by tad modeling

Parameters:
  • filename (str) – model result file *.hms
  • usegrp (str) – group label array, usually assigned with probability prefix like [‘1’,‘0.2’]
class alab.structuresummary(target, usegrp=None, nstruct=10000, pid=10, **kwargs)[source]

This class offers a series of methods to study the model structure populations.

Parameters:
  • target (str) – the output directory for population structures, containing copy*.hms files or can be seen as summary file *.hss
  • usegrp (str) – the probablility key used in modeling, e.g. p005j
  • nstruct (int) – number of structures to read
findBinIndex(chrom, start, end)[source]

To find bead indexes given a chromosome region

Parameters:
  • chrom (str) – chromosome, should match the .idx representation
  • start,end (int) – location range
Returns:

Return type:

Bin indexes array, or None if there is no valid ones

getABCopyMeanBeadRadialPosition(nucleusRadius=5000.0)[source]

Calculate mean radial position for every bead in structures, and differentiate diploid copy into A/B by inner or outer radial position

Parameters:nucleusRadius (float) – radius of nucleus, default 5000(nm)
Returns:Two 1-D array
Return type:mean radian position for all beads, and first array contains the inner bead in the diploid genome.
getAveragePairwiseDistance(form='list')[source]

Calculate pairwise distance mean for each pair of beads in the structure population

Parameters:form (str) – the return form of the function ‘list’ return the list form ‘matrix’ return the matrix form
getBeadRadialPosition(beads, nucleusRadius=5000.0)[source]

Calculate radial position for every bead in the input list beads

Parameters:
  • beads (array-like) – list of all beads to calculate
  • nucleusRadius (float) – radius of nucleus, default 5000(nm)
Returns:

M*N matrix – M = len(beads) N = number of structures in population

Return type:

radial position for all beads in the input and all structures in population

getChromosomeRadialPosition(chrom, nucleusRadius=5000.0)[source]

Calculate radial position for the chrom

Parameters:
  • chrom (str) – the chromosome to calculate
  • nucleusRadius (float) – radius of nucleus, default 5000(nm)
Returns:

2N*1 vector

Return type:

radial position for the chromosome in all structures in population

getContactMap(contactRange=1)[source]

Return contact matrix format contact heatmap

getPairDistance(bead1, bead2)[source]

Calculate pairwise distance across all structures

Parameters:bead1,bead2 (int) – two bead in a pair, input to calculate distance
Returns:
Return type:numpy array that has distance for the bead pair.
plotRadialPosition(figurename, chrom, format='pdf', color='dodgerblue', nucleusRadius=5000.0)[source]

Plot Radial Position of beads for given chromosome

Parameters:
  • figurename (str) – name of the figure
  • chrom (str) – given chromosome name
  • color (str) – given the color for ploting
save(filename)[source]

save all info into disk

totalRestraints

returns: numpy array :rtype: all restraints for each structure

totalViolations

returns: numpy array :rtype: all violations for each structure

violationPercentage

returns: numpy array :rtype: violation percentage for each structure

alab.files

class alab.files.modelgroup(grouphandler, genome, idx)[source]