Tools¶

Analysis tools to get results shown under the report/ directory

PGS package provides tools for analyzing the final structure population (see Alberlab API Reference). The following are some examples of how to extract information from the structure-population.

First, identify which model you would like to see, e.g. result/structure/copy0.hms. You can check group names in the file using a simple bash command:

$ h5ls result/structure/copy0.hms

And something like this will appear:

01b                    Group
01a                    Group
01                     Group
05                     Group
1                      Group
2                      Group
                      Group
genome                   Dataset {SCALAR}
idx                      Dataset {2320}

In that example, the “idx” has information of 2320 TADs and it saves iteration snapshots at theta = {1, .., 0.01, 0.01a, 0.01b}. Thus the final structure is in group “0.01b” (at theta level p=0.01 there are 3 A/M iteration cycles).

Getting the 3D coordinates of the genome
import alab hmsfile = 'result/structure/copy0.hms problvl = '0.01a' hms = alab.modelstructures(hmsfile, [problvl]) TADidx = hms.idx #TADs information xyz = hms[0].xyz #diploid set of coordinates
Now the user can use the coordinates, stored in xyz, to do any analysis. The TAD information with genomic location is stored in `TADidx` variable. In the following we provide some other usage of coordinates.

Getting the contact probability map

We show example lines on how to get contact probability maps under result/report/heatmap and result/report/intraMatrix`

import alab

hmsfiledir = 'result/structure'
problvl = '0.01a'
nstruct = 1000
summary = alab.structuresummary(hmsfiledir, problvl, nstruct)
m = summary.getContactMap()
m.plot('heatmap.png',format='png',clip_max=1)
m.makeIntraMatrix('chr1').plot('chr1_heatmap.pdf',format='pdf',clip_max=1)

Getting the radial position
Radial position of a TAD is calculated by the average radial positions across the structure population of that TAD. 0 marks the center of nucleus, and 1 marks nuclear envelope.
rp = summary.getBeadRadialPosition(beads=range(len(summary.idx)*2)) rp_mean = rp.mean(axis=1) rp_hapmean = (rp_mean[:len(summary.idx)]+rp_mean[len(summary.idx):])/2

Getting PDB

Some users might wish to get the coordinates and radii in a PDB format, maybe for visualization purpose. Hence we provide some nice scripts under tool/ directory. Simply execute the following shell command under $PROJECT_DIR/:

$ tools/hms_export.py result/structure/copy0.hms 0.01b copy0.pdb

The script takes 3 arguments (hmsfile, theta_group, and output_name), then a pdb file will be saved and it will look something like this:

ATOM      1  PAM A1  a   1     2899.1    58.6   855.0   218
ATOM      2  PAM A1  a   2     3029.7   286.1  1257.0   244
ATOM      3  PAM A1  a   3     2575.2   106.8  1117.7   202
....      .  ... ..  .   .     ......  ......  ......   ...
....      .  ... ..  .   .     ......  ......  ......   ...
ATOM   1214  QAM BX  w  65    -2206.8   183.8  2465.6   202
ATOM   1215  QAM BX  w  66    -2452.5   434.7  3049.5   238

Note

The 2nd column marks the TADs ids.
PAM and QAM marks the short and long arms of a chromosome, respectively.
CEN marks the centromere representative TAD.
Chromosome homologue (4th column) is labeled as the chromosome name preceeded with A or B, e.g. A1 = the first homolog of chr1, BX = the 2nd homolog of chrX.
The first half of coordinates belong to the first diploid copy, the second half contains the homologues.
Chain name (5th column) is unique for each chromosome (e.g. chains “a” to “w” are for chr1 to chrX, respectively).
The 6th column contains TADs order in a chromosome.
Columns 6-8 record the 3D coordinates.
Column 9 stores the TADs radii.
In human genome 3D models, the nuclear radius is set to 5000 nm.

Getting the structure-population summary file
Instead of analyzing structures from many files, user can also aggregate the final structure population into a file and ignore the intermediate snapshots. Let the final population be in “001b” group, thus the following shell command should be run:
$ tools/hms_to_hss.py result/strcuture/ 1000 0.01b structures_001b.hss
This command outputs a summary file called “structures_001b.hss” which contains all coordinates of the last optimized structure population, their radii, TAD information, optimization scores, etc. At this point, if the user is not interested in the structures at intermediate steps, all structure/copy*.hms files can be deleted to release some disk space.

Warning

Check the content of the summary file (hss) first before deleting the *.hms files!