TSINFER with SMARTER test data

TSINFER with SMARTER test data#

I’ve created a test dataset consisting of 10 samples and genotype data for the chromosome 26 of the SMARTER database. I’ve created a nextflow pipeline to prepare a dataset and generate a phased/imputed genotype with Beagle. You can simply test the pipeline using the nextflow test profile: first collect test input file in the data directory:

wget https://github.com/cnr-ibba/nf-treeseq/raw/master/tests/Oar_v3.1_chr26.fna.gz?download= -O data/Oar_v3.1_chr26.fna.gz
wget https://raw.githubusercontent.com/cnr-ibba/nf-treeseq/master/tests/test_dataset.tsv -O data/test_dataset.tsv
wget https://raw.githubusercontent.com/cnr-ibba/nf-treeseq/master/tests/test_outgroup.tsv -O data/test_outgroup.tsv
wget https://github.com/cnr-ibba/nf-treeseq/raw/master/tests/test_dataset.bed?download= -O data/test_dataset.bed
wget https://github.com/cnr-ibba/nf-treeseq/raw/master/tests/test_dataset.bim?download= -O data/test_dataset.bim
wget https://github.com/cnr-ibba/nf-treeseq/raw/master/tests/test_dataset.fam?download= -O data/test_dataset.fam

Then run the pipeline with the test profile:

nextflow run cnr-ibba/nf-treeseq -r v0.2.1 -profile test,singularity --plink_bfile data/test_dataset \
    --plink_keep data/test_dataset.tsv --genome data/Oar_v3.1_chr26.fna.gz \
    --outdir results-estsfs/test --with_estsfs --outgroup1 data/test_outgroup.tsv

Now try to read and determine a tstree object with tsdata:

import json

import tsinfer
import tsdate
import cyvcf2
from tqdm.notebook import tqdm
from tskit import MISSING_DATA

from tskitetude import get_project_dir
from tskitetude.helper import add_populations, add_diploid_individuals, get_ancestors_alleles

Define some useful stuff:

def get_chromosome_lengths(vcf):
    results = {}
    for seqname, seqlen in zip(vcf.seqnames, vcf.seqlens):
        results[seqname] = seqlen

    return results

vcf_location = get_project_dir() / "results-estsfs/test/focal/test_dataset.focal.26.vcf.gz"
samples_location = get_project_dir() / "results-estsfs/test/tsinfer/test_dataset.focal.26.samples"

vcf = cyvcf2.VCF(vcf_location)
chromosome_lengths = get_chromosome_lengths(vcf)

I’ve derived ancient alleles with est-sfs. Try to load data from my results:

ancestors_alleles = get_ancestors_alleles(get_project_dir() / "results-estsfs/test/estsfs/samples-merged.26.ancestral.csv")

Now try to define a custom function to deal with VCF data:

def add_diploid_sites(vcf, samples, ancestors_alleles):
    """
    Read the sites in the vcf and add them to the samples object.
    """
    # You may want to change the following line, e.g. here we allow
    # "*" (a spanning deletion) to be a valid allele state
    allele_chars = set("ATGCatgc*")
    pos = 0
    progressbar = tqdm(total=samples.sequence_length, desc="Read VCF", unit='bp')

    for variant in vcf:  # Loop over variants, each assumed at a unique site
        progressbar.update(variant.POS - pos)

        if pos == variant.POS:
            print(f"Duplicate entries at position {pos}, ignoring all but the first")
            continue

        else:
            pos = variant.POS

        if any([not phased for _, _, phased in variant.genotypes]):
            raise ValueError("Unphased genotypes for variant at position", pos)

        alleles = [variant.REF.upper()] + [v.upper() for v in variant.ALT]
        ancestral_allele = ancestors_alleles.get((variant.CHROM, variant.POS), MISSING_DATA)

        # Check we have ATCG alleles
        for a in alleles:
            if len(set(a) - allele_chars) > 0:
                print(f"Ignoring site at pos {pos}: allele {a} not in {allele_chars}")
                continue

        # Map original allele indexes to their indexes in the new alleles list.
        genotypes = [g for row in variant.genotypes for g in row[0:2]]
        samples.add_site(pos, genotypes, alleles, ancestral_allele=ancestral_allele)

Add individual and populations to empty samples data:

with tsinfer.SampleData(
        path=str(samples_location), sequence_length=chromosome_lengths["26"]) as samples:
    samples_tsv = get_project_dir() / "data/test_dataset.tsv"
    pop_lookup = add_populations(samples_tsv, samples)
    indv_lookup = add_diploid_individuals(samples_tsv, pop_lookup, samples)
    add_diploid_sites(vcf, samples, ancestors_alleles)
/home/cozzip/.cache/pypoetry/virtualenvs/tskitetude-hh-GIRXc-py3.12/lib/python3.12/site-packages/tsinfer/formats.py:530: FutureWarning: The LMDBStore is deprecated and will be removed in a Zarr-Python version 3, see https://github.com/zarr-developers/zarr-python/issues/1274 for more information.
  return zarr.LMDBStore(self.path, subdir=False, map_size=map_size)
/tmp/ipykernel_44052/2889523049.py:1: DeprecationWarning: SampleData is deprecated
  with tsinfer.SampleData(
/home/cozzip/.cache/pypoetry/virtualenvs/tskitetude-hh-GIRXc-py3.12/lib/python3.12/site-packages/tsinfer/formats.py:104: FutureWarning: The LMDBStore is deprecated and will be removed in a Zarr-Python version 3, see https://github.com/zarr-developers/zarr-python/issues/1274 for more information.
  store = zarr.LMDBStore(
print(
    "Sample file created for {} samples ".format(samples.num_samples)
    + "({} individuals) ".format(samples.num_individuals)
    + "with {} variable sites.".format(samples.num_sites),
    flush=True,
)

# Do the inference
sparrow_ts = tsinfer.infer(samples)

print(
    "Inferred tree sequence `{}`: {} trees over {} Mb".format(
        "sparrow_ts", sparrow_ts.num_trees, sparrow_ts.sequence_length / 1e6
    )
)
# Check the metadata
for sample_node_id in sparrow_ts.samples():
    individual_id = sparrow_ts.node(sample_node_id).individual
    population_id = sparrow_ts.node(sample_node_id).population
    print(
        "Node",
        sample_node_id,
        "labels a chr26 sampled from individual",
        json.loads(sparrow_ts.individual(individual_id).metadata),
        "in",
        json.loads(sparrow_ts.population(population_id).metadata),
    )
Sample file created for 18 samples (9 individuals) with 680 variable sites.
2025-12-19 13:25:30,845 - root - INFO - Max encoded genotype matrix size=12.0 KiB
2025-12-19 13:25:30,846 - tsinfer.inference - INFO - Starting addition of 680 sites
2025-12-19 13:25:30,870 - tsinfer.inference - INFO - Finished adding sites
2025-12-19 13:25:30,871 - tsinfer.inference - INFO - Ancestor builder peak RAM: 1.0 MiB
2025-12-19 13:25:30,877 - tsinfer.inference - INFO - Starting build for 525 ancestors
2025-12-19 13:25:30,894 - tsinfer.inference - INFO - Finished building ancestors
2025-12-19 13:25:30,935 - tsinfer.inference - INFO - Mismatch prevented by setting constant high recombination and low mismatch probabilities
2025-12-19 13:25:30,937 - tsinfer.inference - INFO - Summary of recombination probabilities between sites: min=0.01; max=0.01; median=0.01; mean=0.01
2025-12-19 13:25:30,938 - tsinfer.inference - INFO - Summary of mismatch probabilities over sites: min=1e-20; max=1e-20; median=1e-20; mean=1e-20
2025-12-19 13:25:30,938 - tsinfer.inference - INFO - Matching using likelihood_threshold of 1e-13
2025-12-19 13:25:30,941 - tsinfer.inference - INFO - 18 epochs with 30.5 median size.
2025-12-19 13:25:30,942 - tsinfer.inference - INFO - First large (>15250.0) epoch is 18
2025-12-19 13:25:30,942 - tsinfer.inference - INFO - Grouping 527 ancestors by linesweep
2025-12-19 13:25:30,943 - tsinfer.ancestors - INFO - Merged to 85 ancestors in 0.00s
2025-12-19 13:25:30,944 - tsinfer.ancestors - INFO - Built 170 events in 0.00s
2025-12-19 13:25:31,580 - tsinfer.ancestors - INFO - Linesweep generated 955 dependencies in 0.64s
2025-12-19 13:25:32,168 - tsinfer.ancestors - INFO - Found groups in 0.59s
2025-12-19 13:25:32,170 - tsinfer.ancestors - INFO - Un-merged in 0.00s
2025-12-19 13:25:32,171 - tsinfer.ancestors - INFO - 16 groups with median size 36.5
2025-12-19 13:25:32,171 - tsinfer.inference - INFO - Finished grouping into 16 groups in 1.23 seconds
2025-12-19 13:25:32,172 - tsinfer.inference - INFO - Starting ancestor matching for 16 groups
2025-12-19 13:25:32,173 - tsinfer.inference - INFO - Starting group 0 of 16 with 0 ancestors
2025-12-19 13:25:32,175 - tsinfer.inference - INFO - Finished group 0 of 16 in 0.00 seconds
2025-12-19 13:25:32,176 - tsinfer.inference - INFO - Starting group 1 of 16 with 1 ancestors
2025-12-19 13:25:32,216 - tsinfer.inference - INFO - Finished group 1 of 16 in 0.04 seconds
2025-12-19 13:25:32,217 - tsinfer.inference - INFO - Starting group 2 of 16 with 17 ancestors
2025-12-19 13:25:32,225 - tsinfer.inference - INFO - Finished group 2 of 16 in 0.01 seconds
2025-12-19 13:25:32,226 - tsinfer.inference - INFO - Starting group 3 of 16 with 16 ancestors
2025-12-19 13:25:32,234 - tsinfer.inference - INFO - Finished group 3 of 16 in 0.01 seconds
2025-12-19 13:25:32,234 - tsinfer.inference - INFO - Starting group 4 of 16 with 15 ancestors
2025-12-19 13:25:32,242 - tsinfer.inference - INFO - Finished group 4 of 16 in 0.01 seconds
2025-12-19 13:25:32,243 - tsinfer.inference - INFO - Starting group 5 of 16 with 10 ancestors
2025-12-19 13:25:32,250 - tsinfer.inference - INFO - Finished group 5 of 16 in 0.01 seconds
2025-12-19 13:25:32,250 - tsinfer.inference - INFO - Starting group 6 of 16 with 37 ancestors
2025-12-19 13:25:32,261 - tsinfer.inference - INFO - Finished group 6 of 16 in 0.01 seconds
2025-12-19 13:25:32,262 - tsinfer.inference - INFO - Starting group 7 of 16 with 26 ancestors
2025-12-19 13:25:32,272 - tsinfer.inference - INFO - Finished group 7 of 16 in 0.01 seconds
2025-12-19 13:25:32,272 - tsinfer.inference - INFO - Starting group 8 of 16 with 36 ancestors
2025-12-19 13:25:32,283 - tsinfer.inference - INFO - Finished group 8 of 16 in 0.01 seconds
2025-12-19 13:25:32,284 - tsinfer.inference - INFO - Starting group 9 of 16 with 46 ancestors
2025-12-19 13:25:32,296 - tsinfer.inference - INFO - Finished group 9 of 16 in 0.01 seconds
2025-12-19 13:25:32,297 - tsinfer.inference - INFO - Starting group 10 of 16 with 71 ancestors
2025-12-19 13:25:32,313 - tsinfer.inference - INFO - Finished group 10 of 16 in 0.02 seconds
2025-12-19 13:25:32,314 - tsinfer.inference - INFO - Starting group 11 of 16 with 56 ancestors
2025-12-19 13:25:32,328 - tsinfer.inference - INFO - Finished group 11 of 16 in 0.01 seconds
2025-12-19 13:25:32,328 - tsinfer.inference - INFO - Starting group 12 of 16 with 50 ancestors
2025-12-19 13:25:32,342 - tsinfer.inference - INFO - Finished group 12 of 16 in 0.01 seconds
2025-12-19 13:25:32,343 - tsinfer.inference - INFO - Starting group 13 of 16 with 49 ancestors
2025-12-19 13:25:32,356 - tsinfer.inference - INFO - Finished group 13 of 16 in 0.01 seconds
2025-12-19 13:25:32,357 - tsinfer.inference - INFO - Starting group 14 of 16 with 54 ancestors
2025-12-19 13:25:32,373 - tsinfer.inference - INFO - Finished group 14 of 16 in 0.02 seconds
2025-12-19 13:25:32,374 - tsinfer.inference - INFO - Starting group 15 of 16 with 42 ancestors
2025-12-19 13:25:32,387 - tsinfer.inference - INFO - Finished group 15 of 16 in 0.01 seconds
2025-12-19 13:25:32,399 - tsinfer.inference - INFO - Built ancestors tree sequence: 546 nodes (19 pc ancestors); 2018 edges; 527 sites; 527 mutations
2025-12-19 13:25:32,400 - tsinfer.inference - INFO - Finished ancestor matching
2025-12-19 13:25:32,404 - tsinfer.inference - INFO - Mismatch prevented by setting constant high recombination and low mismatch probabilities
2025-12-19 13:25:32,405 - tsinfer.inference - INFO - Summary of recombination probabilities between sites: min=0.01; max=0.01; median=0.01; mean=0.01
2025-12-19 13:25:32,406 - tsinfer.inference - INFO - Summary of mismatch probabilities over sites: min=1e-20; max=1e-20; median=1e-20; mean=1e-20
2025-12-19 13:25:32,407 - tsinfer.inference - INFO - Matching using likelihood_threshold of 1e-13
2025-12-19 13:25:32,432 - tsinfer.inference - INFO - Loaded 18 samples 546 nodes; 2018 edges; 527 sites; 527 mutations
2025-12-19 13:25:32,433 - tsinfer.inference - INFO - Started matching for 18 samples
2025-12-19 13:25:32,436 - tsinfer.inference - INFO - 1766147132.4364045Thread 140329576445760 starting haplotype 0
2025-12-19 13:25:32,437 - tsinfer.inference - INFO - 1766147132.4379387Thread 140329576445760 finished haplotype 0
2025-12-19 13:25:32,438 - tsinfer.inference - INFO - 1766147132.438609Thread 140329576445760 starting haplotype 1
2025-12-19 13:25:32,440 - tsinfer.inference - INFO - 1766147132.440556Thread 140329576445760 finished haplotype 1
2025-12-19 13:25:32,441 - tsinfer.inference - INFO - 1766147132.441191Thread 140329576445760 starting haplotype 2
2025-12-19 13:25:32,442 - tsinfer.inference - INFO - 1766147132.4428568Thread 140329576445760 finished haplotype 2
2025-12-19 13:25:32,443 - tsinfer.inference - INFO - 1766147132.4435833Thread 140329576445760 starting haplotype 3
2025-12-19 13:25:32,445 - tsinfer.inference - INFO - 1766147132.4451828Thread 140329576445760 finished haplotype 3
2025-12-19 13:25:32,445 - tsinfer.inference - INFO - 1766147132.4458532Thread 140329576445760 starting haplotype 4
2025-12-19 13:25:32,447 - tsinfer.inference - INFO - 1766147132.4478834Thread 140329576445760 finished haplotype 4
2025-12-19 13:25:32,448 - tsinfer.inference - INFO - 1766147132.4487512Thread 140329576445760 starting haplotype 5
2025-12-19 13:25:32,449 - tsinfer.inference - INFO - 1766147132.4499621Thread 140329576445760 finished haplotype 5
2025-12-19 13:25:32,450 - tsinfer.inference - INFO - 1766147132.4507165Thread 140329576445760 starting haplotype 6
2025-12-19 13:25:32,452 - tsinfer.inference - INFO - 1766147132.4523108Thread 140329576445760 finished haplotype 6
2025-12-19 13:25:32,452 - tsinfer.inference - INFO - 1766147132.4529233Thread 140329576445760 starting haplotype 7
2025-12-19 13:25:32,454 - tsinfer.inference - INFO - 1766147132.4545383Thread 140329576445760 finished haplotype 7
2025-12-19 13:25:32,455 - tsinfer.inference - INFO - 1766147132.455192Thread 140329576445760 starting haplotype 8
2025-12-19 13:25:32,456 - tsinfer.inference - INFO - 1766147132.4567995Thread 140329576445760 finished haplotype 8
2025-12-19 13:25:32,457 - tsinfer.inference - INFO - 1766147132.4574382Thread 140329576445760 starting haplotype 9
2025-12-19 13:25:32,458 - tsinfer.inference - INFO - 1766147132.4589832Thread 140329576445760 finished haplotype 9
2025-12-19 13:25:32,459 - tsinfer.inference - INFO - 1766147132.459632Thread 140329576445760 starting haplotype 10
2025-12-19 13:25:32,461 - tsinfer.inference - INFO - 1766147132.4612594Thread 140329576445760 finished haplotype 10
2025-12-19 13:25:32,461 - tsinfer.inference - INFO - 1766147132.461925Thread 140329576445760 starting haplotype 11
2025-12-19 13:25:32,463 - tsinfer.inference - INFO - 1766147132.463466Thread 140329576445760 finished haplotype 11
2025-12-19 13:25:32,464 - tsinfer.inference - INFO - 1766147132.4640374Thread 140329576445760 starting haplotype 12
2025-12-19 13:25:32,465 - tsinfer.inference - INFO - 1766147132.4659064Thread 140329576445760 finished haplotype 12
2025-12-19 13:25:32,466 - tsinfer.inference - INFO - 1766147132.466536Thread 140329576445760 starting haplotype 13
2025-12-19 13:25:32,468 - tsinfer.inference - INFO - 1766147132.468093Thread 140329576445760 finished haplotype 13
2025-12-19 13:25:32,468 - tsinfer.inference - INFO - 1766147132.4687457Thread 140329576445760 starting haplotype 14
2025-12-19 13:25:32,470 - tsinfer.inference - INFO - 1766147132.4702985Thread 140329576445760 finished haplotype 14
2025-12-19 13:25:32,470 - tsinfer.inference - INFO - 1766147132.4709408Thread 140329576445760 starting haplotype 15
2025-12-19 13:25:32,474 - tsinfer.inference - INFO - 1766147132.4741087Thread 140329576445760 finished haplotype 15
2025-12-19 13:25:32,475 - tsinfer.inference - INFO - 1766147132.4751089Thread 140329576445760 starting haplotype 16
2025-12-19 13:25:32,476 - tsinfer.inference - INFO - 1766147132.4767797Thread 140329576445760 finished haplotype 16
2025-12-19 13:25:32,477 - tsinfer.inference - INFO - 1766147132.4774344Thread 140329576445760 starting haplotype 17
2025-12-19 13:25:32,479 - tsinfer.inference - INFO - 1766147132.4790552Thread 140329576445760 finished haplotype 17
2025-12-19 13:25:32,479 - tsinfer.inference - INFO - Finished matching for all samples in 0.04 seconds
2025-12-19 13:25:32,480 - tsinfer.inference - INFO - Inserting sample paths: 1184 edges in total
2025-12-19 13:25:32,483 - tsinfer.inference - INFO - Finalising tree sequence
2025-12-19 13:25:32,496 - tsinfer.inference - INFO - Built samples tree sequence: 573 nodes (28 pc); 3199 edges; 527 sites; 527 mutations
2025-12-19 13:25:32,497 - tsinfer.inference - INFO - Mapping additional sites
2025-12-19 13:25:32,515 - tsinfer.inference - INFO - Removing the oldest edge to detach the virtual-root-like ancestor
2025-12-19 13:25:32,518 - tsinfer.inference - INFO - Located the all zeros ultimate ancestor
2025-12-19 13:25:32,522 - tsinfer.inference - INFO - Splitting ultimate ancestor into 325 nodes
2025-12-19 13:25:32,535 - tsinfer.inference - INFO - Erased flanks covering 0.5445215377117799% of the genome: 166516.0 units at the start and 73497.0 units at the end
2025-12-19 13:25:32,536 - tsinfer.inference - INFO - Simplifying with filter_sites=False, filter_populations=False, filter_individuals=False, and keep_unary=True on 898 nodes and 4330 edges
2025-12-19 13:25:32,538 - tsinfer.inference - INFO - Finished simplify; now have 894 nodes and 3578 edges
Inferred tree sequence `sparrow_ts`: 506 trees over 44.077779 Mb
Node 0 labels a chr26 sampled from individual {'sample_id': 'UYOA-TEX-000000001'} in {'breed': 'TEX'}
Node 1 labels a chr26 sampled from individual {'sample_id': 'UYOA-TEX-000000001'} in {'breed': 'TEX'}
Node 2 labels a chr26 sampled from individual {'sample_id': 'GROA-FRZ-000000170'} in {'breed': 'FRZ'}
Node 3 labels a chr26 sampled from individual {'sample_id': 'GROA-FRZ-000000170'} in {'breed': 'FRZ'}
Node 4 labels a chr26 sampled from individual {'sample_id': 'UYOA-MER-000000224'} in {'breed': 'MER'}
Node 5 labels a chr26 sampled from individual {'sample_id': 'UYOA-MER-000000224'} in {'breed': 'MER'}
Node 6 labels a chr26 sampled from individual {'sample_id': 'UYOA-CRR-000000320'} in {'breed': 'CRR'}
Node 7 labels a chr26 sampled from individual {'sample_id': 'UYOA-CRR-000000320'} in {'breed': 'CRR'}
Node 8 labels a chr26 sampled from individual {'sample_id': 'UYOA-CRL-000000380'} in {'breed': 'CRL'}
Node 9 labels a chr26 sampled from individual {'sample_id': 'UYOA-CRL-000000380'} in {'breed': 'CRL'}
Node 10 labels a chr26 sampled from individual {'sample_id': 'FROA-BER-000000478'} in {'breed': 'BER'}
Node 11 labels a chr26 sampled from individual {'sample_id': 'FROA-BER-000000478'} in {'breed': 'BER'}
Node 12 labels a chr26 sampled from individual {'sample_id': 'ITOA-ALT-000001084'} in {'breed': 'ALT'}
Node 13 labels a chr26 sampled from individual {'sample_id': 'ITOA-ALT-000001084'} in {'breed': 'ALT'}
Node 14 labels a chr26 sampled from individual {'sample_id': 'ITOA-LEC-000002620'} in {'breed': 'LEC'}
Node 15 labels a chr26 sampled from individual {'sample_id': 'ITOA-LEC-000002620'} in {'breed': 'LEC'}
Node 16 labels a chr26 sampled from individual {'sample_id': 'ITOA-SAB-000003324'} in {'breed': 'SAB'}
Node 17 labels a chr26 sampled from individual {'sample_id': 'ITOA-SAB-000003324'} in {'breed': 'SAB'}

Try to infer dates on my tree:

# Removes unary nodes (currently required in tsdate), keeps historical-only sites
inferred_ts = tsdate.preprocess_ts(sparrow_ts.simplify(), filter_sites=False)
dated_ts = tsdate.date(inferred_ts, method="inside_outside", mutation_rate=1e-8, Ne=1e4)

dated_ts
2025-12-19 13:25:32,552 - tsdate.util - INFO - Beginning preprocessing
2025-12-19 13:25:32,553 - tsdate.util - INFO - Minimum_gap: None and erase_flanks: None
2025-12-19 13:25:32,555 - tsdate.util - INFO - REMOVING TELOMERE: Snip topology from 0 to first site at 166515.0.
2025-12-19 13:25:32,556 - tsdate.util - INFO - REMOVING TELOMERE: Snip topology from 44004282.0 to end of sequence at 44077779.0.
2025-12-19 13:25:34,187 - tsdate.core - INFO - Inserted node and mutation metadata in 0.03345775604248047 seconds
2025-12-19 13:25:34,188 - root - INFO - Modified ages of 38 nodes to satisfy constraints
2025-12-19 13:25:34,189 - tsdate.core - INFO - Constrained node ages in 0.00 seconds
2025-12-19 13:25:34,192 - root - INFO - Set ages of 0 nonsegregating mutations to root times.
Tree Sequence
Trees505
Sequence Length44 077 779
Time Unitsgenerations
Sample Nodes18
Total Size341.7 KiB
Metadata
dict
Table Rows Size Has Metadata
Edges 4 208 131.5 KiB
Individuals 9 591 Bytes
Migrations 0 8 Bytes
Mutations 968 35.0 KiB
Nodes 1 037 105.2 KiB
Populations 9 224 Bytes
Provenances 4 2.8 KiB
Sites 660 33.5 KiB
Provenance Timestamp Software Name Version Command Full record
19 December, 2025 at 01:25:34 PM tsdate 0.2.4 inside_outside
Details
dict schema_version: 1.0.0
software:
dict name: tsdate
version: 0.2.4

parameters:
dict mutation_rate: 1e-08
recombination_rate: None
time_units: None
progress: None
population_size: 10000.0
eps: 1e-10
outside_standardize: True
ignore_oldest_root: False
probability_space: logarithmic
num_threads: None
cache_inside: False
command: inside_outside

environment:
dict
os:
dict system: Linux
node: node1
release: 5.15.0-58-generic
version: #64-Ubuntu SMP Thu Jan 5
11:43:13 UTC 2023
machine: x86_64

python:
dict implementation: CPython
version: 3.12.12

libraries:
dict
tskit:
dict version: 1.0.0b3



resources:
dict elapsed_time: 1.619746208190918
user_time: 62.79
sys_time: 3.44
max_memory: 683495424

19 December, 2025 at 01:25:32 PM tsdate 0.2.4 preprocess_ts
Details
dict schema_version: 1.0.0
software:
dict name: tsdate
version: 0.2.4

parameters:
dict minimum_gap: 1000000
erase_flanks: True
split_disjoint: True
filter_populations: False
filter_individuals: False
filter_sites: False
delete_intervals:
list
list 0
166515.0

list 44004282.0
44077779.0


command: preprocess_ts

environment:
dict
os:
dict system: Linux
node: node1
release: 5.15.0-58-generic
version: #64-Ubuntu SMP Thu Jan 5
11:43:13 UTC 2023
machine: x86_64

python:
dict implementation: CPython
version: 3.12.12

libraries:
dict
tskit:
dict version: 1.0.0b3



resources:
dict elapsed_time: 0.019176483154296875
user_time: 61.18
sys_time: 3.42
max_memory: 682795008

19 December, 2025 at 01:25:32 PM tskit 1.0.0b3 simplify
Details
dict schema_version: 1.0.0
software:
dict name: tskit
version: 1.0.0b3

parameters:
dict command: simplify
TODO: add simplify parameters

environment:
dict
os:
dict system: Linux
node: node1
release: 5.15.0-58-generic
version: #64-Ubuntu SMP Thu Jan 5
11:43:13 UTC 2023
machine: x86_64

python:
dict implementation: CPython
version: 3.12.12

libraries:
dict
kastore:
dict version: 2.1.1



19 December, 2025 at 01:25:32 PM tsinfer 0.5.0 infer
Details
dict schema_version: 1.0.0
software:
dict name: tsinfer
version: 0.5.0

parameters:
dict mismatch_ratio: None
path_compression: True
precision: None
post_process: None
command: infer

environment:
dict
libraries:
dict
zarr:
dict version: 2.18.7

numcodecs:
dict version: 0.15.0

lmdb:
dict version: 1.7.5

tskit:
dict version: 1.0.0b3


os:
dict system: Linux
node: node1
release: 5.15.0-58-generic
version: #64-Ubuntu SMP Thu Jan 5
11:43:13 UTC 2023
machine: x86_64

python:
dict implementation: CPython
version:
list 3
12
12



resources:
dict elapsed_time: 1.6952054277062416
user_time: 1.519999999999996
sys_time: 0.16999999999999993
max_memory: 682426368

To cite this software, please consult the citation manual: https://tskit.dev/citation/