Helper functions

For a lack of a better name, hpo3 comes with a helper submodule that contains some methods that fully utilize Rust’s multithreading for batchwise large operations. This is especially useful for large set data analysis.

Methods

batch_similarity(comparisons, kind, method)

Calculate similarity between HPOTerm in batches

This method runs parallelized on all avaible CPU

Parameters

comparisons (list[tuple[pyhpo.HPOTerm, pyhpo.HPOTerm]]) – A list of HPOTerm tuples. The two HPOTerm within one tuple will be compared to each other.
kind (str, default: omim) –
Which kind of information content to use for similarity calculation

Available options:
- omim
- gene
method (str, default graphic) –
The method to use to calculate the similarity.

Available options:
- resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)
- lin - Lin D, Proceedings of the 15th ICML, (1998)
- jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO
- jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility
- rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)
- ic - Information coefficient - Li B, et. al., arXiv, (2010)
- graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)
- dist - Distance between terms

Returns

The similarity scores of each comparison

Return type

list[float]

Raises

KeyError – Invalid kind provided
RuntimeError – Invalid method

Examples

import itertools
from pyhpo import Ontology, HPOSet, helper

Ontology()

terms = [t for t in Ontology]
term_combinations = [(a[0], a[1]) for a in itertools.combinations(terms,2)]
similarities = helper.batch_similarity(term_combinations[0:10000], kind="omim", method="graphic")

batch_set_similarity(comparisons, kind, method, combine)

Calculate similarity between HPOSet in batches

This method runs parallelized on all avaible CPU

Parameters

comparisons (list[tuple[pyhpo.HPOSet, pyhpo.HPOSet]]) – A list of HPOSet tuples. The two HPOSet within one tuple will be compared to each other.
kind (str, default: omim) –
Which kind of information content to use for similarity calculation

Available options:
- omim
- gene
method (str, default graphic) –
The method to use to calculate the similarity.

Available options:
- resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)
- lin - Lin D, Proceedings of the 15th ICML, (1998)
- jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO
- jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility
- rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)
- ic - Information coefficient - Li B, et. al., arXiv, (2010)
- graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)
- dist - Distance between terms

Returns

The similarity scores of each comparison

Return type

list[float]

Raises

NameError – Ontology not yet constructed
KeyError – Invalid kind provided
RuntimeError – Invalid method or combine

Examples

import itertools
from pyhpo import Ontology, HPOSet, helper

Ontology()

gene_sets = [g.hpo_set() for g in Ontology.genes]
gene_set_combinations = [(a[0], a[1]) for a in itertools.combinations(gene_sets,2)]
similarities = helper.batch_set_similarity(gene_set_combinations[0:100], kind="omim", method="graphic", combine = "funSimAvg")

batch_disease_enrichment(hposets)

Calculate enriched diseases in a list of HPOSet

This method runs parallelized on all avaible CPU

Calculate the hypergeometric enrichment of diseases associated to the terms of each set. Each set is calculated individually, the returning list has the same order as the input data.

Parameters: hposets (list[pyhpo.HPOSet]) – A list of HPOSets. The enrichment of all diseases is calculated separately for each HPOset in the list
Returns: The enrichment result for every disease. See pyhpo.stats.EnrichmentModel.enrichment() for details
Return type: list[dict]
Raises: NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology, helper

Ontology()

genes = [g for g in Ontology.genes[0:100]]
gene_sets = [g.hpo_set() for g in genes]
enrichments = helper.batch_disease_enrichment(gene_sets)

for (gene, enriched_diseases) in zip(genes, enrichments):
    print(
        "The top enriched diseases for {} are: {}".format(
            gene.name,
            ", ".join([f"{disease['item'].name}, ({disease['enrichment']})" for disease in enriched_diseases[0:5]])
        )
    )

# >>> The top enriched diseases for C7 are: C7 deficiency, (3.6762699175625894e-42), C6 deficiency, (3.782313673973149e-37), C5 deficiency, (2.6614254464758174e-33), Complement factor B deficiency, (4.189056541495023e-32), Complement component 8 deficiency, type II, (8.87368759499919e-32)
# >>> The top enriched diseases for WNT5A are: Robinow syndrome, autosomal recessive, (0.0), Robinow syndrome, autosomal dominant 1, (0.0), Pallister-Killian syndrome, (1.2993558687813034e-238), Robinow syndrome, autosomal dominant 3, (1.2014167106834296e-223), Peters-plus syndrome, (2.5163107554882648e-216)
# >>> The top enriched diseases for TYMS are: Dyskeratosis congenita, X-linked, (5.008058437787544e-192), Dyskeratosis congenita, digenic, (2.703378203105612e-184), Dyskeratosis congenita, autosomal dominant 2, (1.3109083102058795e-150), Bloom syndrome, (3.965926308699221e-141), Dyskeratosis congenita, autosomal dominant 3, (1.123439117889186e-131)

batch_gene_enrichment(hposets)

Calculate enriched genes in a list of HPOSet

This method runs parallelized on all avaible CPU

Calculate hypergeometric enrichment of genes associated to the terms of each set. Each set is calculated individually, the returning list has the same order as the input data.

Parameters: hposets (list[pyhpo.HPOSet]) – A list of HPOSets. The enrichment of all genes is calculated separately for each HPOset in the list
Returns: The enrichment result for every gene. See pyhpo.stats.EnrichmentModel.enrichment() for details
Return type: list[dict]
Raises: NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology, helper

Ontology()

diseases = [d for d in Ontology.omim_diseases[0:100]]
disease_sets = [d.hpo_set() for d in diseases]
enrichments = helper.batch_gene_enrichment(disease_sets)

for (disease, enriched_genes) in zip(diseases, enrichments):
    print(
        "The top enriched genes for {} are: {}".format(
            disease.name,
            ", ".join([f"{gene['item'].name}, ({gene['enrichment']})" for gene in enriched_genes[0:5]])
        )
    )

# >>> The top enriched genes for Immunodeficiency 85 and autoimmunity are: TOM1, (7.207370728788139e-45), PIK3CD, (1.9560156243742087e-17), IL2RG, (1.0000718026169596e-16), BACH2, (3.373013104581288e-15), IL6ST, (3.760565282680126e-15)
# >>> The top enriched genes for CODAS syndrome are: LONP1, (4.209128613268585e-80), EXTL3, (5.378742851736401e-23), SMC1A, (5.338807361962185e-22), FLNA, (1.0968887647112733e-21), COL2A1, (1.1029731783630839e-21)
# >>> The top enriched genes for Rhizomelic chondrodysplasia punctata, type 1 are: PEX7, (9.556919089648523e-54), PEX5, (7.030392607093173e-22), PEX1, (3.7973830291601626e-19), PEX11B, (4.318791413029623e-19), HSPG2, (7.108950838424571e-19)
# >>> The top enriched genes for Oculopharyngodistal myopathy 4 are: RILPL1, (1.4351489331895004e-49), LRP12, (2.168165858699749e-30), GIPC1, (3.180801819975307e-27), NOTCH2NLC, (1.0700847991253517e-23), VCP, (2.8742020666947536e-20)