Helper functionsο
For a lack of a better name, hpo3 comes with a helper submodule that contains
some methods that fully utilize Rustβs multithreading for batchwise large operations.
This is especially useful for large set data analysis.
Methodsο
- batch_similarity(comparisons, kind, method)ο
Calculate similarity between
HPOTermin batchesThis method runs parallelized on all avaible CPU
- Parameters
comparisons (list[tuple[
pyhpo.HPOTerm,pyhpo.HPOTerm]]) β A list ofHPOTermtuples. The twoHPOTermwithin one tuple will be compared to each other.kind (str, default:
omim) βWhich kind of information content to use for similarity calculation
Available options:
omim
orpha
gene
method (str, default
graphic) βThe method to use to calculate the similarity.
Available options:
resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)
lin - Lin D, Proceedings of the 15th ICML, (1998)
jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO
jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility
rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)
ic - Information coefficient - Li B, et. al., arXiv, (2010)
graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)
dist - Distance between terms
- Returns
The similarity scores of each comparison
- Return type
list[float]
- Raises
KeyError β Invalid
kindprovidedRuntimeError β Invalid
method
Examples
import itertools from pyhpo import Ontology, HPOSet, helper Ontology() terms = [t for t in Ontology] term_combinations = [(a[0], a[1]) for a in itertools.combinations(terms,2)] similarities = helper.batch_similarity(term_combinations[0:10000], kind="omim", method="graphic")
- batch_set_similarity(comparisons, kind, method, combine)ο
Calculate similarity between
HPOSetin batchesThis method runs parallelized on all avaible CPU
- Parameters
comparisons (list[tuple[
pyhpo.HPOSet,pyhpo.HPOSet]]) β A list ofHPOSettuples. The twoHPOSetwithin one tuple will be compared to each other.kind (str, default:
omim) βWhich kind of information content to use for similarity calculation
Available options:
omim
orpha
gene
method (str, default
graphic) βThe method to use to calculate the similarity.
Available options:
resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)
lin - Lin D, Proceedings of the 15th ICML, (1998)
jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO
jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility
rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)
ic - Information coefficient - Li B, et. al., arXiv, (2010)
graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)
dist - Distance between terms
- Returns
The similarity scores of each comparison
- Return type
list[float]
- Raises
NameError β Ontology not yet constructed
KeyError β Invalid
kindprovidedRuntimeError β Invalid
methodorcombine
Examples
import itertools from pyhpo import Ontology, HPOSet, helper Ontology() gene_sets = [g.hpo_set() for g in Ontology.genes] gene_set_combinations = [(a[0], a[1]) for a in itertools.combinations(gene_sets,2)] similarities = helper.batch_set_similarity(gene_set_combinations[0:100], kind="omim", method="graphic", combine = "funSimAvg")
- batch_disease_enrichment(hposets)ο
Deprecated since 1.3.0
Use
pyhpo.helper.batch_omim_disease_enrichment()orpyhpo.helper.batch_orpha_disease_enrichment()instead
- batch_omim_disease_enrichment(hposets)ο
Calculate enriched Omim diseases in a list of
HPOSetThis method runs parallelized on all avaible CPU
Calculate the hypergeometric enrichment of Omim diseases associated to the terms of each set. Each set is calculated individually, the returning list has the same order as the input data.
- Parameters
hposets (list[
pyhpo.HPOSet]) β A list of HPOSets. The enrichment of all diseases is calculated separately for each HPOset in the list- Returns
The enrichment result for every disease. See
pyhpo.stats.EnrichmentModel.enrichment()for details- Return type
list[dict]
- Raises
NameError β Ontology not yet constructed
Examples
from pyhpo import Ontology, helper Ontology() genes = [g for g in Ontology.genes[0:100]] gene_sets = [g.hpo_set() for g in genes] enrichments = helper.batch_omim_disease_enrichment(gene_sets) for (gene, enriched_diseases) in zip(genes, enrichments): print( "The top enriched diseases for {} are: {}".format( gene.name, ", ".join([f"{disease['item'].name}, ({disease['enrichment']})" for disease in enriched_diseases[0:5]]) ) ) # >>> The top enriched diseases for C7 are: C7 deficiency, (3.6762699175625894e-42), C6 deficiency, (3.782313673973149e-37), C5 deficiency, (2.6614254464758174e-33), Complement factor B deficiency, (4.189056541495023e-32), Complement component 8 deficiency, type II, (8.87368759499919e-32) # >>> The top enriched diseases for WNT5A are: Robinow syndrome, autosomal recessive, (0.0), Robinow syndrome, autosomal dominant 1, (0.0), Pallister-Killian syndrome, (1.2993558687813034e-238), Robinow syndrome, autosomal dominant 3, (1.2014167106834296e-223), Peters-plus syndrome, (2.5163107554882648e-216) # >>> The top enriched diseases for TYMS are: Dyskeratosis congenita, X-linked, (5.008058437787544e-192), Dyskeratosis congenita, digenic, (2.703378203105612e-184), Dyskeratosis congenita, autosomal dominant 2, (1.3109083102058795e-150), Bloom syndrome, (3.965926308699221e-141), Dyskeratosis congenita, autosomal dominant 3, (1.123439117889186e-131)
- batch_orpha_disease_enrichment(hposets)ο
Calculate enriched Orpha diseases in a list of
HPOSetThis method runs parallelized on all avaible CPU
Calculate the hypergeometric enrichment of Orpha diseases associated to the terms of each set. Each set is calculated individually, the returning list has the same order as the input data.
- Parameters
hposets (list[
pyhpo.HPOSet]) β A list of HPOSets. The enrichment of all diseases is calculated separately for each HPOset in the list- Returns
The enrichment result for every disease. See
pyhpo.stats.EnrichmentModel.enrichment()for details- Return type
list[dict]
- Raises
NameError β Ontology not yet constructed
Examples
from pyhpo import Ontology, helper Ontology() genes = [g for g in Ontology.genes[0:100]] gene_sets = [g.hpo_set() for g in genes] enrichments = helper.batch_orpha_disease_enrichment(gene_sets) for (gene, enriched_diseases) in zip(genes, enrichments): print( "The top enriched diseases for {} are: {}".format( gene.name, ", ".join([f"{disease['item'].name}, ({disease['enrichment']})" for disease in enriched_diseases[0:5]]) ) ) # >>> The top enriched diseases for C7 are: C7 deficiency, (3.6762699175625894e-42), C6 deficiency, (3.782313673973149e-37), C5 deficiency, (2.6614254464758174e-33), Complement factor B deficiency, (4.189056541495023e-32), Complement component 8 deficiency, type II, (8.87368759499919e-32) # >>> The top enriched diseases for WNT5A are: Robinow syndrome, autosomal recessive, (0.0), Robinow syndrome, autosomal dominant 1, (0.0), Pallister-Killian syndrome, (1.2993558687813034e-238), Robinow syndrome, autosomal dominant 3, (1.2014167106834296e-223), Peters-plus syndrome, (2.5163107554882648e-216) # >>> The top enriched diseases for TYMS are: Dyskeratosis congenita, X-linked, (5.008058437787544e-192), Dyskeratosis congenita, digenic, (2.703378203105612e-184), Dyskeratosis congenita, autosomal dominant 2, (1.3109083102058795e-150), Bloom syndrome, (3.965926308699221e-141), Dyskeratosis congenita, autosomal dominant 3, (1.123439117889186e-131)
- batch_gene_enrichment(hposets)ο
Calculate enriched genes in a list of
HPOSetThis method runs parallelized on all avaible CPU
Calculate hypergeometric enrichment of genes associated to the terms of each set. Each set is calculated individually, the returning list has the same order as the input data.
- Parameters
hposets (list[
pyhpo.HPOSet]) β A list of HPOSets. The enrichment of all genes is calculated separately for each HPOset in the list- Returns
The enrichment result for every gene. See
pyhpo.stats.EnrichmentModel.enrichment()for details- Return type
list[dict]
- Raises
NameError β Ontology not yet constructed
Examples
from pyhpo import Ontology, helper Ontology() diseases = [d for d in Ontology.omim_diseases[0:100]] disease_sets = [d.hpo_set() for d in diseases] enrichments = helper.batch_gene_enrichment(disease_sets) for (disease, enriched_genes) in zip(diseases, enrichments): print( "The top enriched genes for {} are: {}".format( disease.name, ", ".join([f"{gene['item'].name}, ({gene['enrichment']})" for gene in enriched_genes[0:5]]) ) ) # >>> The top enriched genes for Immunodeficiency 85 and autoimmunity are: TOM1, (7.207370728788139e-45), PIK3CD, (1.9560156243742087e-17), IL2RG, (1.0000718026169596e-16), BACH2, (3.373013104581288e-15), IL6ST, (3.760565282680126e-15) # >>> The top enriched genes for CODAS syndrome are: LONP1, (4.209128613268585e-80), EXTL3, (5.378742851736401e-23), SMC1A, (5.338807361962185e-22), FLNA, (1.0968887647112733e-21), COL2A1, (1.1029731783630839e-21) # >>> The top enriched genes for Rhizomelic chondrodysplasia punctata, type 1 are: PEX7, (9.556919089648523e-54), PEX5, (7.030392607093173e-22), PEX1, (3.7973830291601626e-19), PEX11B, (4.318791413029623e-19), HSPG2, (7.108950838424571e-19) # >>> The top enriched genes for Oculopharyngodistal myopathy 4 are: RILPL1, (1.4351489331895004e-49), LRP12, (2.168165858699749e-30), GIPC1, (3.180801819975307e-27), NOTCH2NLC, (1.0700847991253517e-23), VCP, (2.8742020666947536e-20)