Statistical Analysis
EnrichmentModel
- class EnrichmentModel(category)
Calculate the hypergeometric enrichment of genes or diseases in a set of HPO terms
- Parameters
category (str) – Specify
gene
oromim
to determine which enrichments to calculate- Raises
KeyError – Invalid category, only
gene
oromim
are possible
Examples
from pyhpo import Ontology, Gene, Omim from pyhpo import stats Ontology() model = stats.EnrichmentModel("omim") # use the `model.enrichment` method to calculate # the enrichment of Omim Diseases within an HPOSet
- enrichment(method, hposet)
Calculate the enrichment for all genes or diseeases in the HPOSet
- Parameters
method (str) – Currently, only hypergeom is implemented
hposet (
pyhpo.HPOSet
) – The set of HPOTerms to use as sampleset for calculation of enrichment. The full ontology is used as background set.
- Returns
a list with dict that contain data about the enrichment, with the keys:
- enrichmentfloat
The hypergeometric enrichment score
- foldfloat
The fold enrichment
- countint
Number of occurrences
- itemGene
pyhpo.Gene
orpyhpo.Omim
The actual enriched gene or disease
- itemGene
- Return type
list[dict]
- Raises
NameError – Ontology not yet constructed
NotImplementedError – invalid
method
provided, onlyhypergeom
is implemented
Examples
from pyhpo import Ontology, Gene, Omim from pyhpo import stats Ontology() model = stats.EnrichmentModel("omim") # you can crate a custom HPOset or use a Gene or Disease term_set = Gene.get("GBA1").hpo_set() enriched_diseases = model.enrichment("hypergeom", term_set) enriched_diseases[0] # >> { # >> "enrichment": 7.708086517543451e-223, # >> "fold": 27.44879391414045, # >> "count": 164, # >> "item": <OmimDisease (608013)> # >> }
Linkage
- linkage(sets, method, kind, similarity_method, combine)
Crate a linkage matrix from a list of
HpoSet
s to use in dendograms or other hierarchical cluster analyses- Parameters
sets (list[
pyhpo.HPOSet
]) – TheHPOSet
s for which the linkage should be calculatedmethod (str, default:
single
) –The algorithm to use for clustering
Available options:
single : The minimum distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also known as the Nearest Point Algorithm.
union : Create a new HpoSet for each cluster based on the union of both combined clusters. This method becomes slow with growing input data
complete : The maximum distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also known by the Farthest Point Algorithm or Voor Hees Algorithm.
average : The mean distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also called the UPGMA algorithm.
kind (str, default: omim) –
Which kind of information content to use for similarity calculation
Available options:
omim
gene
similarity_method (str, default graphic) –
The method to use to calculate the similarity between HPOSets.
Available options:
resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)
lin - Lin D, Proceedings of the 15th ICML, (1998)
jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO
jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility
rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)
ic - Information coefficient - Li B, et. al., arXiv, (2010)
graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)
dist - Distance between terms
combine (string, default
funSimAvg
) –The method to combine similarity measures.
Available options:
funSimAvg - Schlicker A, BMC Bioinformatics, (2006)
funSimMax - Schlicker A, BMC Bioinformatics, (2006)
BMA - Deng Y, et. al., PLoS One, (2015)
- Raises
NameError – Ontology not yet constructed
KeyError – Invalid
kind
RuntimeError – Invalid
method
orsimilarity_method
orcombine
Examples
import pyhpo Ontology() # Using 100 diseases and creating a Tuple of (Disease Name, HPOSet) for each diseases = [(d.name, HPOSet(list(d.hpo)).remove_modifier()) for d in list(Ontology.omim_diseases)[0:100]] # Creating one list with all HPOSets disease_sets = [d[1] for d in diseases[0:100]] # And one list with the names of diseases names = [d[0] for d in diseases[0:100]] # Cluster the diseases using default settings lnk = pyhpo.stats.linkage(disease_sets) # For plotting, you can use `scipy` import scipy scipy.cluster.hierarchy.dendrogram(lnk)