Statistical Analysisο
EnrichmentModelο
- class EnrichmentModel(category)ο
Calculate the hypergeometric enrichment of genes or diseases in a set of HPO terms
- Parameters
category (str) β Specify
gene,omimororphato determine which enrichments to calculate- Raises
KeyError β Invalid category, only
gene,omimororphaare possible
Examples
from pyhpo import Ontology, Gene, Omim from pyhpo import stats Ontology() model = stats.EnrichmentModel("omim") # use the `model.enrichment` method to calculate # the enrichment of Omim Diseases within an HPOSet
- enrichment(method, hposet)ο
Calculate the enrichment for all genes or diseeases in the HPOSet
- Parameters
method (str) β Currently, only hypergeom is implemented
hposet (
pyhpo.HPOSet) β The set of HPOTerms to use as sampleset for calculation of enrichment. The full ontology is used as background set.
- Returns
a list with dict that contain data about the enrichment, with the keys:
- enrichmentfloat
The hypergeometric enrichment score
- foldfloat
The fold enrichment
- countint
Number of occurrences
- itemGene
pyhpo.Gene,pyhpo.Omimorpyhpo.Orpha The actual enriched gene or disease
- itemGene
- Return type
list[dict]
- Raises
NameError β Ontology not yet constructed
NotImplementedError β invalid
methodprovided, onlyhypergeomis implemented
Examples
from pyhpo import Ontology, Gene, Omim from pyhpo import stats Ontology() model = stats.EnrichmentModel("omim") # you can crate a custom HPOset or use a Gene or Disease term_set = Gene.get("GBA1").hpo_set() enriched_diseases = model.enrichment("hypergeom", term_set) enriched_diseases[0] # >> { # >> "enrichment": 7.708086517543451e-223, # >> "fold": 27.44879391414045, # >> "count": 164, # >> "item": <OmimDisease (608013)> # >> }
Linkageο
- linkage(sets, method, kind, similarity_method, combine)ο
Crate a linkage matrix from a list of
HpoSets to use in dendograms or other hierarchical cluster analyses- Parameters
sets (list[
pyhpo.HPOSet]) β TheHPOSets for which the linkage should be calculatedmethod (str, default:
single) βThe algorithm to use for clustering
Available options:
single : The minimum distance of each clusterβs nodes to the other nodes is used as distance for newly formed clusters. This is also known as the Nearest Point Algorithm.
union : Create a new HpoSet for each cluster based on the union of both combined clusters. This method becomes slow with growing input data
complete : The maximum distance of each clusterβs nodes to the other nodes is used as distance for newly formed clusters. This is also known by the Farthest Point Algorithm or Voor Hees Algorithm.
average : The mean distance of each clusterβs nodes to the other nodes is used as distance for newly formed clusters. This is also called the UPGMA algorithm.
kind (str, default: omim) β
Which kind of information content to use for similarity calculation
Available options:
omim
orpha
gene
similarity_method (str, default graphic) β
The method to use to calculate the similarity between HPOSets.
Available options:
resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)
lin - Lin D, Proceedings of the 15th ICML, (1998)
jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO
jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility
rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)
ic - Information coefficient - Li B, et. al., arXiv, (2010)
graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)
dist - Distance between terms
combine (string, default
funSimAvg) βThe method to combine similarity measures.
Available options:
funSimAvg - Schlicker A, BMC Bioinformatics, (2006)
funSimMax - Schlicker A, BMC Bioinformatics, (2006)
BMA - Deng Y, et. al., PLoS One, (2015)
- Raises
NameError β Ontology not yet constructed
KeyError β Invalid
kindRuntimeError β Invalid
methodorsimilarity_methodorcombine
Examples
import pyhpo from pyhpo import Ontology, HPOSet Ontology() # Using 100 diseases and creating a Tuple of (Disease Name, HPOSet) for each diseases = [(d.name, HPOSet(list(d.hpo)).remove_modifier()) for d in list(Ontology.omim_diseases)[0:100]] # Creating one list with all HPOSets disease_sets = [d[1] for d in diseases[0:100]] # And one list with the names of diseases names = [d[0] for d in diseases[0:100]] # Cluster the diseases using default settings lnk = pyhpo.stats.linkage(disease_sets) # For plotting, you can use `scipy` import scipy scipy.cluster.hierarchy.dendrogram(lnk)