Statistical Analysis

EnrichmentModel

class EnrichmentModel(category)

Calculate the hypergeometric enrichment of genes or diseases in a set of HPO terms

Parameters: category (str) – Specify gene, omim or orpha to determine which enrichments to calculate
Raises: KeyError – Invalid category, only gene, omim or orpha are possible

Examples

from pyhpo import Ontology, Gene, Omim
from pyhpo import stats

Ontology()
model = stats.EnrichmentModel("omim")

# use the `model.enrichment` method to calculate
# the enrichment of Omim Diseases within an HPOSet

enrichment(method, hposet)

Calculate the enrichment for all genes or diseeases in the HPOSet

Parameters

method (str) – Currently, only hypergeom is implemented
hposet (pyhpo.HPOSet) – The set of HPOTerms to use as sampleset for calculation of enrichment. The full ontology is used as background set.

Returns

a list with dict that contain data about the enrichment, with the keys:

enrichmentfloat
The hypergeometric enrichment score
foldfloat
The fold enrichment
countint
Number of occurrences
itemGene pyhpo.Gene, pyhpo.Omim or pyhpo.Orpha
The actual enriched gene or disease

Return type

list[dict]

Raises

NameError – Ontology not yet constructed
NotImplementedError – invalid method provided, only hypergeom is implemented

Examples

from pyhpo import Ontology, Gene, Omim
from pyhpo import stats

Ontology()
model = stats.EnrichmentModel("omim")

# you can crate a custom HPOset or use a Gene or Disease
term_set = Gene.get("GBA1").hpo_set()

enriched_diseases = model.enrichment("hypergeom", term_set)

enriched_diseases[0]

# >> {
# >>     "enrichment": 7.708086517543451e-223,
# >>     "fold": 27.44879391414045,
# >>     "count": 164,
# >>     "item": <OmimDisease (608013)>
# >> }

Linkage

linkage(sets, method, kind, similarity_method, combine)

Crate a linkage matrix from a list of HpoSets to use in dendograms or other hierarchical cluster analyses

Parameters

sets (list[pyhpo.HPOSet]) – The HPOSets for which the linkage should be calculated
method (str, default: single) –
The algorithm to use for clustering

Available options:
- single : The minimum distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also known as the Nearest Point Algorithm.
- union : Create a new HpoSet for each cluster based on the union of both combined clusters. This method becomes slow with growing input data
- complete : The maximum distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also known by the Farthest Point Algorithm or Voor Hees Algorithm.
- average : The mean distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also called the UPGMA algorithm.
kind (str, default: omim) –
Which kind of information content to use for similarity calculation

Available options:
- omim
- orpha
- gene
similarity_method (str, default graphic) –
The method to use to calculate the similarity between HPOSets.

Available options:
- resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)
- lin - Lin D, Proceedings of the 15th ICML, (1998)
- jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO
- jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility
- rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)
- ic - Information coefficient - Li B, et. al., arXiv, (2010)
- graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)
- dist - Distance between terms
combine (string, default funSimAvg) –
The method to combine similarity measures.

Available options:
- funSimAvg - Schlicker A, BMC Bioinformatics, (2006)
- funSimMax - Schlicker A, BMC Bioinformatics, (2006)
- BMA - Deng Y, et. al., PLoS One, (2015)

Raises

NameError – Ontology not yet constructed
KeyError – Invalid kind
RuntimeError – Invalid method or similarity_method or combine

Examples

import pyhpo
from pyhpo import Ontology, HPOSet
Ontology()

# Using 100 diseases and creating a Tuple of (Disease Name, HPOSet) for each
diseases = [(d.name, HPOSet(list(d.hpo)).remove_modifier()) for d in list(Ontology.omim_diseases)[0:100]]

# Creating one list with all HPOSets
disease_sets = [d[1] for d in diseases[0:100]]
# And one list with the names of diseases
names = [d[0] for d in diseases[0:100]]

# Cluster the diseases using default settings
lnk = pyhpo.stats.linkage(disease_sets)

# For plotting, you can use `scipy`
import scipy

scipy.cluster.hierarchy.dendrogram(lnk)