Statistical Analysis

EnrichmentModel

class EnrichmentModel(category)

Calculate the hypergeometric enrichment of genes or diseases in a set of HPO terms

Parameters

category (str) – Specify gene or omim to determine which enrichments to calculate

Raises

KeyError – Invalid category, only gene or omim are possible

Examples

from pyhpo import Ontology, Gene, Omim
from pyhpo import stats

Ontology()
model = stats.EnrichmentModel("omim")

# use the `model.enrichment` method to calculate
# the enrichment of Omim Diseases within an HPOSet
enrichment(method, hposet)

Calculate the enrichment for all genes or diseeases in the HPOSet

Parameters
  • method (str) – Currently, only hypergeom is implemented

  • hposet (pyhpo.HPOSet) – The set of HPOTerms to use as sampleset for calculation of enrichment. The full ontology is used as background set.

Returns

a list with dict that contain data about the enrichment, with the keys:

  • enrichmentfloat

    The hypergeometric enrichment score

  • foldfloat

    The fold enrichment

  • countint

    Number of occurrences

  • itemGene pyhpo.Gene or pyhpo.Omim

    The actual enriched gene or disease

Return type

list[dict]

Raises
  • NameError – Ontology not yet constructed

  • NotImplementedError – invalid method provided, only hypergeom is implemented

Examples

from pyhpo import Ontology, Gene, Omim
from pyhpo import stats

Ontology()
model = stats.EnrichmentModel("omim")

# you can crate a custom HPOset or use a Gene or Disease
term_set = Gene.get("GBA1").hpo_set()

enriched_diseases = model.enrichment("hypergeom", term_set)

enriched_diseases[0]

# >> {
# >>     "enrichment": 7.708086517543451e-223,
# >>     "fold": 27.44879391414045,
# >>     "count": 164,
# >>     "item": <OmimDisease (608013)>
# >> }

Linkage

linkage(sets, method, kind, similarity_method, combine)

Crate a linkage matrix from a list of HpoSets to use in dendograms or other hierarchical cluster analyses

Parameters
  • sets (list[pyhpo.HPOSet]) – The HPOSets for which the linkage should be calculated

  • method (str, default: single) –

    The algorithm to use for clustering

    Available options:

    • single : The minimum distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also known as the Nearest Point Algorithm.

    • union : Create a new HpoSet for each cluster based on the union of both combined clusters. This method becomes slow with growing input data

    • complete : The maximum distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also known by the Farthest Point Algorithm or Voor Hees Algorithm.

    • average : The mean distance of each cluster’s nodes to the other nodes is used as distance for newly formed clusters. This is also called the UPGMA algorithm.

  • kind (str, default: omim) –

    Which kind of information content to use for similarity calculation

    Available options:

    • omim

    • gene

  • similarity_method (str, default graphic) –

    The method to use to calculate the similarity between HPOSets.

    Available options:

    • resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)

    • lin - Lin D, Proceedings of the 15th ICML, (1998)

    • jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO

    • jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility

    • rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)

    • ic - Information coefficient - Li B, et. al., arXiv, (2010)

    • graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)

    • dist - Distance between terms

  • combine (string, default funSimAvg) –

    The method to combine similarity measures.

    Available options:

    • funSimAvg - Schlicker A, BMC Bioinformatics, (2006)

    • funSimMax - Schlicker A, BMC Bioinformatics, (2006)

    • BMA - Deng Y, et. al., PLoS One, (2015)

Raises
  • NameError – Ontology not yet constructed

  • KeyError – Invalid kind

  • RuntimeError – Invalid method or similarity_method or combine

Examples

import pyhpo
Ontology()

# Using 100 diseases and creating a Tuple of (Disease Name, HPOSet) for each
diseases = [(d.name, HPOSet(list(d.hpo)).remove_modifier()) for d in list(Ontology.omim_diseases)[0:100]]

# Creating one list with all HPOSets
disease_sets = [d[1] for d in diseases[0:100]]
# And one list with the names of diseases
names = [d[0] for d in diseases[0:100]]

# Cluster the diseases using default settings
lnk = pyhpo.stats.linkage(disease_sets)

# For plotting, you can use `scipy`
import scipy

scipy.cluster.hierarchy.dendrogram(lnk)