HPOSet

An HPOSet is a collection of HPOTerm that can be used to document the clinical information of a patient. At the same time, the phenotypes associated with genes and diseases are also HPOSets. HPOSet can be instantiated in multiple ways, depending on the available data types. Whichever way you choose, you must instantiate the Ontology beforehand.

Instantiation

from_queries(queries)

Instantiate an HPOSet from various inputs

This is the most common way to instantiate HPOSet because it can use all kind of different inputs. Callers must ensure that each query paramater matches a single HPOTerm.

Parameters

queries (list[str or int]) –

  • str HPO term (e.g.: Scoliosis)

  • str HPO-ID (e.g.: HP:0002650)

  • int HPO term id (e.g.: 2650)

Returns

A new HPOSet

Return type

pyhpo.HPOSet

Raises
  • NameError – Ontology not yet constructed

  • ValueError – query cannot be converted to HpoTermId

  • RuntimeError – No HPO term is found for the provided query

Examples

from pyhpo import Ontology
Ontology()
my_set = HPOSet.from_queries([
    "HP:0002650",
    118,
    "Thoracolumbar scoliosis"
])
len(my_set)
# >> 3
from_serialized(pickle)

Instantiate an HPOSet from a serialized HPOSet

This method is used when you have a serialized form of the HPOSet to share between applications. See pyhpo.HPOSet.serialize()

Parameters

pickle (str) – A pickled string of all HPOTerms, e.g. 118+2650

Returns

A new HPOSet

Return type

pyhpo.HPOSet

Raises
  • NameError – Ontology not yet constructed

  • ValueError – pickled item cannot be converted to HpoTermId

  • KeyError – No HPO term is found for the provided query

Examples

from pyhpo import Ontology
Ontology()
my_set = HPOSet.from_serialized("7+118+152+234+271+315+478+479+492+496")
len(my_set
# >> 10
from_gene(gene)

Instantiate an HPOSet from a Gene

Parameters

gene (pyhpo.Gene) – A gene from the ontology

Returns

A new HPOSet

Return type

pyhpo.HPOSet

Raises

NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology
Ontology()
gene_set = HPOSet.from_gene(Ontology.genes[0])
len(gene_set)
# >> 118
from_disease(disease)

Instantiate an HPOSet from an Omim disease

Parameters

gene (pyhpo.Omim) – An Omim disease from the ontology

Returns

A new HPOSet

Return type

pyhpo.HPOSet

Raises

NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology
Ontology()
disease_set = HPOSet.from_disease(Ontology.omim_diseases[0])
len(disease_set)
# >> 18

Instance methods

class HPOSet(terms)
add(term)

Add an HPOTerm to the HPOSet

Parameters

term (HPOTerm or int) – The term to add, either as actual HPOTerm or the integer representation

Raises
  • NameError – Ontology not yet constructed

  • KeyError – (only when int are used as input): HPOTerm does not exist

Examples

from pyhpo import Ontology, HPOSet
Ontology()
my_set = HPOSet([])
my_set.add(Ontology[118])
len(my_set) # >> 1
my_set.add(2650)
len(my_set) # >> 2
all_genes()

Returns a set of associated genes

Returns

The union of genes associated with terms in the HPOSet

Return type

set[pyhpo.Gene]

Raises

NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology
Ontology()
disease = list(Ontology.omim_diseases)[0]
for gene in disease.all_genes():
    print(gene.name)
child_nodes()

Returns a new HPOSet that does not contain ancestor terms

If a set contains HPOTerms that are ancestors of other terms in the set, they will be removed. This method is useful to create a set that contains only the most specific terms.

Returns

A new HPOSet that contains only the most specific terms

Return type

pyhpo.HPOSet

Raises

NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology, HPOSet

my_set = HPOSet.from_queries([
    'HP:0002650',
    'HP:0010674',
    'HP:0000925',
    'HP:0009121'
])

child_set = my_set.child_nodes()

len(my_set) # >> 4
len(child_set) # >> 1
information_content(kind='omim')

Returns basic information content stats about the HPOTerms within the set

Parameters

kind (str, default: omim) – Which kind of information content should be calculated. Options are [‘omim’, ‘gene’]

Returns

Dict with the following items

  • mean - float - Mean information content

  • max - float - Maximum information content value

  • total - float - Sum of all information content values

  • all - list of float - List with all information content values

Return type

dict

Raises

NameError – Ontology not yet constructed

Examples

omim_diseases()

Returns a set of associated diseases

Returns

The union of Omim diseases associated with terms in the HPOSet

Return type

set[pyhpo.Omim]

Raises

NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology
Ontology()
gene = list(Ontology.genes)[0]
for disease in disease.omim_diseases():
    print(disease.name)
remove_modifier()

Returns a new HPOSet that does not contain any modifier terms

This method removes all terms that are not children of HP:0000118 | Phenotypic abnormality

Returns

A new HPOSet that contains only phenotype terms

Return type

pyhpo.HPOSet

Raises

NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology, HPOSet

my_set = HPOSet.from_queries([
    'HP:0002650',
    'HP:0010674',
    'HP:0000925',
    'HP:0009121',
    'HP:0012823',
])

pheno_set = my_set.remove_modifier()

len(my_set) # >> 5
len(pheno_set) # >> 4
replace_obsolete()

Returns a new HPOSet that replaces all obsolete terms with their replacement

If an obsolete term has a replacement term defined it will be replaced, otherwise it will be removed.

Returns

A new HPOSet that contains only phenotype terms

Return type

pyhpo.HPOSet

Raises

NameError – Ontology not yet constructed

Examples

from pyhpo import Ontology, HPOSet

my_set = HPOSet.from_queries([
    'HP:0002650',
    'HP:0010674',
    'HP:0000925',
    'HP:0009121',
    'HP:0410003',
])

active_set = my_set.replace_obsolete()

len(my_set) # >> 5
len(active_set) # >> 5

Ontology[410003] in my_set
# >> True

Ontology[410003] in active_set
# >> False
serialize()

Returns a serialized string representing the HPOSet

Returns

A serialized string uniquely representing the HPOSet, e.g.: 3+118+2650`

Return type

str

Examples

from pyhpo import Ontology
Ontology()
gene_sets = [g.hpo_set() for g in Ontology.genes]
gene_sets[0].serialize()
# >> 7+118+152+234+271+315+478+479+492+496.....
similarity(other, kind, method, combine)

Calculate similarity between this and another HPOSet

This method runs parallelized on all avaible CPU

Parameters
  • other (pyhpo.HPOSet) – The HPOSet to calculate the similarity to

  • kind (str, default: omim) –

    Which kind of information content to use for similarity calculation

    Available options:

    • omim

    • gene

  • method (str, default graphic) –

    The method to use to calculate the similarity.

    Available options:

    • resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)

    • lin - Lin D, Proceedings of the 15th ICML, (1998)

    • jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO

    • jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility

    • rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)

    • ic - Information coefficient - Li B, et. al., arXiv, (2010)

    • graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)

    • dist - Distance between terms

  • combine (str, default funSimAvg) –

    The method to combine individual term similarity to HPOSet similarities.

    Available options:

    • funSimAvg

    • funSimMax

    • BMA

Returns

Similarity scores

Return type

float

Raises
  • NameError – Ontology not yet constructed

  • AttributeError – Invalid kind

  • RuntimeError – Invalid method or combine

Examples

from pyhpo import Ontology
Ontology()
gene_sets = [g.hpo_set() for g in Ontology.genes]
gene_sets[0].similarity(gene_sets[1])
# >> 0.29546087980270386
similarity_scores(other, kind, method, combine)

Calculate similarity between this HPOSet and a list of other HPOSet

This method runs parallelized on all avaible CPU

Parameters
  • other (list[pyhpo.HPOSet]) – Calculate similarity between self and every provided HPOSet

  • kind (str, default: omim) –

    Which kind of information content to use for similarity calculation

    Available options:

    • omim

    • gene

  • method (str, default graphic) –

    The method to use to calculate the similarity.

    Available options:

    • resnik - Resnik P, Proceedings of the 14th IJCAI, (1995)

    • lin - Lin D, Proceedings of the 15th ICML, (1998)

    • jc - Jiang J, Conrath D, ROCLING X, (1997) This is different to PyHPO

    • jc2 - Jiang J, Conrath D, ROCLING X, (1997) Same as jc, but kept for backwards compatibility

    • rel - Relevance measure - Schlicker A, et.al., BMC Bioinformatics, (2006)

    • ic - Information coefficient - Li B, et. al., arXiv, (2010)

    • graphic - Graph based Information coefficient - Deng Y, et. al., PLoS One, (2015)

    • dist - Distance between terms

  • combine (str, default funSimAvg) –

    The method to combine individual term similarity to HPOSet similarities.

    Available options:

    • funSimAvg

    • funSimMax

    • BMA

Returns

Similarity scores for every comparison

Return type

list[float]

Raises
  • NameError – Ontology not yet constructed

  • KeyError – Invalid kind

  • RuntimeError – Invalid method or combine

Examples

from pyhpo import Ontology
Ontology()
gene_sets = [g.hpo_set() for g in Ontology.genes]
similarities = gene_sets[0].similarity_scores(gene_sets)
similarities[0:4]
# >> [1.0, 0.5000048279762268, 0.29546087980270386, 0.5000059008598328]
terms()

Returns the HPOTerms in the set

Returns

A list of every term in the set

Return type

list[pyhpo.HPOTerm]

Important

The return type of this method will very likely change into an Iterator of HPOTerm. (Info about likely API changes)

Raises
  • NameError – Ontology not yet constructed

  • KeyError – No HPO term is found for the provided query

Examples

from pyhpo import Ontology
Ontology()
my_set = list(Ontology.genes)[0].hpo_set()
for term in my_set.terms():
    print(term.name)
toJSON(verbose)

Returns a dict/JSON representation the HPOSet

Parameters

verbose (bool) – Indicates if each HPOTerm should contain verbose information see pyhpo.HpoTerm.toJSON()

Returns

Dict representation of all HPOTerms in the set that can be used for JSON serialization

Return type

Dict

Raises
  • NameError – Ontology not yet constructed

  • KeyError – No HPO term is found for the provided query

Examples

from pyhpo import Ontology
Ontology()
my_set = HPOSet.from_serialized("7+118+152+234+271+315+478+479+492+496")
my_set.toJSON()
# >> [
# >>     {'name': 'Autosomal recessive inheritance', 'id': 'HP:0000007', 'int': 7},
# >>     {'name': 'Phenotypic abnormality', 'id': 'HP:0000118', 'int': 118},
# >>     {'name': 'Abnormality of head or neck', 'id': 'HP:0000152', 'int': 152},
# >>     {'name': 'Abnormality of the head', 'id': 'HP:0000234', 'int': 234},
# >>     {'name': 'Abnormality of the face', 'id': 'HP:0000271', 'int': 271},
# >>     {'name': 'Abnormality of the orbital region', 'id': 'HP:0000315', 'int': 315},
# >>     {'name': 'Abnormality of the eye', 'id': 'HP:0000478', 'int': 478},
# >>     {'name': 'Abnormal retinal morphology', 'id': 'HP:0000479', 'int': 479},
# >>     {'name': 'Abnormal eyelid morphology', 'id': 'HP:0000492', 'int': 492},
# >>     {'name': 'Abnormality of eye movement', 'id': 'HP:0000496', 'int': 496}
# >> ]

Not yet implemented

The following instance methods are not yet implemented for pyhpo.HPOSet

variance(self, /)

Calculates the distances between all its term-pairs. It also provides basic calculations for variances among the pairs.

Returns

Tuple with the variance metrices

  • float Average distance between pairs

  • int Smallest distance between pairs

  • int Largest distance between pairs

  • list of int List of all distances between pairs

Return type

tuple of (int, int, int, list of int)

combinations(self, /)

Helper generator function that returns all possible two-pair combination between all its terms

This function is direction dependent. That means that every pair will appear twice. Once for each direction

Yields

Tuple of pyhpo.HPOTerm – Tuple containing the follow items * HPOTerm 1 of the pair * HPOTerm 2 of the pair

combinations_one_way(self, /)

Helper generator function that returns all possible two-pair combination between all its terms

This methow will report each pair only once

Yields

Tuple of term.HPOTerm – Tuple containing the follow items

  • HPOTerm instance 1 of the pair

  • HPOTerm instance 2 of the pair

BasicHPOSet

A BasicHPOSet is like a normal pyhpo.HPOSet, but:

  • only child terms are retained, non-specific parent terms are removed

  • a obsolete terms are replaced or removed

  • all modifier terms are removed

HPOPhenoSet

A BasicHPOSet is like a normal pyhpo.HPOSet, but:

  • a obsolete terms are replaced or removed

  • all modifier terms are removed

Term

HPOSet

BasicHPOSet

HPOPhenoSet

obsolete

modifier

parents