Skip to main content

Protein

Identifiers and Primary Sources

The IGVF Catalog has protein nodes collection with Ensembl Protein IDs (e.g., ENSP00000384707) as identifiers from the following GENCODE releases: Information from UniProt are also integrated as properties on each node, including UniProt IDs (e.g., P49711), protein names, and Database cross-reference (dbxref). The data was imported from the following UniProt files (downloaded in May 2024):

Additional Protein Edge Collections

Complexes-Proteins Edges

Data protein complexes are imported from Complex Portal at the EBI.

Motifs-Proteins Edges

Transcription Factor (TF) Binding Motifs represented by Position Weight Matrices (PWMs) defining the DNA binding preferences for transcription factors from HOCOMOCO v11.

Variants-Proteins Edges

SourceClassEdge DescriptionDatasets
Externalstatistical assessmentAllele-specific TF binding events called from assays such as ChIP-seqADASTRA Bill Cipher
Externalstatistical assessmentAllele-specific TF binding events called SNP-SELEXGVATdb
IGVFpredictionPredictions of the impact of variants on binding to 223 TFs binding using SEMVARSEMpl
externalstatsistical assessmentpQTL (Protein Quantitative Trait Loci) studies on blood plasma protein abundance levelsUK BioBank Pharma Proteomics Project

Protein-Protein Edges

Protein-protein interactions from BioGRID and IntAct includes direct physical interactions and functional genetic associations.

Coding variants

Identifiers and Primary Sources

The IGVF Catalog has coding variants collection using identifiers combining Ensembl Transcript ID and HGVS nomenclature for both protein and cDNA changes (e.g., HK1_ENST00000298649_p.Arg380Gly_c.1138C>G). Each coding variant is also mapped to its genomic location using standardized SPDI identifiers. The primary source is dbNSFP v5.1, which aggregates scores from dozens of functional prediction algorithms (e.g., SIFT, CADD) for all possible non-synonymous single-nucleotide variants (nsSNVs) and splice-site variants in the human genome. Additionally, predictions on coding variants functions from Mutpred2 and ESM-1v for every possible single amino acid change in human genes are imported as part of the IGVF project. As part of this import process, all possible genetic indels that could lead to each specific amino acid change were enumerated and included.

Coding Variant Edges

SourceClassEdge DescriptionDatasets
IGVFobserved dataVariant Abundance by Massively Parallel Sequencing (VAMP-seq) uses massively parallel sequencing to measure the effects of thousands of missense variants on protein abundance for CYP2C19 and G6PD.VAMP-seq assays
IGVFobserved dataAn advanced version of VAMP-seq tailored for secreted proteins. Measures how missense variants impact secretion and post-translational modifications for coagulation factor IX (F9).MultiSTEP assays
IGVFpredictionPredictions for all single amino acid substitutions across all human genes.Mutpred2
IGVFpredictionPredictions for all single amino acid substitutions in all MANE protein sequencesESM-1v

Variants-Phenotypes-Coding variants Edges

  • IGVF SGE assays: CRISPR-based assays that systematically introduce and evaluate the functional impact of hundreds to thousands of variants in PALB2, CTCF, RAD51D, SFPQ, XRCC2, and BRCA2.

GO Terms Table

This table presents Gene Ontology (GO) terms associated with the protein, describing its functions, processes, and cellular locations.
Column NameDescription
Annotation IDUnique identifier for the protein
Annotation NameProtein name from UniProt
GO Term NameThe specific Gene Ontology term
SourceOrigin of the GO term information
Gene Product TypeType of gene product (e.g., protein, RNA)
Gene Product SymbolSymbol representing the gene product
QualifierAdditional qualifiers for the GO term (if applicable)
OrganismThe organism to which this annotation applies
EvidenceCode indicating the type of evidence for this annotation
GO IDUnique identifier for the GO term

TF-Binding Motif

This table shows information about transcription factor binding motifs associated with the protein. from HOCOMOCO v11.
ColumnDescription
SourceOrigin of the motif information
Motif NameName of the binding motif
TF NameName of the transcription factor
LengthLength of the motif
Motif SourceSource of the motif data (click for more details)

Protein Complex

This table displays information about protein complexes that include the protein you’re viewing from EBI complex.
ColumnDescription
IDUnique identifier for the complex
NameName of the protein complex
AliasAlternative names for the complex
MoleculesOther molecules in the complex
Evidence CodeCode indicating the type of evidence (hover for full text)
Experimental EvidenceDetails of experimental evidence
DescriptionBrief description of the complex (hover for full text)
Complex AssemblyInformation on how the complex assembles
Complex SourceSource of the complex information
Reactome XrefCross-references to Reactome database
SourceOrigin of the complex data (click for more details)

Protein-Protein Interactions Table

This table presents protein-protein interactions for this protein from Biogrid and IntAct, includes direct physical interactions or functional genetic associations.
ColumnDescription
Interacting ProteinThe name of the interacting protein
Interaction TypeA numerical code classifying the interaction (hover for full text)
Detection MethodThe specific technique used to identify the interaction (e.g., affinity chromatography technology)
Confidence (BioGRID)A confidence score assigned by BioGRID. Higher scores generally indicate stronger supporting evidence
Confidence (IntAct)A confidence score from the IntAct database
PMIDsThe publication(s) that reported the interaction
SourceDatabase source of the interaction

Associated Variant Table

This table shows variants that are directly associated with this protein, including pQTLs (protein quantitative trait loci) and allele-specific transcription factor binding events.
ColumnDescription
Variant (rsID)The variant identifier, typically an rsID when available
Association TypeThe type of association between the variant and protein
SourceDatabase or study source of the association
Context / BiosampleThe biological context or biosample in which the association was observed
MotifAssociated transcription factor binding motif (if applicable)
P-value (-log10)Statistical significance of the association as -log10(p-value)
Beta (pQTL)Effect size for protein quantitative trait loci
Standard ErrorStandard error of the effect estimate
ClassClassification of the variant or association
GeneAssociated gene symbol
Gene ConsequenceThe predicted consequence of the variant on the gene
Effect on BindingPredicted effect on protein binding (if applicable)
Relative Binding AffinityRelative change in binding affinity
Alt ScoreAlternative allele score
Ref ScoreReference allele score
This table displays variants that are related to this protein through various evidence sources including pQTL and other regulatory mechanisms.
ColumnDescription
rsIDThe variant’s reference SNP cluster ID
ChromosomeThe chromosome where the variant is located
PositionThe genomic position of the variant
RefThe reference allele
AltThe alternative allele
HGVSHuman Genome Variation Society nomenclature for the variant
Evidence SourcesThe databases or studies providing evidence for this relationship
Biological ContextThe biological contexts in which this relationship was observed
Max -log10(p-value)Maximum statistical significance across all evidence sources
This table shows genes and proteins that are related to this protein through various biological relationships, including protein-protein interactions and regulatory networks.
ColumnDescription
Related ProteinProteins that have biological relationships with this protein
Related GeneGenes that have biological relationships with this protein
Related EntitiesOther biological entities connected to this protein through various pathways