Protein
Identifiers and Primary Sources
The IGVF Catalog has protein nodes collection with Ensembl Protein IDs (e.g.,ENSP00000384707) as identifiers from the following GENCODE releases:
- Human: GENCODE v43
- Mouse: GENCODE vM36
P49711), protein names, and Database cross-reference (dbxref). The data was imported from the following UniProt files (downloaded in May 2024):
- Human: uniprot_sprot_human.dat.gz, uniprot_trembl_human.dat.gz
- Mouse: uniprot_sprot_rodents.dat.gz, uniprot_trembl_rodents.dat.gz
Additional Protein Edge Collections
Complexes-Proteins Edges
Data protein complexes are imported from Complex Portal at the EBI.Motifs-Proteins Edges
Transcription Factor (TF) Binding Motifs represented by Position Weight Matrices (PWMs) defining the DNA binding preferences for transcription factors from HOCOMOCO v11.Variants-Proteins Edges
| Source | Class | Edge Description | Datasets |
|---|---|---|---|
| External | statistical assessment | Allele-specific TF binding events called from assays such as ChIP-seq | ADASTRA Bill Cipher |
| External | statistical assessment | Allele-specific TF binding events called SNP-SELEX | GVATdb |
| IGVF | prediction | Predictions of the impact of variants on binding to 223 TFs binding using SEMVAR | SEMpl |
| external | statsistical assessment | pQTL (Protein Quantitative Trait Loci) studies on blood plasma protein abundance levels | UK BioBank Pharma Proteomics Project |
Protein-Protein Edges
Protein-protein interactions from BioGRID and IntAct includes direct physical interactions and functional genetic associations.Coding variants
Identifiers and Primary Sources
The IGVF Catalog has coding variants collection using identifiers combining Ensembl Transcript ID and HGVS nomenclature for both protein and cDNA changes (e.g.,HK1_ENST00000298649_p.Arg380Gly_c.1138C>G). Each coding variant is also mapped to its genomic location using standardized SPDI identifiers.
The primary source is dbNSFP v5.1, which aggregates scores from dozens of functional prediction algorithms (e.g., SIFT, CADD) for all possible non-synonymous single-nucleotide variants (nsSNVs) and splice-site variants in the human genome. Additionally, predictions on coding variants functions from Mutpred2 and ESM-1v for every possible single amino acid change in human genes are imported as part of the IGVF project. As part of this import process, all possible genetic indels that could lead to each specific amino acid change were enumerated and included.
Coding Variant Edges
| Source | Class | Edge Description | Datasets |
|---|---|---|---|
| IGVF | observed data | Variant Abundance by Massively Parallel Sequencing (VAMP-seq) uses massively parallel sequencing to measure the effects of thousands of missense variants on protein abundance for CYP2C19 and G6PD. | VAMP-seq assays |
| IGVF | observed data | An advanced version of VAMP-seq tailored for secreted proteins. Measures how missense variants impact secretion and post-translational modifications for coagulation factor IX (F9). | MultiSTEP assays |
| IGVF | prediction | Predictions for all single amino acid substitutions across all human genes. | Mutpred2 |
| IGVF | prediction | Predictions for all single amino acid substitutions in all MANE protein sequences | ESM-1v |
Variants-Phenotypes-Coding variants Edges
- IGVF SGE assays: CRISPR-based assays that systematically introduce and evaluate the functional impact of hundreds to thousands of variants in PALB2, CTCF, RAD51D, SFPQ, XRCC2, and BRCA2.
GO Terms Table
This table presents Gene Ontology (GO) terms associated with the protein, describing its functions, processes, and cellular locations.| Column Name | Description |
|---|---|
| Annotation ID | Unique identifier for the protein |
| Annotation Name | Protein name from UniProt |
| GO Term Name | The specific Gene Ontology term |
| Source | Origin of the GO term information |
| Gene Product Type | Type of gene product (e.g., protein, RNA) |
| Gene Product Symbol | Symbol representing the gene product |
| Qualifier | Additional qualifiers for the GO term (if applicable) |
| Organism | The organism to which this annotation applies |
| Evidence | Code indicating the type of evidence for this annotation |
| GO ID | Unique identifier for the GO term |
TF-Binding Motif
This table shows information about transcription factor binding motifs associated with the protein. from HOCOMOCO v11.| Column | Description |
|---|---|
| Source | Origin of the motif information |
| Motif Name | Name of the binding motif |
| TF Name | Name of the transcription factor |
| Length | Length of the motif |
| Motif Source | Source of the motif data (click for more details) |
Protein Complex
This table displays information about protein complexes that include the protein you’re viewing from EBI complex.| Column | Description |
|---|---|
| ID | Unique identifier for the complex |
| Name | Name of the protein complex |
| Alias | Alternative names for the complex |
| Molecules | Other molecules in the complex |
| Evidence Code | Code indicating the type of evidence (hover for full text) |
| Experimental Evidence | Details of experimental evidence |
| Description | Brief description of the complex (hover for full text) |
| Complex Assembly | Information on how the complex assembles |
| Complex Source | Source of the complex information |
| Reactome Xref | Cross-references to Reactome database |
| Source | Origin of the complex data (click for more details) |
Protein-Protein Interactions Table
This table presents protein-protein interactions for this protein from Biogrid and IntAct, includes direct physical interactions or functional genetic associations.| Column | Description |
|---|---|
| Interacting Protein | The name of the interacting protein |
| Interaction Type | A numerical code classifying the interaction (hover for full text) |
| Detection Method | The specific technique used to identify the interaction (e.g., affinity chromatography technology) |
| Confidence (BioGRID) | A confidence score assigned by BioGRID. Higher scores generally indicate stronger supporting evidence |
| Confidence (IntAct) | A confidence score from the IntAct database |
| PMIDs | The publication(s) that reported the interaction |
| Source | Database source of the interaction |
Associated Variant Table
This table shows variants that are directly associated with this protein, including pQTLs (protein quantitative trait loci) and allele-specific transcription factor binding events.| Column | Description |
|---|---|
| Variant (rsID) | The variant identifier, typically an rsID when available |
| Association Type | The type of association between the variant and protein |
| Source | Database or study source of the association |
| Context / Biosample | The biological context or biosample in which the association was observed |
| Motif | Associated transcription factor binding motif (if applicable) |
| P-value (-log10) | Statistical significance of the association as -log10(p-value) |
| Beta (pQTL) | Effect size for protein quantitative trait loci |
| Standard Error | Standard error of the effect estimate |
| Class | Classification of the variant or association |
| Gene | Associated gene symbol |
| Gene Consequence | The predicted consequence of the variant on the gene |
| Effect on Binding | Predicted effect on protein binding (if applicable) |
| Relative Binding Affinity | Relative change in binding affinity |
| Alt Score | Alternative allele score |
| Ref Score | Reference allele score |
Related Variant Table
This table displays variants that are related to this protein through various evidence sources including pQTL and other regulatory mechanisms.| Column | Description |
|---|---|
| rsID | The variant’s reference SNP cluster ID |
| Chromosome | The chromosome where the variant is located |
| Position | The genomic position of the variant |
| Ref | The reference allele |
| Alt | The alternative allele |
| HGVS | Human Genome Variation Society nomenclature for the variant |
| Evidence Sources | The databases or studies providing evidence for this relationship |
| Biological Context | The biological contexts in which this relationship was observed |
| Max -log10(p-value) | Maximum statistical significance across all evidence sources |
Related Genes & Protein Table
This table shows genes and proteins that are related to this protein through various biological relationships, including protein-protein interactions and regulatory networks.| Column | Description |
|---|---|
| Related Protein | Proteins that have biological relationships with this protein |
| Related Gene | Genes that have biological relationships with this protein |
| Related Entities | Other biological entities connected to this protein through various pathways |

