It’s DNA Day 2019!

DNA Day commemorates the completion of the Human Genome Project in April 2003 and the discovery of the double helix of DNA in 1953.

In honor of the day, we took a closer look at the ~20,000 protein-coding genes in our DNA.

DNA Day 2019

We’ve been able to associate just under 5500 of the 20,000 genes with a unique disease in our knowledge graph, which unifies data from 28+ million articles read with NLP, and dozens of databases.

Most mentioned genes in research

The top most mentioned genes have a heavy focus on cancer, along with other diseases with a high prevalence, and many of them are molecular markers. Understandably, research that has the potential to impact more lives gets larger research budgets and more attention. In case you were wondering, here are the top 20 genes appearing in research, along with an article count.

Gene Count Description
TP53 9291 Tumor Protein 53 (TP53) is a tumor suppressor gene, that plays a critical role in maintaining genomic integrity and preventing mutations. Modifications in this gene can lead to a wide variety of cancers, including hereditary cancer Li-Fraumeni syndrome.
TNF 5643 Tumor necrosis factor (TNF) is a multifunctional proinflammatory cytokine. Mainly secreted by macrophage, TNF is involved in the regulation of a wide spectrum of biological processes and has been studied in immune diseases like multiple sclerosis, rheumatoid arthritis, and others.
EGFR 5321 The epidermal growth factor receptor (EGFR) controls cell differentiation and proliferation processes and is a potential cancer marker. Somatic mutations in this gene are specifically linked with non-small cell lung cancers.
VEGFA 4423 Vascular Endothelial Growth Factor A (VEGFA) enodes a heparin-binding protein that is crucial for physiological and pathological angiogenesis. VEGFA is an inflammatory marker that has been frequently associated with renal cell carcinoma, non-Hodgkin and mantle cell lymphoma, hypertensive intracerebral hemorrhage, Alzheimer’s and diabetic retinopathy.
IL6 4322 Interleukin 6 (IL6) gene encodes for interleukin 6 cytokine. This cytokine plays an important role in inflammation and B cell maturation. IL6 has been studied in acute pancreatitis, periodontitis, Kaposi Sarcoma, rheumatoid arthritis, and other autoimmune diseases.
APOE 4252 Apolipoprotein E (APOE) encodes for a protein that is essential for normal fat metabolism. Mutations in the gene may cause a variety of age-related complications, such as hearing loss, muscular degeneration, Alzheimer’s and cardiovascular disease.
TGFB1 4101 Transforming growth factor beta 1 (TGFB1) is a multifunctional peptide that controls proliferation, differentiation, motility, and apoptosis of the cells. Mutations in this gene may cause Camurati-Engelmann disease and several types of cancers.
MTHFR 3431 Methylenetetrahydrofolate Reductase (MTHFR) gene controls the conversion of amino acid homocysteine into methionine. Mutations in the gene are linked to homocystinuria, spina bifida, anencephaly and others.
ESR1 3092 Estrogen receptor 1 (ESR1) is an important biomarker for inflammation and has been associated with estrogen receptor-positive breast cancer, primary ovarian insufficiency, estrogen resistance, myocardial infarction and more.
AKT1 3088 AKT1 gene encodes for an enzyme “serine threonine protein kinase” that serves an important role in cell proliferation and differentiation. This gene is a cancer marker, studied frequently in metastatic prostate, oesophageal and vulvar squamous cell carcinoma and Proteus syndrome.
HIF1A 2976 Hypoxia-inducible factor-1 alpha (HIF1A) is a transcription factor that plays an essential role in cellular and systemic homeostatic responses to hypoxia. Mutations in the gene may also lead to retinal ischemia.
NFKB1 2941 Nuclear Factor Kappa B Subunit 1 (NFKB1) plays a crucial role in transcription regulation. Abnormal activation of this gene has been associated with a number of inflammatory diseases while persistent inhibition of NFKB1 leads to inappropriate immune cell development or delayed cell growth.
IL10 2938 Interleukin 10 (IL10) is a key anti-inflammatory cytokine produced by activated immune cells that plays a critical role in immune responses. Mutations in the gene are associated with HIV1 susceptibility, rheumatoid arthritis and others.
BRCA1 2778 Breast Cancer Type 1(BRCA1) plays critical roles in DNA repair, cell cycle checkpoint control, and maintenance of genomic stability. a It is most notoriously associated with breast and ovarian cancer. Numerous pathogenic variants in BRCA1 have been identified.
ERBB2 2759 Erythroblastic Oncogene B2 (ERBB2) encodes for Erb-B2 receptor tyrosine kinase 2, which facilitates cell proliferation and suppresses apoptosis. Overexpression of this gene has been associated with a variety of cancers.
MMP9 2690 Matrix Metalloproteinase 9 (MMP9) is a neoplastic marker that has been widely studied in breast, invasive prostate, papillary thyroid, hepatocellular and ovarian cancer. It has also been identified as an inflammation marker. MMP9 proteins are essential for the breakdown of extracellular matrix in various physiological processes, including embryonic development, reproduction, tissue modeling and others.
IL1B 2658 Interleukin 1 beta (IL1B) belongs to cytokine family that mediates acute phase response. Mutations in the gene are associated with gastric cancer, periodontal disease and others. Overexpression of IL1B can also lead to various autoinflammatory syndromes.
HLA-DRB1 2654 Human leukocyte antigen-DR beta 1 (HLA-DRB 1) complex plays a critical role in the effective functioning of the immune system. Mutations in this gene are associated with autoimmune Addison disease, multiple sclerosis, rheumatoid arthritis, and type 1 diabetes.
STAT3 2547 Signal transducer and activator of transcription 3 (STAT3) gene plays a key role in cellular processes such as cell growth and apoptosis. Mutations in STAT3 may lead to cancer, autoimmune disorders and autosomal dominant hyper-IgE syndrome.
APP 2520 Amyloid precursor protein (APP) gene encodes for an integral membrane protein, the amyloid precursor protein. Evidence suggests it controls synapse formation and neural plasticity. Mutations in the genes are linked with autosomal dominant Alzheimer’s disease and hereditary cerebral amyloid angiopathy.
Get access to the AI knowledge graph

Recent Posts

Get access to the AI knowledge graph