Overview

Welcome to Gastrointestinal Cancer Knowledge Database!

Gastrointestinal Cancer Knowledge Database (GIDB) is a text-mining based resource for common gastrointestinal cancers, including biliary tract cancer, liver cancer, stomach cancer, pancreatic cancer, colorectal cancer and esophagus cancer. The knowledge is extracted through a large number of published literatures via natural language processing (NLP) technology and manual curation, and is supplemented by multi-level omics datasets, clinical data and sample information.

Cancer/Gene Timeline


In GIDB, there are two kinds of timeline features, Cancer-Timeline and Gene-Timeline. We take Biliary Tract Cancer (BTC) as an example.

BTC-Timeline



BTC Gene-Timeline



Return to Top

Molecular Alterations Data


Click "Gene" button of each primary site, a gene signature table for each GI cancer type consists of multidimensional analysis relevant to high mutation frequencies (>0.01), differential expression (H: HIGH, M: MODERATE, L: LOW), altered DNA methylation (H: HIGH, M: MODERATE, L: LOW) and survival (P: Prognosis Related), which describes the molecular alterations undertaken by GI cancer and the strength of the correlation between the genes and prognosis. We take Biliary Tract Cancer (BTC) as an example.

Gene



Overview

The table shows a general overview of information available for each curated gene. It describes Official Symbol, Official Full Name, Alias, Entrez Gene ID, HGNC ID, Ensembl ID, Vega ID, UniprotKB ID, Gene Cards ID, Gene Ontology, KEGG, ProteinAltas, STRING and Sequence Downloading.


Mutation

The table shows a list of mutation information for each curated gene. It describes Mutation Consequence, Transcription ID, Start, Mutation Type, dbSNP_RS, CDS Mutation, AA Mutation, Mutation Classification and Feature Type. For more details on data source please see "Data Source and Data Type"


Expression

The table shows a list of expression information for each curated gene. It describes expression level (HIGH, MODERATE, LOW) and two histograms showing the difference of expression level in tumor vs. normal samples and among all GI cancer samples. For more details on data source please see "Data Source and Data Type"


Methylation

The table shows a list of methylation information for each curated gene. It describes methylation level (HIGH, MODERATE, LOW), a histogram showing the difference of methylation level between tumor and normal samples, a scatter plot comparing the methylation level to the expression level and a histogram showing the difference among all GI cancer samples. For more details on data source please see "Data Source and Data Type"


Prognosis

The table shows a list of overall survival analysis for each curated gene. It describes prognosis information which genes expression/methylation level are related to overall survival (RELATED,-) and a survival curve. For more details on data source please see "Data Source and Data Type"


Literature

The table shows a list of literature for each curated gene. It describes Journal, Title, Author and PMID. For more details on text-mining method please see "Text-Mining Method"


Timeline

The Gene-Timeline is an interactive and data-rich resource that provides a historical overview of studies in each gene associated with each type of GI cancer. A span of 40 years from the 1970s to the last update time in this version of the GIDB in 2018.11 is chronicled in the Timeline.

Return to Top

GIDB provides three search options.
1. Gene Search
2. Sequence Search
3. Advanced Search

Gene Search

In GIDB, the search term is case-insensitive, as shown in the following figures. Moreover, if the query string is part of official gene symbols or full/part of alias of a gene, both of which use fuzzy matching.



1. Query for exact match and fuzzy match



2. Recognition of alias



Sequence Search

In GIDB, the amino acid sequence of the search result is available for download. Moreover, query sequences will return lists as their result in a descending order by the sequence similarity. We used the similar_text function in PHP to measure the similarity between one input sequence and all FASTA-formatted amino acid sequences of GI cancer related proteins stored in the database. In total, there are two possibilities for the sequence search:
1. Sequences to be exact matches


2. Returning multiple match results



GIDB also supports multiple fasta sequence query. If multiple FASTA entries are in the text box, all queries will be searched.



Advanced Search

In GIDB, an advanced search option that lets users to do sophisticated searches by combining the various concepts (genes, cancers, keywords) they have already identified for their search.



Return to Top

Gene/Cancer Drug Information


The GIDB integrates information about curated gene sets and anticancer drugs approved by the US Food and Drug Administration (FDA) using ReactomeFIViz. There are two types of links that can be used in the Gene-Cancer Drugs Dictionary page. Firstly, a gene is linked to the Gene Search Result page where you can find the detail of its relationship to GI cancer.



Secondly, the information in PharmGKB and Drugs columns is related to other two online databases, the Pharmacogenetics Knowledge Base (PharmGKB) and DrugBank, to find more details for genes, drugs and interactions.



Moreover, we show a drug-target interaction network that contains information on which genes (i.e., ABCB1) are targeted by cancer drugs (i.e., Vinblastine Sulfate, Tamoxifen Citrate) that are annotated via ReactomeFIViz. Considering the network visualization, we built a small network that consists of an interested gene (blue), its cancer drugs (green) and other genes that these drugs target (orange) herein. We take ABCB1 gene as an example below:



Return to Top

Curated miRNA/lncRNA/CNV/Fusion


The GIDB also consists of curated signatures on miRNAs-target, lncRNAs, fusion genes and CNVs. For miRNAs and lncRNAs, GIDB provides a network 2D visualization, a network 3D visualization and datasets (dynamic network files, cytoscape network format files, genes /miRNAs/lncRNAs tables) for download. Considering a good visualization, nodes will not be shown in a network if the interaction degrees are low in some cancers.

miRNA

The lists of miRNAs have been annotated with microRNA name (eg, miR-205), mRNAs which have been investigated in the same literatures and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.



lncRNA

The lists of lncRNAs have been annotated with long non-coding RNAs name (eg, CRNDE) and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.

CNV

The lists of CNVs have been annotated with gene symbol, copy number type (loss or amplification) , gene expression (underexpression,overexpression, or -), position and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.

Fusion

The lists of Fusion have been annotated with Gene fusions (eg, FGF3-TACC) and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.



Return to Top

Semantic Network & Network Tool


The semantic associations of entities in text are identified by SemRep in the form of a semantic knowledge network. The GIDB currently accommodates semantic relations on AFFECTS, ASSOCIATED_WITH, AUGMENTS, CAUSES, DISRUPTS, PREVENTS and TREATS. One can view how genes (nodes) interact (edges) in each GI cancer type (nodes) in a 2D/3D network mode. Click "Network " button of each primary site, the details of Gene-Cancer Semantic Network also have been showed in the table below. Click "View Dynamic Network " button, the dynamic Network can be displayed.

Semantic Network



GIDB also develops a network tool for users to upload data to view a interactive network.



The formats of upload datasets are showed as follows:



Return to Top

Cancer Heatmap & Heatmap Tool


Cancer Heatmap

GIDB provides a Heatmap using Hierarchical Clustering and Euclidean distance and shows the results of curated genes in each cancer type. It displays two data types (Expression, Methylation) and several characteristics for classification (Gender, Sex, Race, AJCC_Metastasis, Family_History, Age, etc). The number of gene clusters or sample clusters can be set. GIDB just shows top 300 genes heamap results. And the all curated genes' expression/methylation dataset can also be downloaded below.

Heatmap Tool

Moreover, GIDB provides a Heatmap tool (the left side navigation) to upload user's own dataset to identify tumor subgroups features. In current version of GIDB, it only supports two characteristics for classification: Gender and Age (two group).



The formats of two upload datasets are showed as follows:



Return to Top

Data Source and Data Type


Multi-Omics datasets are integrated and analyzed from The Cancer Genome Atlas (TCGA) Database. The Data Category includes Transcriptome Profiling, Simple Somatic Mutation, Clinical Data and Biospecimen Data. The Experimental strategy includes WXS, RNA-Seq and Methylation Array.

Return to Top

Text-Mining Method


In GIDB, we perform a natural language processing (NLP) approach to automate the extraction of disease-gene association from biomedical literature in PubMed. To extract information from the semi-structured MEDLINE format, we construct relevant vocabularies for GI cancers, including cholangiocarcinoma, gallbladder carcinoma, vater ampulla carcinoma, hepatocellular carcinoma, gastric cancer, pancreatic cancer, esophageal cancer, colon cancer and rectal cancer. The metathesaurus concepts and semantic types mentioned in the text are recognized by MetaMap. Moreover, the TF-IDF score and PubTator tool are integrated to recognize the genes involved in GI cancers. In total, GIDB applies two co-occurrence strategies. One is based on a co-occurrence pattern of gene symbols and GI cancer names in one citation. The other is based on a co-occurrence UMLS concept pattern. These two strategies can find a direct and a hidden indirect supported evidence for one gene associated with GI cancers, respectively.

Return to Top

Contact Us

Emails

Ying Wang: nadger_wang@139.com; Xiaoyan Zhang: xyzhang@tongji.edu.cn

Address

Tongji University, No.1239, Siping Road, Shanghai, P.R. China

Return to Top