Overview
Welcome to Gastrointestinal Cancer Knowledge Database!
Gastrointestinal Cancer Knowledge Database (GIDB) is a text-mining based resource for common gastrointestinal cancers, including biliary tract cancer, liver cancer, stomach cancer, pancreatic cancer, colorectal cancer and esophagus cancer. The knowledge is extracted through a large number of published literatures via natural language processing (NLP) technology and manual curation, and is supplemented by multi-level omics datasets, clinical data and sample information.Contents
Cancer/Gene Timeline
In GIDB, there are two kinds of timeline features, Cancer-Timeline and Gene-Timeline. We take Biliary Tract Cancer (BTC) as an example.
BTC-Timeline

BTC Gene-Timeline

Return to Top
Molecular Alterations Data
Click "Gene" button of each primary site, a gene signature table for each GI cancer type consists of multidimensional analysis relevant to high mutation frequencies (>0.01), differential expression (H: HIGH, M: MODERATE, L: LOW), altered DNA methylation (H: HIGH, M: MODERATE, L: LOW) and survival (P: Prognosis Related), which describes the molecular alterations undertaken by GI cancer and the strength of the correlation between the genes and prognosis. We take Biliary Tract Cancer (BTC) as an example.
Gene

Overview
The table shows a general overview of information available for each curated gene. It describes Official Symbol, Official Full Name, Alias, Entrez Gene ID, HGNC ID, Ensembl ID, Vega ID, UniprotKB ID, Gene Cards ID, Gene Ontology, KEGG, ProteinAltas, STRING and Sequence Downloading.
Mutation
The table shows a list of mutation information for each curated gene. It describes Mutation Consequence, Transcription ID, Start, Mutation Type, dbSNP_RS, CDS Mutation, AA Mutation, Mutation Classification and Feature Type. For more details on data source please see "Data Source and Data Type"
Expression
The table shows a list of expression information for each curated gene. It describes expression level (HIGH, MODERATE, LOW) and two histograms showing the difference of expression level in tumor vs. normal samples and among all GI cancer samples. For more details on data source please see "Data Source and Data Type"
Methylation
The table shows a list of methylation information for each curated gene. It describes methylation level (HIGH, MODERATE, LOW), a histogram showing the difference of methylation level between tumor and normal samples, a scatter plot comparing the methylation level to the expression level and a histogram showing the difference among all GI cancer samples. For more details on data source please see "Data Source and Data Type"
Prognosis
The table shows a list of overall survival analysis for each curated gene. It describes prognosis information which genes expression/methylation level are related to overall survival (RELATED,-) and a survival curve. For more details on data source please see "Data Source and Data Type"
Literature
The table shows a list of literature for each curated gene. It describes Journal, Title, Author and PMID. For more details on text-mining method please see "Text-Mining Method"
Timeline
The Gene-Timeline is an interactive and data-rich resource that provides a historical overview of studies in each gene associated with each type of GI cancer. A span of 40 years from the 1970s to the last update time in this version of the GIDB in 2018.11 is chronicled in the Timeline.
Return to Top
Search Options
GIDB provides three search options.
1. Gene Search
2. Sequence Search
3. Advanced Search
Gene Search
In GIDB, the search term is case-insensitive, as shown in the following figures. Moreover, if the query string is part of official gene symbols or full/part of alias of a gene, both of which use fuzzy matching.
1. Query for exact match and fuzzy match

2. Recognition of alias

Sequence Search
In GIDB, the amino acid sequence of the search result is available for download. Moreover, query sequences will return lists as their result in a descending order by the sequence similarity. We used the similar_text function in PHP to measure the similarity between one input sequence and all FASTA-formatted amino acid sequences of GI cancer related proteins stored in the database. In total, there are two possibilities for the sequence search:1. Sequences to be exact matches

2. Returning multiple match results

GIDB also supports multiple fasta sequence query. If multiple FASTA entries are in the text box, all queries will be searched.

Advanced Search
In GIDB, an advanced search option that lets users to do sophisticated searches by combining the various concepts (genes, cancers, keywords) they have already identified for their search.
Return to Top
Gene/Cancer Drug Information
The GIDB integrates information about curated gene sets and anticancer drugs approved by the US Food and Drug Administration (FDA) using ReactomeFIViz. There are two types of links that can be used in the Gene-Cancer Drugs Dictionary page. Firstly, a gene is linked to the Gene Search Result page where you can find the detail of its relationship to GI cancer.

Secondly, the information in PharmGKB and Drugs columns is related to other two online databases, the Pharmacogenetics Knowledge Base (PharmGKB) and DrugBank, to find more details for genes, drugs and interactions.

Moreover, we show a drug-target interaction network that contains information on which genes (i.e., ABCB1) are targeted by cancer drugs (i.e., Vinblastine Sulfate, Tamoxifen Citrate) that are annotated via ReactomeFIViz. Considering the network visualization, we built a small network that consists of an interested gene (blue), its cancer drugs (green) and other genes that these drugs target (orange) herein. We take ABCB1 gene as an example below:

Return to Top
Curated miRNA/lncRNA/CNV/Fusion
The GIDB also consists of curated signatures on miRNAs-target, lncRNAs, fusion genes and CNVs. For miRNAs and lncRNAs, GIDB provides a network 2D visualization, a network 3D visualization and datasets (dynamic network files, cytoscape network format files, genes /miRNAs/lncRNAs tables) for download. Considering a good visualization, nodes will not be shown in a network if the interaction degrees are low in some cancers.
miRNA
The lists of miRNAs have been annotated with microRNA name (eg, miR-205), mRNAs which have been investigated in the same literatures and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.
lncRNA
The lists of lncRNAs have been annotated with long non-coding RNAs name (eg, CRNDE) and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.
CNV
The lists of CNVs have been annotated with gene symbol, copy number type (loss or amplification) , gene expression (underexpression,overexpression, or -), position and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.
Fusion
The lists of Fusion have been annotated with Gene fusions (eg, FGF3-TACC) and literature counts in Pubmed. Click "View" button, the detailed information of these literatures can be displayed.
Return to Top
Semantic Network & Network Tool
The semantic associations of entities in text are identified by SemRep in the form of a semantic knowledge network. The GIDB currently accommodates semantic relations on AFFECTS, ASSOCIATED_WITH, AUGMENTS, CAUSES, DISRUPTS, PREVENTS and TREATS. One can view how genes (nodes) interact (edges) in each GI cancer type (nodes) in a 2D/3D network mode. Click "Network " button of each primary site, the details of Gene-Cancer Semantic Network also have been showed in the table below. Click "View Dynamic Network " button, the dynamic Network can be displayed.
Semantic Network

GIDB also develops a network tool for users to upload data to view a interactive network.

The formats of upload datasets are showed as follows:

Return to Top
Cancer Heatmap & Heatmap Tool
Cancer Heatmap
GIDB provides a Heatmap using Hierarchical Clustering and Euclidean distance and shows the results of curated genes in each cancer type. It displays two data types (Expression, Methylation) and several characteristics for classification (Gender, Sex, Race, AJCC_Metastasis, Family_History, Age, etc). The number of gene clusters or sample clusters can be set. GIDB just shows top 300 genes heamap results. And the all curated genes' expression/methylation dataset can also be downloaded below.
Heatmap Tool
Moreover, GIDB provides a Heatmap tool (the left side navigation) to upload user's own dataset to identify tumor subgroups features. In current version of GIDB, it only supports two characteristics for classification: Gender and Age (two group).
The formats of two upload datasets are showed as follows:

Return to Top
Data Source and Data Type
Multi-Omics datasets are integrated and analyzed from The Cancer Genome Atlas (TCGA) Database. The Data Category includes Transcriptome Profiling, Simple Somatic Mutation, Clinical Data and Biospecimen Data. The Experimental strategy includes WXS, RNA-Seq and Methylation Array.
Return to Top
Text-Mining Method
In GIDB, we perform a natural language processing (NLP) approach to automate the extraction of disease-gene association from biomedical literature in PubMed. To extract information from the semi-structured MEDLINE format, we construct relevant vocabularies for GI cancers, including cholangiocarcinoma, gallbladder carcinoma, vater ampulla carcinoma, hepatocellular carcinoma, gastric cancer, pancreatic cancer, esophageal cancer, colon cancer and rectal cancer. The metathesaurus concepts and semantic types mentioned in the text are recognized by MetaMap. Moreover, the TF-IDF score and PubTator tool are integrated to recognize the genes involved in GI cancers. In total, GIDB applies two co-occurrence strategies. One is based on a co-occurrence pattern of gene symbols and GI cancer names in one citation. The other is based on a co-occurrence UMLS concept pattern. These two strategies can find a direct and a hidden indirect supported evidence for one gene associated with GI cancers, respectively.
Return to Top
Contact Us
Emails
Ying Wang: nadger_wang@139.com; Xiaoyan Zhang: xyzhang@tongji.edu.cnAddress
Tongji University, No.1239, Siping Road, Shanghai, P.R. ChinaReturn to Top