|
Summary
ZCURVE is an ab initio program for gene finding in bacterial or archaeal genomes and its latest version is 3.0.
As well known, gene recognition is the first step of annotation process after obtaining genomic sequences for one
prokaryote and our program could perform such function. The method is based on the Z curve theory of DNA sequence.
The computer program is composed of the following six modules:
- Seeking seed ORFs
- Training the Fisher coefficients involved in the model to describe the characteristics of coding and non-coding regions
- Scoring all ORFs
- Checking ORFs for overlapping
- Relocating start sites of predicted genes
- Choosing those essential genes
The original 1.0 version has been reported in Nucleic acids research as research article in 2003.
Since then, over 100 laboratories and institutes have registered to use our program and it has been involved in
dozens of genome sequencing projects of prokaryotes. The NAR paper has attracted 160 citations in the past ten years.
|
Feature
Compared with ZCURVE 1.0, our ZCURVE 3.0 system has been improved in the following four aspects:
- To extract more information and achieve higher accuracy, we take the short-term correlation among three or four adjacent
nucleotides into consideration. Hence, the number of characteristic variables, change from 33 to 765 (9+36+144+576) in ZCURVE 3.0.
- Instead of the Fisher linear discriminant, support vector machine is used as the classifying function because it has shown
excellent performance in many classifying problem particularly when positive and negative sample group have the balancing size.
- As an extended function, ZCURVE 3.0 could select the most important subset--essential genes from the complete list of
protein-coding gene in one special genome.
- The old version of our program is a standalone tool needed to download from the Tubic website. To facilitate the users, we have provide the web-based automatic running mode of the latest version. Users
could freely and easily using the program by entering into either of our two websites, Tubic and Cefg.
|
Result
Based on cross validations of 422 prokaryotic genomes, ZCURVE 3.0 has slightly higher accuracy than
Glimmer 3.02. As the most prominent advantage, the ZCURVE 3.0 could automatically select essential
genes from the list of protein-coding genes and not any other ab initio program could provide such
convenience. We hope this feature would help drug target selection when the annotated genome belong to pathogens.
|
References
When you use the web services you should cite:
- ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes.
Hua ZG, Lin Y, Yuan YZ, Yang DC, Wei W, Guo FB.
Nucleic Acids Res. 2015 Jul 1;43(W1):W85-90. doi: 10.1093/nar/gkv491. Epub 2015 May 14
- or ZCURVE: a new system for
recognizing protein-coding genes in bacterial and archaeal genomes. Guo FB, Ou HY and Zhang CT. Nucleic Acids Res. 31, 1780-1789, 2003.
- and Geptop: a gene essentiality prediction tool for sequenced bacterial genomes
based on orthology and phylogeny. Wei W, Ning LW, Ye YN and Guo FB. PLoS One. 8(8):e72343, 2013.
|
|