Bibliography

1
S. Audic and J.M.Claverie.
Detection of eukaryotic promoters using markov transition matrices.
Comput. Chem., 21:223-227, 1997.

2
F. Avbelj and R. L. Baldwin.
Role of backbone solvation and electrostatics in generating preferred peptide backbone conformations: distributions of phi.
PNAS, 100(10):5742-7, 2003.

3
F. Avbelj and R. L. Baldwin.
Origin of the neighboring residue effect on peptide backbone conformation.
PNAS, 101(30):10967 - 10972, 2004.

4
G. Bejerano, Y. Seldin, H. Margalit, and N. Tishby.
Markovian domain fingerprinting: statistical segmentation of protein sequences.
Bioinformatics, 17(10):927-934, 2001.

5
M. Belkin.
Problems of Learning on Manifolds.
PhD thesis, Uniersity of Chicago, 2003.

6
P. Bucher.
Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences.
J. Mol. Biol., 212:564-575, 1990.

7
C. Burge.
Identification of Genes in Human Genome DNA.
PhD thesis, Standford, 1997.

8
M. Burset and R. Guig$\acute{o}$.
Evaluation of gene structure prediction programs.
Genomics, 34:353-367, 1996.

9
H. Bussemaker and H. Li.
Regulatory element detection using a probabilistic segmentation model.
Proc. Int. Conf. Intell. Syst. Mol. Biol., pages 8:67-74, 2000.

10
H. Bussemaker, H. Li, and E. D. Siggia.
Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis.
Proc. Natl. Acad. Sci., pages 97:10096-10100, 2000.

11
H. Bussemaker, H. Li, and E. D. Siggia.
Regulatory element detection using correlation with expression.
Nature Genet., pages 27:167-171, 2001.

12
C.E.Shannon.
A mathematical theory of communication.
Bell System Technical Journal, pages 27:623-656, 1948.

13
K. Chakrabarti and S. Mehrotra.
Local dimensionality reduction: A new approach to indexing high dimensional spaces.
In Proceedings of 26th VLDB Conference, 2000.

14
J.-M. Claverie.
Computational methods for the identification of genes in vertebrate genomic sequences.
Human Molecular Genetics, 6(10):1735-1744, 1997.

15
J. M. Claverie, I. Sauvaget, and L. Bougueleret.
Computer generation and statistical analysis of a data bank of protein sequences translated from genebank.
Biochimie, 67(5):437-43, 1985.

16
P. Clote and R. Backofen.
Computational Molecular Biology.
John Wiley @Sons Ltd, 2001.

17
C. de Marcken.
Unsupervised Language Acquisition.
PhD thesis, MIT, 1996.

18
Dempster, Laird, and D. Rubin.
Maximum likelihood from incomplete data via the em algorithm.
Bioinformatics, 39:1-38, 1977.

19
K. A. DILL.
Polymer principles and protein folding.
Protein Sci., pages 8: 1166 - 1180, Jun 1999.

20
S. Dong and D. Searls.
Gene structure prediction by linguistic methods.
Genomics, pages 540-551, 1994.

21
J. Drenth.
Principles of Protein X-ray Crystallography.
Springer, second edition, 1999.

22
R. M. Dudley.
Real Analysis and Probability.
Cambridge Univ. Press, second edition, 2002.

23
E. Eskin and M. S. Gelfand.
Genome-wide analysis of bacterial promoter regions.

24
E. Eskin and P. A. Pevzner.
Finding composite regulatory patterns in dna sequence.
Bioinformatics, 1:1-9, 2002.

25
J. W. Fickett and A. Hatzigeorgiou.
Eukaryotic promoter recognition.
Genome Res., 7:861-878, 1997.

26
J. W. Fickett and C.-S. Tung.
Assesment of protein coding measures.
Nucleic Acids Research, 20:6441-6450, 1992.

27
C. Fields and C. Soderlund.
Gm: a practical tool for automating dna sequence analysis.
Computer Applications in Biosciences, pages 263-270, 1990.

28
P. J. Flory.
Principles of Polymer Chemistry.
Cornell University Press, Ithaca, N.Y., 1953.

29
P. J. Flory.
Statistical Mechanics of Chain Molecules.
Wiley,New York, 1969.

30
P. J. Flory.
Spatial configuration of macromolecular chains.
Nobel Lecture, 1974.

31
Y. G.
Methods for global orgranization of all know protein sequence.
PhD thesis, Hebrew University, 1999.

32
M. Gelfand.
Computer prediction of exon-intron structure of mammalian pre-mrnas.
Nucleic Acids Research, pages 5865-5869, 1990.

33
M. Gelfand, A. A. Mironov, and P. Pevzner.
Gene recognition via spliced sequence alignment.
Proc. Natl. Acad. Sci. USA, pages 9061-9066, 1996.

34
E. Giladi, M. G. Walker, J. Z. Wang, and W. Volkmuth.
Sst: An algorithm for searching sequence databases in time proportional to the logarithm of the database size.

35
R. Guigo, S. Knudsen, N. Drake, and T. Smith.
Prediction of gene structure.
Journal of Molecular Biology, pages 141-157, 1992.

36
J. Henderson, S. Salzberg, and K. Fasman.
Finding genes in human dna with a hidden markov model.
Journal of Computational Biology, pages 119-126, 1997.

37
E. D. Hoffmann and V. Stroobant.
Mass Spectrometry : Principles and Applications.
John Wiley, second edition, 2001.

38
E. Hunt, M. P. Atkinson, and R. W. Irving.
A database index to large biological sequence.
In Proceedings of the 27th VLDB Conference, 2001.

39
G. Hutchinson.
The prediction of vertebrate promoter regions using differential hexamer frequency analysis.
Comput. Applic. Biosci., 12:391-398, 1996.

40
G. Hutchinson and M. Hayden.
The prediction of exons through an analysis of spliceable open reading frames.
Nucleic Acids Research, pages 3453-3462, 1992.

41
e. a. Jean-Micel Claverie.
k-tuple frequency analysis of sequence.
Methods In Enzymology, 183:237-252, 1990.

42
D. Jurafsky and J. H. Martin.
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition.
Prentice-Hall, 2000.

43
C. Kit and Y. Wilks.
Unsupervised learning of word boundary with description length gain.
In Proceedings CoNLL99 ACL Workshop, Bergen, 1999.

44
S. Knudsen.
Promoter 2.0: for the recognition of pol II promoter sequences.
Bioinformatics, 15:356-361, 1999.

45
Y. Kondrakhin, Y. Shamir, and N. Kolchanov.
Construction of a generalized consensus matrix for recognition of vertebrate pre-mRNA 3' terminal processing sites.
Comput. Applic. Biosci., 10:597-603., 1994.

46
T. Kortemme, A. V. Morozov, and D. Baker.
An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes.
Journal of Molecular Biology, 326:1239-1259, 2003.

47
J. B. Kruskal and M. Wish.
Multidimensional Scaling.
Sage Piblications, Beverly Hills, CA,, 1978.

48
D. Kulp, D. Haussler, M. G. Reese, and F. H. Eeckman.
A generalized hidden markov model for the recognition of human genes in dna.
Proc Int Conf Intell Syst Mol Biol, 4, 1996.

49
D. Kulp, D. Haussler, M. G. Reese, and F. H. Eeckman.
Integrating database homology in a probabilistic gene structure model.
Pac Symp Biocomput, 1997.

50
D. C. Kulp.
Protein-coding Gene Structure Prediction Using Generalized Hidden Markov Models.
PhD thesis, University of California, Santa Cruz, 2003.

51
L.E.Baum.
An inequality and associated maximization technique in statistical estimation for probabilistic functions of markov process.
Inequalities, pages 3:1-8, 1972.

52
C. Leslie, E. Eskin, A. Cohen, and W. S. Noble.
Mismatch string kernels for svm protein classification.

53
W. Li.

54
S. Lifson and A. Roig.
On the theory of helix-coil transition in polypeptides.
J. Chem. Phys., pages 34, 1963-1974, 1961.

55
J. Liu.
Unsupervised learning of protein sequences, 2002.
Master Thesis.

56
H. Lu, L. Lu, and J. Skolnick.
Development of unified statistical potentials describing protein-protein interactions.
Biophysical Journal, 84:1895-1901, 2003.

57
A. MAGALHAES, B. MAIGRET, J. HOFLACK, J. A. N. F. GOMES, and H. A. SCHERAGA.
Contribution of unusual arginine-arginine short-range interactions to stabilization and recognition in proteins.
J. Protein Chem., 13(2):195-215, 1994.

58
J. D. McAuliffe, L. Pachter, and M. I. Jordan.
Multiple-sequence functional annotation and the generalized hidden Markov phylogeny.
Bioinformatics, 20(12):1850-1860, 2004.

59
C. Melodelima, L. Gueguen, D. Piau, and C. Gautier.
Modeling the length distribution of exons by sums of geometric laws. analysis of the structure of genes and g+c influence.

60
L. Milanesi, N. Kolchanov, I. Rogozin, A. Kel, and I. Titov.
Sequence functional inference.
pages 249-312, 1993.

61
D. R. Morrison.
Patricia-practical algorithm to retrieve information coded in alphanumeric.
Journal of the Association for Computing Machinery, 15(4):514-534, 1968.

62
C. Mészáros.
The efficient implementation of interior point methods for linear programming and their applications.
PhD thesis, Eötvös Loránd University of Sciences, 1996.

63
N.A.Chomsky.
Aspects of The Theory of Syntax.
MIT Press,Cambridge,MA, 1965.

64
G. Navarro and R. A. Baeza-Yates.
Improving an algorithm for approximate pattern matching.
Algorithmica, 30(4):473-502, 2001.

65
R. V. Pappu, R. Srinivasan, and G. D. Rose.
The flory isolated-pair hypothesis is not valid for polypeptide chains: Implications for protein folding.
Proc. Natl Acad. Sci. USA, pages 97:12565-12570, 2000.

66
H. Qian and J. A. Schellman.
Helix-coil theories: a comparative study for finite length polypeptides.
J. Phys. Chem., pages 96(10); 3987-3994, 1992.

67
M. G. Reese, F. H. Eeckman, D. Kulp, and D. Haussler.
Improved splice site detection in genie.
J Comput Biol, 4(3), 1997.

68
J. Rissanen.
Modeling by shortest data description.
Automatica, pages 14:465-471, 1978.

69
S. Salzberg, A. Delcher, K. Fasman, and J. Henderson.
A decision tree system for finding genes in DNA, 1997.

70
M. Scherf, A. Klingenhoff, and T. Werner.
Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: A novel context analysis approach.
J. Mol. Biol., 297:599-606, 2000.

71
S.G.Lambrakos and J.P.Boris.
Geometric properites of the monotonic lagrangian grid algorithm for near neighbour calculations.
In Journal of Computational Physics, pages 183-202, 1987.

72
J. Singh and J. M. Thornton.
Atlas of protein side-chain interactions.
Oxford ; New York : IRL Press at Oxford University Press, c1992.

73
E. Snyder and G. Stormo.
Identification of coding regions in genomic dna sequences: an application of dynamic programming and neural networks.
Nucleic Acids Research, pages 607-613, 1993.

74
E. Snyder and G. Stormo.
Identification of protein coding regions in genomic dna.
Journal of Molecular Biology, pages 1-18, 1995.

75
V. Solovyev and C. Lawrence.
Identification of human gene functional regions based on oligonucleotide composition.
Proceedings of the first international conference on Intelligent Systems for Molecular Biology (AAAI Press), pages 371-379, 1993.

76
V. Solovyev and A. Salamov.
The gene-finder computer tools for analysis of human and model organisms genome sequences.
In ISMB97, pages 294-302., 1997.

77
V. Solovyev, A. Salamov, and C. Lawrence.
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.
Proceedings of the second international conference on Intelligent Systems for Molecular Biology(AAAI Press), pages 354-36, 1994.

78
V. Solovyev, A. Salamov, and C. Lawrence.
Prediction of human gene structure using linear discriminant functions and dynamic programming.
Proceedings of the first international conference on Intelligent Systems for Molecular Biology (AAAI Press), pages 367-375, 1995.

79
A. Thomas and M. Skolnick.
A probabilistic model for detecting coding regions in dna sequences.
IMA Journal of Mathematics Applied in Medicine and Biology, pages 149-160, 1994.

80
E. Ukkonen.
Approximate string matching with q-grams and maximal matches.
Theoretical Computer Science, pages 191-211, 1992.

81
J. A. Vila, D. R. R. and§ Myriam E. Villegas, Y. N. Vorobjev, and H. A. Scheraga.
Role of hydrophobicity and solvent-mediated charge-charge interactions in stabilizing alpha -helices.
Biophysical Journal, 75:2637-2646, 1998.

82
M. S. Waterman.
Introduction to Computational Biology.
Chapman Hall/CRC Press, 1995.

83
Y. Xu, R. Mural, M. Shah, and E. Uberbacher.
Recognizing exons in genomic sequence using grail ii.
Genet. Eng. (NY), pages 241-53, 1994.

84
Y. Xu, R. Mural, and E. Uberbaker.
Constructing gene models from accurately predicted exons: an application of dynamic programming.
Computer Applications in Biosciences, 1994.

85
C. Yu, B. C. Ooi, K.-L. Tan, and H. V. Jagadish.
Indexing the distance: An efficient method to KNN processing.
In The VLDB Journal, pages 421-430, 2001.

86
M. H. Zaman, M.-Y. Shen, R. S. Berry, K. F. Freed, and T. R. Sosnick.
Investigations into sequence and conformational dependence of backbone entropy, inter-basin dynamics and the flory isolated-pair hypothesis for peptides.
J. Mol. Biol., pages 331:693-771, 2003.

87
M. Zhang.
Identification of protein coding regions in the human genome based on quadratic discriminant analysis.
Proceedings of the National Academy of Sciences, pages 559-564, 1997.

88
B. H. Zimm and J. K. Bragg.
Theory of the phase transition between helix and random coil in polypeptide chains.
J. Chem. Phys., pages 31, 526-535, 1959.


Jing Liu 2005-11-17