MONDAY, March 3, 2008
Time: 1:30 - 2:30 PM
Constant 1009

Title: Improving Motif Identification in Biological Sequences

Nak-Kyeong Kim
National Center for Biotechnology Information
NLM and NIH

Identifying common local segments, also called motifs, in biological sequences plays an important role to understand gene regulation and functions. Many computational methods for motif identification use a likelihood ratio between motif and background models. One way to improve this approach is to reconsider the background model by using higher order Markov background models. At least two different Markov background models have been proposed to increase the accuracy of predicting motif sites. Both Markov background models have theoretical drawbacks, so I present a third, context dependent Markov background model.

The locations of transcription factor binding sites (TFBSs) show at least some positional preference with respect to transcription start sites (TSS). Notably, many known TFBSs occur within several hundred base pairs upstream of the TSS. Although some programs for identifying TFBSs have exploited position preference, most model the positional preference implicitly, in an ad hoc manner. I propose a systematic probabilistic model for combining sequence and positional information in a Bayesian paradigm.