Transcription Factor Binding Site Prediction

Using TF matrices to predict TF binding sites (TFBS) in regions of interest.

I have seen TRANSFAC and JASPAR mentioned in relation to TF matrices. I have found some text files in the JASPAR database that seem like what I need and I will probably use these. Would anybody know if these are any different from the TRANSFAC matrices? Any other resources for matrices?

b. Predict TFBS in sequence of interest

For each TF matrix, predict where TFBS could be found in the sequences of interest. I have looked at theTFBSmodule for perl and although I dont want to doubt that what it does is right, the way that it searches for TFBS is not clear to me and so I wouldnt want to use it in a serious analysis.

Are there any easy ways to bulk download TF matrices for all known TFs? (vertebrate, fly, nematode – separate for each species)

Is there a fast and usable TFBS prediction program?

Since I am completely at a loss and TF prediction is not exactly my area of expertise, I dont know if what Im asking for is irrelevant, solved 100 times already etc. Feel free to just point me to some relevant reviews or such and/or your favourite programs. It seems that all resources I get are from the early 00s and many are not still functional.

I have a GIST for exactly this. You can clone/download it

It uses the MOODS package (paper here: to process JASPAR style TFBS and any normal seq-interval format … but with ~5 minutes of work you could switch it over to use fasta-files.

It runs blisteringly fast … I can usually annotate all upstream-promoters of a genome within ~10 minutes.

Feel free to fork the repository and make any changes … I always welcome pull-requests.

Great package! I had been looking for something like this for some time.

Nice to hear … let me know if its useful.

This seems interesting, Im not so good in python but maybe I could use it.

Hi i have a problem like you, i want to know if you could solved your problem with GIST. i dont know how can i run it. it hasnt any user guide. thanks a lot in advance

Is accurate enough to use TFBS matrices from humans to predict TFBS for other vertebrates ? Is there any relevant paper you can point me out? Is there also any up-to-date dabase with TFBS matrices? thanks a lot

Please check this paper and the related database CISBP: Weirauch, M. T., et al. (2014). Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158(6): 1431-1443.

a nice source pf PWMs is UniProbe: these are PWMs obtained using protein binding microarrays (check out Bulyks lab pagehere). You can download a large number of PWMs freely, for all sorts of organisms (mouse, yeast, nematode, etc…). The format is a little weird, but you can convert that to standard Transfac format (accepted by most tools) usingRSA-tools convert-matrix. Select tab as the input format, and transfac as the output. I also suggest using RSA-tools matrix-scan.

Try usingINSECTs Server. It will help you with the TFBS search, you can add your own TFBS and perform the search either on FASTA files or in Genes from ENSEMBL, putting their IDs.

I am not a great fan of using matrices (i prefer using IUPAC patterns) for representing TFBS as it is difficult to know at what cut off a match is good or not. When forced to do so I have used matrix-scan atRSA Tools. It does at least allow the use of P-value thresholds.

Matrices can be directly downloaded in bulk fromJASPAR; i downloaded the and extracted the non-redundant matrices for vertebrates. I converted the JAPSAR format to TRANFAC format, as i know matrix-scan handles this well.

