![]() ![]() Since histone marks tend to be associated with genes, we opted for the use of existing annotations to classify genes according to peak shape as a practical trade-off between computational complexity and biological sensitivity. Some existing approaches such as HOMER stitch together narrow peaks to avoid the computational cost of finding regions of variable length. Although the experimental procedures are similar, the resulting data needs to be treated accordingly. In contrast to transcription factors, most histone marks are of variable length and can span across entire gene bodies. Therefore, we initially discuss the analysis of ChIP-seq data and later investigate how algorithms developed for ChIP-seq perform on DNase-seq data.ĬhIP-seq experiments are also increasingly used to investigate histone modifications. Hence it is unsurprsing to see pipelines developed for ChIP-seq analysis routinely being applied to data produced with other protocols. Although each experimental technique uses different procedures for fragmentation and enrichment, the computational processing in terms of mapping the sequencing data and analysing the resulting signal in genomic context is similar to processing ChIP-seq data. DNase-seq (DNase I hypersensitive site sequencing, ), ATAC-seq Assay for Transposase-Accessible Chromatin with highthroughput sequencing ) and FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements sequencing ) are used to identify accessible regions in the genome, and MNase-seq (Micrococcal Nuclease sequencing ) is used to identify nucleosome positioning. To name a few prominent examples, ChIP-exo is a derivative of ChIP-seq where exonucleases are used to identify the genomic location of DNA-protein binding-sites with higher resolution. įurthermore, there are already numerous experimental protocols related to ChIP-seq available and new protocols are published all the time. For more in-depth information we recommend the “ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia”. For example, the ENCODE Project ( ENCyclopedia Of DNA Elements) has produced data on hundreds of regulatory factors (see ) in mouse and human. Usually a control experiment is performed where the immuno-precipitation step is left out or an antibody that is not specifically binding to the target genome is used. Following this Chromatin Immuno- Precipitation (ChIP) step, the short stretches of DNA attached to the protein of interest are identified by high-throughput sequencing.įor any targeted protein and a given cell-line or condition, this results in several million reads of raw sequencing data. After shearing the DNA, a protein of interest is extracted along with the cross-linked DNA fragments from the cell-lysate using specific antibodies. In broad terms, these techniques chemically cross-link proteins to those stretches of DNA they are bound to in vivo. ![]() ![]() These protocols provide us with a deeper understanding of gene-regulatory and epigenetic mechanisms by identifying, for example, Transcription-Factor Binding Sites (TFBS), open chromatin regions or the location of epigenetic marks. In order to identify functional elements in a genome, a number of experimental high-throughput techniques have been developed for investigating specific interactions between proteins and DNA. The results show that CLC shape-based peak caller ranks well among popular state-of-the-art peak callers while providing flexibility and ease-of-use. In order to show the applicability of the method to similar *-seq protocols, we also investigate algorithmic performances on DNase-seq data. Using independently validated benchmark datasets, we compare our implementation to other state-of-the-art algorithms explicitly designed to analyse ChIP-seq data and provide an evaluation in terms of receiver-operator characteristic (ROC) plots. Thanks to the generality of the idea and the fact the algorithm is able to learn the peak shape from the data, the implementation requires only minimal user input, while still being applicable to a range of *-seq protocols. We illustrate the advantages of a shape-based approach and describe the algorithmic principles underlying the implementation. In this paper, we present the ChIP-seq analysis tool available in CLC Genomics Workbench and CLC Genomics Server (version 7.5 and up), a user-friendly peak-caller designed to be not specific to a particular *-seq protocol. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. ![]()
0 Comments
Leave a Reply. |