<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD with MathML3 v1.2 20190208//EN" "JATS-archivearticle1-mathml3.dtd"> <article xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2"><front><journal-meta><journal-id journal-id-type="nlm-ta">elife</journal-id><journal-id journal-id-type="publisher-id">eLife</journal-id><journal-title-group><journal-title>eLife</journal-title></journal-title-group><issn pub-type="epub" publication-format="electronic">2050-084X</issn><publisher><publisher-name>eLife Sciences Publications, Ltd</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">67403</article-id><article-id pub-id-type="doi">10.7554/eLife.67403</article-id><article-categories><subj-group subj-group-type="display-channel"><subject>Research Article</subject></subj-group><subj-group subj-group-type="heading"><subject>Computational and Systems Biology</subject></subj-group><subj-group subj-group-type="heading"><subject>Genetics and Genomics</subject></subj-group></article-categories><title-group><article-title>Information content differentiates enhancers from silencers in mouse photoreceptors</article-title></title-group><contrib-group><contrib contrib-type="author" id="author-225952"><name><surname>Friedman</surname><given-names>Ryan Z</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0001-9013-8676</contrib-id><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref><xref ref-type="other" rid="fund1"/><xref ref-type="fn" rid="con1"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-167784"><name><surname>Granas</surname><given-names>David M</given-names></name><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref><xref ref-type="fn" rid="con2"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-142547"><name><surname>Myers</surname><given-names>Connie A</given-names></name><xref ref-type="aff" rid="aff3">3</xref><xref ref-type="fn" rid="con3"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-43124"><name><surname>Corbo</surname><given-names>Joseph C</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0002-9323-7140</contrib-id><xref ref-type="aff" rid="aff3">3</xref><xref ref-type="other" rid="fund4"/><xref ref-type="other" rid="fund5"/><xref ref-type="fn" rid="con4"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-13668"><name><surname>Cohen</surname><given-names>Barak A</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0002-3350-2715</contrib-id><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref><xref ref-type="other" rid="fund3"/><xref ref-type="fn" rid="con5"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" corresp="yes" id="author-227377"><name><surname>White</surname><given-names>Michael A</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0001-8511-6026</contrib-id><email>mawhite@wustl.edu</email><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref><xref ref-type="other" rid="fund2"/><xref ref-type="fn" rid="con6"/><xref ref-type="fn" rid="conf1"/></contrib><aff id="aff1"><label>1</label><institution>Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine</institution><addr-line><named-content content-type="city">St. Louis</named-content></addr-line><country>United States</country></aff><aff id="aff2"><label>2</label><institution>Department of Genetics, Washington University School of Medicine</institution><addr-line><named-content content-type="city">St. Louis</named-content></addr-line><country>United States</country></aff><aff id="aff3"><label>3</label><institution>Department of Pathology and Immunology, Washington University School of Medicine</institution><addr-line><named-content content-type="city">St Louis</named-content></addr-line><country>United States</country></aff></contrib-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Barkai</surname><given-names>Naama</given-names></name><role>Reviewing Editor</role><aff><institution>Weizmann Institute of Science</institution><country>Israel</country></aff></contrib><contrib contrib-type="senior_editor"><name><surname>Barkai</surname><given-names>Naama</given-names></name><role>Senior Editor</role><aff><institution>Weizmann Institute of Science</institution><country>Israel</country></aff></contrib></contrib-group><pub-date date-type="publication" publication-format="electronic"><day>06</day><month>09</month><year>2021</year></pub-date><pub-date pub-type="collection"><year>2021</year></pub-date><volume>10</volume><elocation-id>e67403</elocation-id><history><date date-type="received" iso-8601-date="2021-02-09"><day>09</day><month>02</month><year>2021</year></date><date date-type="accepted" iso-8601-date="2021-09-03"><day>03</day><month>09</month><year>2021</year></date></history><pub-history><event><event-desc>This manuscript was published as a preprint at bioRxiv.</event-desc><date date-type="preprint" iso-8601-date="2021-02-07"><day>07</day><month>02</month><year>2021</year></date><self-uri content-type="preprint" xlink:href="https://doi.org/10.1101/2021.02.05.429997"/></event></pub-history><permissions><copyright-statement>© 2021, Friedman et al</copyright-statement><copyright-year>2021</copyright-year><copyright-holder>Friedman et al</copyright-holder><ali:free_to_read/><license xlink:href="http://creativecommons.org/licenses/by/4.0/"><ali:license_ref>http://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This article is distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p></license></permissions><self-uri content-type="pdf" xlink:href="elife-67403-v2.pdf"/><self-uri content-type="figures-pdf" xlink:href="elife-67403-figures-v2.pdf"/><abstract><p>Enhancers and silencers often depend on the same transcription factors (TFs) and are conflated in genomic assays of TF binding or chromatin state. To identify sequence features that distinguish enhancers and silencers, we assayed massively parallel reporter libraries of genomic sequences targeted by the photoreceptor TF cone-rod homeobox (CRX) in mouse retinas. Both enhancers and silencers contain more TF motifs than inactive sequences, but relative to silencers, enhancers contain motifs from a more diverse collection of TFs. We developed a measure of information content that describes the number and diversity of motifs in a sequence and found that, while both enhancers and silencers depend on CRX motifs, enhancers have higher information content. The ability of information content to distinguish enhancers and silencers targeted by the same TF illustrates how motif context determines the activity of <italic>cis</italic>-regulatory sequences.</p></abstract><abstract abstract-type="executive-summary"><title>eLife digest</title><p>Different cell types are established by activating and repressing the activity of specific sets of genes, a process controlled by proteins called transcription factors. Transcription factors work by recognizing and binding short stretches of DNA in parts of the genome called cis-regulatory sequences. A cis-regulatory sequence that increases the activity of a gene when bound by transcription factors is called an enhancer, while a sequence that causes a decrease in gene activity is called a silencer.</p><p>To establish a cell type, a particular transcription factor will act on both enhancers and silencers that control the activity of different genes. For example, the transcription factor cone-rod homeobox (CRX) is critical for specifying different types of cells in the retina, and it acts on both enhancers and silencers. In rod photoreceptors, CRX activates rod genes by binding their enhancers, while repressing cone photoreceptor genes by binding their silencers. However, CRX always recognizes and binds to the same DNA sequence, known as its binding site, making it unclear why some cis-regulatory sequences bound to CRX act as silencers, while others act as enhancers.</p><p>Friedman et al. sought to understand how enhancers and silencers, both bound by CRX, can have different effects on the genes they control. Since both enhancers and silencers contain CRX binding sites, the difference between the two must lie in the sequence of the DNA surrounding these binding sites.</p><p>Using retinas that have been explanted from mice and kept alive in the laboratory, Friedman et al. tested the activity of thousands of CRX-binding sequences from the mouse genome. This showed that both enhancers and silencers have more copies of CRX-binding sites than sequences of the genome that are inactive. Additionally, the results revealed that enhancers have a diverse collection of binding sites for other transcription factors, while silencers do not. Friedman et al. developed a new metric they called information content, which captures the diverse combinations of different transcription binding sites that cis-regulatory sequences can have. Using this metric, Friedman et al. showed that it is possible to distinguish enhancers from silencers based on their information content.</p><p>It is critical to understand how the DNA sequences of cis-regulatory regions determine their activity, because mutations in these regions of the genome can cause disease. However, since every person has thousands of benign mutations in cis-regulatory sequences, it is a challenge to identify specific disease-causing mutations, which are relatively rare. One long-term goal of models of enhancers and silencers, such as Friedman et al.’s information content model, is to understand how mutations can affect cis-regulatory sequences, and, in some cases, lead to disease.</p></abstract><kwd-group kwd-group-type="author-keywords"><kwd>enhancers</kwd><kwd>silencers</kwd><kwd>information theory</kwd><kwd>massively parallel reporter assays</kwd></kwd-group><kwd-group kwd-group-type="research-organism"><title>Research organism</title><kwd>Mouse</kwd></kwd-group><funding-group><award-group id="fund1"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000002</institution-id><institution>National Institutes of Health</institution></institution-wrap></funding-source><award-id>F31HG011431</award-id><principal-award-recipient><name><surname>Friedman</surname><given-names>Ryan Z</given-names></name></principal-award-recipient></award-group><award-group id="fund2"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000002</institution-id><institution>National Institutes of Health</institution></institution-wrap></funding-source><award-id>R01GM121755</award-id><principal-award-recipient><name><surname>White</surname><given-names>Michael A</given-names></name></principal-award-recipient></award-group><award-group id="fund3"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000002</institution-id><institution>National Institutes of Health</institution></institution-wrap></funding-source><award-id>R01EY027784</award-id><principal-award-recipient><name><surname>Cohen</surname><given-names>Barak A</given-names></name></principal-award-recipient></award-group><award-group id="fund4"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000002</institution-id><institution>National Institutes of Health</institution></institution-wrap></funding-source><award-id>R01EY025196</award-id><principal-award-recipient><name><surname>Corbo</surname><given-names>Joseph C</given-names></name></principal-award-recipient></award-group><award-group id="fund5"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000002</institution-id><institution>National Institutes of Health</institution></institution-wrap></funding-source><award-id>R01EY030075</award-id><principal-award-recipient><name><surname>Corbo</surname><given-names>Joseph C</given-names></name></principal-award-recipient></award-group><funding-statement>The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</funding-statement></funding-group><custom-meta-group><custom-meta specific-use="meta-only"><meta-name>Author impact statement</meta-name><meta-value>Silencers and enhancers targeted by a common transcription factor in photoreceptors are distinguished by the number and diversity of binding transcription factor binding sites they contain.</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Active <italic>cis</italic>-regulatory sequences in the genome are characterized by accessible chromatin and specific histone modifications, which reflect the action of DNA-binding transcription factors (TFs) that recognize specific sequence motifs and recruit chromatin-modifying enzymes (<xref ref-type="bibr" rid="bib44">Klemm et al., 2019</xref>). These epigenetic hallmarks of active chromatin are routinely used to train machine learning models that predict <italic>cis</italic>-regulatory sequences, based on the assumption that such epigenetic marks are reliable predictors of genuine <italic>cis</italic>-regulatory sequences (<xref ref-type="bibr" rid="bib13">Ernst and Kellis, 2012</xref>; <xref ref-type="bibr" rid="bib19">Ghandi et al., 2014</xref>; <xref ref-type="bibr" rid="bib27">Hoffman et al., 2012</xref>; <xref ref-type="bibr" rid="bib41">Kelley et al., 2016</xref>; <xref ref-type="bibr" rid="bib50">Lee et al., 2011</xref>; <xref ref-type="bibr" rid="bib77">Sethi et al., 2020</xref>; <xref ref-type="bibr" rid="bib90">Zhou and Troyanskaya, 2015</xref>). However, results from functional assays show that many predicted <italic>cis</italic>-regulatory sequences exhibit little or no <italic>cis</italic>-regulatory activity. Typically, 50% or more of predicted <italic>cis</italic>-regulatory sequences fail to drive expression in massively parallel reporter assays (MPRAs) (<xref ref-type="bibr" rid="bib58">Moore et al., 2020</xref>; <xref ref-type="bibr" rid="bib48">Kwasnieski et al., 2014</xref>), indicating that an active chromatin state is not sufficient to reliably identify <italic>cis</italic>-regulatory sequences.</p><p>Another challenge is that enhancers and silencers are difficult to distinguish by chromatin accessibility or epigenetic state (<xref ref-type="bibr" rid="bib11">Doni Jayavelu et al., 2020</xref>; <xref ref-type="bibr" rid="bib20">Gisselbrecht et al., 2020</xref>; <xref ref-type="bibr" rid="bib62">Pang and Snyder, 2020</xref>; <xref ref-type="bibr" rid="bib66">Petrykowska et al., 2008</xref>; <xref ref-type="bibr" rid="bib76">Segert et al., 2021</xref>), and thus computational predictions of <italic>cis-</italic>regulatory sequences often do not differentiate between enhancers and silencers. Silencers are often enhancers in other cell types (<xref ref-type="bibr" rid="bib5">Brand et al., 1987</xref>; <xref ref-type="bibr" rid="bib11">Doni Jayavelu et al., 2020</xref>; <xref ref-type="bibr" rid="bib20">Gisselbrecht et al., 2020</xref>; <xref ref-type="bibr" rid="bib30">Huang et al., 2021</xref>; <xref ref-type="bibr" rid="bib37">Jiang et al., 1993</xref>; <xref ref-type="bibr" rid="bib61">Ngan et al., 2020</xref>; <xref ref-type="bibr" rid="bib62">Pang and Snyder, 2020</xref>), reside in open chromatin (<xref ref-type="bibr" rid="bib11">Doni Jayavelu et al., 2020</xref>; <xref ref-type="bibr" rid="bib29">Huang et al., 2019</xref>; <xref ref-type="bibr" rid="bib30">Huang et al., 2021</xref>; <xref ref-type="bibr" rid="bib62">Pang and Snyder, 2020</xref>), sometimes bear epigenetic marks of active enhancers (<xref ref-type="bibr" rid="bib14">Fan et al., 2016</xref>; <xref ref-type="bibr" rid="bib30">Huang et al., 2021</xref>), and can be bound by TFs that also act on enhancers in the same cell type (<xref ref-type="bibr" rid="bib1">Alexandre and Vincent, 2003</xref>; <xref ref-type="bibr" rid="bib21">Grass et al., 2003</xref>; <xref ref-type="bibr" rid="bib30">Huang et al., 2021</xref>; <xref ref-type="bibr" rid="bib35">Iype et al., 2004</xref>; <xref ref-type="bibr" rid="bib37">Jiang et al., 1993</xref>; <xref ref-type="bibr" rid="bib52">Liu et al., 2014</xref>; <xref ref-type="bibr" rid="bib53">Martínez-Montañés et al., 2013</xref>; <xref ref-type="bibr" rid="bib65">Peng et al., 2005</xref>; <xref ref-type="bibr" rid="bib69">Rachmin et al., 2015</xref>; <xref ref-type="bibr" rid="bib70">Rister et al., 2015</xref>; <xref ref-type="bibr" rid="bib80">Stampfel et al., 2015</xref>; <xref ref-type="bibr" rid="bib85">White et al., 2013</xref>). As a result, enhancers and silencers share similar sequence features, and understanding how they are distinguished in a particular cell type remains an important challenge (<xref ref-type="bibr" rid="bib76">Segert et al., 2021</xref>).</p><p>The TF cone-rod homeobox (CRX) controls selective gene expression in a number of different photoreceptor and bipolar cell types in the retina (<xref ref-type="bibr" rid="bib6">Chen et al., 1997</xref>; <xref ref-type="bibr" rid="bib17">Freund et al., 1997</xref>; <xref ref-type="bibr" rid="bib18">Furukawa et al., 1997</xref>; <xref ref-type="bibr" rid="bib60">Murphy et al., 2019</xref>). These cell types derive from the same progenitor cell population (<xref ref-type="bibr" rid="bib45">Koike et al., 2007</xref>; <xref ref-type="bibr" rid="bib83">Wang et al., 2014</xref>), but they exhibit divergent, CRX-directed transcriptional programs (<xref ref-type="bibr" rid="bib9">Corbo et al., 2010</xref>; <xref ref-type="bibr" rid="bib25">Hennig et al., 2008</xref>; <xref ref-type="bibr" rid="bib31">Hughes et al., 2017</xref>; <xref ref-type="bibr" rid="bib60">Murphy et al., 2019</xref>). CRX cooperates with cell type-specific co-factors to selectively activate and repress different genes in different cell types and is required for differentiation of rod and cone photoreceptors (<xref ref-type="bibr" rid="bib7">Chen et al., 2005</xref>; <xref ref-type="bibr" rid="bib23">Hao et al., 2012</xref>; <xref ref-type="bibr" rid="bib25">Hennig et al., 2008</xref>; <xref ref-type="bibr" rid="bib28">Hsiau et al., 2007</xref>; <xref ref-type="bibr" rid="bib34">Irie et al., 2015</xref>; <xref ref-type="bibr" rid="bib43">Kimura et al., 2000</xref>; <xref ref-type="bibr" rid="bib51">Lerner et al., 2005</xref>; <xref ref-type="bibr" rid="bib55">Mears et al., 2001</xref>; <xref ref-type="bibr" rid="bib56">Mitton et al., 2000</xref>; <xref ref-type="bibr" rid="bib60">Murphy et al., 2019</xref>; <xref ref-type="bibr" rid="bib65">Peng et al., 2005</xref>; <xref ref-type="bibr" rid="bib75">Sanuki et al., 2010</xref>; <xref ref-type="bibr" rid="bib79">Srinivas et al., 2006</xref>). However, the sequence features that define CRX-targeted enhancers vs. silencers in the retina are largely unknown.</p><p>We previously found that a significant minority of CRX-bound sequences act as silencers in an MPRA conducted in live mouse retinas (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>), and that silencer activity requires CRX (<xref ref-type="bibr" rid="bib86">White et al., 2016</xref>). Here, we extend our analysis by testing thousands of additional candidate <italic>cis</italic>-regulatory sequences. We show that while regions of accessible chromatin and CRX binding exhibit a range of <italic>cis</italic>-regulatory activity, enhancers and silencers contain more TF motifs than inactive sequences, and that enhancers are distinguished from silencers by a higher diversity of TF motifs. We capture the differences between these sequence classes with a new metric, motif information content (Boltzmann entropy), that considers only the number and diversity of TF motifs in a candidate <italic>cis</italic>-regulatory sequence. Our results suggest that CRX-targeted enhancers are defined by a flexible regulatory grammar and demonstrate how differences in motif information content encode functional differences between genomic loci with similar chromatin states.</p></sec><sec id="s2" sec-type="results"><title>Results</title><p>We tested the activities of 4844 putative CRX-targeted <italic>cis</italic>-regulatory sequences (CRX-targeted sequences) by MPRA in live retinas. The MPRA libraries consist of 164 bp genomic sequences centered on the best match to the CRX position weight matrix (PWM) (<xref ref-type="bibr" rid="bib49">Lee et al., 2010</xref>) whenever a CRX motif is present, and matched sequences in which all CRX motifs were abolished by point mutation (Materials and methods). The MPRA libraries include 3299 CRX-bound sequences identified by ChIP-seq in the adult retina (<xref ref-type="bibr" rid="bib9">Corbo et al., 2010</xref>) and 1545 sequences that do not have measurable CRX binding in the adult retina but reside in accessible chromatin in adult photoreceptors (<xref ref-type="bibr" rid="bib31">Hughes et al., 2017</xref>) and have the H3K27ac enhancer mark in postnatal day 14 (P14) retina (<xref ref-type="bibr" rid="bib72">Ruzycki et al., 2018</xref>) (‘ATAC-seq peaks’). We split the sequences across two plasmid libraries, each of which contained the same 150 scrambled sequences as internal controls (<xref ref-type="supplementary-material" rid="supp1">Supplementary files 1 and 2</xref>). We cloned sequences upstream of the rod photoreceptor-specific <italic>Rhodopsin</italic> (<italic>Rho</italic>) promoter and a <italic>DsRed</italic> reporter gene, electroporated libraries into explanted mouse retinas at P0 in triplicate, harvested the retinas at P8, and then sequenced the RNA and input DNA plasmid pool. The data is highly reproducible across replicates (R<sup>2</sup> > 0.96, <xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1</xref>). After activity scores were calculated and normalized to the basal <italic>Rho</italic> promoter, the two libraries were well calibrated and merged together (two-sample Kolmogorov-Smirnov test p = 0.09, <xref ref-type="fig" rid="fig1s2">Figure 1—figure supplement 2</xref>, <xref ref-type="supplementary-material" rid="supp3">Supplementary file 3</xref>, and Materials and methods).</p><sec id="s2-1"><title>Strong enhancers and silencers have high CRX motif content</title><p>The <italic>cis</italic>-regulatory activities of CRX-targeted sequences vary widely (<xref ref-type="fig" rid="fig1">Figure 1a</xref>). We defined enhancers and silencers as those sequences that have statistically significant activity that is at least twofold above or below the activity of the basal <italic>Rho</italic> promoter (Welch’s t-test, Benjamini-Hochberg false discovery rate (FDR) q < 0.05, <xref ref-type="supplementary-material" rid="supp3">Supplementary file 3</xref>). We defined inactive sequences as those whose activity is both within a twofold change of basal activity and not significantly different from the basal <italic>Rho</italic> promoter. We further stratified enhancers into strong and weak enhancers based on whether or not they fell above the 95th percentile of scrambled sequences. Using these criteria, 22% of CRX-targeted sequences are strong enhancers, 28% are weak enhancers, 19% are inactive, and 17% are silencers; the remaining 13% were considered ambiguous and removed from further analysis. To test whether these sequences function as CRX-dependent enhancers and silencers in the genome, we examined genes differentially expressed in <italic>Crx<sup>-/-</sup></italic> retina (<xref ref-type="bibr" rid="bib71">Roger et al., 2014</xref>). Genes that are de-repressed are more likely to be near silencers (Fisher’s exact test p = 0.001, odds ratio = 2.1, n = 206) and genes that are down-regulated are more likely to be near enhancers (Fisher’s exact test p = 0.02, odds ratio = 1.5, n = 344, Materials and methods), suggesting that our reporter assay identified sequences that act as enhancers and silencers in the genome. We sought to identify features that would accurately classify these different classes of sequences.</p><fig-group><fig id="fig1" position="float"><label>Figure 1.</label><caption><title>Activity of putative <italic>cis</italic>-regulatory sequences with cone-rod homeobox (CRX) motifs.</title><p>(<bold>a</bold>) Volcano plot of activity scores relative to the <italic>Rho</italic> promoter alone. Sequences are grouped as strong enhancers (dark blue), weak enhancers (light blue), inactive (green), silencers (red), or ambiguous (gray). Horizontal line, false discovery rate (FDR) q = 0.05. Vertical lines, twofold above and below <italic>Rho</italic>. (<bold>b</bold>) Fraction of ChIP-seq and ATAC-seq peaks that belong to each activity group. (<bold>c</bold>) Predicted CRX occupancy of each activity group. Horizontal lines, medians; enh., enhancer. Numbers at top of (<bold>b and c</bold>) indicate n for groups.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig1.jpg"/></fig><fig id="fig1s1" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 1.</label><caption><title>Reproducibility of massively parallel reporter assay (MPRA) measurements.</title><p>Each row represents a different library and experiment. For each column, the first replicate in the title is the x-axis and the second replicate is the y-axis.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig1-figsupp1.jpg"/></fig><fig id="fig1s2" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 2.</label><caption><title>Calibration of massively parallel reporter assay (MPRA) libraries with the <italic>Rho</italic> promoter.</title><p>Probability density histogram of the same 150 scrambled sequences in two libraries after normalizing to the basal <italic>Rho</italic> promoter.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig1-figsupp2.jpg"/></fig></fig-group><p>Neither CRX ChIP-seq-binding status nor DNA accessibility as measured by ATAC-seq strongly differentiates between these four classes (<xref ref-type="fig" rid="fig1">Figure 1b</xref>). Compared to CRX ChIP-seq peaks, ATAC-seq peaks that lack CRX binding in the adult retina are slightly enriched for inactive sequences (Fisher’s exact test p = 2 × 10<sup>–7</sup>, odds ratio = 1.5) and slightly depleted for strong enhancers (Fisher’s exact test p = 1 × 10<sup>–21</sup>, odds ratio = 2.2). However, sequences with ChIP-seq or ATAC-seq peaks span all four activity categories, consistent with prior reports that DNA accessibility and TF binding data are not sufficient to identify functional enhancers and silencers (<xref ref-type="bibr" rid="bib11">Doni Jayavelu et al., 2020</xref>; <xref ref-type="bibr" rid="bib29">Huang et al., 2019</xref>; <xref ref-type="bibr" rid="bib30">Huang et al., 2021</xref>; <xref ref-type="bibr" rid="bib62">Pang and Snyder, 2020</xref>; <xref ref-type="bibr" rid="bib85">White et al., 2013</xref>).</p><p>We examined whether the number and affinity of CRX motifs differentiate enhancers, silencers, and inactive sequences by computing the predicted CRX occupancy (i.e. expected number of bound molecules) for each sequence (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>). Consistent with our previous work (<xref ref-type="bibr" rid="bib86">White et al., 2016</xref>), both strong enhancers and silencers have higher predicted CRX occupancy than inactive sequences (Mann-Whitney U test, p = 6 × 10<sup>–10</sup> and 6 × 10<sup>–17</sup>, respectively, <xref ref-type="fig" rid="fig1">Figure 1c</xref>), suggesting that total CRX motif content helps distinguish silencers and strong enhancers from inactive sequences. However, predicted CRX occupancy does not distinguish strong enhancers from silencers: a logistic regression classifier trained with fivefold cross-validation only achieves an area under the receiver operating characteristic (AUROC) curve of 0.548 ± 0.023 and an area under the precision recall (AUPR) curve of 0.571 ± 0.020 (<xref ref-type="fig" rid="fig2">Figure 2a</xref> and <xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref>). We thus sought to identify sequence features that distinguish strong enhancers from silencers.</p><fig-group><fig id="fig2" position="float"><label>Figure 2.</label><caption><title>Strong enhancers contain a diverse array of motifs.</title><p>(<bold>a</bold>) Receiver operating characteristic for classifying strong enhancers from silencers. Solid black, 6-mer support vector machine (SVM); orange, eight transcription factors (TFs) predicted occupancy logistic regression; aqua, predicted cone-rod homeobox (CRX) occupancy logistic regression; dashed black, chance; shaded area, 1 standard deviation based on fivefold cross-validation. (<bold>b and c</bold>) Total predicted TF occupancy (<bold>b</bold>) and frequency of TF motifs (<bold>c</bold>) in each activity class. (<bold>d</bold>) Frequency of co-occurring TF motifs in strong enhancers. Lower triangle is expected co-occurrence if motifs are independent. (<bold>e</bold>) Frequency of activity classes, colored as in (<bold>b</bold>), for sequences in CRX, NRL, and/or MEF2D ChIP-seq peaks. (<bold>f</bold>) Frequency of TF ChIP-seq peaks in activity classes. TFs in (<bold>c</bold>) are sorted by feature importance of the logistic regression model in (<bold>a</bold>).</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig2.jpg"/></fig><fig id="fig2s1" position="float" specific-use="child-fig"><label>Figure 2—figure supplement 1.</label><caption><title>Precision recall curve for strong enhancer vs. silencer classifiers.</title><p>Solid black, 6-mer support vector machine (SVM); orange, eight transcription factors (TFs) predicted occupancy logistic regression; aqua, predicted cone-rod homeobox (CRX) occupancy logistic regression; dashed black, chance; shaded area, 1 standard deviation based on fivefold cross-validation.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig2-figsupp1.jpg"/></fig><fig id="fig2s2" position="float" specific-use="child-fig"><label>Figure 2—figure supplement 2.</label><caption><title>Results from de novo motif analysis.</title><p>Motifs enriched in strong enhancers (<bold>a</bold>) and silencers (<bold>b</bold>). Bottom, de novo motif identified with DREME; top, matched known motif identified with TOMTOM.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig2-figsupp2.jpg"/></fig><fig id="fig2s3" position="float" specific-use="child-fig"><label>Figure 2—figure supplement 3.</label><caption><title>Additional validation of the eight transcription factors (TFs) predicted occupancy logistic regression model.</title><p>(<bold>a and b</bold>) Predictions of the 6-mer support vector machine (SVM) (black) and eight TFs predicted occupancy logistic regression model (orange) on an independent test set. (<bold>c and d</bold>) Null distribution of 100 logistic regression models trained using randomly selected motifs (gray) compared to the true features (orange). Shaded area, 1 standard deviation based on fivefold cross-validation. (<bold>a and c</bold>) Receiver operating characteristic, (<bold>b and d</bold>) precision recall curve. Dashed black line represents chance in all panels.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig2-figsupp3.jpg"/></fig></fig-group></sec><sec id="s2-2"><title>Lineage-defining TF motifs differentiate strong enhancers from silencers</title><p>We performed a de novo motif enrichment analysis to identify motifs that distinguish strong enhancers from silencers and found several differentially enriched motifs matching known TFs. For motifs that matched multiple TFs, we selected one representative TF for downstream analysis, since TFs from the same family have PWMs that are too similar to meaningfully distinguish between motifs for these TFs (<xref ref-type="fig" rid="fig2s2">Figure 2—figure supplement 2</xref>, Materials and methods). Strong enhancers are enriched for several motif families that include TFs that interact with CRX or are important for photoreceptor development: NeuroD1/NDF1 (E-box-binding bHLH) (<xref ref-type="bibr" rid="bib59">Morrow et al., 1999</xref>), RORB (nuclear receptor) (<xref ref-type="bibr" rid="bib36">Jia et al., 2009</xref>; <xref ref-type="bibr" rid="bib79">Srinivas et al., 2006</xref>), MAZ or Sp4 (C2H2 zinc finger) (<xref ref-type="bibr" rid="bib51">Lerner et al., 2005</xref>), and NRL (bZIP) (<xref ref-type="bibr" rid="bib55">Mears et al., 2001</xref>; <xref ref-type="bibr" rid="bib56">Mitton et al., 2000</xref>). Sp4 physically interacts with CRX in the retina (<xref ref-type="bibr" rid="bib51">Lerner et al., 2005</xref>), but we chose to represent the zinc finger motif with MAZ because it has a higher quality score in the HOCOMOCO database (<xref ref-type="bibr" rid="bib46">Kulakovskiy et al., 2018</xref>). Silencers were enriched for a motif that resembles a partial K50 homeodomain motif but instead matches the zinc finger TF GFI1, a member of the Snail repressor family (<xref ref-type="bibr" rid="bib8">Chiang and Ayyanathan, 2013</xref>) expressed in developing retinal ganglion cells (<xref ref-type="bibr" rid="bib88">Yang et al., 2003</xref>). Therefore, while strong enhancers and silencers are not distinguished by their CRX motif content, strong enhancers are uniquely enriched for several lineage-defining TFs.</p><p>To quantify how well these TF motifs differentiate strong enhancers from silencers, we trained two different classification models with fivefold cross-validation. First, we trained a 6-mer support vector machine (SVM) (<xref ref-type="bibr" rid="bib19">Ghandi et al., 2014</xref>) and achieved an AUROC of 0.781 ± 0.013 and AUPR of 0.812 ± 0.020 (<xref ref-type="fig" rid="fig2">Figure 2a</xref> and <xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref>). The SVM considers all 2080 non-redundant 6-mers and provides an upper bound to the predictive power of models that do not consider the exact arrangement or spacing of sequence features. We next trained a logistic regression model on the predicted occupancy for eight lineage-defining TFs (<xref ref-type="supplementary-material" rid="supp4">Supplementary file 4</xref>) and compared it to the upper bound established by the SVM. In this model, we considered CRX, the five TFs identified in our motif enrichment analysis, and two additional TFs enriched in photoreceptor ATAC-seq peaks (<xref ref-type="bibr" rid="bib31">Hughes et al., 2017</xref>): RAX, a Q50 homeodomain TF that contrasts with CRX, a K50 homeodomain TF (<xref ref-type="bibr" rid="bib34">Irie et al., 2015</xref>) and MEF2D, a MADS box TF which co-binds with CRX (<xref ref-type="bibr" rid="bib2">Andzelm et al., 2015</xref>). The logistic regression model performs nearly as well as the SVM (AUROC 0.698 ± 0.036, AUPR 0.745 ± 0.032, <xref ref-type="fig" rid="fig2">Figure 2a</xref> and <xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref>) despite a 260-fold reduction from 2080 to 8 features. To determine whether the logistic regression model depends specifically on the eight lineage-defining TFs, we established a null distribution by fitting 100 logistic regression models with randomly selected TFs (Materials and methods). Our logistic regression model outperforms the null distribution (one-tailed Z-test for AUROC and AUPR, p < 0.0008, <xref ref-type="fig" rid="fig2s3">Figure 2—figure supplement 3</xref>), indicating that the performance of the model specifically requires the eight lineage-defining TFs. To determine whether the SVM identified any additional motifs that could be added to the logistic regression model, we generated de novo motifs using the SVM <italic>k</italic>-mer scores and found no additional motifs predictive of strong enhancers. Finally, we found that our two models perform similarly on an independent test set of CRX-targeted sequences (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>; <xref ref-type="fig" rid="fig2s3">Figure 2—figure supplement 3</xref>). Since the logistic regression model performs near the upper bound established by the SVM and depends specifically on the eight selected motifs, we conclude that these motifs comprise nearly all of the sequence features captured by the SVM that distinguish strong enhancers from silencers among CRX-targeted sequences.</p></sec><sec id="s2-3"><title>Strong enhancers are characterized by diverse total motif content</title><p>To understand how these eight TF motifs differentiate strong enhancers from silencers, we first calculated the total predicted occupancy of each sequence by all eight lineage-defining TFs and compared the different activity classes. Strong enhancers and silencers both have higher total predicted occupancies than inactive sequences, but total predicted occupancies do not distinguish strong enhancers and silencers from each other (<xref ref-type="fig" rid="fig2">Figure 2b</xref>, <xref ref-type="supplementary-material" rid="supp5">Supplementary file 5</xref>). Since strong enhancers are enriched for several motifs relative to silencers, this suggests that strong enhancers are distinguished from silencers by the diversity of their motifs, rather than the total number.</p><p>We considered two hypotheses for how the more diverse collection of motifs function in strong enhancers: either strong enhancers depend on specific combinations of TF motifs (‘TF identity hypothesis’) or they instead must be co-occupied by multiple lineage-defining TFs, regardless of TF identity (‘TF diversity hypothesis’). To distinguish between these hypotheses, we examined which specific motifs contribute to the total motif content of strong enhancers and silencers. We considered motifs for a TF present in a sequence if the TF predicted occupancy was above 0.5 molecules (<xref ref-type="supplementary-material" rid="supp4">Supplementary file 4</xref>), which generally corresponds to at least one motif with a relative <italic>K</italic><sub><italic>D</italic></sub> above 3%. This threshold captures the effect of low affinity motifs that are often biologically relevant (<xref ref-type="bibr" rid="bib10">Crocker et al., 2015</xref>; <xref ref-type="bibr" rid="bib15">Farley et al., 2015</xref>; <xref ref-type="bibr" rid="bib16">Farley et al., 2016</xref>; <xref ref-type="bibr" rid="bib63">Parker et al., 2011</xref>). As expected, 97% of strong enhancers and silencers contain CRX motifs since the sequences were selected based on CRX binding or significant matches to the CRX PWM within open chromatin (<xref ref-type="fig" rid="fig2">Figure 2c</xref>). Compared to silencers, strong enhancers contain a broader diversity of motifs for the eight lineage-defining TFs (<xref ref-type="fig" rid="fig2">Figure 2c</xref>). However, while strong enhancers contain a broader range of motifs, no single motif occurs in a majority of strong enhancers: NRL motifs are present in 23% of strong enhancers, NeuroD1 and RORB in 18% each, and MAZ in 16%. Additionally, none of the motifs tend to co-occur as pairs in strong enhancers: no specific pair occurred in more than 5% of sequences (<xref ref-type="fig" rid="fig2">Figure 2d</xref>). We also did not observe a bias in the linear arrangement of motifs in strong enhancers (Materials and methods). Similarly, no single motif occurs in more than 15% of silencers (<xref ref-type="fig" rid="fig2">Figure 2c</xref>). These results suggest that strong enhancers are defined by the diversity of their motifs, and not by specific motif combinations or their linear arrangement.</p><p>The results above predict that strong enhancers are more likely to be bound by a diverse but degenerate collection of TFs, compared with silencers or inactive sequences. We tested this prediction by examining in vivo TF binding using published ChIP-seq data for NRL (<xref ref-type="bibr" rid="bib23">Hao et al., 2012</xref>) and MEF2D (<xref ref-type="bibr" rid="bib2">Andzelm et al., 2015</xref>). Consistent with the prediction, sequences bound by CRX and either NRL or MEF2D are approximately twice as likely to be strong enhancers compared to sequences only bound by CRX (<xref ref-type="fig" rid="fig2">Figure 2e</xref>). Sequences bound by all three TFs are the most likely to be strong or weak enhancers rather than silencers or inactive sequences. However, most strong enhancers are not bound by either NRL or MEF2D (<xref ref-type="fig" rid="fig2">Figure 2f</xref>), indicating that binding of these TFs is not required for strong enhancers. Our results support the TF diversity hypothesis: CRX-targeted enhancers are co-occupied by multiple TFs, without a requirement for specific combinations of lineage-defining TFs.</p></sec><sec id="s2-4"><title>Strong enhancers have higher motif information content than silencers</title><p>Our results indicate that both strong enhancers and silencers have a higher total motif content than inactive sequences, while strong enhancers contain a more diverse collection of motifs than silencers. To quantify these differences in the number and diversity of motifs, we computed the information content of CRX-targeted sequences using Boltzmann entropy. The Boltzmann entropy of a system is related to the number of ways the system’s molecules can be arranged, which increases with either the number or diversity of molecules (<xref ref-type="bibr" rid="bib67">Phillips et al., 2012</xref>, Chapter 5). In our case, each TF is a different type of molecule and the number of each TF is represented by its predicted occupancy for a <italic>cis</italic>-regulatory sequence. The number of molecular arrangements is thus <italic>W</italic>, the number of distinguishable permutations that the TFs can be ordered on the sequence, and the information content of a sequence is then log<sub>2</sub><italic>W</italic> (Materials and methods).</p><p>We found that on average, strong enhancers have higher information content than both silencers and inactive sequences (Mann-Whitney U test, p = 1 × 10<sup>–23</sup> and 7 × 10<sup>–34</sup>, respectively, <xref ref-type="fig" rid="fig3">Figure 3a</xref>, <xref ref-type="supplementary-material" rid="supp5">Supplementary file 5</xref>), confirming that information content captures the effect of both the number and diversity of motifs. Quantitatively, the average silencer and inactive sequence contains 1.6 and 1.4 bits, respectively, which represents approximately three total motifs for two TFs. Strong enhancers contain on average 2.4 bits, representing approximately three total motifs for three TFs or four total motifs for two TFs. To compare the predictive value of our information content metric to the model based on all eight motifs, we trained a logistic regression model and found that information content classifies strong enhancers from silencers with an AUROC of 0.634 ± 0.008 and an AUPR of 0.663 ± 0.014 (<xref ref-type="fig" rid="fig3">Figure 3b</xref> and <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1</xref>). This is only slightly worse than the model trained on eight TF occupancies despite an eightfold reduction in the number of features, which is itself comparable to the SVM with 2080 features. The difference between the two logistic regression models suggests that the specific identities of TF motifs make some contribution to the eight TF model, but that most of the signal captured by the SVM can be described with a single metric that does not assign weights to specific motifs. Information content also distinguishes strong enhancers from inactive sequences (AUROC 0.658 ± 0.012, AUPR 0.675 ± 0.019, <xref ref-type="fig" rid="fig3">Figure 3b</xref> and <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1</xref>). These results indicate that strong enhancers are characterized by higher information content, which reflects both the total number and diversity of motifs.</p><fig-group><fig id="fig3" position="float"><label>Figure 3.</label><caption><title>Information content classifies strong enhancers.</title><p>(<bold>a</bold>) Information content for different activity classes. (<bold>b</bold>) Receiver operating characteristic of information content to classify strong enhancers from silencers (orange) or inactive sequences (indigo).</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig3.jpg"/></fig><fig id="fig3s1" position="float" specific-use="child-fig"><label>Figure 3—figure supplement 1.</label><caption><title>Precision recall curve of logistic regression classifier using information content.</title><p>Orange, strong enhancer vs. silencer; indigo, strong enhancer vs. inactive; shaded area, 1 standard deviation based on fivefold cross-validation.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig3-figsupp1.jpg"/></fig></fig-group></sec><sec id="s2-5"><title>Strong enhancers require high information content but not NRL motifs</title><p>Our results show that except for CRX, none of the lineage-defining motifs occur in a majority of strong enhancers. However, all sequences were tested in reporter constructs with the <italic>Rho</italic> promoter, which contains an NRL motif and three CRX motifs (<xref ref-type="bibr" rid="bib9">Corbo et al., 2010</xref>; <xref ref-type="bibr" rid="bib47">Kwasnieski et al., 2012</xref>). Since NRL is a key co-regulator with CRX in rod photoreceptors, we tested whether strong enhancers generally require NRL, which would be inconsistent with our TF diversity hypothesis. We removed the NRL motif by recloning our MPRA library without the basal <italic>Rho</italic> promoter. If strong enhancers require an NRL motif for high activity, then only CRX-targeted sequences with NRL motifs will drive reporter expression. If information content (i.e. total motif content and diversity) is the primary determinant of strong enhancers, only CRX-targeted sequences with sufficient motif diversity, measured by information content, will drive reporter expression regardless of whether or not NRL motifs are present.</p><p>We replaced the <italic>Rho</italic> promoter with a minimal 23 bp polylinker sequence between our libraries and <italic>DsRed</italic>, and repeated the MPRA (<xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1</xref>, <xref ref-type="supplementary-material" rid="supp3">Supplementary file 3</xref>). CRX-targeted sequences were designated as ‘autonomous’ if they retained activity in the absence of the <italic>Rho</italic> promoter (log<sub>2</sub>(RNA/DNA) > 0, Materials and methods). We found that 90% of autonomous sequences are from the enhancer class, while less than 3% of autonomous sequences are from the silencer class (<xref ref-type="fig" rid="fig4">Figure 4a</xref>). This confirms that the distinction between silencers and enhancers does not depend on the <italic>Rho</italic> promoter, which is consistent with our previous finding that CRX-targeted silencers repress other promoters (<xref ref-type="bibr" rid="bib32">Hughes et al., 2018</xref>; <xref ref-type="bibr" rid="bib86">White et al., 2016</xref>). However, while most autonomous sequences are enhancers, only 39% of strong enhancers and 9% of weak enhancers act autonomously. Consistent with a role for information content, autonomous strong enhancers have higher information content (Mann-Whitney U test p = 4 × 10<sup>–8</sup>, <xref ref-type="fig" rid="fig4">Figure 4b</xref>) and higher predicted CRX occupancy (Mann-Whitney U test p = 9 × 10<sup>–12</sup>, <xref ref-type="fig" rid="fig4">Figure 4c</xref>) than non-autonomous strong enhancers. We found no evidence that specific lineage-defining motifs are required for autonomous activity, including NRL, which is present in only 25% of autonomous strong enhancers (<xref ref-type="fig" rid="fig4">Figure 4d</xref>). Similarly, NRL ChIP-seq binding (<xref ref-type="bibr" rid="bib23">Hao et al., 2012</xref>) occurs more often among autonomous strong enhancers (41% vs. 19%, Fisher’s exact test p = 2 × 10<sup>–14</sup>, odds ratio = 3.0), yet NRL binding still only accounts for a minority of these sequences. We thus conclude that strong enhancers require high information content, rather than any specific lineage-defining motifs.</p><fig id="fig4" position="float"><label>Figure 4.</label><caption><title>Sequence features of autonomous and non-autonomous strong enhancers.</title><p>(<bold>a</bold>) Activity of library in the presence (x-axis) or absence (y-axis) of the <italic>Rho</italic> promoter. Dark blue, strong enhancers; light blue, weak enhancers; green, inactive; red, silencers; gray, ambiguous; horizontal line, cutoff for autonomous activity. Points on the far left and/or very bottom are sequences that were present in the plasmid pool but not detected in the RNA. (<bold>b–d</bold>) Comparison of autonomous and non-autonomous strong enhancers for information content (<bold>b</bold>), predicted cone-rod homeobox (CRX) occupancy (<bold>c</bold>), and frequency of transcription factor (TF) motifs (<bold>d</bold>).</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig4.jpg"/></fig></sec><sec id="s2-6"><title>TF motifs contribute independently to strong enhancers</title><p>Our results indicate that information content distinguishes strong enhancers from silencers and inactive sequences. Information content only takes into account the total number and diversity of motifs in a sequence and not any potential interactions between them. The classification success of information content thus suggests that each TF motif will contribute independently to enhancer activity. We tested this prediction with CRX-targeted sequences where all CRX motifs were abolished by point mutation (<xref ref-type="supplementary-material" rid="supp3">Supplementary file 3</xref>). Consistent with our previous work (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>), mutating CRX motifs causes the activities of both enhancers and silencers to regress toward basal levels (Pearson’s <italic>r</italic> = 0.608, <xref ref-type="fig" rid="fig5">Figure 5a</xref>), indicating that most enhancers and silencers show some dependence on CRX. However, 40% of wild-type strong enhancers show low CRX dependence and remain strong enhancers with their CRX motifs abolished. Although strong enhancers with high and low CRX dependence have similar wild-type information content (<xref ref-type="fig" rid="fig5">Figure 5b</xref>), strong enhancers with low CRX dependence have lower predicted CRX occupancy than those with high CRX dependence (Mann-Whitney U test p = 2 × 10<sup>–9</sup>, <xref ref-type="fig" rid="fig5">Figure 5c</xref>), and also have higher ‘residual’ information content (i.e. information content without CRX motifs, Mann-Whitney U test p = 1 × 10<sup>–7</sup>, <xref ref-type="fig" rid="fig5">Figure 5d</xref>). Low CRX dependence sequences have an average of 1.5 residual bits, which corresponds to three motifs for two TFs, while high CRX dependence sequences have an average of 1.0 residual bits, which corresponds to two motifs for two TFs (<xref ref-type="fig" rid="fig5">Figure 5e</xref>).</p><fig id="fig5" position="float"><label>Figure 5.</label><caption><title>Independence of transcription factor (TF) motifs in strong enhancers.</title><p>(<bold>a</bold>) Activity of sequences with and without cone-rod homeobox (CRX) motifs. Points are colored by the activity group with CRX motifs intact: dark blue, strong enhancers; light blue, weak enhancers; green, inactive; red, silencers; gray, ambiguous; horizontal dotted lines and color bar represent the cutoffs for the same groups when CRX motifs are mutated. Solid black line is the y = x line. (<bold>b–d</bold>) Comparison of strong enhancers with high and low CRX dependence for information content (<bold>b</bold>), predicted CRX occupancy (<bold>c</bold>), and residual information content (<bold>d</bold>). (<bold>e</bold>) Representative strong enhancers with high (top) or low (bottom) CRX dependence.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="temp.xml.media/fig5.jpg"/></fig><p>Strong enhancers with low and high CRX dependence have similar wild-type information content and similar total predicted occupancy (<xref ref-type="fig" rid="fig5">Figure 5b and e</xref>). As a result, sequences with more CRX motifs have fewer motifs for other TFs, suggesting that there is no evolutionary pressure for enhancers to contain additional motifs beyond the minimum amount of information content required to be active. To test this idea, we calculated the minimum number and diversity of motifs necessary to specify a relatively unique location in the genome (<xref ref-type="bibr" rid="bib87">Wunderlich and Mirny, 2009</xref>) and found that a 164 bp sequence only requires five motifs for three TFs (Materials and methods). These motif requirements can be achieved in two ways with similar information content that differ only in the quantitative number of motifs for each TF. In other words, the number of motifs for any particular TF is not important so long as there is sufficient information content. Taken together, we conclude that each TF motif provides an independent contribution toward specifying strong enhancers.</p></sec></sec><sec id="s3" sec-type="discussion"><title>Discussion</title><p>Many regions in the genome are bound by TFs and bear the epigenetic hallmarks of active <italic>cis</italic>-regulatory sequences, yet fail to exhibit <italic>cis</italic>-regulatory activity when tested directly. The discrepancy between measured epigenomic state and <italic>cis</italic>-regulatory activity indicates that enhancers and silencers consist of more than the minimal sequence features necessary to recruit TFs and chromatin-modifying factors. Our results show that enhancers, silencers, and inactive sequences in developing photoreceptors can be distinguished by their motif content, even though they are indistinguishable by CRX binding or chromatin accessibility. We show that both enhancers and silencers contain more TF motifs than inactive sequences, and that enhancers also contain more diverse sets of motifs for lineage-defining TFs. These differences are captured by our measure of information content. Information content, as a single metric, identifies strong enhancers nearly as well as an unbiased set of 2080 non-redundant 6-mers used for an SVM, indicating that a simple measure of motif number and diversity can capture the key sequence features that distinguish enhancers from other sequences that lie in open chromatin.</p><p>The results of our information content classifier are consistent with the TF collective model of enhancers (<xref ref-type="bibr" rid="bib39">Junion et al., 2012</xref>; <xref ref-type="bibr" rid="bib78">Spitz and Furlong, 2012</xref>): globally, active enhancers are specified by the combinatorial action of lineage-defining TFs with little constraint on which motifs must co-occur. We show that CRX-targeted enhancers are distinguished from inactive CRX-targeted sequences by a larger, more diverse collection of TF motifs, and not any specific combination of motifs. This indicates that enhancers are active because they have acquired the necessary number of TF binding motifs, and not because they are defined by a strict regulatory grammar. Sequences with fewer motifs may be bound by CRX and reside within open chromatin, but they lack sufficient TF binding for activity. Such loose constraints would facilitate the de novo emergence of tissue-specific enhancers and silencers over evolution and explain why critical cell type-specific TF interactions, such as CRX and NRL in rod photoreceptors, occur at only a minority of the active enhancers in that cell type (<xref ref-type="bibr" rid="bib28">Hsiau et al., 2007</xref>; <xref ref-type="bibr" rid="bib32">Hughes et al., 2018</xref>; <xref ref-type="bibr" rid="bib85">White et al., 2013</xref>).</p><p>Like enhancers, CRX-targeted silencers require higher motif content and are dependent on CRX motifs, but they lack the TF diversity of enhancers. The lack of TF diversity in silencers parallels the architecture of signal-responsive <italic>cis</italic>-regulatory sequences, which are silencers in the absence of a signal and require multiple activators for induction (<xref ref-type="bibr" rid="bib4">Barolo and Posakony, 2002</xref>). Consistent with this, we previously showed using synthetic sequences that high occupancy of CRX alone is sufficient to encode silencers while the addition of a single NRL motif converts synthetic silencers to enhancers, and that genomic sequences with very high CRX motif content repress a basal promoter that lacks NRL motifs (<xref ref-type="bibr" rid="bib86">White et al., 2016</xref>). We found that photoreceptor genes which are de-repressed upon loss of CRX are located near <italic>cis</italic>-regulatory sequences with high CRX motif content, and that genes near regions that are bound only by CRX are expressed at lower levels than genes near regions co-bound by CRX and NRL (<xref ref-type="bibr" rid="bib86">White et al., 2016</xref>). In the current study, we find that silencers in our MPRA library are more likely to occur near de-repressed photoreceptor genes, while strong enhancers are enriched near genes that lose expression in <italic>Crx<sup>-/-</sup></italic> retina. These findings suggest that the low TF diversity and high CRX motif content that characterize silencers in our MPRA library are also important for silencing in the genome.</p><p>The contrast in motif diversity between enhancers and silencers that we observe could explain how CRX achieves selective activation and repression of its target genes in multiple cell types and across developmental time points (<xref ref-type="bibr" rid="bib60">Murphy et al., 2019</xref>; <xref ref-type="bibr" rid="bib72">Ruzycki et al., 2018</xref>). CRX itself is required for silencing, and we previously showed that some silencers become active enhancers in <italic>Crx<sup>-/-</sup></italic> retina (<xref ref-type="bibr" rid="bib86">White et al., 2016</xref>). The mechanism of CRX-based silencing is unknown, however CRX cooperates with other TFs that can sometimes act as repressors of cell type-specific genes (<xref ref-type="bibr" rid="bib7">Chen et al., 2005</xref>; <xref ref-type="bibr" rid="bib65">Peng et al., 2005</xref>; <xref ref-type="bibr" rid="bib84">Webber et al., 2008</xref>), while other repressors can directly inhibit activation by CRX or its co-activators (<xref ref-type="bibr" rid="bib12">Dorval et al., 2006</xref>; <xref ref-type="bibr" rid="bib26">Hlawatsch et al., 2013</xref>; <xref ref-type="bibr" rid="bib57">Mitton et al., 2003</xref>; <xref ref-type="bibr" rid="bib75">Sanuki et al., 2010</xref>). In <italic>Drosophila</italic> photoreceptors, selective silencing of opsin genes is controlled by cell type-specific expression of a repressor, Dve, which acts on the same K50 homeodomain-binding sites as a universally expressed activator, Otd, a homolog of CRX (<xref ref-type="bibr" rid="bib70">Rister et al., 2015</xref>). Other transcriptional activators selectively act as repressors in the same cell type. GATA-1 represses the <italic>GATA-</italic>2 promoter by displacing CREB-binding protein (CBP), while at other genes GATA-1 binds CBP to activate transcription (<xref ref-type="bibr" rid="bib21">Grass et al., 2003</xref>). Selective repression by GATA-1 is also mediated by chromatin occupancy levels and interaction with co-regulators (<xref ref-type="bibr" rid="bib38">Johnson et al., 2006</xref>), which is consistent with our finding that sequence context enables a TF to both activate and repress genes in the same cell type.</p><p>Given the central role of CRX in selectively regulating genes in multiple closely related cell types (<xref ref-type="bibr" rid="bib60">Murphy et al., 2019</xref>), we speculate that CRX-targeted silencers may contain sufficient information to act as enhancers in other cell types in which a different set of co-activating TFs are expressed. This hypothesis would be consistent with the finding that many silencers are enhancers in other cell types (<xref ref-type="bibr" rid="bib11">Doni Jayavelu et al., 2020</xref>; <xref ref-type="bibr" rid="bib20">Gisselbrecht et al., 2020</xref>; <xref ref-type="bibr" rid="bib61">Ngan et al., 2020</xref>). Our work suggests that characterizing sequences by their motif information content offers a way to identify these different classes of <italic>cis</italic>-regulatory sequences in the genome.</p></sec><sec id="s4" sec-type="materials|methods"><title>Materials and methods</title><table-wrap id="keyresource" position="anchor"><label>Key resources table</label><table frame="hsides" rules="groups"><thead><tr><th align="left" valign="top">Reagent type (species) or resource</th><th align="left" valign="top">Designation</th><th align="left" valign="top">Source or reference</th><th align="left" valign="top">Identifiers</th><th align="left" valign="top">Additional information</th></tr></thead><tbody><tr><td align="left" valign="top">Strain, strain background (<italic>Mus musculus</italic>, male and female)</td><td align="left" valign="top">CD-1</td><td align="left" valign="top">Charles River</td><td align="left" valign="top">Strain code 022</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Recombinant DNA reagent</td><td align="left" valign="top">Library1</td><td align="left" valign="top">This paper</td><td align="left" valign="top"/><td align="left" valign="top">Listed in <xref ref-type="supplementary-material" rid="supp1">Supplementary file 1</xref></td></tr><tr><td align="left" valign="top">Recombinant DNA reagent</td><td align="left" valign="top">Library2</td><td align="left" valign="top">This paper</td><td align="left" valign="top"/><td align="left" valign="top">Listed in <xref ref-type="supplementary-material" rid="supp2">Supplementary file 2</xref></td></tr><tr><td align="left" valign="top">Recombinant DNA reagent</td><td align="left" valign="top">pJK01_Rhominprox-DsRed</td><td align="left" valign="top"><xref ref-type="bibr" rid="bib47">Kwasnieski et al., 2012</xref></td><td align="left" valign="top">AddGene plasmid # 173,489</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Recombinant DNA reagent</td><td align="left" valign="top">pJK03_<italic>Rho_basal_</italic>DsRed</td><td align="left" valign="top"><xref ref-type="bibr" rid="bib47">Kwasnieski et al., 2012</xref></td><td align="left" valign="top">AddGene plasmid # 173,490</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Sequence-based reagent</td><td align="left" valign="top">Primers</td><td align="left" valign="top">IDT</td><td align="left" valign="top"/><td align="left" valign="top">Listed in <xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref></td></tr><tr><td align="left" valign="top">Commercial assay or kit</td><td align="left" valign="top">Monarch PCR Cleanup Kit</td><td align="left" valign="top">New England Biolabs</td><td align="left" valign="top">T1030S</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Commercial assay or kit</td><td align="left" valign="top">Monarch DNA Gel Extraction Kit</td><td align="left" valign="top">New England Biolabs</td><td align="left" valign="top">T1020L</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Commercial assay or kit</td><td align="left" valign="top">TURBO DNA-free</td><td align="left" valign="top">Invitrogen</td><td align="left" valign="top">AM1907</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Commercial assay or kit</td><td align="left" valign="top">SuperScript III Reverse Transcriptase</td><td align="left" valign="top">Invitrogen</td><td align="char" char="." valign="top">18080044</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">Bedtools</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://bedtools.readthedocs.io/en/latest/">https://bedtools.readthedocs.io/en/latest/</ext-link></td><td align="left" valign="top">RRID:<ext-link ext-link-type="uri" xlink:href="https://identifiers.org/RRID/RRID:SCR_006646">SCR_006646</ext-link></td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">MEME Suite</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://meme-suite.org/">https://meme-suite.org/</ext-link></td><td align="left" valign="top">RRID:<ext-link ext-link-type="uri" xlink:href="https://identifiers.org/RRID/RRID:SCR_001783">SCR_001783</ext-link></td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">ShapeMF</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://github.com/h-samee/shape-motif">https://github.com/h-samee/shape-motif</ext-link>, <xref ref-type="bibr" rid="bib74">Samee, 2021</xref></td><td align="left" valign="top">DOI:<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.cels.2018.12.001">10.1016/j.cels.2018.12.001</ext-link></td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">Numpy</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://numpy.org/">https://numpy.org/</ext-link></td><td align="left" valign="top">DOI:<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41586-020-2649-2">10.1038/s41586-020-2649-2</ext-link></td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">Scipy</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://www.scipy.org/">https://www.scipy.org/</ext-link></td><td align="left" valign="top">DOI:<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41592-019-0686-2">10.1038/s41592-019-0686-2</ext-link></td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">Pandas</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://pandas.pydata.org/">https://pandas.pydata.org/</ext-link></td><td align="left" valign="top">DOI:<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.3509134">10.5281/zenodo.3509134</ext-link></td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">Matplotlib</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://matplotlib.org/">https://matplotlib.org/</ext-link></td><td align="left" valign="top">DOI:<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.1482099">10.5281/zenodo.1482099</ext-link></td><td align="left" valign="top"/></tr><tr><td align="left" valign="top">Software, algorithm</td><td align="left" valign="top">Logomaker</td><td align="left" valign="top"><ext-link ext-link-type="uri" xlink:href="https://github.com/jbkinney/logomaker">https://github.com/jbkinney/logomaker</ext-link>, <xref ref-type="bibr" rid="bib40">Justin, 2021</xref></td><td align="left" valign="top">DOI:<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/bioinformatics/btz921">10.1093/bioinformatics/btz921</ext-link></td><td align="left" valign="top"/></tr></tbody></table></table-wrap><sec id="s4-1"><title>Library design</title><p>CRX ChIP-seq peaks re-processed by <xref ref-type="bibr" rid="bib72">Ruzycki et al., 2018</xref> were intersected with previously published CRX MPRA libraries (<xref ref-type="bibr" rid="bib32">Hughes et al., 2018</xref>; <xref ref-type="bibr" rid="bib85">White et al., 2013</xref>) and one unpublished library to select sequences that had not been previously tested by MPRA. These sequences were scanned for instances of CRX motifs using FIMO version 4.11.2 (<xref ref-type="bibr" rid="bib3">Bailey et al., 2009</xref>), a p-value cutoff of 2.3 × 10<sup>–3</sup> (see below), and a CRX PWM derived from an electrophoretic mobility shift assay (<xref ref-type="bibr" rid="bib49">Lee et al., 2010</xref>). We centered 2622 sequences on the highest scoring CRX motif. For 677 sequences without a CRX motif, we instead centered them using the Gibbs sampler from ShapeMF (Github commit abe8421) (<xref ref-type="bibr" rid="bib73">Samee et al., 2019</xref>) and a motif size of 10.</p><p>For sequences unbound in CRX ChIP-seq but in open chromatin, we took ATAC-seq peaks collected in 8-week FACS-purified rods, green cones, and <italic>Nrl<sup>-/-</sup></italic> blue cones (<xref ref-type="bibr" rid="bib31">Hughes et al., 2017</xref>) and removed sequences that overlapped with CRX ChIP-seq peaks. The remaining sequences were scanned for instances of CRX motifs using FIMO with a p-value cutoff of 2.5 × 10<sup>–3</sup> and the CRX PWM. Sequences with a CRX motif were kept and the three ATAC-seq data sets were merged together, intersected with H3K27ac and H3K4me3 ChIP-seq peaks collected in P14 retinas (<xref ref-type="bibr" rid="bib72">Ruzycki et al., 2018</xref>), and centered on the highest scoring CRX motifs. We randomly selected 1004 H3K27ac<sup>+</sup>H3K4me3<sup>-</sup> sequences and 541 H3K27ac<sup>+</sup>H3K4me3<sup>+</sup> to reflect the fact that ~35% of CRX ChIP-seq peaks are H3K4me3<sup>+</sup>. After synthesis of our library, we discovered 11% of these sequences do not actually overlap H3K27ac ChIP-seq peaks (110/1004 of the H3K4me3<sup>-</sup> group and 60/541 of the H3K4me3<sup>+</sup> group), but we still included them in the analysis because they contain CRX motifs in ATAC-seq peaks.</p><p>All data was converted to mm10 coordinates using the UCSC liftOver tool (<xref ref-type="bibr" rid="bib22">Haeussler et al., 2019</xref>) and processed using Bedtools version 2.27.1 (<xref ref-type="bibr" rid="bib68">Quinlan and Hall, 2010</xref>). All sequences in our library design were adjusted to 164 bp and screened for instances of EcoRI, SpeI, SphI, and NotI sites. In total, our library contains 4844 genomic sequences (2622 CRX ChIP-seq peaks with motifs, 677 CRX ChIP-seq peaks without motifs, 1004 CRX<sup>-</sup>ATAC<sup>+</sup>H3K27ac<sup>+</sup>H3K4me3<sup>-</sup> CRX motifs, and 541 CRX<sup>-</sup>ATAC<sup>+</sup>H3K27ac<sup>+</sup>H3K4me3<sup>+</sup> CRX motifs), a variant of each sequence with all CRX motifs mutated, 150 scrambled sequences, and a construct for cloning the basal promoter alone.</p><p>For sequences centered on CRX motifs, all CRX motifs with a p-value of 2.5 × 10<sup>–3</sup> or less were mutated by changing the core TAAT to TACT (<xref ref-type="bibr" rid="bib49">Lee et al., 2010</xref>) on the appropriate strand, as described previously (<xref ref-type="bibr" rid="bib32">Hughes et al., 2018</xref>; <xref ref-type="bibr" rid="bib85">White et al., 2013</xref>). We then re-scanned sequences and mutated any additional motifs inadvertently created.</p><p>To generate scrambled sequences, we randomly selected 150 CRX ChIP-seq peaks spanning the entire range of GC content in the library. We then scrambled each sequence while preserving dinucleotide content as previously described (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>). We used FIMO to confirm that none of the scrambled sequences contain CRX motifs.</p><p>We unintentionally used a FIMO p-value cutoff of 2.3 × 10<sup>–3</sup> to identify CRX motifs in CRX ChIP-seq peaks, rather than the slightly less stringent 2.5 × 10<sup>–3</sup> cutoff used with ATAC-seq peaks or mutating CRX motifs. Due to this anomaly, there may be sequences centered using ShapeMF that should have been centered on a CRX motif, and these motifs would not have been mutated because CRX motifs were not mutated in sequences centered using ShapeMF. However, any intact CRX motifs would still be captured in the residual information content of the mutant sequence.</p></sec><sec id="s4-2"><title>Plasmid library construction</title><p>We generated two 15,000 libraries of 230 bp oligonucleotides (oligos) from Agilent Technologies (Santa Clara, CA) through a limited licensing agreement. Our library was split across the two oligo pools, ensuring that both the genomic and mutant forms of each sequence were placed in the same oligo pool (<xref ref-type="supplementary-material" rid="supp1">Supplementary files 1 and 2</xref>). Both oligo pools contain all 150 scrambled sequences as an internal control. All sequences were assigned three unique barcodes as previously described (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>). In each oligo pool, the basal promoter alone was assigned 18 unique barcodes. Oligos were synthesized as follows: 5’ priming sequence (<named-content content-type="sequence">GTAGCGTCTGTCCGT</named-content>)/EcoRI site/Library sequence/SpeI site/C/SphI site/Barcode sequence/NotI site/3’ priming sequence (<named-content content-type="sequence">CAACTACTACTACAG</named-content>). To clone the basal promoter into barcoded oligos without any upstream <italic>cis</italic>-regulatory sequence, we placed the SpeI site next to the EcoRI site, which allowed us to place the promoter between the EcoRI site and the 3’ barcode.</p><p>We cloned the synthesized oligos as previously described by our group (<xref ref-type="bibr" rid="bib47">Kwasnieski et al., 2012</xref><xref ref-type="bibr" rid="bib86">White et al., 2016</xref>; <xref ref-type="bibr" rid="bib85">White et al., 2013</xref>). Specifically, for each oligo pool, we used 50 femtomoles of template and four cycles of PCR in each of multiple 50 µl reactions (New England Biolabs [NEB], Ipswich, MA) (NEB Phusion) using primers MO563 and MO564 (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>), 2% DMSO, and an annealing temperature of 57°C. PCR amplicons were purified from a 2% agarose gel (NEB), digested with EcoRI-HF and NotI-HF (NEB), and then cloned into the EagI and EcoRI sites of the plasmid pJK03 with multiple 20 µl ligation reactions (NEB T4 ligase). The libraries were transformed into 5-alpha electrocompetent cells (NEB) and grown in liquid culture. Next, 2 µg of each library was digested with SphI-HF and SpeI-HF (NEB) and then treated with Antarctic phosphatase (NEB).</p><p>The <italic>Rho</italic> basal promoter and <italic>DsRed</italic> reporter gene was amplified from the plasmid pJK01 using primers MO566 and MO567 (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>). The Polylinker and <italic>DsRed</italic> reporter gene was amplified from the plasmid pJK03 using primers MO610 and MO567 (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>). The Polylinker is a short 23 bp multiple cloning site with no known core promoter motifs. Inserts were purified from a 1% agarose gel (NEB), digested with NheI-HF and SphI-HF (NEB), and cloned into the libraries using multiple 20 µl ligations (NEB T4 ligase). The libraries were transformed into 5-alpha electrocompetent cells (NEB) and grown in liquid culture.</p></sec><sec id="s4-3"><title>Retinal explant electroporation</title><p>Animal procedures were performed in accordance with a Washington University in St Louis Institutional Animal Care and Use Committee-approved vertebrate animals protocol. Electroporation into retinal explants and RNA extraction was performed as described previously (<xref ref-type="bibr" rid="bib28">Hsiau et al., 2007</xref>; <xref ref-type="bibr" rid="bib32">Hughes et al., 2018</xref>; <xref ref-type="bibr" rid="bib47">Kwasnieski et al., 2012</xref>; <xref ref-type="bibr" rid="bib86">White et al., 2016</xref>; <xref ref-type="bibr" rid="bib85">White et al., 2013</xref>). Briefly, retinas were isolated from P0 newborn CD-1 mice and electroporated in a solution with 30 µg library and 30 µg <italic>Rho</italic>-GFP. Electroporated retinas were cultured for 8 days, at which point they were harvested, washed three times with HBSS (ThermoFisher Scientific/Gibco, Waltham, MA), and stored in TRIzol (ThermoFisher Scientific/Invitrogen, Waltham, MA) at –80°C. Five retinas were pooled for each biological replicate and three replicates were performed for each library. RNA was extracted from TRIzol according to manufacturer’s instructions and treated with TURBO DNase (Invitrogen). cDNA was prepared using SuperScript RT III (Invitrogen) with oligo dT primers. Barcodes from both the cDNA and the plasmid DNA pool were amplified for sequencing (described below). The resulting products were mixed at equal concentration and sequenced on the Illumina NextSeq platform. We obtained greater than 1300× coverage across all samples.</p><p><italic>Rho</italic> libraries were amplified using primers MO574 and MO575 (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>) for six cycles at an annealing temperature of 66°C followed by 18 cycles with no annealing step (NEB Phusion) and then purified with the Monarch PCR kit (NEB). PCR amplicons were digested using MfeI-HF and SphI-HF (NEB) and ligated to custom-annealed adaptors with PE2 indexing barcodes and phased P1 barcodes (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>). The final enrichment PCR used primers MO588 and MO589 (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>) for 20 cycles at an annealing temperature of 66°C (NEB Phusion), followed by purification with the Monarch PCR kit. Polylinker libraries were amplified using primers BC_CRX_Nested_F and BC_CRX_R (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>) for 30 cycles (NEB Q5) at an annealing temperature of 67°C and then purified with the Monarch PCR kit. Illumina adaptors were then added via two further rounds of PCR. First, P1 indexing barcodes were added using forward primers P1_inner_A through P1_inner_D and reverse primer P1_inner_nested_rev (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>) for five cycles at an annealing temperature of 55°C followed by five cycles with no annealing step (NEB Q5). PE2 indexing barcodes were then added by amplifying 2 µl of the previous reaction with forward primer P1_outer and reverse primers PE2_outer_SIC69 and PE2_outer_SIC70 (<xref ref-type="supplementary-material" rid="supp6">Supplementary file 6</xref>) for five cycles at an annealing temperature of 66°C followed by five cycles with no annealing step (NEB Q5) and then purified with the Monarch PCR kit.</p></sec><sec id="s4-4"><title>Data processing</title><p>All data processing, statistical analysis, and downstream analyses were performed in Python version 3.6.5 using Numpy version 1.15.4 (<xref ref-type="bibr" rid="bib24">Harris et al., 2020</xref>), Scipy version 1.1.0 (<xref ref-type="bibr" rid="bib82">Virtanen et al., 2020</xref>), and Pandas version 0.23.4 (<xref ref-type="bibr" rid="bib54">McKinney, 2010</xref>), and visualized using Matplotlib version 3.0.2 (<xref ref-type="bibr" rid="bib33">Hunter, 2007</xref>) and Logomaker version 0.8 (<xref ref-type="bibr" rid="bib81">Tareen and Kinney, 2020</xref>). All statistical analysis used two-sided tests unless noted otherwise.</p><p>Sequencing reads were filtered to ensure that the barcode sequence perfectly matched the expected sequence (>93% reads in a sample for the <italic>Rho</italic> libraries, >86% reads for the Polylinker libraries). For the <italic>Rho</italic> libraries, barcodes that had less than 10 raw counts in the DNA sample were considered missing and removed from downstream analysis. Barcodes that had less than five raw counts in any cDNA sample were considered present in the input plasmid pool but below the detection limit and thus set to zero in all samples. Barcode counts were normalized by reads per million (RPM) for each sample. Barcode expression was calculated by dividing the cDNA RPM by the DNA RPM. Replicate-specific expression was calculated by averaging the barcodes corresponding to each library sequence. After performing statistical analysis (see below), expression levels were normalized by replicate-specific basal mean expression and then averaged across biological replicates.</p><p>For the Polylinker assay, the expected lack of expression of many constructs required different processing. Barcodes that had less than 50 raw counts in the DNA sample were removed from downstream analysis. Barcodes were normalized by RPM for each replicate. Barcodes that had less than 8 RPM in any cDNA sample were set to zero in all samples. cDNA RPM were then divided by DNA RPM as above. Within each biological replicate, barcodes were averaged as above but were not normalized to basal expression because there is no basal construct. Expression values were then averaged across biological replicates. Due to the low expression of scrambled sequences and the lack of a basal construct, we were unable to assess data calibration with the same rigor as above.</p></sec><sec id="s4-5"><title>Assignment of activity classes</title><p>Activity classes were assigned by comparing expression levels to basal promoter expression levels across replicates. The null hypothesis is that the expression of a sequence is the same as basal levels. Expression levels were approximately log-normally distributed, so we computed the log-normal parameters for each sequence and then performed Welch’s t-test. We corrected for multiple hypotheses using the Benjamini-Hochberg FDR procedure. We corrected for multiple hypotheses in each library separately to account for any potential batch effects between libraries. The log<sub>2</sub> expression was calculated after adding a pseudocount of 1 × 10<sup>–3</sup> to every sequence.</p><p>Sequences were classified as enhancers if they were twofold above basal and the q-value was below 0.05. Silencers were similarly defined as twofold below basal and q-value less than 0.05. Inactive sequences were defined as within a twofold change and q-value greater than or equal to 0.05. All other sequences were classified as ambiguous and removed from further analysis. We used scrambled sequences to further stratify enhancers into strong and weak enhancers, using the rationale that scrambled sequences give an empirical distribution for the activity of random sequences. We defined strong enhancers as enhancers that are above the 95th percentile of scrambled sequences and all other enhancers as weak enhancers.</p><p>For the Polylinker assay, we did not have a basal construct as a reference point. Instead, we defined a sequence to have autonomous activity if the average cDNA barcode counts were higher than average DNA barcode counts, and non-autonomous otherwise. The log<sub>2</sub> expression was calculated after adding a pseudocount of 1 × 10<sup>–2</sup> to every sequence.</p></sec><sec id="s4-6"><title>RNA-seq analysis</title><p>We obtained RNA-seq data from WT and Crx<sup>-/-</sup> P21 retinas (<xref ref-type="bibr" rid="bib71">Roger et al., 2014</xref>) processed into a counts matrix for each gene by <xref ref-type="bibr" rid="bib72">Ruzycki et al., 2018</xref>. Each sample was normalized by read counts per million and replicates were averaged together. Genes with at least a twofold change between genotypes were considered differentially expressed. We determined which differentially expressed genes are near a member of our library using previously published associations between retinal ATAC-seq peaks and genes (<xref ref-type="bibr" rid="bib60">Murphy et al., 2019</xref>). For de-repressed genes, we determined how often the nearest library member is a silencer; for down-regulated genes, we determined how often the nearest library member is a strong or weak enhancer.</p></sec><sec id="s4-7"><title>Motif analysis</title><p>We performed motif enrichment analysis using the MEME Suite version 5.0.4 (<xref ref-type="bibr" rid="bib3">Bailey et al., 2009</xref>). We searched for motifs that were enriched in one group of sequences relative to another group using DREME-py3 with the parameters -mink 6 -maxk 12 -e 0.05 and compared the de novo motifs to known motifs using TOMTOM on default parameters. We ran DREME using strong enhancers as positives and silencers as negatives, and vice versa. For TOMTOM, we used version 11 of the full mouse HOCOMOCO database (<xref ref-type="bibr" rid="bib46">Kulakovskiy et al., 2018</xref>) with the following additions from the JASPAR human database (<xref ref-type="bibr" rid="bib42">Khan et al., 2018</xref>): NRL (accession MA0842.1), RORB (accession MA1150.1), and RAX (accession MA0718.1). We added these PWMs because they have known roles in the retina, but the mouse PWMs were not in the HOCOMOCO database. We also used the CRX PWM that we used to design the library. Motifs were selected for downstream analysis based on their matches to the de novo motifs, whether the TF had a known role in retinal development, and the quality of the PWM. Because PWMs from TFs of the same family were so similar, we used one TF for each DREME motif, recognizing that these motifs may be bound by other TFs that recognize similar motifs. We did not use any PWMs with a quality of ‘D’. We excluded DREME motifs without a match to the database from further analysis; most of these resemble dinucleotides.</p></sec><sec id="s4-8"><title>Predicted occupancy</title><p>We computed predicted occupancy as previously described (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>; <xref ref-type="bibr" rid="bib89">Zhao et al., 2009</xref>). Briefly, we normalized each letter probability matrix by the most probable letter at each position. We took the negative log of this matrix and multiplied by 2.5, which corresponds to the ideal gas constant times 300 K, to obtain an energy weight matrix. We used a chemical potential <italic>μ</italic> of 9 for all TFs. At this value, the probability of a site being bound is at least 0.5 if the relative <italic>K</italic><sub><italic>D</italic></sub> is at least 0.03 of the optimal binding site. We computed the predicted occupancy for every site on the forward and reverse strands and summed them together to get a single value for each TF.</p><p>To determine if there is a bias in the linear arrangement of motifs, we selected strong enhancers with exactly one site occupied by CRX and exactly one site occupied by a second TF. We counted the number of times the position of the second TF was 5’ and 3’ of the CRX site and then performed a binomial test. We did not observe a statistically significant bias for any TF at an FDR q-value cutoff of 0.05. We also performed this analysis for silencers with exactly one site occupied by CRX and exactly one site occupied by NRL and did not observe a significant difference in the 5’ vs. 3’ bias of strong enhancers vs. silencers (Fisher’s exact test p = 0.17).</p></sec><sec id="s4-9"><title>Information content</title><p>To capture the effects of TF predicted occupancy and diversity in a single metric, we calculated the motif information content using Boltzmann entropy. Boltzmann’s equation states that the entropy of a system <inline-formula><mml:math id="inf1"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> is related to the number of ways the molecules can be arranged (microstates) <inline-formula><mml:math id="inf2"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> via the equation <inline-formula><mml:math id="inf3"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mi>B</mml:mi></mml:msub><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mi>W</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula>, where <inline-formula><mml:math id="inf4"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula> is Boltzmann’s constant (<xref ref-type="bibr" rid="bib67">Phillips et al., 2012</xref>, Chapter 5). The number of microstates is defined as <inline-formula><mml:math id="inf5"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>N</mml:mi><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:munder><mml:mo>∏</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>!</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math></inline-formula> where <inline-formula><mml:math id="inf6"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> is the total number of particles and <inline-formula><mml:math id="inf7"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula> are the number of the -th type of particles. In our case, the system is the collection of predicted binding motifs for different TFs in a <italic>cis</italic>-regulatory sequence. We assume each TF is a different type of molecule because the DNA-binding domain of each TF belongs to a different subfamily. The number of molecular arrangements <inline-formula><mml:math id="inf8"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> represents the number of distinguishable ways that the TFs can be ordered on the sequence. Thus, <inline-formula><mml:math id="inf9"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula> is the predicted occupancy of the <inline-formula><mml:math id="inf10"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula>-th TF and <inline-formula><mml:math id="inf11"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> is the total predicted occupancy of all TFs on the <italic>cis</italic>-regulatory sequence. Because the predicted occupancies are continuous values, we exploit the definition of the Gamma function, <inline-formula><mml:math id="inf12"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>N</mml:mi><mml:mo>!</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> to rewrite <inline-formula><mml:math id="inf13"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munder><mml:mo>∏</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math></inline-formula> .</p><p>If we assume that each arrangement of motifs is equally likely, then we can write the probability of arrangement <inline-formula><mml:math id="inf14"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>…</mml:mo><mml:mo>,</mml:mo><mml:mi>W</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> as <inline-formula><mml:math id="inf15"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>w</mml:mi></mml:mfrac></mml:mrow></mml:mstyle></mml:math></inline-formula> and rewrite the entropy as <inline-formula><mml:math id="inf16"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">(</mml:mo></mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>w</mml:mi></mml:mfrac><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>, where we have dropped Boltzmann’s constant since the connection between molecular arrangements and temperature is not important. Because each arrangement is equally likely, then <inline-formula><mml:math id="inf17"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>w</mml:mi></mml:mfrac></mml:mrow></mml:mstyle></mml:math></inline-formula> is also the expected value of <inline-formula><mml:math id="inf18"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula> and we can write the entropy as <inline-formula><mml:math id="inf19"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:mi>E</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:munder><mml:mo>∑</mml:mo><mml:mi>w</mml:mi></mml:munder><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> , which is Shannon entropy. By definition, Shannon entropy is also the expected value of the information content: <inline-formula><mml:math id="inf20"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>E</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mi>I</mml:mi><mml:mo stretchy="false">]</mml:mo><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:munder><mml:mo>∑</mml:mo><mml:mi>w</mml:mi></mml:munder><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo>∑</mml:mo><mml:mi>w</mml:mi></mml:munder><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mi>I</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>w</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> where the information content <inline-formula><mml:math id="inf21"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> of a particular state is <inline-formula><mml:math id="inf22"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>w</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>. Since we assumed each arrangement is equally likely, then the expected value of the information content is also the information content of each arrangement. Therefore, the information content of a <italic>cis</italic>-regulatory sequence can be written as <inline-formula><mml:math id="inf23"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:mrow><mml:mi mathvariant="normal">l</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">g</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="normal">l</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">g</mml:mi></mml:mrow><mml:mi>W</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula>. We use log base 2 to express the information content in bits.</p><p>With this metric, <italic>cis</italic>-regulatory sequences with higher predicted TF occupancies generally have higher information content. Sequences with higher TF diversity have higher information content than lower diversity sequences with the same predicted occupancy. Thus, our metric captures the effects of both TF diversity and total TF occupancy. For example, consider hypothetical TFs A, B, and C. If motifs for only one TF are in a sequence, then <inline-formula><mml:math id="inf24"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> is always one and the information content is always zero (regardless of total occupancy). The simplest case for non-zero information content is one motif for A, one motif for B, and zero motifs for C (1-1-0). Then <inline-formula><mml:math id="inf25"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>!</mml:mo><mml:mn>1</mml:mn><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf26"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> bit. If we increase predicted occupancy by adding a motif for A (2-1-0), then <inline-formula><mml:math id="inf27"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>3</mml:mn><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>!</mml:mo><mml:mn>1</mml:mn><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf28"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mn>1.6</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> bits, which is approximately the information content of silencers and inactive sequences. If we increase predicted occupancy again and add a second motif for B (2-2-0), then <inline-formula><mml:math id="inf29"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>4</mml:mn><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>!</mml:mo><mml:mn>2</mml:mn><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mn>6</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf30"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mn>2.6</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> bits, which is approximately the information content of strong enhancers. If instead of increasing predicted occupancy, we instead increase diversity by replacing a motif for A with a motif for C (1-1-1), then <inline-formula><mml:math id="inf31"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>3</mml:mn><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>!</mml:mo><mml:mn>1</mml:mn><mml:mo>!</mml:mo><mml:mn>1</mml:mn><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mn>6</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> and once again <inline-formula><mml:math id="inf32"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mn>2.6</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> bits, which is higher than the lower diversity case (2-1-0).</p><p>According to <xref ref-type="bibr" rid="bib87">Wunderlich and Mirny, 2009</xref>, the probability of observing <inline-formula><mml:math id="inf33"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> total motifs for <inline-formula><mml:math id="inf34"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> different TFs in a <inline-formula><mml:math id="inf35"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> bp window is <inline-formula><mml:math id="inf36"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>∼</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo>;</mml:mo><mml:mi>λ</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula><ext-link ext-link-type="uri" xlink:href="https://www.codecogs.com/eqnedit.php?latex=P(k)%20sim%20Poisson(k%3B%20lambda)#0">,</ext-link> where <inline-formula><mml:math id="inf37"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>λ</mml:mi><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mi>m</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf38"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> is the probability of finding a spurious motif in the genome. The expected number of windows with <inline-formula><mml:math id="inf39"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula> total motifs in a genome of length <ext-link ext-link-type="uri" xlink:href="https://www.codecogs.com/eqnedit.php?latex=N#0">N</ext-link> is thus <inline-formula><mml:math id="inf40"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>⋅</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula>. In mammals, <inline-formula><mml:math id="inf41"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>N</mml:mi><mml:mo>≈</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mstyle></mml:math></inline-formula> and Wunderlich and Mirny find that <inline-formula><mml:math id="inf42"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.0025</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> for multicellular eukaryotes. For <inline-formula><mml:math id="inf43"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> TFs and a <ext-link ext-link-type="uri" xlink:href="https://www.codecogs.com/eqnedit.php?latex=w%20%3D%20164#0">w=164</ext-link> bp window (which is the size of our sequences), <inline-formula><mml:math id="inf44"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>λ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.123</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf45"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mn>5</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mn>1.6</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> meaning that five total motifs for three different TFs specify an approximately unique 164 bp location in a mammalian genome. Five total motifs for three different TFs can be achieved in two ways: three motifs for A, one for B, and one for C (3-1-1), or two motifs for A, two for B, and one for C (2-2-1). In the case of 3-1-1, <inline-formula><mml:math id="inf46"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>5</mml:mn><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>!</mml:mo><mml:mn>1</mml:mn><mml:mo>!</mml:mo><mml:mn>1</mml:mn><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mn>20</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf47"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mn>4.3</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> bits. In the case of 2-2-1, <inline-formula><mml:math id="inf48"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>W</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>5</mml:mn><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>!</mml:mo><mml:mn>2</mml:mn><mml:mo>!</mml:mo><mml:mn>1</mml:mn><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mn>30</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf49"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mn>4.9</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> bits.</p></sec><sec id="s4-10"><title>Machine learning</title><p>The <italic>k</italic>-mer SVM was fit using gkmSVM (<xref ref-type="bibr" rid="bib19">Ghandi et al., 2014</xref>). All other machine learning, including cross-validation, logistic regression, and computing ROC and PR curves, was performed using scikit-learn version 0.19.1 (<xref ref-type="bibr" rid="bib64">Pedregosa et al., 2011</xref>). We wrote custom Python wrappers for gkmSVM to allow for interfacing between the C++ binaries and the rest of our workflow. We ran gkmSVM with the parameters -l 6 -k 6 -m 1. To estimate model performance, all models were fit with stratified fivefold cross-validation after shuffling the order of sequences. For the TF occupancy logistic regression model, we used L2 regularization. We selected the regularization parameter C by performing grid search with fivefold cross-validation on the values 10<sup>–4</sup>, 10<sup>–3</sup>, 10<sup>–2</sup>, 10<sup>–1</sup>, 1, 10<sup>1</sup>, 10<sup>2</sup>, 10<sup>3</sup>, 10<sup>4</sup> and selecting the value that maximized the F1 score. The optimal value of C was 0.01, which we used as the regularization strength when assessing the performance of the model with other feature sets.</p><p>To assess the performance of the logistic regression model, we randomly sampled eight PWMs from the HOCOMOCO database and computed the predicted occupancy of each TF on each sequence. We then fit a new logistic regression model with these features and repeated this procedure 100 times to generate a background distribution of model performances.</p><p>To generate de novo motifs from the SVM, we generated all 6-mers and scored them against the SVM. We then ran the svmw_emalign.py script from gkmSVM on the <italic>k</italic>-mer scores with the parameters -n 10 -f 2 -m 4 and a PWM length of 6, and then used TOMTOM to compare them to the database from our motif analysis.</p></sec><sec id="s4-11"><title>Other data sources</title><p>We used our previously published library (<xref ref-type="bibr" rid="bib85">White et al., 2013</xref>) as an independent test set for our machine learning models. We defined strong enhancers as ChIP-seq peaks that were above the 95th percentile of all scrambled sequences. There was no basal promoter construct in this library, so instead we defined silencers as ChIP-seq peaks that were at least twofold below the log<sub>2</sub> mean of all scrambled sequences.</p><p>Previously published ChIP-seq data for NRL (<xref ref-type="bibr" rid="bib23">Hao et al., 2012</xref>) that was re-processed by <xref ref-type="bibr" rid="bib31">Hughes et al., 2017</xref> and MEF2D (<xref ref-type="bibr" rid="bib2">Andzelm et al., 2015</xref>) was used to annotate sequences for in vivo TF binding. We converted peaks to mm10 coordinates using the UCSC liftOver tool and then used Bedtools to intersect peaks with our library.</p></sec></sec></body><back><sec id="s5" sec-type="additional-information"><title>Additional information</title><fn-group content-type="competing-interest"><title>Competing interests</title><fn fn-type="COI-statement" id="conf1"><p>No competing interests declared</p></fn></fn-group><fn-group content-type="author-contribution"><title>Author contributions</title><fn fn-type="con" id="con1"><p>Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review and editing</p></fn><fn fn-type="con" id="con2"><p>Investigation</p></fn><fn fn-type="con" id="con3"><p>Investigation</p></fn><fn fn-type="con" id="con4"><p>Funding acquisition, Supervision, Writing – original draft, Writing – review and editing</p></fn><fn fn-type="con" id="con5"><p>Conceptualization, Funding acquisition, Methodology, Supervision, Writing – original draft, Writing – review and editing</p></fn><fn fn-type="con" id="con6"><p>Conceptualization, Funding acquisition, Supervision, Writing – original draft, Writing – review and editing</p></fn></fn-group><fn-group content-type="ethics-information"><title>Ethics</title><fn fn-type="other"><p>This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to protocol # A-3381-01 approved by the Institutional Animal Care and Use Committee of Washington University in St. Louis. Euthanasia of mice was performed according to the recommendations of the American Veterinary Medical Association Guidelines on Euthanasia. Appropriate measures are taken to minimize pain and discomfort to the animals during experimental procedures.</p></fn></fn-group></sec><sec id="s6" sec-type="supplementary-material"><title>Additional files</title><supplementary-material id="supp1"><label>Supplementary file 1.</label><caption><title>FASTA file of all sequences in library 1.</title><p>Sequences were named using the following nomenclature: ‘chrom-start-stop_annotations_variant’. ‘Chrom’, ‘start’, and ‘stop’ correspond to the mm10 genomic coordinates of the sequences in BED format. ‘Annotations’ is a four-letter string where the first position indicates CRX-binding status (<bold><underline>C</underline></bold>hIP-seq peak or <bold><underline>U</underline></bold>nbound), the second position indicates CRX motif status (<bold><underline>P</underline></bold>WM hit, <bold><underline>S</underline></bold>hape motif, or <bold><underline>B</underline></bold>oth PWM and shape motif), the third position indicates ATAC-seq status (peak in <bold><underline>R</underline></bold>ods but not cones, peak in <bold><underline>C</underline></bold>ones but not rods, peak in both rod and cone <bold><underline>P</underline></bold>hotoreceptors, or peak in <bold><underline>N</underline></bold>one of the above), and the fourth position indicates histone ChIP-seq status (‘<bold><underline>E</underline></bold>nhancer marked’ with H3K27Ac<sup>+</sup>H3K4me3<sup>-</sup>, ‘<bold><underline>P</underline></bold>romoter marked’ with H3K27Ac<sup>+</sup>H3K4me3<sup>+</sup>, <bold><underline>Q</underline></bold> for H3K27Ac<sup>-</sup>H3K4me3<sup>+</sup>, or <bold><underline>N</underline></bold>either mark). ‘Variant’ indicates whether the sequence is genomic (‘WT’), mutated CRX motifs (‘MUT-allCrxSites’), scrambled shape motif (‘MUT-shape’), or a scrambled control (‘scrambled’).</p></caption><media mime-subtype="plain" mimetype="text" xlink:href="elife-67403-supp1-v2.txt"/></supplementary-material><supplementary-material id="supp2"><label>Supplementary file 2.</label><caption><title>FASTA file of all sequences in library 2.</title><p>Sequences were named as in <xref ref-type="supplementary-material" rid="supp1">Supplementary file 1</xref>.</p></caption><media mime-subtype="plain" mimetype="text" xlink:href="elife-67403-supp2-v2.txt"/></supplementary-material><supplementary-material id="supp3"><label>Supplementary file 3.</label><caption><title>Expression measurements and annotations of all sequences.</title><p>Values are tab-delimited. Rows are named based on the sequence name from <xref ref-type="supplementary-material" rid="supp1">Supplementary files 1 and 2</xref> without the ‘variant’ information. Columns ending in ‘_WT’ indicate the wild-type sequence with the <italic>Rho</italic> promoter, ‘_MUT’ as the CRX motif mutant sequence with the <italic>Rho</italic> promoter, and ‘_POLY’ as the wild-type sequence with the Polylinker. Sequences with the scrambled shape motif were excluded from the ‘_MUT’ columns. Columns are named as follows: label, the sequence name from <xref ref-type="supplementary-material" rid="supp1">Supplementary files 1 and 2</xref> without the ‘variant’ information; expression, average activity of the sequence, NaN indicates sequence was missing from the plasmid pool; expression_std, standard deviation of activity; expression_reps, number of replicates in which the sequence was measured; expression_pvalue, p-value from Welch’s t-test of log-normal data for the null hypothesis that the activity of the sequence with <italic>Rho</italic> is no different than the <italic>Rho</italic> promoter alone; expression_qvalue, FDR-correction of the p-values; library, which library contains the sequence; expression_log2, log2 average activity of the sequence; group_name, activity classification of the sequence with the <italic>Rho</italic> promoter; plot_color, hex code for visualization; variant, the ‘variant’ portion of the sequence identifier; wt_vs_mut_log2, log2 fold change between the wild-type and mutant version of the sequence, NaN indicates the wild-type and/or mutant version was not measured; wt_vs_mut_pvalue, p-value from Welch’s t-test for the null hypothesis that the wild-type and mutant sequences have the same activity; wt_vs_mut_qvalue, FDR-correction of the p-values; autonomous_activity, Boolean value for if the wild-type sequence is autonomous with the Polylinker; crx_bound, nrl_bound, and mef2d_bound, Boolean values for if the sequence overlaps a ChIP-seq peak for the corresponding TF; binding_group, string denoting each of the eight possible combinations of CRX, NRL, and MEF2D binding.</p></caption><media mime-subtype="plain" mimetype="text" xlink:href="elife-67403-supp3-v2.txt"/></supplementary-material><supplementary-material id="supp4"><label>Supplementary file 4.</label><caption><title>Predicted occupancy scores for each transcription factor (TF) and each sequence.</title><p>Values are tab-delimited. Rows are named based on the sequence name from <xref ref-type="supplementary-material" rid="supp1">Supplementary files 1 and 2</xref> including the ‘variant’ information. Columns are the predicted occupancy scores for the denoted TF.</p></caption><media mime-subtype="plain" mimetype="text" xlink:href="elife-67403-supp4-v2.txt"/></supplementary-material><supplementary-material id="supp5"><label>Supplementary file 5.</label><caption><title>Information content and related metrics for each sequence.</title><p>Values are tab-delimited. Rows are named based on the sequence name from <xref ref-type="supplementary-material" rid="supp1">Supplementary files 1 and 2</xref>, including the ‘variant’ information. Columns are named as follows: total_occupancy, total predicted occupancy of all eight transcription factors (TFs); diversity, number of TFs with predicted occupancy above 0.5; entropy, information content (which is also entropy).</p></caption><media mime-subtype="plain" mimetype="text" xlink:href="elife-67403-supp5-v2.txt"/></supplementary-material><supplementary-material id="supp6"><label>Supplementary file 6.</label><caption><title>Primers used in this study.</title></caption><media mime-subtype="xlsx" mimetype="application" xlink:href="elife-67403-supp6-v2.xlsx"/></supplementary-material><supplementary-material id="transrepform"><label>Transparent reporting form</label><media mime-subtype="docx" mimetype="application" xlink:href="elife-67403-transrepform1-v2.docx"/></supplementary-material></sec><sec id="s7" sec-type="data-availability"><title>Data availability</title><p>The pJK01 and pJK03 plasmids have been deposited with AddGene (IDs 173489, 173490). Raw sequencing data and barcode counts have been uploaded to the NCBI GEO database under accession GSE165812. All processed activity data, predicted occupancy, and information content values are available in the supplementary material. All code for data processing, analysis, and visualization is available on Github at <ext-link ext-link-type="uri" xlink:href="https://github.com/barakcohenlab/CRX-Information-Content">https://github.com/barakcohenlab/CRX-Information-Content</ext-link> (copy archived at <ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:rev:ca108a6fb1d30c9476521eeb7e77f921a4c99323">https://archive.softwareheritage.org/swh:1:rev:ca108a6fb1d30c9476521eeb7e77f921a4c99323</ext-link>).</p><p>The following dataset was generated:</p><p><element-citation id="dataset1" publication-type="data" specific-use="isSupplementedBy"><person-group person-group-type="author"><name><surname>Friedman</surname><given-names>RZ</given-names></name><name><surname>Granas</surname><given-names>DM</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name><name><surname>Cohen</surname><given-names>BA</given-names></name><name><surname>White</surname><given-names>MA</given-names></name></person-group><year iso-8601-date="2021">2021</year><data-title>Information Content Differentiates Enhancers From Silencers in Mouse Photoreceptors</data-title><source>NCBI Gene Expression Omnibus</source><pub-id pub-id-type="accession" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165812">GSE165812</pub-id></element-citation></p><p>The following previously published datasets were used:</p><p><element-citation id="dataset2" publication-type="data" specific-use="references"><person-group person-group-type="author"><name><surname>Langmann</surname><given-names>T</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name></person-group><year iso-8601-date="2010">2010</year><data-title>Deciphering the cis-regulatory architecture of mammalian photoreceptors</data-title><source>NCBI Gene Expression Omnibus</source><pub-id pub-id-type="accession" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20012">GSE20012</pub-id></element-citation></p><p><element-citation id="dataset3" publication-type="data" specific-use="references"><person-group person-group-type="author"><name><surname>Hughes</surname><given-names>AE</given-names></name><name><surname>Enright</surname><given-names>JM</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Shen</surname><given-names>SQ</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name></person-group><year iso-8601-date="2016">2016</year><data-title>ATAC-seq and RNA-seq of adult mouse rods and cones</data-title><source>NCBI Gene Expression Omnibus</source><pub-id pub-id-type="accession" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE83312">GSE83312</pub-id></element-citation></p><p><element-citation id="dataset4" publication-type="data" specific-use="references"><person-group person-group-type="author"><name><surname>Ruzycki</surname><given-names>PA</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2018">2018</year><data-title>CRX directs photoreceptor differentiation by accelerating chromatin remodeling at specific target sites</data-title><source>Additional file 1</source><pub-id pub-id-type="accession" xlink:href="https://static-content.springer.com/esm/art%3A10.1186%2Fs13072-018-0212-2/MediaObjects/13072_2018_212_MOESM1_ESM.xlsx">static-content.springer.com/esm/art%3A10.1186%2Fs13072-018-0212-2/MediaObjects/13072_2018_212_MOESM1_ESM.xlsx</pub-id></element-citation></p><p><element-citation id="dataset5" publication-type="data" specific-use="references"><person-group person-group-type="author"><name><surname>Hao</surname><given-names>H</given-names></name><name><surname>Kim</surname><given-names>DS</given-names></name><name><surname>Klocke</surname><given-names>B</given-names></name><name><surname>Johnson</surname><given-names>KR</given-names></name><name><surname>Cui</surname><given-names>K</given-names></name><name><surname>Gotoh</surname><given-names>N</given-names></name><name><surname>Zang</surname><given-names>C</given-names></name><name><surname>Gregorski</surname><given-names>J</given-names></name><name><surname>Gieser</surname><given-names>L</given-names></name><name><surname>Peng</surname><given-names>W</given-names></name><name><surname>Fann</surname><given-names>Y</given-names></name><name><surname>Seifert</surname><given-names>M</given-names></name><name><surname>Zhao</surname><given-names>K</given-names></name><name><surname>Swaroop</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2012">2012</year><data-title>Transcriptional Regulation of Rod Photoreceptor Homeostasis Revealed by In Vivo NRL Targetome Analysis</data-title><source>NEI Data Share</source><pub-id pub-id-type="accession" xlink:href="https://datashare.nei.nih.gov/nnrlMain.jsp">Hong PLoS-Genet-2012</pub-id></element-citation></p><p><element-citation id="dataset6" publication-type="data" specific-use="references"><person-group person-group-type="author"><name><surname>Andzelm</surname><given-names>MM</given-names></name><name><surname>Cherry</surname><given-names>TJ</given-names></name><name><surname>Harmin</surname><given-names>DA</given-names></name><name><surname>Boeke</surname><given-names>AC</given-names></name><name><surname>Lee</surname><given-names>C</given-names></name><name><surname>Hemberg</surname><given-names>M</given-names></name><name><surname>Pawlyk</surname><given-names>B</given-names></name><name><surname>Malik</surname><given-names>AN</given-names></name><name><surname>Flavell</surname><given-names>SW</given-names></name><name><surname>Sandberg</surname><given-names>MA</given-names></name><name><surname>Raviola</surname><given-names>E</given-names></name><name><surname>Greenberg</surname><given-names>ME</given-names></name></person-group><year iso-8601-date="2015">2015</year><data-title>MEF2D drives photoreceptor development through a genome-wide competition for tissue-specific enhancers</data-title><source>NCBI Gene Expression Omnibus</source><pub-id pub-id-type="accession" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61392">GSE61392</pub-id></element-citation></p></sec><ack id="ack"><title>Acknowledgements</title><p>We thank Gary Stormo and members of the Cohen Lab for critically reading the manuscript and helpful discussions; Philip A Ruzycki, Andrew EO Hughes, and Timothy J Cherry for providing processed ChIP-seq and RNA-seq data; and Jessica Hoisington-Lopez and MariaLynn Crosby from the DNA Sequencing Innovation Lab for assistance with high-throughput sequencing.</p></ack><ref-list><title>References</title><ref id="bib1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alexandre</surname><given-names>C</given-names></name><name><surname>Vincent</surname><given-names>JP</given-names></name></person-group><year iso-8601-date="2003">2003</year><article-title>Requirements for transcriptional repression and activation by engrailed in <italic>Drosophila</italic> embryos</article-title><source>Development</source><volume>130</volume><fpage>729</fpage><lpage>739</lpage><pub-id pub-id-type="doi">10.1242/dev.00286</pub-id><pub-id pub-id-type="pmid">12506003</pub-id></element-citation></ref><ref id="bib2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Andzelm</surname><given-names>MM</given-names></name><name><surname>Cherry</surname><given-names>TJ</given-names></name><name><surname>Harmin</surname><given-names>DA</given-names></name><name><surname>Boeke</surname><given-names>AC</given-names></name><name><surname>Lee</surname><given-names>C</given-names></name><name><surname>Hemberg</surname><given-names>M</given-names></name><name><surname>Pawlyk</surname><given-names>B</given-names></name><name><surname>Malik</surname><given-names>AN</given-names></name><name><surname>Flavell</surname><given-names>SW</given-names></name><name><surname>Sandberg</surname><given-names>MA</given-names></name><name><surname>Raviola</surname><given-names>E</given-names></name><name><surname>Greenberg</surname><given-names>ME</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>MEF2D drives photoreceptor development through a genome-wide competition for tissue-specific enhancers</article-title><source>Neuron</source><volume>86</volume><fpage>247</fpage><lpage>263</lpage><pub-id pub-id-type="doi">10.1016/j.neuron.2015.02.038</pub-id><pub-id pub-id-type="pmid">25801704</pub-id></element-citation></ref><ref id="bib3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bailey</surname><given-names>TL</given-names></name><name><surname>Boden</surname><given-names>M</given-names></name><name><surname>Buske</surname><given-names>FA</given-names></name><name><surname>Frith</surname><given-names>M</given-names></name><name><surname>Grant</surname><given-names>CE</given-names></name><name><surname>Clementi</surname><given-names>L</given-names></name><name><surname>Ren</surname><given-names>J</given-names></name><name><surname>Li</surname><given-names>WW</given-names></name><name><surname>Noble</surname><given-names>WS</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>MEME SUITE: Tools for motif discovery and searching</article-title><source>Nucleic Acids Research</source><volume>37</volume><fpage>W202</fpage><lpage>W208</lpage><pub-id pub-id-type="doi">10.1093/nar/gkp335</pub-id><pub-id pub-id-type="pmid">19458158</pub-id></element-citation></ref><ref id="bib4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Barolo</surname><given-names>S</given-names></name><name><surname>Posakony</surname><given-names>JW</given-names></name></person-group><year iso-8601-date="2002">2002</year><article-title>Three habits of highly effective signaling pathways: Principles of transcriptional control by developmental cell signaling</article-title><source>Genes & Development</source><volume>16</volume><fpage>1167</fpage><lpage>1181</lpage><pub-id pub-id-type="doi">10.1101/gad.976502</pub-id><pub-id pub-id-type="pmid">12023297</pub-id></element-citation></ref><ref id="bib5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brand</surname><given-names>AH</given-names></name><name><surname>Micklem</surname><given-names>G</given-names></name><name><surname>Nasmyth</surname><given-names>K</given-names></name></person-group><year iso-8601-date="1987">1987</year><article-title>A yeast silencer contains sequences that can promote autonomous plasmid replication and transcriptional activation</article-title><source>Cell</source><volume>51</volume><fpage>709</fpage><lpage>719</lpage><pub-id pub-id-type="doi">10.1016/0092-8674(87)90094-8</pub-id><pub-id pub-id-type="pmid">3315230</pub-id></element-citation></ref><ref id="bib6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Wang</surname><given-names>QL</given-names></name><name><surname>Nie</surname><given-names>Z</given-names></name><name><surname>Sun</surname><given-names>H</given-names></name><name><surname>Lennon</surname><given-names>G</given-names></name><name><surname>Copeland</surname><given-names>NG</given-names></name><name><surname>Gilbert</surname><given-names>DJ</given-names></name><name><surname>Jenkins</surname><given-names>NA</given-names></name><name><surname>Zack</surname><given-names>DJ</given-names></name></person-group><year iso-8601-date="1997">1997</year><article-title>Crx, a novel Otx-like paired-homeodomain protein, binds to and transactivates photoreceptor cell-specific genes</article-title><source>Neuron</source><volume>19</volume><fpage>1017</fpage><lpage>1030</lpage><pub-id pub-id-type="doi">10.1016/s0896-6273(00)80394-3</pub-id><pub-id pub-id-type="pmid">9390516</pub-id></element-citation></ref><ref id="bib7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>J</given-names></name><name><surname>Rattner</surname><given-names>A</given-names></name><name><surname>Nathans</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>The rod photoreceptor-specific nuclear receptor Nr2e3 represses transcription of multiple cone-specific genes</article-title><source>The Journal of Neuroscience</source><volume>25</volume><fpage>118</fpage><lpage>129</lpage><pub-id pub-id-type="doi">10.1523/JNEUROSCI.3571-04.2005</pub-id><pub-id pub-id-type="pmid">15634773</pub-id></element-citation></ref><ref id="bib8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chiang</surname><given-names>C</given-names></name><name><surname>Ayyanathan</surname><given-names>K</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>SNAIL/GFI-1 (SNAG) family zinc finger proteins in transcription regulation, chromatin dynamics, cell signaling, development, and disease</article-title><source>Cytokine & Growth Factor Reviews</source><volume>24</volume><fpage>123</fpage><lpage>131</lpage><pub-id pub-id-type="doi">10.1016/j.cytogfr.2012.09.002</pub-id><pub-id pub-id-type="pmid">23102646</pub-id></element-citation></ref><ref id="bib9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Corbo</surname><given-names>JC</given-names></name><name><surname>Lawrence</surname><given-names>KA</given-names></name><name><surname>Karlstetter</surname><given-names>M</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Abdelaziz</surname><given-names>M</given-names></name><name><surname>Dirkes</surname><given-names>W</given-names></name><name><surname>Weigelt</surname><given-names>K</given-names></name><name><surname>Seifert</surname><given-names>M</given-names></name><name><surname>Benes</surname><given-names>V</given-names></name><name><surname>Fritsche</surname><given-names>LG</given-names></name><name><surname>Weber</surname><given-names>BHF</given-names></name><name><surname>Langmann</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2010">2010</year><article-title>CRX ChIP-seq reveals the cis-regulatory architecture of mouse photoreceptors</article-title><source>Genome Research</source><volume>20</volume><fpage>1512</fpage><lpage>1525</lpage><pub-id pub-id-type="doi">10.1101/gr.109405.110</pub-id><pub-id pub-id-type="pmid">20693478</pub-id></element-citation></ref><ref id="bib10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Crocker</surname><given-names>J</given-names></name><name><surname>Abe</surname><given-names>N</given-names></name><name><surname>Rinaldi</surname><given-names>L</given-names></name><name><surname>McGregor</surname><given-names>AP</given-names></name><name><surname>Frankel</surname><given-names>N</given-names></name><name><surname>Wang</surname><given-names>S</given-names></name><name><surname>Alsawadi</surname><given-names>A</given-names></name><name><surname>Valenti</surname><given-names>P</given-names></name><name><surname>Plaza</surname><given-names>S</given-names></name><name><surname>Payre</surname><given-names>F</given-names></name><name><surname>Mann</surname><given-names>RS</given-names></name><name><surname>Stern</surname><given-names>DL</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Low affinity binding site clusters confer hox specificity and regulatory robustness</article-title><source>Cell</source><volume>160</volume><fpage>191</fpage><lpage>203</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2014.11.041</pub-id><pub-id pub-id-type="pmid">25557079</pub-id></element-citation></ref><ref id="bib11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Doni Jayavelu</surname><given-names>N</given-names></name><name><surname>Jajodia</surname><given-names>A</given-names></name><name><surname>Mishra</surname><given-names>A</given-names></name><name><surname>Hawkins</surname><given-names>RD</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Candidate silencer elements for the human and mouse genomes</article-title><source>Nature Communications</source><volume>11</volume><elocation-id>1061</elocation-id><pub-id pub-id-type="doi">10.1038/s41467-020-14853-5</pub-id><pub-id pub-id-type="pmid">32103011</pub-id></element-citation></ref><ref id="bib12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dorval</surname><given-names>KM</given-names></name><name><surname>Bobechko</surname><given-names>BP</given-names></name><name><surname>Fujieda</surname><given-names>H</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Zack</surname><given-names>DJ</given-names></name><name><surname>Bremner</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2006">2006</year><article-title>Chx10 targets a subset of photoreceptor genes</article-title><source>The Journal of Biological Chemistry</source><volume>281</volume><fpage>744</fpage><lpage>751</lpage><pub-id pub-id-type="doi">10.1074/jbc.M509470200</pub-id><pub-id pub-id-type="pmid">16236706</pub-id></element-citation></ref><ref id="bib13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ernst</surname><given-names>J</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>ChromHMM: automating chromatin-state discovery and characterization</article-title><source>Nature Methods</source><volume>9</volume><fpage>215</fpage><lpage>216</lpage><pub-id pub-id-type="doi">10.1038/nmeth.1906</pub-id><pub-id pub-id-type="pmid">22373907</pub-id></element-citation></ref><ref id="bib14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname><given-names>R</given-names></name><name><surname>Toubal</surname><given-names>A</given-names></name><name><surname>Goñi</surname><given-names>S</given-names></name><name><surname>Drareni</surname><given-names>K</given-names></name><name><surname>Huang</surname><given-names>Z</given-names></name><name><surname>Alzaid</surname><given-names>F</given-names></name><name><surname>Ballaire</surname><given-names>R</given-names></name><name><surname>Ancel</surname><given-names>P</given-names></name><name><surname>Liang</surname><given-names>N</given-names></name><name><surname>Damdimopoulos</surname><given-names>A</given-names></name><name><surname>Hainault</surname><given-names>I</given-names></name><name><surname>Soprani</surname><given-names>A</given-names></name><name><surname>Aron-Wisnewsky</surname><given-names>J</given-names></name><name><surname>Foufelle</surname><given-names>F</given-names></name><name><surname>Lawrence</surname><given-names>T</given-names></name><name><surname>Gautier</surname><given-names>JF</given-names></name><name><surname>Venteclef</surname><given-names>N</given-names></name><name><surname>Treuter</surname><given-names>E</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Loss of the co-repressor GPS2 sensitizes macrophage activation upon metabolic stress induced by obesity and type 2 diabetes</article-title><source>Nature Medicine</source><volume>22</volume><fpage>780</fpage><lpage>791</lpage><pub-id pub-id-type="doi">10.1038/nm.4114</pub-id><pub-id pub-id-type="pmid">27270589</pub-id></element-citation></ref><ref id="bib15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Farley</surname><given-names>EK</given-names></name><name><surname>Olson</surname><given-names>KM</given-names></name><name><surname>Zhang</surname><given-names>W</given-names></name><name><surname>Brandt</surname><given-names>AJ</given-names></name><name><surname>Rokhsar</surname><given-names>DS</given-names></name><name><surname>Levine</surname><given-names>MS</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Suboptimization of developmental enhancers</article-title><source>Science</source><volume>350</volume><fpage>325</fpage><lpage>328</lpage><pub-id pub-id-type="doi">10.1126/science.aac6948</pub-id><pub-id pub-id-type="pmid">26472909</pub-id></element-citation></ref><ref id="bib16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Farley</surname><given-names>EK</given-names></name><name><surname>Olson</surname><given-names>KM</given-names></name><name><surname>Zhang</surname><given-names>W</given-names></name><name><surname>Rokhsar</surname><given-names>DS</given-names></name><name><surname>Levine</surname><given-names>MS</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers</article-title><source>PNAS</source><volume>113</volume><fpage>6508</fpage><lpage>6513</lpage><pub-id pub-id-type="doi">10.1073/pnas.1605085113</pub-id><pub-id pub-id-type="pmid">27155014</pub-id></element-citation></ref><ref id="bib17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Freund</surname><given-names>CL</given-names></name><name><surname>Gregory-Evans</surname><given-names>CY</given-names></name><name><surname>Furukawa</surname><given-names>T</given-names></name><name><surname>Papaioannou</surname><given-names>M</given-names></name><name><surname>Looser</surname><given-names>J</given-names></name><name><surname>Ploder</surname><given-names>L</given-names></name><name><surname>Bellingham</surname><given-names>J</given-names></name><name><surname>Ng</surname><given-names>D</given-names></name><name><surname>Herbrick</surname><given-names>JAS</given-names></name><name><surname>Duncan</surname><given-names>A</given-names></name><name><surname>Scherer</surname><given-names>SW</given-names></name><name><surname>Tsui</surname><given-names>LC</given-names></name><name><surname>Loutradis-Anagnostou</surname><given-names>A</given-names></name><name><surname>Jacobson</surname><given-names>SG</given-names></name><name><surname>Cepko</surname><given-names>CL</given-names></name><name><surname>Bhattacharya</surname><given-names>SS</given-names></name><name><surname>McInnes</surname><given-names>RR</given-names></name></person-group><year iso-8601-date="1997">1997</year><article-title>Cone-rod dystrophy due to mutations in a novel photoreceptor-specific homeobox gene (CRX) essential for maintenance of the photoreceptor</article-title><source>Cell</source><volume>91</volume><fpage>543</fpage><lpage>553</lpage><pub-id pub-id-type="doi">10.1016/s0092-8674(00)80440-7</pub-id><pub-id pub-id-type="pmid">9390563</pub-id></element-citation></ref><ref id="bib18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Furukawa</surname><given-names>T</given-names></name><name><surname>Morrow</surname><given-names>EM</given-names></name><name><surname>Cepko</surname><given-names>CL</given-names></name></person-group><year iso-8601-date="1997">1997</year><article-title>Crx, a novel otx-like homeobox gene, shows photoreceptor-specific expression and regulates photoreceptor differentiation</article-title><source>Cell</source><volume>91</volume><fpage>531</fpage><lpage>541</lpage><pub-id pub-id-type="doi">10.1016/s0092-8674(00)80439-0</pub-id><pub-id pub-id-type="pmid">9390562</pub-id></element-citation></ref><ref id="bib19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ghandi</surname><given-names>M</given-names></name><name><surname>Lee</surname><given-names>D</given-names></name><name><surname>Mohammad-Noori</surname><given-names>M</given-names></name><name><surname>Beer</surname><given-names>MA</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features</article-title><source>PLOS Computational Biology</source><volume>10</volume><elocation-id>e1003711</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pcbi.1003711</pub-id><pub-id pub-id-type="pmid">25033408</pub-id></element-citation></ref><ref id="bib20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gisselbrecht</surname><given-names>SS</given-names></name><name><surname>Palagi</surname><given-names>A</given-names></name><name><surname>Kurland</surname><given-names>J</given-names></name><name><surname>Rogers</surname><given-names>JM</given-names></name><name><surname>Ozadam</surname><given-names>H</given-names></name><name><surname>Zhan</surname><given-names>Y</given-names></name><name><surname>Dekker</surname><given-names>J</given-names></name><name><surname>Bulyk</surname><given-names>ML</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Transcriptional silencers in <italic>Drosophila</italic> serve a dual role as transcriptional enhancers in alternate cellular contexts</article-title><source>Molecular Cell</source><volume>77</volume><fpage>324</fpage><lpage>337</lpage><pub-id pub-id-type="doi">10.1016/j.molcel.2019.10.004</pub-id><pub-id pub-id-type="pmid">31704182</pub-id></element-citation></ref><ref id="bib21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Grass</surname><given-names>JA</given-names></name><name><surname>Boyer</surname><given-names>ME</given-names></name><name><surname>Pal</surname><given-names>S</given-names></name><name><surname>Wu</surname><given-names>J</given-names></name><name><surname>Weiss</surname><given-names>MJ</given-names></name><name><surname>Bresnick</surname><given-names>EH</given-names></name></person-group><year iso-8601-date="2003">2003</year><article-title>GATA-1-dependent transcriptional repression of GATA-2 via disruption of positive autoregulation and domain-wide chromatin remodeling</article-title><source>PNAS</source><volume>100</volume><fpage>8811</fpage><lpage>8816</lpage><pub-id pub-id-type="doi">10.1073/pnas.1432147100</pub-id><pub-id pub-id-type="pmid">12857954</pub-id></element-citation></ref><ref id="bib22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haeussler</surname><given-names>M</given-names></name><name><surname>Zweig</surname><given-names>AS</given-names></name><name><surname>Tyner</surname><given-names>C</given-names></name><name><surname>Speir</surname><given-names>ML</given-names></name><name><surname>Rosenbloom</surname><given-names>KR</given-names></name><name><surname>Raney</surname><given-names>BJ</given-names></name><name><surname>Lee</surname><given-names>CM</given-names></name><name><surname>Lee</surname><given-names>BT</given-names></name><name><surname>Hinrichs</surname><given-names>AS</given-names></name><name><surname>Gonzalez</surname><given-names>JN</given-names></name><name><surname>Gibson</surname><given-names>D</given-names></name><name><surname>Diekhans</surname><given-names>M</given-names></name><name><surname>Clawson</surname><given-names>H</given-names></name><name><surname>Casper</surname><given-names>J</given-names></name><name><surname>Barber</surname><given-names>GP</given-names></name><name><surname>Haussler</surname><given-names>D</given-names></name><name><surname>Kuhn</surname><given-names>RM</given-names></name><name><surname>Kent</surname><given-names>WJ</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>The UCSC Genome Browser database: 2019 update</article-title><source>Nucleic Acids Research</source><volume>47</volume><fpage>D853</fpage><lpage>D858</lpage><pub-id pub-id-type="doi">10.1093/nar/gky1095</pub-id><pub-id pub-id-type="pmid">30407534</pub-id></element-citation></ref><ref id="bib23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hao</surname><given-names>H</given-names></name><name><surname>Kim</surname><given-names>DS</given-names></name><name><surname>Klocke</surname><given-names>B</given-names></name><name><surname>Johnson</surname><given-names>KR</given-names></name><name><surname>Cui</surname><given-names>K</given-names></name><name><surname>Gotoh</surname><given-names>N</given-names></name><name><surname>Zang</surname><given-names>C</given-names></name><name><surname>Gregorski</surname><given-names>J</given-names></name><name><surname>Gieser</surname><given-names>L</given-names></name><name><surname>Peng</surname><given-names>W</given-names></name><name><surname>Fann</surname><given-names>Y</given-names></name><name><surname>Seifert</surname><given-names>M</given-names></name><name><surname>Zhao</surname><given-names>K</given-names></name><name><surname>Swaroop</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>Transcriptional regulation of rod photoreceptor homeostasis revealed by in vivo NRL targetome analysis</article-title><source>PLOS Genetics</source><volume>8</volume><elocation-id>e1002649</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pgen.1002649</pub-id><pub-id pub-id-type="pmid">22511886</pub-id></element-citation></ref><ref id="bib24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harris</surname><given-names>CR</given-names></name><name><surname>Millman</surname><given-names>KJ</given-names></name><name><surname>van der Walt</surname><given-names>SJ</given-names></name><name><surname>Gommers</surname><given-names>R</given-names></name><name><surname>Virtanen</surname><given-names>P</given-names></name><name><surname>Cournapeau</surname><given-names>D</given-names></name><name><surname>Wieser</surname><given-names>E</given-names></name><name><surname>Taylor</surname><given-names>J</given-names></name><name><surname>Berg</surname><given-names>S</given-names></name><name><surname>Smith</surname><given-names>NJ</given-names></name><name><surname>Kern</surname><given-names>R</given-names></name><name><surname>Picus</surname><given-names>M</given-names></name><name><surname>Hoyer</surname><given-names>S</given-names></name><name><surname>van Kerkwijk</surname><given-names>MH</given-names></name><name><surname>Brett</surname><given-names>M</given-names></name><name><surname>Haldane</surname><given-names>A</given-names></name><name><surname>Del Río</surname><given-names>JF</given-names></name><name><surname>Wiebe</surname><given-names>M</given-names></name><name><surname>Peterson</surname><given-names>P</given-names></name><name><surname>Gérard-Marchant</surname><given-names>P</given-names></name><name><surname>Sheppard</surname><given-names>K</given-names></name><name><surname>Reddy</surname><given-names>T</given-names></name><name><surname>Weckesser</surname><given-names>W</given-names></name><name><surname>Abbasi</surname><given-names>H</given-names></name><name><surname>Gohlke</surname><given-names>C</given-names></name><name><surname>Oliphant</surname><given-names>TE</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Array programming with NumPy</article-title><source>Nature</source><volume>585</volume><fpage>357</fpage><lpage>362</lpage><pub-id pub-id-type="doi">10.1038/s41586-020-2649-2</pub-id><pub-id pub-id-type="pmid">32939066</pub-id></element-citation></ref><ref id="bib25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hennig</surname><given-names>AK</given-names></name><name><surname>Peng</surname><given-names>GH</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>Regulation of photoreceptor gene expression by Crx-associated transcription factor network</article-title><source>Brain Research</source><volume>1192</volume><fpage>114</fpage><lpage>133</lpage><pub-id pub-id-type="doi">10.1016/j.brainres.2007.06.036</pub-id><pub-id pub-id-type="pmid">17662965</pub-id></element-citation></ref><ref id="bib26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hlawatsch</surname><given-names>J</given-names></name><name><surname>Karlstetter</surname><given-names>M</given-names></name><name><surname>Aslanidis</surname><given-names>A</given-names></name><name><surname>Lückoff</surname><given-names>A</given-names></name><name><surname>Walczak</surname><given-names>Y</given-names></name><name><surname>Plank</surname><given-names>M</given-names></name><name><surname>Böck</surname><given-names>J</given-names></name><name><surname>Langmann</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>Sterile alpha motif containing 7 (SAMD7) is a novel Crx-regulated transcriptional repressor in the retina</article-title><source>PLOS ONE</source><volume>8</volume><elocation-id>e60633</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pone.0060633</pub-id><pub-id pub-id-type="pmid">23565263</pub-id></element-citation></ref><ref id="bib27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hoffman</surname><given-names>MM</given-names></name><name><surname>Buske</surname><given-names>OJ</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name><name><surname>Weng</surname><given-names>Z</given-names></name><name><surname>Bilmes</surname><given-names>JA</given-names></name><name><surname>Noble</surname><given-names>WS</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>Unsupervised pattern discovery in human chromatin structure through genomic segmentation</article-title><source>Nature Methods</source><volume>9</volume><fpage>473</fpage><lpage>476</lpage><pub-id pub-id-type="doi">10.1038/nmeth.1937</pub-id><pub-id pub-id-type="pmid">22426492</pub-id></element-citation></ref><ref id="bib28"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hsiau</surname><given-names>THC</given-names></name><name><surname>Diaconu</surname><given-names>C</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Lee</surname><given-names>J</given-names></name><name><surname>Cepko</surname><given-names>CL</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>The cis-regulatory logic of the mammalian photoreceptor transcriptional network</article-title><source>PLOS ONE</source><volume>2</volume><elocation-id>e643</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pone.0000643</pub-id><pub-id pub-id-type="pmid">17653270</pub-id></element-citation></ref><ref id="bib29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>D</given-names></name><name><surname>Petrykowska</surname><given-names>HM</given-names></name><name><surname>Miller</surname><given-names>BF</given-names></name><name><surname>Elnitski</surname><given-names>L</given-names></name><name><surname>Ovcharenko</surname><given-names>I</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Identification of human silencers by correlating cross-tissue epigenetic profiles and gene expression</article-title><source>Genome Research</source><volume>29</volume><fpage>657</fpage><lpage>667</lpage><pub-id pub-id-type="doi">10.1101/gr.247007.118</pub-id><pub-id pub-id-type="pmid">30886051</pub-id></element-citation></ref><ref id="bib30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>Z</given-names></name><name><surname>Liang</surname><given-names>N</given-names></name><name><surname>Goñi</surname><given-names>S</given-names></name><name><surname>Damdimopoulos</surname><given-names>A</given-names></name><name><surname>Wang</surname><given-names>C</given-names></name><name><surname>Ballaire</surname><given-names>R</given-names></name><name><surname>Jager</surname><given-names>J</given-names></name><name><surname>Niskanen</surname><given-names>H</given-names></name><name><surname>Han</surname><given-names>H</given-names></name><name><surname>Jakobsson</surname><given-names>T</given-names></name><name><surname>Bracken</surname><given-names>AP</given-names></name><name><surname>Aouadi</surname><given-names>M</given-names></name><name><surname>Venteclef</surname><given-names>N</given-names></name><name><surname>Kaikkonen</surname><given-names>MU</given-names></name><name><surname>Fan</surname><given-names>R</given-names></name><name><surname>Treuter</surname><given-names>E</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>The corepressors GPS2 and SMRT control enhancer and silencer remodeling via eRNA transcription during inflammatory activation of macrophages</article-title><source>Molecular Cell</source><volume>81</volume><fpage>953</fpage><lpage>968</lpage><pub-id pub-id-type="doi">10.1016/j.molcel.2020.12.040</pub-id><pub-id pub-id-type="pmid">33503407</pub-id></element-citation></ref><ref id="bib31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hughes</surname><given-names>AEO</given-names></name><name><surname>Enright</surname><given-names>JM</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Shen</surname><given-names>SQ</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Cell type-specific epigenomic analysis reveals a uniquely closed chromatin architecture in mouse rod photoreceptors</article-title><source>Scientific Reports</source><volume>7</volume><elocation-id>43184</elocation-id><pub-id pub-id-type="doi">10.1038/srep43184</pub-id><pub-id pub-id-type="pmid">28256534</pub-id></element-citation></ref><ref id="bib32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hughes</surname><given-names>AEO</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>A massively parallel reporter assay reveals context-dependent activity of homeodomain binding sites in vivo</article-title><source>Genome Research</source><volume>28</volume><fpage>1520</fpage><lpage>1531</lpage><pub-id pub-id-type="doi">10.1101/gr.231886.117</pub-id><pub-id pub-id-type="pmid">30158147</pub-id></element-citation></ref><ref id="bib33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hunter</surname><given-names>JD</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>Matplotlib: A 2D Graphics Environment</article-title><source>Computing in Science & Engineering</source><volume>9</volume><fpage>90</fpage><lpage>95</lpage><pub-id pub-id-type="doi">10.1109/MCSE.2007.55</pub-id></element-citation></ref><ref id="bib34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Irie</surname><given-names>S</given-names></name><name><surname>Sanuki</surname><given-names>R</given-names></name><name><surname>Muranishi</surname><given-names>Y</given-names></name><name><surname>Kato</surname><given-names>K</given-names></name><name><surname>Chaya</surname><given-names>T</given-names></name><name><surname>Furukawa</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Rax Homeoprotein Regulates Photoreceptor Cell Maturation and Survival in Association with Crx in the Postnatal Mouse Retina</article-title><source>Molecular and Cellular Biology</source><volume>35</volume><fpage>2583</fpage><lpage>2596</lpage><pub-id pub-id-type="doi">10.1128/MCB.00048-15</pub-id><pub-id pub-id-type="pmid">25986607</pub-id></element-citation></ref><ref id="bib35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Iype</surname><given-names>T</given-names></name><name><surname>Taylor</surname><given-names>DG</given-names></name><name><surname>Ziesmann</surname><given-names>SM</given-names></name><name><surname>Garmey</surname><given-names>JC</given-names></name><name><surname>Watada</surname><given-names>H</given-names></name><name><surname>Mirmira</surname><given-names>RG</given-names></name></person-group><year iso-8601-date="2004">2004</year><article-title>The transcriptional repressor Nkx6.1 also functions as a deoxyribonucleic acid context-dependent transcriptional activator during pancreatic beta-cell differentiation: evidence for feedback activation of the nkx6.1 gene by Nkx6.1</article-title><source>Molecular Endocrinology</source><volume>18</volume><fpage>1363</fpage><lpage>1375</lpage><pub-id pub-id-type="doi">10.1210/me.2004-0006</pub-id><pub-id pub-id-type="pmid">15056733</pub-id></element-citation></ref><ref id="bib36"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname><given-names>L</given-names></name><name><surname>Oh</surname><given-names>ECT</given-names></name><name><surname>Ng</surname><given-names>L</given-names></name><name><surname>Srinivas</surname><given-names>M</given-names></name><name><surname>Brooks</surname><given-names>M</given-names></name><name><surname>Swaroop</surname><given-names>A</given-names></name><name><surname>Forrest</surname><given-names>D</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Retinoid-related orphan nuclear receptor RORbeta is an early-acting factor in rod photoreceptor development</article-title><source>PNAS</source><volume>106</volume><fpage>17534</fpage><lpage>17539</lpage><pub-id pub-id-type="doi">10.1073/pnas.0902425106</pub-id><pub-id pub-id-type="pmid">19805139</pub-id></element-citation></ref><ref id="bib37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname><given-names>J</given-names></name><name><surname>Cai</surname><given-names>H</given-names></name><name><surname>Zhou</surname><given-names>Q</given-names></name><name><surname>Levine</surname><given-names>M</given-names></name></person-group><year iso-8601-date="1993">1993</year><article-title>Conversion of a dorsal-dependent silencer into an enhancer: evidence for dorsal corepressors</article-title><source>The EMBO Journal</source><volume>12</volume><fpage>3201</fpage><lpage>3209</lpage><pub-id pub-id-type="doi">10.1002/j.1460-2075.1993.tb05989.x</pub-id><pub-id pub-id-type="pmid">8344257</pub-id></element-citation></ref><ref id="bib38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname><given-names>KD</given-names></name><name><surname>Kim</surname><given-names>SI</given-names></name><name><surname>Bresnick</surname><given-names>EH</given-names></name></person-group><year iso-8601-date="2006">2006</year><article-title>Differential sensitivities of transcription factor target genes underlie cell type-specific gene expression profiles</article-title><source>PNAS</source><volume>103</volume><fpage>15939</fpage><lpage>15944</lpage><pub-id pub-id-type="doi">10.1073/pnas.0604041103</pub-id><pub-id pub-id-type="pmid">17043224</pub-id></element-citation></ref><ref id="bib39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Junion</surname><given-names>G</given-names></name><name><surname>Spivakov</surname><given-names>M</given-names></name><name><surname>Girardot</surname><given-names>C</given-names></name><name><surname>Braun</surname><given-names>M</given-names></name><name><surname>Gustafson</surname><given-names>EH</given-names></name><name><surname>Birney</surname><given-names>E</given-names></name><name><surname>Furlong</surname><given-names>EEM</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>A transcription factor collective defines cardiac cell fate and reflects lineage history</article-title><source>Cell</source><volume>148</volume><fpage>473</fpage><lpage>486</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2012.01.030</pub-id><pub-id pub-id-type="pmid">22304916</pub-id></element-citation></ref><ref id="bib40"><element-citation publication-type="software"><person-group person-group-type="author"><name><surname>Justin</surname><given-names>BK</given-names></name></person-group><year iso-8601-date="2021">2021</year><data-title>Logomaker</data-title><source>Github</source><ext-link ext-link-type="uri" xlink:href="https://github.com/jbkinney/logomaker">https://github.com/jbkinney/logomaker</ext-link></element-citation></ref><ref id="bib41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kelley</surname><given-names>DR</given-names></name><name><surname>Snoek</surname><given-names>J</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks</article-title><source>Genome Research</source><volume>26</volume><fpage>990</fpage><lpage>999</lpage><pub-id pub-id-type="doi">10.1101/gr.200535.115</pub-id><pub-id pub-id-type="pmid">27197224</pub-id></element-citation></ref><ref id="bib42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Khan</surname><given-names>A</given-names></name><name><surname>Fornes</surname><given-names>O</given-names></name><name><surname>Stigliani</surname><given-names>A</given-names></name><name><surname>Gheorghe</surname><given-names>M</given-names></name><name><surname>Castro-Mondragon</surname><given-names>JA</given-names></name><name><surname>van der Lee</surname><given-names>R</given-names></name><name><surname>Bessy</surname><given-names>A</given-names></name><name><surname>Chèneby</surname><given-names>J</given-names></name><name><surname>Kulkarni</surname><given-names>SR</given-names></name><name><surname>Tan</surname><given-names>G</given-names></name><name><surname>Baranasic</surname><given-names>D</given-names></name><name><surname>Arenillas</surname><given-names>DJ</given-names></name><name><surname>Sandelin</surname><given-names>A</given-names></name><name><surname>Vandepoele</surname><given-names>K</given-names></name><name><surname>Lenhard</surname><given-names>B</given-names></name><name><surname>Ballester</surname><given-names>B</given-names></name><name><surname>Wasserman</surname><given-names>WW</given-names></name><name><surname>Parcy</surname><given-names>F</given-names></name><name><surname>Mathelier</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework</article-title><source>Nucleic Acids Research</source><volume>46</volume><elocation-id>D1284</elocation-id><pub-id pub-id-type="doi">10.1093/nar/gkx1188</pub-id><pub-id pub-id-type="pmid">29161433</pub-id></element-citation></ref><ref id="bib43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kimura</surname><given-names>A</given-names></name><name><surname>Singh</surname><given-names>D</given-names></name><name><surname>Wawrousek</surname><given-names>EF</given-names></name><name><surname>Kikuchi</surname><given-names>M</given-names></name><name><surname>Nakamura</surname><given-names>M</given-names></name><name><surname>Shinohara</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2000">2000</year><article-title>Both PCE-1/RX and OTX/CRX interactions are necessary for photoreceptor-specific gene expression</article-title><source>The Journal of Biological Chemistry</source><volume>275</volume><fpage>1152</fpage><lpage>1160</lpage><pub-id pub-id-type="doi">10.1074/jbc.275.2.1152</pub-id><pub-id pub-id-type="pmid">10625658</pub-id></element-citation></ref><ref id="bib44"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Klemm</surname><given-names>SL</given-names></name><name><surname>Shipony</surname><given-names>Z</given-names></name><name><surname>Greenleaf</surname><given-names>WJ</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Chromatin accessibility and the regulatory epigenome</article-title><source>Nature Reviews. Genetics</source><volume>20</volume><fpage>207</fpage><lpage>220</lpage><pub-id pub-id-type="doi">10.1038/s41576-018-0089-8</pub-id><pub-id pub-id-type="pmid">30675018</pub-id></element-citation></ref><ref id="bib45"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koike</surname><given-names>C</given-names></name><name><surname>Nishida</surname><given-names>A</given-names></name><name><surname>Ueno</surname><given-names>S</given-names></name><name><surname>Saito</surname><given-names>H</given-names></name><name><surname>Sanuki</surname><given-names>R</given-names></name><name><surname>Sato</surname><given-names>S</given-names></name><name><surname>Furukawa</surname><given-names>A</given-names></name><name><surname>Aizawa</surname><given-names>S</given-names></name><name><surname>Matsuo</surname><given-names>I</given-names></name><name><surname>Suzuki</surname><given-names>N</given-names></name><name><surname>Kondo</surname><given-names>M</given-names></name><name><surname>Furukawa</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>Functional roles of Otx2 transcription factor in postnatal mouse retinal development</article-title><source>Molecular and Cellular Biology</source><volume>27</volume><fpage>8318</fpage><lpage>8329</lpage><pub-id pub-id-type="doi">10.1128/MCB.01209-07</pub-id><pub-id pub-id-type="pmid">17908793</pub-id></element-citation></ref><ref id="bib46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kulakovskiy</surname><given-names>IV</given-names></name><name><surname>Vorontsov</surname><given-names>IE</given-names></name><name><surname>Yevshin</surname><given-names>IS</given-names></name><name><surname>Sharipov</surname><given-names>RN</given-names></name><name><surname>Fedorova</surname><given-names>AD</given-names></name><name><surname>Rumynskiy</surname><given-names>EI</given-names></name><name><surname>Medvedeva</surname><given-names>YA</given-names></name><name><surname>Magana-Mora</surname><given-names>A</given-names></name><name><surname>Bajic</surname><given-names>VB</given-names></name><name><surname>Papatsenko</surname><given-names>DA</given-names></name><name><surname>Kolpakov</surname><given-names>FA</given-names></name><name><surname>Makeev</surname><given-names>VJ</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>HOCOMOCO: Towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis</article-title><source>Nucleic Acids Research</source><volume>46</volume><fpage>D252</fpage><lpage>D259</lpage><pub-id pub-id-type="doi">10.1093/nar/gkx1106</pub-id><pub-id pub-id-type="pmid">29140464</pub-id></element-citation></ref><ref id="bib47"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kwasnieski</surname><given-names>JC</given-names></name><name><surname>Mogno</surname><given-names>I</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name><name><surname>Cohen</surname><given-names>BA</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>Complex effects of nucleotide variants in a mammalian cis-regulatory element</article-title><source>PNAS</source><volume>109</volume><fpage>19498</fpage><lpage>19503</lpage><pub-id pub-id-type="doi">10.1073/pnas.1210678109</pub-id><pub-id pub-id-type="pmid">23129659</pub-id></element-citation></ref><ref id="bib48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kwasnieski</surname><given-names>JC</given-names></name><name><surname>Fiore</surname><given-names>C</given-names></name><name><surname>Chaudhari</surname><given-names>HG</given-names></name><name><surname>Cohen</surname><given-names>BA</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>High-throughput functional testing of ENCODE segmentation predictions</article-title><source>Genome Research</source><volume>24</volume><fpage>1595</fpage><lpage>1602</lpage><pub-id pub-id-type="doi">10.1101/gr.173518.114</pub-id><pub-id pub-id-type="pmid">25035418</pub-id></element-citation></ref><ref id="bib49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>J</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Williams</surname><given-names>N</given-names></name><name><surname>Abdelaziz</surname><given-names>M</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name></person-group><year iso-8601-date="2010">2010</year><article-title>Quantitative fine-tuning of photoreceptor cis-regulatory elements through affinity modulation of transcription factor binding sites</article-title><source>Gene Therapy</source><volume>17</volume><fpage>1390</fpage><lpage>1399</lpage><pub-id pub-id-type="doi">10.1038/gt.2010.77</pub-id><pub-id pub-id-type="pmid">20463752</pub-id></element-citation></ref><ref id="bib50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>D</given-names></name><name><surname>Karchin</surname><given-names>R</given-names></name><name><surname>Beer</surname><given-names>MA</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>Discriminative prediction of mammalian enhancers from DNA sequence</article-title><source>Genome Research</source><volume>21</volume><fpage>2167</fpage><lpage>2180</lpage><pub-id pub-id-type="doi">10.1101/gr.121905.111</pub-id><pub-id pub-id-type="pmid">21875935</pub-id></element-citation></ref><ref id="bib51"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lerner</surname><given-names>LE</given-names></name><name><surname>Peng</surname><given-names>GH</given-names></name><name><surname>Gribanova</surname><given-names>YE</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Farber</surname><given-names>DB</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>Sp4 is expressed in retinal neurons, activates transcription of photoreceptor-specific genes, and synergizes with Crx</article-title><source>The Journal of Biological Chemistry</source><volume>280</volume><fpage>20642</fpage><lpage>20650</lpage><pub-id pub-id-type="doi">10.1074/jbc.M500957200</pub-id><pub-id pub-id-type="pmid">15781457</pub-id></element-citation></ref><ref id="bib52"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>YR</given-names></name><name><surname>Laghari</surname><given-names>ZA</given-names></name><name><surname>Novoa</surname><given-names>CA</given-names></name><name><surname>Hughes</surname><given-names>J</given-names></name><name><surname>Webster</surname><given-names>JRM</given-names></name><name><surname>Goodwin</surname><given-names>PE</given-names></name><name><surname>Wheatley</surname><given-names>SP</given-names></name><name><surname>Scotting</surname><given-names>PJ</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>Sox2 acts as a transcriptional repressor in neural stem cells</article-title><source>BMC Neuroscience</source><volume>15</volume><elocation-id>95</elocation-id><pub-id pub-id-type="doi">10.1186/1471-2202-15-95</pub-id><pub-id pub-id-type="pmid">25103589</pub-id></element-citation></ref><ref id="bib53"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Martínez-Montañés</surname><given-names>F</given-names></name><name><surname>Rienzo</surname><given-names>A</given-names></name><name><surname>Poveda-Huertes</surname><given-names>D</given-names></name><name><surname>Pascual-Ahuir</surname><given-names>A</given-names></name><name><surname>Proft</surname><given-names>M</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>Activator and repressor functions of the Mot3 transcription factor in the osmostress response of <italic>Saccharomyces cerevisiae</italic></article-title><source>Eukaryotic Cell</source><volume>12</volume><fpage>636</fpage><lpage>647</lpage><pub-id pub-id-type="doi">10.1128/EC.00037-13</pub-id><pub-id pub-id-type="pmid">23435728</pub-id></element-citation></ref><ref id="bib54"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>McKinney</surname><given-names>W</given-names></name></person-group><year iso-8601-date="2010">2010</year><conf-name>Data structures for statistical computing in Python</conf-name><article-title>Proceedings of the 9th Python in Science conference</article-title><fpage>51</fpage><lpage>56</lpage></element-citation></ref><ref id="bib55"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mears</surname><given-names>AJ</given-names></name><name><surname>Kondo</surname><given-names>M</given-names></name><name><surname>Swain</surname><given-names>PK</given-names></name><name><surname>Takada</surname><given-names>Y</given-names></name><name><surname>Bush</surname><given-names>RA</given-names></name><name><surname>Saunders</surname><given-names>TL</given-names></name><name><surname>Sieving</surname><given-names>PA</given-names></name><name><surname>Swaroop</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2001">2001</year><article-title>Nrl is required for rod photoreceptor development</article-title><source>Nature Genetics</source><volume>29</volume><fpage>447</fpage><lpage>452</lpage><pub-id pub-id-type="doi">10.1038/ng774</pub-id><pub-id pub-id-type="pmid">11694879</pub-id></element-citation></ref><ref id="bib56"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mitton</surname><given-names>KP</given-names></name><name><surname>Swain</surname><given-names>PK</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Xu</surname><given-names>S</given-names></name><name><surname>Zack</surname><given-names>DJ</given-names></name><name><surname>Swaroop</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2000">2000</year><article-title>The leucine zipper of NRL interacts with the CRX homeodomain. A possible mechanism of transcriptional synergy in rhodopsin regulation</article-title><source>The Journal of Biological Chemistry</source><volume>275</volume><fpage>29794</fpage><lpage>29799</lpage><pub-id pub-id-type="doi">10.1074/jbc.M003658200</pub-id><pub-id pub-id-type="pmid">10887186</pub-id></element-citation></ref><ref id="bib57"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mitton</surname><given-names>KP</given-names></name><name><surname>Swain</surname><given-names>PK</given-names></name><name><surname>Khanna</surname><given-names>H</given-names></name><name><surname>Dowd</surname><given-names>M</given-names></name><name><surname>Apel</surname><given-names>IJ</given-names></name><name><surname>Swaroop</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2003">2003</year><article-title>Interaction of retinal bZIP transcription factor NRL with Flt3-interacting zinc-finger protein Fiz1: possible role of Fiz1 as a transcriptional repressor</article-title><source>Human Molecular Genetics</source><volume>12</volume><fpage>365</fpage><lpage>373</lpage><pub-id pub-id-type="doi">10.1093/hmg/ddg035</pub-id><pub-id pub-id-type="pmid">12566383</pub-id></element-citation></ref><ref id="bib58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname><given-names>JE</given-names></name><name><surname>Purcaro</surname><given-names>MJ</given-names></name><name><surname>Pratt</surname><given-names>HE</given-names></name><name><surname>Epstein</surname><given-names>CB</given-names></name><name><surname>Shoresh</surname><given-names>N</given-names></name><name><surname>Adrian</surname><given-names>J</given-names></name><name><surname>Kawli</surname><given-names>T</given-names></name><name><surname>Davis</surname><given-names>CA</given-names></name><name><surname>Dobin</surname><given-names>A</given-names></name><name><surname>Kaul</surname><given-names>R</given-names></name><name><surname>Halow</surname><given-names>J</given-names></name><name><surname>Van Nostrand</surname><given-names>EL</given-names></name><name><surname>Freese</surname><given-names>P</given-names></name><name><surname>Gorkin</surname><given-names>DU</given-names></name><name><surname>Shen</surname><given-names>Y</given-names></name><name><surname>He</surname><given-names>Y</given-names></name><name><surname>Mackiewicz</surname><given-names>M</given-names></name><name><surname>Pauli-Behn</surname><given-names>F</given-names></name><name><surname>Williams</surname><given-names>BA</given-names></name><name><surname>Mortazavi</surname><given-names>A</given-names></name><name><surname>Keller</surname><given-names>CA</given-names></name><name><surname>Zhang</surname><given-names>XO</given-names></name><name><surname>Elhajjajy</surname><given-names>SI</given-names></name><name><surname>Huey</surname><given-names>J</given-names></name><name><surname>Dickel</surname><given-names>DE</given-names></name><name><surname>Snetkova</surname><given-names>V</given-names></name><name><surname>Wei</surname><given-names>X</given-names></name><name><surname>Wang</surname><given-names>X</given-names></name><name><surname>Rivera-Mulia</surname><given-names>JC</given-names></name><name><surname>Rozowsky</surname><given-names>J</given-names></name><name><surname>Zhang</surname><given-names>J</given-names></name><name><surname>Chhetri</surname><given-names>SB</given-names></name><name><surname>Zhang</surname><given-names>J</given-names></name><name><surname>Victorsen</surname><given-names>A</given-names></name><name><surname>White</surname><given-names>KP</given-names></name><name><surname>Visel</surname><given-names>A</given-names></name><name><surname>Yeo</surname><given-names>GW</given-names></name><name><surname>Burge</surname><given-names>CB</given-names></name><name><surname>Lécuyer</surname><given-names>E</given-names></name><name><surname>Gilbert</surname><given-names>DM</given-names></name><name><surname>Dekker</surname><given-names>J</given-names></name><name><surname>Rinn</surname><given-names>J</given-names></name><name><surname>Mendenhall</surname><given-names>EM</given-names></name><name><surname>Ecker</surname><given-names>JR</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name><name><surname>Klein</surname><given-names>RJ</given-names></name><name><surname>Noble</surname><given-names>WS</given-names></name><name><surname>Kundaje</surname><given-names>A</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name><name><surname>Farnham</surname><given-names>PJ</given-names></name><name><surname>Cherry</surname><given-names>JM</given-names></name><name><surname>Myers</surname><given-names>RM</given-names></name><name><surname>Ren</surname><given-names>B</given-names></name><name><surname>Graveley</surname><given-names>BR</given-names></name><name><surname>Gerstein</surname><given-names>MB</given-names></name><name><surname>Pennacchio</surname><given-names>LA</given-names></name><name><surname>Snyder</surname><given-names>MP</given-names></name><name><surname>Bernstein</surname><given-names>BE</given-names></name><name><surname>Wold</surname><given-names>B</given-names></name><name><surname>Hardison</surname><given-names>RC</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name><name><surname>Stamatoyannopoulos</surname><given-names>JA</given-names></name><name><surname>Weng</surname><given-names>Z</given-names></name><collab>ENCODE Project Consortium</collab></person-group><year iso-8601-date="2020">2020</year><article-title>Expanded encyclopaedias of DNA elements in the human and mouse genomes</article-title><source>Nature</source><volume>583</volume><fpage>699</fpage><lpage>710</lpage><pub-id pub-id-type="doi">10.1038/s41586-020-2493-4</pub-id><pub-id pub-id-type="pmid">32728249</pub-id></element-citation></ref><ref id="bib59"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Morrow</surname><given-names>EM</given-names></name><name><surname>Furukawa</surname><given-names>T</given-names></name><name><surname>Lee</surname><given-names>JE</given-names></name><name><surname>Cepko</surname><given-names>CL</given-names></name></person-group><year iso-8601-date="1999">1999</year><article-title>NeuroD regulates multiple functions in the developing neural retina in rodent</article-title><source>Development</source><volume>126</volume><fpage>23</fpage><lpage>36</lpage><pub-id pub-id-type="doi">10.1242/dev.126.1.23</pub-id><pub-id pub-id-type="pmid">9834183</pub-id></element-citation></ref><ref id="bib60"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Murphy</surname><given-names>DP</given-names></name><name><surname>Hughes</surname><given-names>AE</given-names></name><name><surname>Lawrence</surname><given-names>KA</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Cis-regulatory basis of sister cell type divergence in the vertebrate retina</article-title><source>eLife</source><volume>8</volume><elocation-id>e48216</elocation-id><pub-id pub-id-type="doi">10.7554/eLife.48216</pub-id><pub-id pub-id-type="pmid">31633482</pub-id></element-citation></ref><ref id="bib61"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ngan</surname><given-names>CY</given-names></name><name><surname>Wong</surname><given-names>CH</given-names></name><name><surname>Tjong</surname><given-names>H</given-names></name><name><surname>Wang</surname><given-names>W</given-names></name><name><surname>Goldfeder</surname><given-names>RL</given-names></name><name><surname>Choi</surname><given-names>C</given-names></name><name><surname>He</surname><given-names>H</given-names></name><name><surname>Gong</surname><given-names>L</given-names></name><name><surname>Lin</surname><given-names>J</given-names></name><name><surname>Urban</surname><given-names>B</given-names></name><name><surname>Chow</surname><given-names>J</given-names></name><name><surname>Li</surname><given-names>M</given-names></name><name><surname>Lim</surname><given-names>J</given-names></name><name><surname>Philip</surname><given-names>V</given-names></name><name><surname>Murray</surname><given-names>SA</given-names></name><name><surname>Wang</surname><given-names>H</given-names></name><name><surname>Wei</surname><given-names>CL</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Chromatin interaction analyses elucidate the roles of PRC2-bound silencers in mouse development</article-title><source>Nature Genetics</source><volume>52</volume><fpage>264</fpage><lpage>272</lpage><pub-id pub-id-type="doi">10.1038/s41588-020-0581-x</pub-id><pub-id pub-id-type="pmid">32094912</pub-id></element-citation></ref><ref id="bib62"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pang</surname><given-names>B</given-names></name><name><surname>Snyder</surname><given-names>MP</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Systematic identification of silencers in human cells</article-title><source>Nature Genetics</source><volume>52</volume><fpage>254</fpage><lpage>263</lpage><pub-id pub-id-type="doi">10.1038/s41588-020-0578-5</pub-id><pub-id pub-id-type="pmid">32094911</pub-id></element-citation></ref><ref id="bib63"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Parker</surname><given-names>DS</given-names></name><name><surname>White</surname><given-names>MA</given-names></name><name><surname>Ramos</surname><given-names>A</given-names></name><name><surname>Cohen</surname><given-names>BA</given-names></name><name><surname>Barolo</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>The cis-regulatory logic of Hedgehog gradient responses: key roles for gli binding affinity, competition, and cooperativity</article-title><source>Science Signaling</source><volume>4</volume><elocation-id>ra38</elocation-id><pub-id pub-id-type="doi">10.1126/scisignal.2002077</pub-id></element-citation></ref><ref id="bib64"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pedregosa</surname><given-names>F</given-names></name><name><surname>Varoquaux</surname><given-names>G</given-names></name><name><surname>Gramfort</surname><given-names>A</given-names></name><name><surname>Michel</surname><given-names>V</given-names></name><name><surname>Thirion</surname><given-names>B</given-names></name><name><surname>Grisel</surname><given-names>O</given-names></name><name><surname>Blondel</surname><given-names>M</given-names></name><name><surname>Prettenhofer</surname><given-names>P</given-names></name><name><surname>Weiss</surname><given-names>R</given-names></name><name><surname>Dubourg</surname><given-names>V</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>Scikit-learn: Machine learning in Python</article-title><source>The Journal of Machine Learning Research</source><volume>12</volume><fpage>2825</fpage><lpage>2830</lpage></element-citation></ref><ref id="bib65"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname><given-names>GH</given-names></name><name><surname>Ahmad</surname><given-names>O</given-names></name><name><surname>Ahmad</surname><given-names>F</given-names></name><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>The photoreceptor-specific nuclear receptor Nr2e3 interacts with CRX and exerts opposing effects on the transcription of rod versus cone genes</article-title><source>Human Molecular Genetics</source><volume>14</volume><fpage>747</fpage><lpage>764</lpage><pub-id pub-id-type="doi">10.1093/hmg/ddi070</pub-id><pub-id pub-id-type="pmid">15689355</pub-id></element-citation></ref><ref id="bib66"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Petrykowska</surname><given-names>HM</given-names></name><name><surname>Vockley</surname><given-names>CM</given-names></name><name><surname>Elnitski</surname><given-names>L</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>Detection and characterization of silencers and enhancer-blockers in the greater CFTR locus</article-title><source>Genome Research</source><volume>18</volume><fpage>1238</fpage><lpage>1246</lpage><pub-id pub-id-type="doi">10.1101/gr.073817.107</pub-id><pub-id pub-id-type="pmid">18436892</pub-id></element-citation></ref><ref id="bib67"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Phillips</surname><given-names>R</given-names></name><name><surname>Kondev</surname><given-names>J</given-names></name><name><surname>Theriot</surname><given-names>J</given-names></name><name><surname>Garcia</surname><given-names>HG</given-names></name><name><surname>Orme</surname><given-names>N</given-names></name></person-group><year iso-8601-date="2012">2012</year><source>Physical Biology of the Cell</source><publisher-name>Garland Science</publisher-name><pub-id pub-id-type="doi">10.1201/9781134111589</pub-id></element-citation></ref><ref id="bib68"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname><given-names>AR</given-names></name><name><surname>Hall</surname><given-names>IM</given-names></name></person-group><year iso-8601-date="2010">2010</year><article-title>Bedtools: A flexible suite of utilities for comparing genomic features</article-title><source>Bioinformatics</source><volume>26</volume><fpage>841</fpage><lpage>842</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id><pub-id pub-id-type="pmid">20110278</pub-id></element-citation></ref><ref id="bib69"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rachmin</surname><given-names>I</given-names></name><name><surname>Amsalem</surname><given-names>E</given-names></name><name><surname>Golomb</surname><given-names>E</given-names></name><name><surname>Beeri</surname><given-names>R</given-names></name><name><surname>Gilon</surname><given-names>D</given-names></name><name><surname>Fang</surname><given-names>P</given-names></name><name><surname>Nechushtan</surname><given-names>H</given-names></name><name><surname>Kay</surname><given-names>G</given-names></name><name><surname>Guo</surname><given-names>M</given-names></name><name><surname>Yiqing</surname><given-names>PL</given-names></name><name><surname>Foo</surname><given-names>RSY</given-names></name><name><surname>Fisher</surname><given-names>DE</given-names></name><name><surname>Razin</surname><given-names>E</given-names></name><name><surname>Tshori</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>FHL2 switches MITF from activator to repressor of Erbin expression during cardiac hypertrophy</article-title><source>International Journal of Cardiology</source><volume>195</volume><fpage>85</fpage><lpage>94</lpage><pub-id pub-id-type="doi">10.1016/j.ijcard.2015.05.108</pub-id><pub-id pub-id-type="pmid">26025865</pub-id></element-citation></ref><ref id="bib70"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rister</surname><given-names>J</given-names></name><name><surname>Razzaq</surname><given-names>A</given-names></name><name><surname>Boodram</surname><given-names>P</given-names></name><name><surname>Desai</surname><given-names>N</given-names></name><name><surname>Tsanis</surname><given-names>C</given-names></name><name><surname>Chen</surname><given-names>H</given-names></name><name><surname>Jukam</surname><given-names>D</given-names></name><name><surname>Desplan</surname><given-names>C</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Single-base pair differences in a shared motif determine differential Rhodopsin expression</article-title><source>Science</source><volume>350</volume><fpage>1258</fpage><lpage>1261</lpage><pub-id pub-id-type="doi">10.1126/science.aab3417</pub-id><pub-id pub-id-type="pmid">26785491</pub-id></element-citation></ref><ref id="bib71"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Roger</surname><given-names>JE</given-names></name><name><surname>Hiriyanna</surname><given-names>A</given-names></name><name><surname>Gotoh</surname><given-names>N</given-names></name><name><surname>Hao</surname><given-names>H</given-names></name><name><surname>Cheng</surname><given-names>DF</given-names></name><name><surname>Ratnapriya</surname><given-names>R</given-names></name><name><surname>Kautzmann</surname><given-names>MAI</given-names></name><name><surname>Chang</surname><given-names>B</given-names></name><name><surname>Swaroop</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>OTX2 loss causes rod differentiation defect in CRX-associated congenital blindness</article-title><source>The Journal of Clinical Investigation</source><volume>124</volume><fpage>631</fpage><lpage>643</lpage><pub-id pub-id-type="doi">10.1172/JCI72722</pub-id><pub-id pub-id-type="pmid">24382353</pub-id></element-citation></ref><ref id="bib72"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ruzycki</surname><given-names>PA</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>CRX directs photoreceptor differentiation by accelerating chromatin remodeling at specific target sites</article-title><source>Epigenetics & Chromatin</source><volume>11</volume><elocation-id>42</elocation-id><pub-id pub-id-type="doi">10.1186/s13072-018-0212-2</pub-id><pub-id pub-id-type="pmid">30068366</pub-id></element-citation></ref><ref id="bib73"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Samee</surname><given-names>MAH</given-names></name><name><surname>Bruneau</surname><given-names>BG</given-names></name><name><surname>Pollard</surname><given-names>KS</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>A De Novo Shape Motif Discovery Algorithm Reveals Preferences of Transcription Factors for DNA Shape Beyond Sequence Motifs</article-title><source>Cell Systems</source><volume>8</volume><fpage>27</fpage><lpage>42</lpage><pub-id pub-id-type="doi">10.1016/j.cels.2018.12.001</pub-id><pub-id pub-id-type="pmid">30660610</pub-id></element-citation></ref><ref id="bib74"><element-citation publication-type="software"><person-group person-group-type="author"><name><surname>Samee</surname><given-names>MAH</given-names></name></person-group><year iso-8601-date="2021">2021</year><data-title>Shape-motif</data-title><source>Github</source><ext-link ext-link-type="uri" xlink:href="https://github.com/h-samee/shape-motif">https://github.com/h-samee/shape-motif</ext-link></element-citation></ref><ref id="bib75"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sanuki</surname><given-names>R</given-names></name><name><surname>Omori</surname><given-names>Y</given-names></name><name><surname>Koike</surname><given-names>C</given-names></name><name><surname>Sato</surname><given-names>S</given-names></name><name><surname>Furukawa</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2010">2010</year><article-title>Panky, a novel photoreceptor-specific ankyrin repeat protein, is a transcriptional cofactor that suppresses CRX-regulated photoreceptor genes</article-title><source>FEBS Letters</source><volume>584</volume><fpage>753</fpage><lpage>758</lpage><pub-id pub-id-type="doi">10.1016/j.febslet.2009.12.030</pub-id><pub-id pub-id-type="pmid">20026326</pub-id></element-citation></ref><ref id="bib76"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Segert</surname><given-names>JA</given-names></name><name><surname>Gisselbrecht</surname><given-names>SS</given-names></name><name><surname>Bulyk</surname><given-names>ML</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>Transcriptional silencers: Driving gene expression with the brakes on</article-title><source>Trends in Genetics</source><volume>37</volume><fpage>514</fpage><lpage>527</lpage><pub-id pub-id-type="doi">10.1016/j.tig.2021.02.002</pub-id><pub-id pub-id-type="pmid">33712326</pub-id></element-citation></ref><ref id="bib77"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sethi</surname><given-names>A</given-names></name><name><surname>Gu</surname><given-names>M</given-names></name><name><surname>Gumusgoz</surname><given-names>E</given-names></name><name><surname>Chan</surname><given-names>L</given-names></name><name><surname>Yan</surname><given-names>K-K</given-names></name><name><surname>Rozowsky</surname><given-names>J</given-names></name><name><surname>Barozzi</surname><given-names>I</given-names></name><name><surname>Afzal</surname><given-names>V</given-names></name><name><surname>Akiyama</surname><given-names>JA</given-names></name><name><surname>Plajzer-Frick</surname><given-names>I</given-names></name><name><surname>Yan</surname><given-names>C</given-names></name><name><surname>Novak</surname><given-names>CS</given-names></name><name><surname>Kato</surname><given-names>M</given-names></name><name><surname>Garvin</surname><given-names>TH</given-names></name><name><surname>Pham</surname><given-names>Q</given-names></name><name><surname>Harrington</surname><given-names>A</given-names></name><name><surname>Mannion</surname><given-names>BJ</given-names></name><name><surname>Lee</surname><given-names>EA</given-names></name><name><surname>Fukuda-Yuzawa</surname><given-names>Y</given-names></name><name><surname>Visel</surname><given-names>A</given-names></name><name><surname>Dickel</surname><given-names>DE</given-names></name><name><surname>Yip</surname><given-names>KY</given-names></name><name><surname>Sutton</surname><given-names>R</given-names></name><name><surname>Pennacchio</surname><given-names>LA</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Supervised enhancer prediction with epigenetic pattern recognition and targeted validation</article-title><source>Nature Methods</source><volume>17</volume><fpage>807</fpage><lpage>814</lpage><pub-id pub-id-type="doi">10.1038/s41592-020-0907-8</pub-id><pub-id pub-id-type="pmid">32737473</pub-id></element-citation></ref><ref id="bib78"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Spitz</surname><given-names>F</given-names></name><name><surname>Furlong</surname><given-names>EEM</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>Transcription factors: From enhancer binding to developmental control</article-title><source>Nature Reviews. Genetics</source><volume>13</volume><fpage>613</fpage><lpage>626</lpage><pub-id pub-id-type="doi">10.1038/nrg3207</pub-id><pub-id pub-id-type="pmid">22868264</pub-id></element-citation></ref><ref id="bib79"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Srinivas</surname><given-names>M</given-names></name><name><surname>Ng</surname><given-names>L</given-names></name><name><surname>Liu</surname><given-names>H</given-names></name><name><surname>Jia</surname><given-names>L</given-names></name><name><surname>Forrest</surname><given-names>D</given-names></name></person-group><year iso-8601-date="2006">2006</year><article-title>Activation of the blue opsin gene in cone photoreceptor development by retinoid-related orphan receptor beta</article-title><source>Molecular Endocrinology</source><volume>20</volume><fpage>1728</fpage><lpage>1741</lpage><pub-id pub-id-type="doi">10.1210/me.2005-0505</pub-id><pub-id pub-id-type="pmid">16574740</pub-id></element-citation></ref><ref id="bib80"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Stampfel</surname><given-names>G</given-names></name><name><surname>Kazmar</surname><given-names>T</given-names></name><name><surname>Frank</surname><given-names>O</given-names></name><name><surname>Wienerroither</surname><given-names>S</given-names></name><name><surname>Reiter</surname><given-names>F</given-names></name><name><surname>Stark</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Transcriptional regulators form diverse groups with context-dependent regulatory functions</article-title><source>Nature</source><volume>528</volume><fpage>147</fpage><lpage>151</lpage><pub-id pub-id-type="doi">10.1038/nature15545</pub-id><pub-id pub-id-type="pmid">26550828</pub-id></element-citation></ref><ref id="bib81"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tareen</surname><given-names>A</given-names></name><name><surname>Kinney</surname><given-names>JB</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Logomaker: beautiful sequence logos in Python</article-title><source>Bioinformatics</source><volume>36</volume><fpage>2272</fpage><lpage>2274</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/btz921</pub-id><pub-id pub-id-type="pmid">31821414</pub-id></element-citation></ref><ref id="bib82"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Virtanen</surname><given-names>P</given-names></name><name><surname>Gommers</surname><given-names>R</given-names></name><name><surname>Oliphant</surname><given-names>TE</given-names></name><name><surname>Haberland</surname><given-names>M</given-names></name><name><surname>Reddy</surname><given-names>T</given-names></name><name><surname>Cournapeau</surname><given-names>D</given-names></name><name><surname>Burovski</surname><given-names>E</given-names></name><name><surname>Peterson</surname><given-names>P</given-names></name><name><surname>Weckesser</surname><given-names>W</given-names></name><name><surname>Bright</surname><given-names>J</given-names></name><name><surname>van der Walt</surname><given-names>SJ</given-names></name><name><surname>Brett</surname><given-names>M</given-names></name><name><surname>Wilson</surname><given-names>J</given-names></name><name><surname>Millman</surname><given-names>KJ</given-names></name><name><surname>Mayorov</surname><given-names>N</given-names></name><name><surname>Nelson</surname><given-names>ARJ</given-names></name><name><surname>Jones</surname><given-names>E</given-names></name><name><surname>Kern</surname><given-names>R</given-names></name><name><surname>Larson</surname><given-names>E</given-names></name><name><surname>Carey</surname><given-names>CJ</given-names></name><name><surname>Polat</surname><given-names>İ</given-names></name><name><surname>Feng</surname><given-names>Y</given-names></name><name><surname>Moore</surname><given-names>EW</given-names></name><name><surname>VanderPlas</surname><given-names>J</given-names></name><name><surname>Laxalde</surname><given-names>D</given-names></name><name><surname>Perktold</surname><given-names>J</given-names></name><name><surname>Cimrman</surname><given-names>R</given-names></name><name><surname>Henriksen</surname><given-names>I</given-names></name><name><surname>Quintero</surname><given-names>EA</given-names></name><name><surname>Harris</surname><given-names>CR</given-names></name><name><surname>Archibald</surname><given-names>AM</given-names></name><name><surname>Ribeiro</surname><given-names>AH</given-names></name><name><surname>Pedregosa</surname><given-names>F</given-names></name><name><surname>van Mulbregt</surname><given-names>P</given-names></name><collab>SciPy 1.0 Contributors</collab></person-group><year iso-8601-date="2020">2020</year><article-title>SciPy 1.0: Fundamental algorithms for scientific computing in Python</article-title><source>Nature Methods</source><volume>17</volume><fpage>261</fpage><lpage>272</lpage><pub-id pub-id-type="doi">10.1038/s41592-019-0686-2</pub-id><pub-id pub-id-type="pmid">32015543</pub-id></element-citation></ref><ref id="bib83"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>S</given-names></name><name><surname>Sengel</surname><given-names>C</given-names></name><name><surname>Emerson</surname><given-names>MM</given-names></name><name><surname>Cepko</surname><given-names>CL</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>A gene regulatory network controls the binary fate decision of rod and bipolar cells in the vertebrate retina</article-title><source>Developmental Cell</source><volume>30</volume><fpage>513</fpage><lpage>527</lpage><pub-id pub-id-type="doi">10.1016/j.devcel.2014.07.018</pub-id><pub-id pub-id-type="pmid">25155555</pub-id></element-citation></ref><ref id="bib84"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Webber</surname><given-names>AL</given-names></name><name><surname>Hodor</surname><given-names>P</given-names></name><name><surname>Thut</surname><given-names>CJ</given-names></name><name><surname>Vogt</surname><given-names>TF</given-names></name><name><surname>Zhang</surname><given-names>T</given-names></name><name><surname>Holder</surname><given-names>DJ</given-names></name><name><surname>Petrukhin</surname><given-names>K</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>Dual role of Nr2e3 in photoreceptor development and maintenance</article-title><source>Experimental Eye Research</source><volume>87</volume><fpage>35</fpage><lpage>48</lpage><pub-id pub-id-type="doi">10.1016/j.exer.2008.04.006</pub-id><pub-id pub-id-type="pmid">18547563</pub-id></element-citation></ref><ref id="bib85"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>White</surname><given-names>MA</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name><name><surname>Cohen</surname><given-names>BA</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks</article-title><source>PNAS</source><volume>110</volume><fpage>11952</fpage><lpage>11957</lpage><pub-id pub-id-type="doi">10.1073/pnas.1307449110</pub-id><pub-id pub-id-type="pmid">23818646</pub-id></element-citation></ref><ref id="bib86"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>White</surname><given-names>MA</given-names></name><name><surname>Kwasnieski</surname><given-names>JC</given-names></name><name><surname>Myers</surname><given-names>CA</given-names></name><name><surname>Shen</surname><given-names>SQ</given-names></name><name><surname>Corbo</surname><given-names>JC</given-names></name><name><surname>Cohen</surname><given-names>BA</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>A Simple Grammar Defines Activating and Repressing cis-Regulatory Elements in Photoreceptors</article-title><source>Cell Reports</source><volume>17</volume><fpage>1247</fpage><lpage>1254</lpage><pub-id pub-id-type="doi">10.1016/j.celrep.2016.09.066</pub-id><pub-id pub-id-type="pmid">27783940</pub-id></element-citation></ref><ref id="bib87"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wunderlich</surname><given-names>Z</given-names></name><name><surname>Mirny</surname><given-names>LA</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Different gene regulation strategies revealed by analysis of binding motifs</article-title><source>Trends in Genetics</source><volume>25</volume><fpage>434</fpage><lpage>440</lpage><pub-id pub-id-type="doi">10.1016/j.tig.2009.08.003</pub-id><pub-id pub-id-type="pmid">19815308</pub-id></element-citation></ref><ref id="bib88"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname><given-names>Z</given-names></name><name><surname>Ding</surname><given-names>K</given-names></name><name><surname>Pan</surname><given-names>L</given-names></name><name><surname>Deng</surname><given-names>M</given-names></name><name><surname>Gan</surname><given-names>L</given-names></name></person-group><year iso-8601-date="2003">2003</year><article-title>Math5 determines the competence state of retinal ganglion cell progenitors</article-title><source>Developmental Biology</source><volume>264</volume><fpage>240</fpage><lpage>254</lpage><pub-id pub-id-type="doi">10.1016/j.ydbio.2003.08.005</pub-id><pub-id pub-id-type="pmid">14623245</pub-id></element-citation></ref><ref id="bib89"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname><given-names>Y</given-names></name><name><surname>Granas</surname><given-names>D</given-names></name><name><surname>Stormo</surname><given-names>GD</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Inferring binding energies from selected binding sites</article-title><source>PLOS Computational Biology</source><volume>5</volume><elocation-id>e1000590</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pcbi.1000590</pub-id><pub-id pub-id-type="pmid">19997485</pub-id></element-citation></ref><ref id="bib90"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname><given-names>J</given-names></name><name><surname>Troyanskaya</surname><given-names>OG</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Predicting effects of noncoding variants with deep learning-based sequence model</article-title><source>Nature Methods</source><volume>12</volume><fpage>931</fpage><lpage>934</lpage><pub-id pub-id-type="doi">10.1038/nmeth.3547</pub-id><pub-id pub-id-type="pmid">26301843</pub-id></element-citation></ref></ref-list></back><sub-article article-type="decision-letter" id="sa1"><front-stub><article-id pub-id-type="doi">10.7554/eLife.67403.sa1</article-id><title-group><article-title>Decision letter</article-title></title-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Barkai</surname><given-names>Naama</given-names></name><role>Reviewing Editor</role><aff><institution>Weizmann Institute of Science</institution><country>Israel</country></aff></contrib></contrib-group><contrib-group><contrib contrib-type="reviewer"><name><surname>Ponting</surname><given-names>Chris P</given-names></name><role>Reviewer</role><aff><institution>University of Edinburgh</institution><country>United Kingdom</country></aff></contrib></contrib-group></front-stub><body><boxed-text id="box1"><p>In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.</p></boxed-text><p><bold>Acceptance summary:</bold></p><p>This manuscript will be of interest to geneticists seeking to establish rules that govern gene regulation. To explain why a sequence enhances, rather than silences, gene transcription the authors draw our attention away from the binding of a single transcription factor, to focus instead on the number and diversity of transcription factor molecules that bind to it. Using a relatively simple metric called sequence information content they appear to be able to improve the prediction of enhancer over silencer sequences. A concern is whether the silencers are true silencers, or whether they only act as such in this specific experimental paradigm.</p><p><bold>Decision letter after peer review:</bold></p><p>[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]</p><p>Thank you for submitting your work entitled "Information content differentiates enhancers from silencers in mouse photoreceptors" for consideration by <italic>eLife</italic>. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Chris P Ponting (Reviewer #1).</p><p>Based on the reviews we received, I am sorry that we cannot offer publication in <italic>eLife</italic>.</p><p>While one of the reviewers was very positive, the other reviewer pointed out substantial weaknesses. We acknowledge that your title clearly makes narrow claims, but for <italic>eLife</italic> we would have hoped that the findings could be generalized.</p><p><italic>Reviewer #1:</italic></p><p>Why can silencers be enhancers in other cell types? Why is it that active chromatin epigenetic marks or binding of a single transcription factor do not reliably predict active enhancers? These are thorny issues in genomics because they hinder our mechanistic understanding of gene transcription regulation.</p><p>In this well-written submission the authors go beyond their previous publications using the same experimental system (White et al., 2013, 2016). They use MPRA for the CRX transcription factor (TF) in explanted mouse retinas to show that epigenetically indistinguishable sequences are classified more accurately as enhancers or silencers by the number and diversity of lineage-specific transcription factor binding motifs that they contain. They separate enhancers from silencers by enhancers' more diverse collection of TF motifs. This distinction is captured in a metric called sequence information content calculated from both TF motif count and diversity. This single metric is slightly worse at predicting strong enhancers over silencers than a model considering the PWMs for 8 TFs.</p><p>1. Whether the authors observe a bias in the linear arrangement of these TFs' motifs that might assist in distinguishing enhancers from silencers?</p><p>2. p10 the choice of the 8 lineage-defining TFs was somewhat arbitrary because of the arbitrary nature of PWM significance thresholds. Please justify their choice and number, and comment on how well the model performs when this TF set is altered?</p><p>Does an evolutionary change in information content calculated between orthologous <italic>Mus musculus</italic> and, say, Mus spretus sequence help to separate active (enhancer/silencer) sequence from inactive sequence? (https://doi.org/10.1038/sdata.2016.75).</p><p>p9 please explain further "we chose to represent the zinc finger motif with MAZ based on the PWM qualities".</p><p>p21 why were different FIMO p-value thresholds applied?</p><p>p27 line 562 "silencers as negatives"?</p><p>line 132 "For motifs that matched multiple TFs, we selected one representative TF for downstream analysis (Figure 2—figure supplement 2, Methods)". Please explain further.</p><p><italic>Reviewer #2:</italic></p><p>The authors state that enhancers and silencers often have the same epigenomic profiles and attempt to identify sequence-based information to differentiate between the two types of elements. They use massively parallel reporter assays to test elements that bind CRX for activity in retinal explants. The authors then look for differences in motif content between the elements that act as silencers vs. those that act as enhancers of gene expression from a basal promoter. They find that although enhancers and silencers have motifs for the same transcription factor – CRX, the number of sites and diversity of other TF sites is greater within enhancers. They suggest motif content is a way to distinguish between the two types of elements. I'm not convinced that anything can be determined about silencers using this experimental design.</p><p>Strengths:</p><p>The authors test many putative enhancers in mouse retinas and identify elements whose function requires CRX sites.</p><p>Interestingly, different behaviors of functional elements could not be predicted based on differences in DNA accessibility or ATAC-seq peak or CRX occupancy. This is a nice systematic example of how difficult it is to predict an enhancer strength or activity based on differences in epigenomic data and highlights the need for sequence-based approaches to identify the specific activity of an element.</p><p>They do a nice analysis of the inert vs. weak and strong enhancers. The data and analysis of these experiments could be really informative for understanding why not all regions that bind CRX and are within open chromatin are active enhancers.</p><p>Weaknesses:</p><p>I'm concerned that the silencers they detect could be an artifact of the experimental design. The promoter contains CRX sites and NRL sites, so there is some level of basal expression; the silencers are enriched in repressors, so is it just that the elements containing a repressor are silencing the basal transcription? Moreover, what does this mean relative to the elements in the endogenous locus, if an endogenous promoter doesn't have CRX or NRL sites within its promoter or basal transcription does this mean the silencers as described in this assay are not really silencers within the genome. I don't think it is possible to make conclusions about a cis-reg element's silencer capacity based on these experiments.</p><p>In line with this, they find that the silencers bind CRX in combination with a repressive TF. Would they find that enhancers as they define them bind a combination of transcriptional activators and that silencers bind some activators such as CRX in combination with a transcriptional repressor expressed in the cell type where the element acts as a silencer?</p><p>I am also not convinced that silencers and enhancers are different things. If a genomic element controls the time and location of gene expression, then it is an enhancer – enhancers can bind activators and repressors and restrict expression to only particular cell types. I think trying to call things enhancers, and silencers makes things overly complex, especially considering the fact the authors point out that the same element can be an enhancer in one tissue type and a silencer in another. I am also concerned about this in relation to my previous comments on the experimental design and the issues demonstrating that a silencer really works this way within the genome.</p><p>As silencers are decreasing expression from a basal promoter and the whole paper is centered on this, I think the choice of promoter in this experiment is critical. I don't think one promoter can be used to draw such conclusions. The authors use a basal promoter for the Rho gene, which contains one NRL site and CRX sites, and these could work in combination with the sites within the elements being tested. I wonder if the elements they're testing would behave the same way with their endogenous promoters or with another promoter that does not contain CRX or NRL sites. Testing these elements with no promoter does not really address this question. It would be helpful to test these elements with the Rho promoter where you mutate the NRL and CRX sites. It would also be helpful to test these elements with a different promoter for another gene that is expressed in the retina, ideally on that doesn't contain CRX and NRL sites, to see if there are enhancer-promoter interactions that are influencing your results and thus your conclusions. If the endogenous promoters don't contain CRX or NRL sites or don't have a basal level of transcription, would your element really be a silencer? I don't think it would, and I'm concerned that the results you're seeing are an artifact of your experimental design.</p><p>The paper claims that information content can be used to distinguish between these two classes of element, I would like to see how this compares to prevalence to transcriptional repressors, and activators found in silencers and enhancers as the only TF mentioned to work with CRX in the silencer is a well-known repressor Snail. Would the presence of a repressive TF be a better predictor of silencer vs. enhancer activity?</p><p>I would like to see that the information content model performs better than measuring the prevalence of activator and repressor motifs.</p><p>In line with this, the difference in information content between a silencer and an enhancer is 3 motifs for 2 tfs vs. 3 motifs for 3 tfs or 4 motifs for 2 tfs. For the silencers and enhancers, what % of these TF motifs are repressors, and what % are activators.</p><p>In my opinion, the authors should focus on using their data to work out what makes regions under their genomic marks functional enhancers vs. inert elements.</p></body></sub-article><sub-article article-type="reply" id="sa2"><front-stub><article-id pub-id-type="doi">10.7554/eLife.67403.sa2</article-id><title-group><article-title>Author response</article-title></title-group></front-stub><body><p>[Editors’ note: The authors appealed the original decision. What follows is the authors’ response to the first round of review.]</p><disp-quote content-type="editor-comment"><p>Reviewer #1:</p><p>Why can silencers be enhancers in other cell types? Why is it that active chromatin epigenetic marks or binding of a single transcription factor do not reliably predict active enhancers? These are thorny issues in genomics because they hinder our mechanistic understanding of gene transcription regulation.</p><p>In this well-written submission the authors go beyond their previous publications using the same experimental system (White et al., 2013, 2016). They use MPRA for the CRX transcription factor (TF) in explanted mouse retinas to show that epigenetically indistinguishable sequences are classified more accurately as enhancers or silencers by the number and diversity of lineage-specific transcription factor binding motifs that they contain. They separate enhancers from silencers by enhancers' more diverse collection of TF motifs. This distinction is captured in a metric called sequence information content calculated from both TF motif count and diversity. This single metric is slightly worse at predicting strong enhancers over silencers than a model considering the PWMs for 8 TFs.</p><p>1. Whether the authors observe a bias in the linear arrangement of these TFs' motifs that might assist in distinguishing enhancers from silencers?</p></disp-quote><p>We do not observe a bias: (1) A biased arrangement of non-CRX TFs would be evident as co-occurrence of TF pairs, which we did not observe as shown in Figure 2d. (2) We tested whether TF motifs tended to occur 5’ or 3’ relative to CRX sites and found no significant bias for any TF in either enhancers or silencers.</p><p>We added a sentence on p. 11, lines 239-240 stating that “We also did not observe a bias in the linear arrangement of motifs in strong enhancers (Methods).” We describe the analysis in the methods on p. 31, lines 665-671 in the “Predicted Occupancy” section of the methods.</p><disp-quote content-type="editor-comment"><p>2. p10 The choice of the 8 lineage-defining TFs was somewhat arbitrary because of the arbitrary nature of PWM significance thresholds. Please justify their choice and number, and comment on how well the model performs when this TF set is altered?</p></disp-quote><p>The 8 TFs were identified by a de novo enrichment analysis using DREME, as described in the methods on p. 29-30, lines 637-653 in the “Motif Analysis” section of the methods. Aside from these motifs, DREME found no other significant motifs enriched in silencers or enhancers. As described on p. 9, lines 204-206, we also used high-scoring k-mers in the SVM to search for additional motifs that we might have missed in the DREME analysis, but found none.</p><p>Sometimes DREME motifs matched PWMs for multiple TFs. We empirically found that replacing a PWM with one that is highly similar (such as MAZ and Sp4, detailed below in 4) does not change model performance, so we chose PWMs that were both high quality and biologically relevant. Following common practice in motif analysis, we excluded low-information content DREME motifs that did not match known TFs, as well as low-specificity dinucleotide motifs not well represented in our sequence classes. While it is always possible that we missed relevant motifs, we performed a thorough search using the standard methods.</p><p>We also tested the performance of the model trained on the 8 retina TFs against 100 ‘null hypothesis’ models from randomly selected sets of 8 TFs. In each case the retina TF model outperformed the randomly selected TF model. These results are described on p. 9, lines 199-204 and shown in Figure 2 figure supplement 3.</p><disp-quote content-type="editor-comment"><p>3) Does an evolutionary change in information content calculated between orthologous <italic>Mus musculus</italic> and, say, Mus spretus sequence help to separate active (enhancer/silencer) sequence from inactive sequence? (https://doi.org/10.1038/sdata.2016.75)</p></disp-quote><p>We thank the reviewer for this interesting suggestion. We identified orthologous sequences using an alignment between <italic>M. musculus</italic> and <italic>M. spretus</italic> from the UCSC browser, and generated information content scores for them. However, the 164 bp orthologous sequences have a median of 98% nucleotide similarity, and 75% of orthologs have 94% nucleotide similarity. Because the sequence similarity is so high, there is almost never an evolutionary change in information content between these two species.</p><disp-quote content-type="editor-comment"><p>p9 Please explain further "we chose to represent the zinc finger motif with MAZ based on the PWM qualities". line 132 "For motifs that matched multiple TFs, we selected one representative TF for downstream analysis (Figure 2—figure supplement 2, Methods)". Please explain further.</p></disp-quote><p>We have clarified this point with two changes. On p. 8, lines 175-178 we now state that “we chose to represent the zinc finger motif with MAZ because it has a higher quality score in the HOCOMOCO database (Kulakovskiy et al., 2018).”</p><p>On p. 30, lines 649-651 in the methods we explain why we selected one motif per TF family, with the sentence “Because PWMs from TFs of the same family were so similar, we used one TF for each DREME motif, recognizing that these motifs may be bound by other TFs that recognize similar sites.”</p><p>Rationale: Sp4 and MAZ are both C2H2 zinc fingers with nearly identical PWMs in the HOCOMOCO database. The Sp4 PWM has a “B” grade quality in the HOCOMOCO database, while the MAZ PWM has an “A” grade, indicating that the MAZ motif is higher quality. Furthermore, the MAZ PWM has a length of 11, while the Sp4 PWM has a length of 16; the extra 5 positions all have very low information content. We therefore chose to represent the C2H2 zinc finger motif with MAZ, though the sites may be bound by Sp4 in photoreceptors.</p><p>More generally, TFs that belong to the same family often have PWMs that are highly similar, if not indistinguishable. For example, the de novo motif identified with DREME matching NeuroD1 (Figure 2—figure supplement 2a, bottom of second column) also matches ATOH1, NeuroD2, NGN2, OLIG2, ASCL2, and 40 other PWMs, most of which are E-box binding TFs in the “Tal-related factors” family in the HOCOMOCO database. Although paralogous TFs can have differential binding specificities, these differences are not always captured by PWMs. Additionally, our predicted occupancy metric is a sigmoidal function, so it is insensitive to subtle differences between similar PWMs. Including these subtle differences would have yielded redundant information, so we selected only one motif for each de novo motif identified with DREME.</p><disp-quote content-type="editor-comment"><p>p21 Why were different FIMO p-value thresholds applied?</p></disp-quote><p>This is an error we made during the library design process, though in practice it has no impact on the overall motif content of the selected CRX ChIP-seq peaks versus ATAC-seq peaks. We intended to use a FIMO p-value threshold of 2.5 x 10<sup>-3</sup> throughout library design, but did not notice the error until after the library was ordered.</p><p>We added an additional paragraph on p. 24, lines 497-503 of the “Library Design” section in the methods to explain this error:</p><p>“We unintentionally used a FIMO p-value cutoff of 2.3 x 10-3 to identify CRX motifs in CRX ChIP-seq peaks, rather than the slightly less stringent 2.5 x 10-3 cutoff used with ATAC-seq peaks or mutating CRX motifs. Due to this anomaly, there may be sequences centered using ShapeMF that should have been centered on a CRX motif, and these motifs would not have been mutated because CRX motifs were not mutated in sequences centered using ShapeMF. However, any intact CRX motifs would still be captured in the residual information content of the mutant sequence.”</p><p>We also reference this new paragraph on p. 22, lines 457-460 by clarifying the language to read “These sequences were scanned for instances of CRX motifs using FIMO version 4.11.2 (Bailey et al., 2009), a p-value cutoff of 2.3 x 10<sup>-3</sup> (see below), and a CRX PWM derived from an electrophoretic mobility shift assay (J. Lee et al., 2010).”</p><disp-quote content-type="editor-comment"><p>p27 line 562 "silencers as negatives"?</p></disp-quote><p>We have corrected the text accordingly to read “silencers as negatives” rather than “silencers and negatives.”</p><disp-quote content-type="editor-comment"><p>Reviewer #2:</p><p>2.1. “I'm not convinced that anything can be determined about silencers using this experimental design.”</p><p>“I'm concerned that the silencers they detect could be an artifact of the experimental design. The promoter contains CRX sites and NRL sites, so there is some level of basal expression; the silencers are enriched in repressors, so is it just that the elements containing a repressor are silencing the basal transcription?”</p><p>“I don't think it is possible to make conclusions about a cis-reg element's silencer capacity based on these experiments.”</p><p>“If the endogenous promoters don't contain CRX or NRL sites or don't have a basal level of transcription, would your element really be a silencer? I don't think it would, and I'm concerned that the results you're seeing are an artifact of your experimental design.”</p></disp-quote><p>These comments suggest that reporter assays with an active basal promoter are not a valid method for identifying silencers, and that our results are due to the presence of general repressor motifs in some sequences. Since this critique questions the entire basis of our study, we have an extensive response below, along with a list of our revisions.</p><p>1. For both enhancers and silencers, reporter gene assays are a widely accepted method used in high-throughput screens and validation of predicted enhancers and silencers. Reporter gene assays, whether for enhancers or silencers, work by testing the effect of a sequence on the activity of a basal promoter. The reviewer raises an important point that a limitation of these assays is that they are conducted with a general basal promoter, rather than the native promoters on which enhancers and silencers act. In spite of this limitation, reporter assays are a well-established method for discovering and validating both enhancers and silencers. Silencers have been identified using reporter gene assays since they were first discovered over 35 years ago (Brand, et al., <italic>Cell</italic> 41:41-48).</p><p>Unlike many reporter assays in the literature, our basal promoter is a natural photoreceptor promoter, which we assayed in living, explanted retina, not in a cell line. Our reporter system is more physiologically natural than many in the literature. We have also observed silencing with other basal promoters in our prior work (see point 2.2 below).</p><p>Finally, our reporter gene experiments are in line with recent, high-profile studies of silencers. A moderately high level of basal transcription from a minimal promoter is needed to detect silencing, and all of these studies use an active basal promoter, often a synthetic one:</p><p>1. EF-1a promoter driving caspase9 (Pang and Snyder, Nat. Genetics 52:253-263).</p><p>2. “Super core promoter” (SCP1) driving GFP (Doni Jayavelu, et al., Nat. Comm. 11:161).</p><p>3. Strong enhancer upstream of the HSP70 promoter driving GFP (Gisselbrecht, et al. <italic>Mol Cell</italic> 77:324-37).</p><p>4. “HS2” from the human β globin LCR upstream of the SV40 promoter driving luciferase (Petrykowska, et al. Genome Res. 18:1238-46).</p><p>5. An enhancer placed upstream of the SV40 promoter driving luciferase (Huang, et al., Genome Res. 29:657-67).</p><p>We added two paragraphs of discussion of the literature on silencers in the introduction (page 3-4, 2nd and 3rd paragraphs, lines 42-75, beginning with “Another challenge…”). We cite all of the reporter gene assays above, and note that their findings are similar to ours: silencers are difficult to distinguish from enhancers because they often reside in open chromatin and bear the epigenetic marks of enhancers.</p><p>We explicitly reference our prior papers showing silencing is not dependent on the <italic>Rho</italic> promoter, on p. 14, lines 307-310, beginning with “This confirms that the distinction between silencers and enhancers does not depend on the <italic>Rho</italic> promoter…” On the same page, lines 306-307, we also added text noting that less than 3% of autonomous sequences are from the silencer class.</p><p>Additionally, we have added a new analysis to support the idea that these silencers function as such in the genome (see below).</p><p>2. The presence of general repressor sites does not account for our results – there is extensive evidence that CRX itself is necessary for silencing. The reviewer suggests that silencing may be due to a general effect of repressor motifs acting on the basal promoter, but only 13% of our silencers sequences contain the GFI1 repressor motif (Figure 2c). We found no enrichment of other repressor motifs. Rather than silencing being the effect of repressor motifs, there is strong evidence that CRX itself is required for silencing.</p><p>CRX is expressed in photoreceptors and bipolar cells, which encompass more than 15 different cell types with divergent transcriptional programs (Murphy et al., <italic>eLife</italic> 8:e48216). The role of CRX is to selectively activate and repress cell type-specific genes, in cooperation with different cofactors. For example, CRX interacts with other TFs to repress cone photoreceptor genes in rods (Peng et al. <italic>Hum Mol. Genet.</italic> 14:747-64). A massively parallel reporter gene assay in live retina, with a native photoreceptor basal promoter, is a physiologically relevant way to study CRX-dependent silencing at scale.</p><p>We previously showed that CRX is necessary for silencing: silencers in our reporter assay were de-repressed in <italic>Crx<sup>-/-</sup></italic> retina (White, et al. <italic>Cell Rep.</italic> 17:1247-54). In the same study we found that CRX ChIP-seq peaks with sequence features of silencers were near genes that are de-repressed in <italic>Crx<sup>-/-</sup></italic> retina. We showed then and in the current study that the silencing effect depends on intact CRX sites (Figure 5a).</p><p>The specific mechanism of CRX-directed repression is unclear, though there are other examples of transcriptional activators behaving as repressors in the same cell type, acting at binding sites that mediate both activation and repression. GATA-1 is an activator in K562 cells, positively interacting with the co-activator CBP, but GATA-1 also represses the <italic>GATA-2</italic> promoter by displacing CBP and GATA-2 (which auto-activates its own promoter). Sequence context is critical here: only a subset of GATA binding sites mediate GATA-1 based repression, for unknown reasons (Grass, et al. <italic>PNAS</italic> 100:8811-16). Our work bears directly on this question of how sequence context governs whether TF binding sites act to enhance or silence transcription.</p><p>To test the genomic relevance of our reporter assay results, we added a new analysis. Similar to our previous work (White, et al. <italic>Cell Rep.</italic> 17:1247-54), we looked at the proximity of our MPRA silencers and enhancers to genes that were up-regulated or down-regulated in <italic>Crx<sup>-/-</sup></italic> retina. Consistent with the reporter assay results, we found that genes that are <italic>down-regulated</italic> upon loss of CRX are likely to be near sequences we classed as enhancers, while <italic>up-regulated</italic> genes are more likely to be near silencers (odds ratio 2.1). These results support our finding that silencers in the reporter assay act as CRX-dependent silencers in the genome. We describe this result on p. 6-7, lines 133-139, beginning with “To test whether these sequences function as CRX-dependent enhancers and silencers in the genome…” We added details on the analysis to the methods on p. 29, lines 626-635 under the heading <bold>“</bold>RNA-seq Analysis<bold>”</bold>.</p><p>We added a more extensive description of our prior work on CRX-dependent repression in the discussion on p. 17-18, lines 387-402, beginning with “Consistent with this, we previously showed…”</p><p>3. Repressors in the retina act on CRX motifs and other homeodomain motifs. Rather than repressor motifs, it is CRX motifs in a particular sequence context that are critical for silencing. Other repressors (such as Vsx2) have been shown to bind CRX or RAX homeodomain sites (Dorval, et al. <italic>J Biol Chem.</italic> 281:744-51; Clark, et al. <italic>Brain Res</italic> 1192:99-113), or interact with CRX itself (Sanuki, et al., <italic>FEBS Lett.</italic> 584:753-8; Hlawatsch, et al. <italic>PLOS One</italic> 8:e60633). A combination of CRX and RAX sites in the <italic>Gnb3</italic> promoter selectively silences this promoter in bipolar cells (Murphy et al., <italic>eLife</italic> 8:e48216). Additionally, studies of photoreceptor repressors often use a reporter gene with the <italic>Rho</italic> basal promoter (for example, Sanuki, et al., <italic>FEBS Lett.</italic> 584:753-8; Cheng, et al., <italic>Hum Mol Genet</italic> 13:1563-75; Mitton, et al., <italic>Hum Mol Genet</italic> 12:365-73). Our use of the <italic>Rho</italic> basal reporter to study CRX-dependent activation and repression has deep support in the literature.</p><p>Silencers in our assay have high CRX/homeodomain motif content, which could make them targets for other photoreceptor-specific repressors. In <italic>Drosophila</italic> photoreceptors, selective silencing of opsin genes is achieved by a repressor, Dve, which binds the motifs of Otd, an ortholog of CRX (Rister, et al. <italic>Science</italic> 350:1258-61). A similar mechanism may operate at some genes in the mammalian retina.</p><p>We added a discussion of the potential mechanisms of repression by CRX, and of the evidence that other TFs act through CRX motifs, on p. 18, lines 404-421, the paragraph beginning with “The contrast in motif diversity…”.</p><disp-quote content-type="editor-comment"><p>2.2. “Moreover, what does this mean relative to the elements in the endogenous locus, if an endogenous promoter doesn't have CRX or NRL sites within its promoter or basal transcription does this mean the silencers as described in this assay are not really silencers within the genome.”</p><p>“As silencers are decreasing expression from a basal promoter and the whole paper is centered on this, I think the choice of promoter in this experiment is critical. I don't think one promoter can be used to draw such conclusions… It would be helpful to test these elements with the Rho promoter where you mutate the NRL and CRX sites. It would also be helpful to test these elements with a different promoter for another gene that is expressed in the retina, ideally one that doesn't contain CRX and NRL sites, to see if there are enhancer-promoter interactions that are influencing your results and thus your conclusions.”</p></disp-quote><p>As noted in 2.1, we now report supporting evidence that CRX-dependent silencers act as such in their genomic context: silencers are enriched near genes that are de-repressed in <italic>Crx<sup>-/-</sup></italic> retina.</p><p>Additionally, we have two responses to the reasonable suggestion to try other promoters:</p><p>1. We tested other promoters in our prior work and observed silencing. In White, et al. <italic>Cell Rep.</italic> 17:1247-54, Figure 2B, we showed an NRL motif is not necessary for silencing. Sequences with high CRX predicted occupancy were inactive with a promoter lacking an NRL motif. In Hughes, et al. <italic>Genome Res.</italic> 28:1520-31, Figure 2B and Supplemental Figure 8, two of us (J.C.C. and C.A.M.) showed that silencing occurs with the <italic>Crx</italic> promoter.</p><p>2. CRX motifs are very highly enriched in photoreceptor promoters. Testing our silencer with a promoter lacking CRX sites is not relevant, since we make no claim that these are general silencers. Our aim is to understand how CRX selectively regulates its targets. As we discussed above in 2.1, silencing depends on CRX, or possibly on repressors that bind CRX sites. CRX motifs are very highly enriched in open chromatin in photoreceptors and bipolar cells (Murphy et al., <italic>eLife</italic> 8:e48216) and common in photoreceptor promoters (Hsiau, et al., <italic>PLOS One</italic> 2:e643).</p><p>We explicitly cite our prior work showing silencing with promoters other than <italic>Rho</italic> on p. 14, lines 307-310, the sentence: “This confirms that the distinction between silencers and enhancers does not depend on the <italic>Rho</italic> promoter, and the result is consistent with our previous finding that CRX-targeted silencers repress other promoters (Hughes et al., 2018; White et al., 2016).”</p><p>As described in 2.1, we have now included more discussion of prior work and the evidence for the role of CRX to both enhance and silence target genes. This new material is in the introduction (p. 3-4) and the discussion (p. 17-18). In 2.1 above we describe the new analysis added to the paper (p. 6-7) finding that silencers are more likely to be near de-repressed genes in <italic>Crx<sup>-/-</sup></italic> retina, which supports the claim that these are CRX dependent silencers.</p><disp-quote content-type="editor-comment"><p>2.3. “I am also not convinced that silencers and enhancers are different things. If a genomic element controls the time and location of gene expression, then it is an enhancer – enhancers can bind activators and repressors and restrict expression to only particular cell types. I think trying to call things enhancers, and silencers makes things overly complex, especially considering the fact the authors point out that the same element can be an enhancer in one tissue type and a silencer in another.”</p></disp-quote><p>We use the standard silencer/enhancer terminology in the field. We agree with the reviewer that silencers and enhancers are not necessarily different things. We suggest that the silencers we identify may be enhancers in other retina cell types (p. 18-19, lines 423-429), and we cite multiple recent papers that show this (Pang and Snyder, <italic>Nat. Genetics</italic> 52:253-263; Doni Jayavelu, et al., <italic>Nat. Comm.</italic> 11:16; Ngan, et al., <italic>Nat. Genetics</italic> 52:264-72; Halfon, <italic>Trends Genet.</italic> 36:149-51; Gisselbrecht, et al., <italic>Mol. Cell</italic> 77:324-337). The idea that enhancers also act as silencers has existed in the literature since silencers were first discovered (Brand, et al., <italic>Cell</italic> 51:709-719; Jiang, et al., <italic>EMBO J</italic> 12:3201-3209).</p><p>Our use of ‘silencer’ and ‘enhancer’ is consistent with these studies. A 2021 review of silencers (Segert, et al., <italic>Trends Genet.</italic> 6:514-27) describes them as follows:</p><p>“Silencers are regulatory DNA elements that reduce transcription from their target promoters; they are the repressive counterparts of enhancers. Although discovered decades ago, and despite evidence of their importance in development and disease, silencers have been much less studied than enhancers. Recently, however, a series of papers have reported systematic studies of silencers in various model systems. Silencers are often bifunctional regulatory elements that can also act as enhancers, depending on cellular context…”</p><disp-quote content-type="editor-comment"><p>2.4. “Would they find that enhancers as they define them bind a combination of transcriptional activators and that silencers bind some activators such as CRX in combination with a transcriptional repressor expressed in the cell type where the element acts as a silencer?”</p><p>“The paper claims that information content can be used to distinguish between these two classes of element, I would like to see how this compares to prevalence to transcriptional repressors, and activators found in silencers and enhancers as the only TF mentioned to work with CRX in the silencer is a well-known repressor Snail. Would the presence of a repressive TF be a better predictor of silencer vs. enhancer activity?</p><p>I would like to see that the information content model performs better than measuring the prevalence of activator and repressor motifs.</p><p>In line with this, the difference in information content between a silencer and an enhancer is 3 motifs for 2 tfs vs. 3 motifs for 3 tfs or 4 motifs for 2 tfs. For the silencers and enhancers, what % of these TF motifs are repressors, and what % are activators.”</p></disp-quote><p>Repressor motifs are not critical for silencers: We searched extensively for enriched motifs but found only one repressor motif, GFI1 (noted on p. 8, line 178-181, with details on p. 29-30 under “Motif Analysis”). This motif is present in only 13% of silencers (shown in Figure 2c). The presence of this single repressor motif is not enough to classify silencers.</p><p>We compared the performance of our information content model (AUROC 0.634, Figure 3b) to the performance of a model based on the 8 enriched TF motifs (AUROC 0.698, Figure 2a). Both models perform somewhat worse than the SVM (AUROC 0.781, Figure 2a). In the manuscript on p. 12-13, lines 279-284 we describe why the information content model is significant, despite its lower performance than the two other models:</p><p>“[The information content model] is only slightly worse than the model trained on eight TF occupancies despite an eight-fold reduction in the number of features, which is itself comparable to the SVM with 2,080 features. The difference between the two logistic regression models suggests that the specific identities of TF motifs make some contribution to the eight TF model, but that most of the signal captured by the SVM can be described with a single metric that does not assign weights to specific motifs.”</p><p>On p. 16, lines 366-371 we provide further interpretation:</p><p>“These differences [between enhancers and silencers] are captured by our measure of information content. Information content, as a single metric, identifies strong enhancers nearly as well as an unbiased set of 2,080 non-redundant 6-mers used for an SVM, indicating that a simple measure of motif number and diversity can capture the key sequence features that distinguish enhancers from other sequences that lie in open chromatin.”</p><p>TFs are often bifunctional and motifs can’t easily be classified into repressors and activators: As we discuss above in 2.1, silencing depends on CRX protein and CRX motifs (which may be bound by other repressors). It is well known that many TFs, like CRX, function to both activate and repress. It is not effective to simply classify sequences based on an inventory of supposed activator or repressor motifs.</p><p>Sequence context matters, which is one of the central claims of our paper. This is why our information content model is significant: it suggests that silencers, at least in photoreceptors, are not defined by repressor motifs, but rather by a lack of diverse binding sites for expressed TFs, which provides the sequence context for CRX-dependent silencing.</p><p>As mentioned above, we added more discussion of the evidence for the role of CRX in both repression and activation. This is found in the introduction (p. 3-4, lines 64-75, beginning with “The TF cone-rod homeobox (CRX) controls selective gene expression…”), and in the discussion (p. 17-18, lines 387-421).</p><p>We added results for RAX to the motif co-occurrence matrix (Figure 2d) so that the figure now represents all enriched TF motifs.</p></body></sub-article></article>