- Open Access
Mutational signature distribution varies with DNA replication timing and strand asymmetry
© The Author(s). 2018
- Received: 6 February 2018
- Accepted: 16 August 2018
- Published: 10 September 2018
DNA replication plays an important role in mutagenesis, yet little is known about how it interacts with other mutagenic processes. Here, we use somatic mutation signatures—each representing a mutagenic process—derived from 3056 patients spanning 19 cancer types to quantify the strand asymmetry of mutational signatures around replication origins and between early and late replicating regions.
We observe that most of the detected mutational signatures are significantly correlated with the timing or direction of DNA replication. The properties of these associations are distinct for different signatures and shed new light on several mutagenic processes. For example, our results suggest that oxidative damage to the nucleotide pool substantially contributes to the mutational landscape of esophageal adenocarcinoma.
Together, our results indicate an interaction between DNA replication, the associated damage repair, and most mutagenic processes.
- DNA replication
- DNA repair
Understanding the mechanisms of mutagenesis in cancer is important for the prevention and treatment of the disease [1, 2]. Mounting evidence suggests replication itself contributes to cancer risk . Copying of DNA is intrinsically asymmetrical, with leading and lagging strands being processed by distinct sets of enzymes , and different genomic regions replicating at defined times during S phase . Previous analyses have focused either on the genome-wide distribution of mutation rate or on the strand specificity of individual base changes. These studies revealed that the average mutation frequency is increased in late-replicating regions [6, 7], and that the asymmetric synthesis of DNA during replication leads to strand-specific frequencies of base changes [8–11]. However, the extent to which DNA replication influences distinct mutational mechanisms, with their manifold possible causes, remains incompletely understood.
Mutational signatures have been established as a powerful approach to quantify the presence of distinct mutational mechanisms in cancer . A mutational signature is a unique combination of the frequencies of all base-pair mutation types (C:G > A:T, T:A > G:C, etc.) and their flanking nucleotides. Since it is usually not known which base in a pair was the source of a mutation, the convention is to annotate mutations from the pyrimidine (C > A, T > A, etc.), leading to 96 possible combinations of mutation types and neighboring bases. Non-negative matrix factorization is used to extract mutational signatures from somatic mutations in cancer samples . This approach has the important advantage of being able to distinguish between processes that have the same major mutation type (such as C > T transitions) but differ in their sequence context. We built upon this feature of mutational signatures and developed a computational framework to identify the replication-strand-specific impact of distinct mutational processes. Using this system, we quantified the replication strand and timing bias of mutational signatures across 19 cancer types. We show that replication affects the distribution of nearly all mutational signatures across the genome, including those that represent chemical mutagens. The unique strand asymmetry and replication timing profile of different signatures reveal novel aspects of the underlying mechanism. For example, we discovered a strong lagging strand bias of T > G mutations in esophageal adenocarcinoma, suggesting an involvement of oxidative damage to the nucleotide pool in the etiology of the disease. Together, our results highlight the critical role of DNA replication and the associated repair in the accumulation of somatic mutations.
Replication bias of mutational signatures
Interestingly, the lack of correlation with replication timing in signature 13 seems to be specific to breast cancers, as other cancer types (such as lung squamous, esophageal adenocarcinoma, and pancreas) show a significant correlation (Additional file 2: Figure S12). Nevertheless, the replication strand asymmetry of both signatures 2 and 13 is significantly present in all these tissues.
Processes directly involving DNA replication or repair
It has previously been proposed that the correlation of overall mutation rate with replication timing (as shown in Fig. 2b) is a direct result of the activity of MMR . In contrast, when splitting the signal by mutational signatures, we observed a more complex relationship. Some MMR signatures in MMR-deficient patients do not significantly correlate with replication timing (sig. 15, 21, 26) or do so only in one direction of replication (such as a negative correlation in the leading direction in sig. 20), whereas others show a clear increase in the late replicated regions (sig. 6 and N4; Additional file 2: Figure S13). We next explored the effect of MMR on correlation with replication timing (irrespective of the replication strand) in all signatures with exposure > 10 in at least four MSI samples. Four signatures exhibited a significant correlation with replication timing in MSI samples (a positive correlation in sig. 6, 18, and N4; negative correlation in sig. 20; Additional file 2: Figure S25). Interestingly, in all these four signatures, the slope of the correlation was significantly steeper in MSI than MSS samples (Additional file 2: Figure S26). Furthermore, some of the other signatures significantly correlated with replication timing in MSS (e.g., sig. 17 and 8) also showed a weak but consistent correlation in MSI. In other signatures (e.g., sig. 1) the correlation is lost. Altogether, these results indicate that MMR is only one of several factors influencing mutagenesis in a timing-dependent manner.
Unexpectedly, two MMR signatures (sig. 6 and N4) showed increased exposures around ORIs (Fig. 4, Additional file 2: Figures S13, S14, S27). Based on experiments in yeast, it has been suggested that MMR is involved in balancing the differences in fidelity of the leading and lagging polymerases , in particular repairing errors made by Pol α , which primes the leading strand at ORIs and each lagging strand Okazaki fragment  and lacks intrinsic proofreading capabilities . It has been recently shown that error-prone Pol α-synthesized DNA is retained in vivo, causing an increase of mutations on the lagging strand . Since regions around ORIs have a higher density of Pol α-synthesized DNA (as discussed, e.g., in ), it is possible that increased exposure to signatures 6 and N4 around ORIs is caused by incomplete repair of Pol α-induced errors. The most common Pol α-induced mismatches normally repaired by MMR are G-dT and C-dT, leading to C > T mutations on the leading strand and C > A mutations on the lagging strand , matching our observations in the MMR-linked signatures. Notably, we also detected weaker but still significant exposure to MMR signatures in samples with seemingly intact mismatch repair (Additional file 2: Figure S14). Replication strand asymmetry in these samples was substantially smaller, but the higher exposure to signatures 6 and N4 around ORIs remained (Additional file 2: Figure S27). These findings are compatible with a model in which one of the functions of mismatch repair is to balance the effect of mis-incorporation of nucleotides by Pol α. Signatures 6, N4, and possibly 26 appear to reflect this mechanism, while the other MMR signatures might be a result of unrelated functions of MMR, such as its involvement in balancing errors made by other polymerases, e.g., Pol δ.
POLE-MUT samples were previously reported to be “ultra-hypermutated” with excessive C > A and C > T mutations on the leading strand [8, 11, 21]. Mutational signature 10 has been associated with mutations in the proofreading domain of Pol ε, the main leading strand polymerase [23, 27]. We noticed that mutational signature 14 is also strongly associated with POLE mutations in the data sets by Shlien et al. , Alexandrov et al. , and in The Cancer Genome Atlas (TCGA). Since then, Andrianova et al. confirmed this observation, showing that signature 14 is enriched in POLE-MUT samples with mismatch repair deficiency . As expected, we observed very strong strand asymmetry for these two signatures in all POLE-MUT samples, with an increase of C > A, C > T, and T > G mutations on the leading strand (Fig. 4, Additional file 2: Figure S15). As with MMR signatures, we also found weak but significant evidence of signature 10 and 14 in samples without Pol ε defects (POLE-WT). Strikingly, however, in these samples the strand asymmetry was in the inverse orientation compared to the POLE-MUT samples, i.e., more C > A, C > T, and T > G mutations on the lagging strand (Fig. 4, Additional file 2: Figure S16). Conversely, we detected the presence of two signatures of unknown etiology, signatures 18 and 28, in POLE-MUT samples, but in the inverse orientation compared to POLE-WT samples. We performed two additional analyses in order to validate that this is not an artifact of spurious/wrong associations in the signature exposures decomposition. First, we removed exposures not robust to perturbations (see “Methods”) and confirmed that all four signatures (10, 14, 18, and 28) remained significantly strand asymmetric in both POLE-MUT and POLE-WT samples (Additional file 2: Figures S17–S18). Second, we directly compared the frequencies of the most prominent mutation types for each of the four signatures (sig. 10, 14, 18, and 28) in POLE-MUT and POLE-WT samples on the leading and lagging strands. The inverse strand preference observed in the signatures was also detected for individual mutation types. For example, the frequency of mutations in TCT > A, TCG > T, and TTT > G, the three major components of signature 10, is higher on the lagging strand than on the leading strand in POLE-WT samples, whereas it is higher on the leading strand in POLE-MUT (Additional file 2: Figures S28–S31). We therefore hypothesize that POLE-linked signatures are originally caused by a process that affects both strands and, under normal circumstances, is slightly enriched on the lagging strand. This could be caused by certain types of DNA lesions which under normal circumstances are less accurately replicated when on the template of the lagging strand (e.g., due to a lower fidelity of Pol δ or Pol α compared to wild-type Pol ε when replicating these lesions). In POLE-MUT samples the lack of replication-associated proofreading would then lead to a strong relative increase in these mutations on the leading strand, explaining the flipped orientation of signatures.
Apart from replication strand asymmetry, we also observed a significant correlation with replication timing in signatures 10, 14, 18, and 28 in POLE-WT samples (Additional file 2: Figure S16). The correlation was significant only in signature 28 in POLE-MUT samples, but at least 75% of samples showed a positive slope of the correlation also in signatures 10, 14, and 18, and the slope was significantly increased in POLE-MUT compared to MSS POLE-WT samples in signatures 10, 18, and 28 (Additional file 2: Figures S15, S16, S32, S33). Future studies with larger sample size are needed to evaluate the effect of POLE mutations and MMR on the replication timing characteristics on this group of mutational signatures.
Signatures linked to environmental mutagens
We next focused on signatures that have not previously been reported to be connected to replication, or for which the causal mechanism is unknown. Our data show a link between DNA replication and exogenous mutagens such as UV light (signature 7), tobacco smoke (signature 4), or aristolochic acid (AA; signature 22) . In these signatures, we observed marked correlation with replication timing (Fig. 4, Additional file 2: Figures S19 and S20). Higher mutation frequency late in replication has been observed in mouse embryonic fibroblast (MEFs) treated with AA or Benzo[a]pyrene (B[a]P; a mutagen in tobacco smoke) . This increased mutagenicity might be attributed to different DNA damage tolerance pathways being active during early and late replication. Regions replicated early in S-phase are thought to prefer high-fidelity template switching, whereas regions replicated late are more likely to require translesion synthesis (TLS), which has a higher error rate [31–37]. This is consistent with the observation in yeast that a disruption of TLS leads to decreased mutation frequency in late-replicating regions and therefore a more even distribution of mutation frequency between early and late-replicating regions . In particular, TLS has been observed to increase in activity and mutagenicity later in the cell cycle when replicating DNA damaged by B[a]P . Alternatively, differences in chromatin accessibility could be responsible for the decreased mutagenicity in early-replicated regions. Open chromatin is, on average, replicated earlier and is also more accessible to repair enzymes, which could contribute to the decreased mutation frequency in early-replicating regions .
We also observed weak but significant replication strand asymmetry in the mutagen-induced signatures in the tissues associated with the respective mutagen (Additional file 2: Figure S19). Signature 4 has a significant strand asymmetry in lung cancers, similar to signature 7 in skin cancers. Signature 22 in kidney cancers has a small sample size but shows the same trend. This matches a previously observed lower efficiency of bypass of DNA damage on the lagging strand  and strong mutational strand asymmetry in cells lacking Pol η, the main TLS polymerase responsible for the replication of UV-induced photolesions . Altogether, our data highlight the importance of replication in converting DNA damage into actual mutations and suggest that bypass of DNA damage occurring on the lagging template results in detectably lower fidelity on this strand.
Signature 17 had the largest median strand asymmetry (sixth largest log2 ratio of the two strands) and also was one of the signatures with the strongest correlations with replication timing (Figs. 2 and 4, Additional file 2: Figure S7). The mutational process causing this signature is unclear. We noted that the timing asymmetry and exposure distribution around ORIs to signature 17 closely resembled that of signatures 4 and 7, suggesting a possible link to DNA damage. Signature 17 is most prominent in gastric cancers and esophageal adenocarcinoma (EAC), where it appears early during disease development , and it is also present in Barrett’s esophagus (BE), a precursor to EAC . Due to the importance of gastro-esophageal and duodeno-gastric reflux in the development of BE and EAC [44–46] and the resulting oxidative stress [47–50], it has been speculated that oxidative damage could cause the mutation patterns characteristic for signature 17 [51, 52]. Increased oxidative damage to guanine has been reported in the epithelial cells of dysplastic BE as well as after incubation of BE tissue with a cocktail mimicking bile reflux . Oxidative stress affects bases in not only the DNA but also the nucleotide pool, such as the oxidation of dGTP to 8-oxo-dGTP. This oxidized dGTP derivative has been shown to induce T > G transversions [53–55] through incorporation by TLS polymerases into DNA opposite A on the template strand . In contrast, oxidation of guanine in the DNA produces 8-oxo-G, which has been shown to result in C > A mutations when paired with adenine during replication . These C > A mutations are normally prevented by DNA glycosylases in the base excision repair pathway, such as MUTYH and OGG1, which repair 8-oxo-G:A pairs to G:C. However, if an 8-oxo-G:A mismatch resulted from incorporation of 8-oxo-dGTP in the de novo synthesized strand, the “repair” to G:C would actually lead to a T > G mutation . Consequently, depletion of MUTYH led to an increase of C > A mutations [57, 58] but a decrease of T > G mutations induced by 8-oxo-dGTP . Importantly, the mismatch of 8-oxo-G and A has been shown in yeast to be more efficiently repaired into G:C when 8-oxo-G is on the lagging strand template [60, 61], resulting in an enrichment of T > G mutations on the lagging strand template if the 8-oxoG:A mismatch originated from incorporation of 8-oxo-dGTP opposite A. Our data show strong lagging-strand bias of T > G mutations and overall higher exposure to signature 17 on the lagging strand, supporting the hypothesis that signature 17 is a by-product of oxidative damage.
DNA methylation-linked mutagenesis
A small but significant strand asymmetry was detected in signature 1 (Additional file 2: Figure S22). This observation is difficult to explain with spontaneous deamination of 5-methylcytosine, the assumed etiology of signature 1. However, the observation would be in line with a previously hypothesized model in which Pol ε has decreased fidelity of replicating 5-methylcytosine, causing an enrichment of C > T mutations in methylated cytosines on the leading strand, especially in samples with deficiency in MMR or Pol ε proofreading . Our analysis shows that signature 1 is slightly enriched on the leading strand, even in samples proficient for post-replicative proofreading and repair. Moreover, we observed that signature 1 is significantly correlated with replication timing in MSS, but not in MSI samples (Additional file 2: Figure S26), in line with the possibility that MMR repairs C > T errors in a CpG context introduced by Pol ε, as MMR is thought to be active primarily in the early-replicated regions .
Our findings demonstrate how the relationship between mutational signatures and DNA replication can help to illuminate the mechanisms underlying several currently unexplained mutational processes, as exemplified by signature 17 in esophageal cancer. Crucially, our computational analysis produces testable hypotheses which we anticipate to be experimentally validated in the future; for instance, that bypass of external-mutagen-induced DNA damage (such as UV light, tobacco smoking, and aristolochic acid) is more error prone during synthesis of the lagging strand, or that oxidative damage to the dNTP pool contributes to the etiology of signature 17. Our results also add a new perspective to the recent debate regarding the correlation of tissue-specific cell division rates with cancer risk . It has been argued that this correlation is primarily attributable to “bad luck” in the form of random errors that are introduced during replication by DNA polymerases. Critics of this theory have pointed out that the range of mutational signatures observed in cancer samples makes a purely replication-driven etiology of cancer mutations unlikely [63, 64]. Our analysis at least partially reconciles the two arguments, showing that most mutational signatures are themselves affected by DNA replication, including signatures linked to environmental mutagens. The presence of mutational signatures on the one hand and a strong relationship between replication and the risk of cancer on the other therefore need not be mutually exclusive. In conclusion, our results provide evidence that DNA replication interacts with most processes that introduce mutations in the genome, suggesting that differences amongst DNA polymerases and post-replicative repair enzymes might play a larger part in the accumulation of mutations than previously appreciated.
Cancer somatic mutations in 3056 whole-genome sequencing samples (Additional file 1: Table S1, Additional file 5: Table S4) were obtained from the data portal of TCGA, the data portal of the International Cancer Genome Consortium (ICGC), and previously published data in peer review journals [12, 21, 51, 65, 66]. For TCGA samples, aligned reads of paired tumor and normal samples were downloaded from the Genomic Data Commons Data Portal website under TCGA access request #10140 and somatic variants were called using Strelka (version 1.0.14)  with default parameters. The status of POLE-MUT and MSI samples were obtained from the supplementary data of [11, 21, 66].
Direction of replication
Left- and right-replicating domains were taken from  where replication timing profiles were generated in six lymphoblastoid cell lines , valleys and peaks (defined as regions with a slope with a magnitude lower than 250 rtu per Mb) were removed, after which left- and right-replicating domains were defined as timing transition regions with a negative and positive slope, respectively . In the left-replicated regions, the reference strand is used as a template for the leading strand, while the opposite strand is used as a template for the lagging strand, and vice versa for the right-replicated regions. Each domain (called territory in the original source code and data) is 20 kbp wide and annotated with the direction of replication and with replication timing.
The following regions were excluded: regions with low unique mappability of sequencing reads (positions with mean mappability in 100-bp sliding windows below 0.99 from UCSC mappability track: alignability of 50mers, accession number wgEncodeEH000320), gencode protein coding genes, and blacklisted regions defined by Anshul Kundaje  (Anshul_Hg19UltraHighSignalArtifactRegions.bed, Duke_Hg19SignalRepeatArtifactRegions.bed, and wgEncodeHg19ConsensusSignalArtifactRegions.bed from http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg19-human/).
Mutation frequency analysis
All variants were classified by the pyrimidine of the mutated Watson-Crick base pair (C or T), strand of this base pair (C or T), and the immediate 5′ and 3′ sequence context into 96 possible mutation types as described by Alexandrov et al. . The frequency of trinucleotides on each strand was computed for each replication domain. Then the mutation frequency of each mutation type in each replication domain on the leading (plus = Watson strand in left-replicating domains; minus = Crick strand in right-replicating domains) and lagging strand (vice versa) was computed for each sample.
Extraction of mutational signatures
Matlab code  was used for extraction of strand-specific mutational signatures. The input data were the mutation counts on the leading and lagging strands (summed from all replicating domains together, but without the excluded regions) in each sample. The 192-elements-long mutational signatures (example in Fig. 1b) were extracted in each cancer type separately (for K number of signatures between 2 and 7). The best K with minimal error and maximal stability (minimizing errorK/max(error) + (1 − stabilityK) and with stability of at least 0.8) was selected for each cancer type. The stability metric (computed as average silhouette width of clusters of signatures computed by Non-negative Matrix Factorisation) represents reproducibility of the model, while the error (computed as the average Frobenius reconstruction error) evaluates the accuracy with which the deciphered mutational signatures and their respective exposures describe the original matrix of mutations. Signatures present in only a small number of samples with very low exposures were excluded ((95th percentile of exposures of this signature)/(Mean total exposure per samples) < 0.2). The remaining signatures were then normalized by the frequency of trinucleotides in the leading and lagging strands and subsequently multiplied by the frequency of trinucleotides in the genome. This made them comparable with the 30 previously identified whole-genome-based COSMIC signatures (http://cancer.sanger.ac.uk/cosmic/signatures). Signatures extracted in each cancer type and COSMIC signatures were all pooled together (with equal values in the leading and lagging parts in the COSMIC signatures) and were clustered using unsupervised hierarchical clustering (with cosine distance and complete linkage). A threshold of 27 signatures was selected to identify clusters of similar signatures. Mis-clustering was avoided by manual examination (and whenever necessary re-assignment) of all signatures in all clusters (Additional file 2: Figure S6). The resulting 29 signatures (representing the detected clusters) contained 25 previously observed (COSMIC) and four new signatures. For the subsequent analysis, the signatures were converted back to 96 values: the 25 COSMIC signatures were used in their original form (i.e., having the same values as on the COSMIC website) and for the four newly identified signatures we used an average of the leading and lagging parts of the 192-elements-long signatures.
Annotation of signatures with leading and lagging direction
Each signature was annotated with the dominant strand direction (leading vs lagging) in each of the 96 mutation types (Fig. 1c). This was based on the dominant strand direction within the signature’s cluster. Types with unclear direction and small values were assigned according to the predominant direction of other trinucleotides of the same mutation group, such as C > T.
Calculating strand-specific exposures in individual samples
The matrix SLD has 96 rows and 29 columns and represents the leading parts of the signatures, i.e., the elements of the lagging parts contain zeros in this matrix. Similarly, SLG has the same size but contains zeros in the leading parts. The vector mLD of length 96 contains mutations on the leading strand (again normalized by trinucleotides in leading strand/whole genome), and similarly mLG contains mutations from the lagging strand. Finally, lsqnonneg finds a non-negative vector of exposures e such that it minimizes a function |m – S · e|. A similar approach has been used in  for finding exposures to a given set of signatures. Our extension includes the strand-specificity of the signatures. The interpretation of the model is that the matching exposure ematching represents exposure of the leading part of the signature on the leading strand and exposure of the lagging part of the signature on the lagging strand, whereas einverse represents the two remaining options. It is important to note that the direction of the mutation is relative to the nucleotide in the base pair chosen as the reference, i.e., mutations of a pyrimidine on the leading strand correspond to mutations of a purine on the lagging strand. In order to minimize the number of spurious signature exposures, the least exposed signature was incrementally removed (in both leading and lagging parts) as long as the resulting error did not exceed the original error by 0.5%. The resulting reported values in each sample and signature were the difference (or fold change) of ematching and einverse. In each signature, the signtest was used to compare matching and inverse exposures across samples with sufficient minimal exposure (at least 10) to the signature. Benjamini-Hochberg correction was applied to correct for multiple testing.
The left/right transitions of the replication domains represent regions with, on average, higher density of replication origins. In order to get better resolution of the replication origins, and to validate the results using an independent estimates of left- and right-replicating domains, genome-wide maps of human replication origins from SNS-seq by  were used. Eight fastq files (HeLa, iPS, hESC, IMR; each with two replicates) were downloaded and mapped to hg19 using bowtie2 (version 2.1.0). To control for the inefficient digestion of λ-exo step of SNS-seq, reads from non-replicating genomic DNA (LexoG0) were used as a control . Peaks were called using “macs callpeak” with parameters --gsize = hs --bw = 200 --qvalue = 0.05 --mfold 5 50 and LexoG0 mapped reads as a control. Only peaks covered in at least seven of the eight samples were used. We generated 1000 1-kbp bins to the left and right of each origin, as long as they did not reach half the distance to the next origin. We then used these replication direction annotations in the 1-kbp bins to calculate strand-specific exposures in individual samples as above and ascertained that both approaches lead to qualitatively very similar mutational strand asymmetries in individual signatures (Additional file 2: Figure S9).
Quantification of exposures with respect to replication timing, left/right transitions, and replication origins
Replication domains were divided into four quartiles by their average replication timing. The entire exposure quantification was computed separately in each quartile, or bin around left/right transition or bin around replication origin. In replication timing plots, a linear regression model (function fitlm in MatLab) was fitted to the mean exposure in each quartile (separately for matching and inverse exposures) and signtest on slopes of the fits in individual samples was used to evaluate significance of correlation with replication timing across the cohort. Benjamini-Hochberg correction was applied to correct for multiple testing.
Evaluation of robustness to noise in signature exposures
For each sample, 1000 perturbations of the input mutation frequency vector were performed, adding noise generated from normal distribution with mean 0 and standard deviation as 5% of the original mutation frequency. Exposures to signatures were quantified for each perturbation. Signatures with extremely variable exposures ((qtl75 − x)/x ≥ 0.3 or (x − qtl25)/x) ≥ 0.3) were removed for this sample and then the exposures were re-computed using only the signatures that passed the filtering.
We thank Dr. Mary Muers for comments on the manuscript.
S.K. and B.S.-B. are funded by Ludwig Cancer Research. S.K. received funding from BBSRC grant BB/M001873/1. M.T. and J.T. are funded by EPSRC (EP/F500394/1).
Availability of data and materials
All data used in this study were downloaded from public repositories. A complete list of whole-genome sequencing data sets and their accession information can be found in Additional file 1: Table S1 and the list of all whole-genome sequencing samples and signature exposures are in Additional file 5: Table S4. The replication direction  was taken from http://software.broadinstitute.org/cancer/cga/AsymTools, table per_base_territories_20kb.mat. The replication origins measured by SNS-seq  were based on the supplementary files of National Center for Biotechnology Information Gene Expression Omnibus accession GSE37757. The strand-specific signatures are in Additional file 6: Table S5. The code is available at https://bitbucket.org/bsblabludwig/replicationasymmetry  and Figshare (https://figshare.com/s/21174d60d594ddb9b83d, DOI https://doi.org/10.6084/m9.figshare.6941456)  under GPL3 license.
BS-B and MT designed the study; MT performed the analysis with contributions from JT; BS-B and MT wrote the manuscript with contributions from SK and JT. All authors read and approved the final manuscript.
The review history is available as Additional file 7.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Secrier M, Li X, de Silva N, Eldridge MD, Contino G, Bornschein J, et al. Mutational signatures in esophageal adenocarcinoma define etiologically distinct subgroups with therapeutic relevance. Nat Genet. 2016;2016:1131–41. Available from: http://0-www.nature.com.brum.beds.ac.uk/doifinder/10.1038/ng.3659.View ArticleGoogle Scholar
- Stenzinger A, Pfarr N, Endris V, Penzel R, Jansen L, Wolf T, et al. Mutations in POLE and survival of colorectal cancer patients--link to disease stage and treatment. Cancer Med. 2014;3:1527–38.View ArticlePubMedPubMed CentralGoogle Scholar
- Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015;347:78–81. Available from: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/25554788.View ArticlePubMedPubMed CentralGoogle Scholar
- Lujan SA, Williams JS, Kunkel TA. DNA polymerases divide the labor of genome replication. Trends Cell Biol. 2016;26:640–54. Available from: https://doi.org/10.1016/j.tcb.2016.04.012.View ArticlePubMedPubMed CentralGoogle Scholar
- Fragkos M, Ganier O, Coulombe P, Méchali M. DNA replication origin activation in space and time. Nat Rev Mol Cell Biol. 2015;16:360–74. Available from: https://doi.org/10.1038/nrm4002%5Cnhttp://www.ncbi.nlm.nih.gov/pubmed/25999062.View ArticlePubMedGoogle Scholar
- Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, Sunyaev SR. Human mutation rate associated with DNA replication timing. Nat Genet. 2009;41:393–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Lawrence MS, Stojanov P, Polak P, Kryukov G V, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. Available from: https://doi.org/10.1038/nature12213.
- Shinbrot E, Henninger EE, Weinhold N, Covington KR, Göksenin AY, Schultz N, et al. Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. Genome Res. 2014:1740–50. Available from: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/25228659.
- Lujan SA, Williams JS, Pursell ZF, Abdulovic-Cui AA, Clark AB, Nick McElhinny SA, et al. Mismatch repair balances leading and lagging strand DNA replication Fidelity. PLoS Genet. 2012;8:e1003016.View ArticlePubMedPubMed CentralGoogle Scholar
- Reijns MAM, Kemp H, Ding J, de Procé SM, Jackson AP, Taylor MS. Lagging-strand replication shapes the mutational landscape of the genome. Nature. 2015;518:502–6. Available from: http://0-www.nature.com.brum.beds.ac.uk/nature/journal/v518/n7540/full/nature14183.html.
- Haradhvala NJ, Polak P, Stojanov P, Covington KR, Shinbrot E, Hess JM, et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell. 2016;164:538–49. Available from: http://0-linkinghub.elsevier.com.brum.beds.ac.uk/retrieve/pii/S0092867415017146.View ArticlePubMedPubMed CentralGoogle Scholar
- Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–21. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3776390&tool=pmcentrez&rendertype=abstract.
- Besnard E, Babled A, Lapasset L, Milhavet O, Parrinello H, Dantec C, et al. Unraveling cell type-specific and reprogrammable human replication origin signatures associated with G-quadruplex consensus motifs. Nat Struct Mol Biol. 2012;19:837–44.View ArticlePubMedGoogle Scholar
- Foulk MS, Urban JM, Casella C, Gerbi SA. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25:725–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Morganella S, Alexandrov LB, Glodzik D, Zou X, Davies H, Staaf J, et al. The topography of mutational processes in breast cancer genomes. Nat Commun. 2016;7:11383. Available from: http://0-www.nature.com.brum.beds.ac.uk/doifinder/10.1038/ncomms11383.View ArticlePubMedPubMed CentralGoogle Scholar
- Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013;3:246–59. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3588146&tool=pmcentrez&rendertype=abstract
- Hoopes JI, Cortez LM, Mertz TM, Malc EP, Mieczkowski PA, Roberts SA. APOBEC3A and APOBEC3B preferentially deaminate the lagging strand template during DNA replication. Cell Rep. 2016;14:1273–82. Available from: http://0-linkinghub.elsevier.com.brum.beds.ac.uk/retrieve/pii/S2211124716000425.
- Green AM, Landry S, Budagyan K, Avgousti DC, Shalhout S, Bhagwat AS, et al. APOBEC3A damages the cellular genome during DNA replication. Cell Cycle. 2016;15:998–1008. Available from: https://doi.org/10.1080/15384101.2016.1152426.View ArticlePubMedPubMed CentralGoogle Scholar
- Seplyarskiy VB, Soldatov RA, Popadin KY, Antonarakis SE, Bazykin GA, Nikolaev SI. APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand during replication. Genome Res. 2016;26:174–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhao H, Thienpont B, Yesilyurt BT, Moisse M, Reumers J, Coenegrachts L, et al. Mismatch repair deficiency endows tumors with a unique mutation signature and sensitivity to DNA double-strand breaks. elife. 2014;3:e02725.View ArticlePubMedPubMed CentralGoogle Scholar
- Shlien A, Campbell BB, de Borja R, Alexandrov LB, Merico D, Wedge D, et al. Combined hereditary and somatic mutations of replication error repair genes result in rapid onset of ultra-hypermutated cancers. Nat Genet. 2015;47:257–62. Available from: http://0-www.nature.com.brum.beds.ac.uk/doifinder/10.1038/ng.3202.View ArticlePubMedGoogle Scholar
- Supek F, Lehner B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature. 2015;521:81–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Stillman B. DNA polymerases at the replication fork in eukaryotes. Mol Cell. 2008;30:259–60.View ArticlePubMedPubMed CentralGoogle Scholar
- McCulloch SD, Kunkel TA. The fidelity of DNA synthesis by eukaryotic replicative and translesion synthesis polymerases. Cell Res. 2008;18:148–61.View ArticlePubMedPubMed CentralGoogle Scholar
- Waisertreiger IS-R, Liston VG, Menezes MR, Kim H-M, Lobachev KS, Stepchenkova EI, et al. Modulation of mutagenesis in eukaryotes by DNA replication fork dynamics and quality of nucleotide pools. Environ Mol Mutagen. 2012;53:699–724. Available from: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/23055184.
- Nick McElhinny SA, Kissling GE, Kunkel TA. Differential correction of lagging-strand replication errors made by DNA polymerases α and δ. Proc Natl Acad Sci U S A. 2010;107:21070–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Georgescu RE, Schauer GD, Yao NY, Langston LD, Yurieva O, Zhang D, et al. Reconstitution of a eukaryotic replisome reveals suppression mechanisms that define leading/lagging strand operation. elife. 2015;2015:1–20.Google Scholar
- Andrianova MA, Bazykin GA, Nikolaev SI, Seplyarskiy VB. Human mismatch repair system balances mutation rates between strands by removing more mismatches from the lagging strand. Genome Res. 2017;27:1336–43.View ArticlePubMedPubMed CentralGoogle Scholar
- Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational signatures in human cancers. Nat Rev Genet. 2014;15:585–98. Available from: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/24981601.View ArticlePubMedPubMed CentralGoogle Scholar
- Nik-Zainal S, Kucab JE, Morganella S, Glodzik D, Alexandrov LB, Arlt VM, et al. The genome as a record of environmental exposure. Mutagenesis. 2015;30:763–70.PubMedPubMed CentralGoogle Scholar
- Waters LS, Walker GC. The critical mutagenic translesion DNA polymerase Rev1 is highly expressed during G(2)/M phase rather than S phase. Proc Natl Acad Sci U S A. 2006;103:8971–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Lang GI, Murray AW. Mutation rates across budding yeast chromosome VI are correlated with replication timing. Genome Biol Evol. 2011;3:799–811.View ArticlePubMedPubMed CentralGoogle Scholar
- Karras GI, Fumasoni M, Sienski G, Vanoli F, Branzei D. Article noncanonical role of the 9-1-1 clamp in the error-free DNA damage tolerance pathway. Mol Cell. 2013;49:536–46. Available from: https://doi.org/10.1016/j.molcel.2012.11.016.View ArticlePubMedGoogle Scholar
- Gonzalez-Huici V, Szakal B, Urulangodi M, Psakhye I, Castellucci F, Menolfi D, et al. DNA bending facilitates the error-free DNA damage tolerance pathway and upholds genome integrity. EMBO J. 2014;33:327–40.View ArticlePubMedPubMed CentralGoogle Scholar
- Bi X. Mechanism of DNA damage tolerance. World J Biol Chem. 2015;6:48. Available from: http://www.wjgnet.com/1949-8454/full/v6/i3/48.htm.View ArticlePubMedPubMed CentralGoogle Scholar
- Branzei D, Szakal B. DNA damage tolerance by recombination: Molecular pathways and DNA structures. DNA Repair (Amst). 2016;44:68–75. Available from: https://doi.org/10.1016/j.dnarep.2016.05.008.View ArticleGoogle Scholar
- D’Souza S, Yamanaka K, Walker GC. Non mutagenic and mutagenic DNA damage tolerance. Cell Cycle. 2016;15:314–5. Available from: https://doi.org/10.1080/15384101.2015.1132909.View ArticlePubMedPubMed CentralGoogle Scholar
- Diamant N, Hendel A, Vered I, Carell T, Reißner T, De Wind N, et al. DNA damage bypass operates in the S and G2 phases of the cell cycle and exhibits differential mutagenicity. Nucleic Acids Res. 2012;40:170–80.View ArticlePubMedGoogle Scholar
- Adar S, Hu J, Lieb JD, Sancar A. Genome-wide kinetics of DNA excision repair in relation to chromatin state and mutagenesis. Proc Natl Acad Sci U S A. 2016:201603388. Available from: http://www.pnas.org/lookup/doi/10.1073/pnas.1603388113.
- Cordeiro-Stone M, Nikolaishvili-Feinberg N. Asymmetry of DNA replication and translesion synthesis of UV-induced thymine dimers. Mutat Res. 2002;510:91–106.View ArticlePubMedGoogle Scholar
- McGregor WG, Wei D, Maher VM, McCormick JJ. Abnormal, error-prone bypass of photoproducts by xeroderma pigmentosum variant cell extracts results in extreme strand bias for the kinds of mutations induced by UV light. Mol Cell Biol. 1999;19:147–54. Available from: http://mcb.asm.org/content/19/1/147.abstract.View ArticlePubMedPubMed CentralGoogle Scholar
- Murugaesu N, Wilson GA, Birkbak NJ, Watkins TBK, McGranahan N, Kumar S, et al. Tracking the genomic evolution of esophageal adenocarcinoma through neoadjuvant chemotherapy. Cancer Discov. 2015;5:821–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Ross-Innes CS, Becq J, Warren A, Cheetham RK, Northen H, O’Donovan M, et al. Whole-genome sequencing provides new insights into the clonal architecture of Barrett’s esophagus and esophageal adenocarcinoma. Nat Genet. 2015;47:1–11. Available from: http://0-www.nature.com.brum.beds.ac.uk/doifinder/10.1038/ng.3357.View ArticleGoogle Scholar
- Souza RF. The role of acid and bile reflux in oesophagitis and Barrett’s metaplasia. Biochem Soc Trans. 2010;38:348–52. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3072824&tool=pmcentrez&rendertype=abstract.View ArticlePubMedPubMed CentralGoogle Scholar
- Erichsen R, Robertson D, Farkas DK, Pedersen L, Pohl H, Baron JA, et al. Erosive reflux disease increases risk for esophageal adenocarcinoma, compared with nonerosive reflux. Clin Gastroenterol Hepatol. 2012;10:475–480.e1. Available from: https://doi.org/10.1016/j.cgh.2011.12.038.View ArticlePubMedGoogle Scholar
- Fein M, Maroske J, Fuchs KH. Importance of duodenogastric reflux in gastro-oesophageal reflux disease. Br J Surg. 2006;93:1475–82.View ArticlePubMedGoogle Scholar
- Kauppi J, Räsänen J, Sihvo E, Nieminen U, Arkkila P, Ahotupa M, et al. Increased oxidative stress in the proximal stomach of patients with Barrett’s esophagus and adenocarcinoma of the esophagus and Esophagogastric junction. Transl Oncol. 2016;9:336–9. Available from: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/27567957.View ArticlePubMedPubMed CentralGoogle Scholar
- Rasanen JV, Sihvo EIT, Ahotupa MO, Färkkilä MA, Salo JA. The expression of 8-hydroxydeoxyguanosine in oesophageal tissues and tumours. Eur J Surg Oncol. 2007;33:1164–8.View ArticlePubMedGoogle Scholar
- Jimenez P, Piazuelo E, Sanchez MT, Ortego J, Soteras F, Lanas A. Free radicals and antioxidant systems in reflux esophagitis and Barrett’s esophagus. World J Gastroenterol. 2005;11:2697–703. Available from: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/15884106.View ArticlePubMedPubMed CentralGoogle Scholar
- Dvorak K, Payne CM, Chavarria M, Ramsey L, Dvorakova B, Bernstein H, et al. Bile acids in combination with low pH induce oxidative stress and oxidative DNA damage: relevance to the pathogenesis of Barrett’s oesophagus. Gut. 2007;56:763–71.View ArticlePubMedGoogle Scholar
- Dulak AM, Stojanov P, Peng S, Lawrence MS, Fox C, Stewart C, et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat Genet. 2013;45:478–86. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3678719&tool=pmcentrez&rendertype=abstract.View ArticlePubMedPubMed CentralGoogle Scholar
- Nones K, Waddell N, Wayte N, Patch A-M, Bailey P, Newell F, et al. Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis. Nat Commun. 2015;5:1–9. Available from: https://doi.org/10.1038/ncomms6224%5Cnpapers2://publication/doi/10.1038/ncomms6224.Google Scholar
- Inoue M, Kamiya H, Fujikawa K, Ootsuyama Y, Murata-Kamiya N, Osaki T, et al. Induction of chromosomal gene mutations in Escherichia coli by direct incorporation of oxidatively damaged nucleotides: new evaluation method for mutagenesis by damaged dna precursors in vivo. J Biol Chem. 1998;273:11069–74.View ArticlePubMedGoogle Scholar
- Satou K, Kawai K, Kasai H, Harashima H, Kamiya H. Mutagenic effects of 8-hydroxy-dGTP in live mammalian cells. Free Radic Biol Med. 2007;42:1552–60. Available from: https://doi.org/10.1016/j.freeradbiomed.2007.02.024.View ArticlePubMedGoogle Scholar
- Satou K, Hori M, Kawai K, Kasai H, Harashima H, Kamiya H. Involvement of specialized DNA polymerases in mutagenesis by 8-hydroxy-dGTP in human cells. DNA Repair (Amst). 2009;8:637–42.View ArticleGoogle Scholar
- Kamiya H. Mutations induced by oxidized DNA precursors and their prevention by nucleotide pool sanitization enzymes. Genes Environ. 2007;29:133–40.View ArticleGoogle Scholar
- Suzuki T, Kamiya H. Mutations induced by 8-hydroxyguanine (8-oxo-7,8-dihydroguanine), a representative oxidized base, in mammalian cells. Genes Environ. 2017;39:2. Available from: https://doi.org/10.1186/s41021-016-0051-y.View ArticlePubMedGoogle Scholar
- Rashid M, Fischer A, Wilson CH, Tiffen J, Rust AG, Stevens P, et al. Adenoma development in familial adenomatous polyposis and MUTYH-associated polyposis: somatic landscape and driver genes. J Pathol. 2016;238:98–108.View ArticlePubMedGoogle Scholar
- Suzuki T, Harashima H, Kamiya H. Effects of base excision repair proteins on mutagenesis by 8-oxo-7,8-dihydroguanine (8-hydroxyguanine) paired with cytosine and adenine. DNA Repair (Amst). 2010;9:542–50. Available from: https://doi.org/10.1016/j.dnarep.2010.02.004.View ArticleGoogle Scholar
- Pavlov YI, Mian IM, Kunkel TA. Evidence for preferential mismatch repair of lagging strand DNA replication errors in yeast. Curr Biol. 2003;13:744–8.View ArticlePubMedGoogle Scholar
- Mudrak SV, Welz-Voegele C, Jinks-Robertson S. The polymerase eta translesion synthesis DNA polymerase acts independently of the mismatch repair system to limit mutagenesis caused by 7,8-dihydro-8-oxoguanine in yeast. Mol Cell Biol. 2009;29:5316–26. Available from: http://0-www.scopus.com.brum.beds.ac.uk/inward/record.url?eid=2-s2.0-70349329456&partnerID=tZOtx3y1.View ArticlePubMedPubMed CentralGoogle Scholar
- Tomkova M, McClellan M, Kriaucionis S, Schuster-Böckler B. DNA replication and associated repair pathways are involved in the mutagenesis of methylated cytosine. DNA Repair (Amst). 2017. Available from: https://0-www-sciencedirect-com.brum.beds.ac.uk/science/article/pii/S1568786417303464.
- Gao Z, Wyman MJ, Sella G, Przeworski M. Interpreting the dependence of mutation rates on age and time. PLoS Biol. 2016;14:e1002355. Available from: http://dx.plos.org/10.1371/journal.pbio.1002355.
- Crossan GP, Garaycoechea JI, Patel KJ. Do mutational dynamics in stem cells explain the origin of common cancers? Cell Stem Cell. 2015;16:111–2. Available from: https://doi.org/10.1016/j.stem.2015.01.009.View ArticlePubMedGoogle Scholar
- Bass AJ, Lawrence MS, Brace LE, Ramos AH, Drier Y, Cibulskis K, et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat Genet. 2011;43:964–8. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3802528&tool=pmcentrez&rendertype=abstract.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang K, Yuen ST, Xu J, Lee SP, Yan HHN, Shi ST, et al. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet. 2014;46:573–82. Available from: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/pubmed/24816253.View ArticlePubMedGoogle Scholar
- Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28:1811–7.View ArticlePubMedGoogle Scholar
- Koren A, Polak P, Nemesh J, Michaelson JJ, Sebat J, Sunyaev SR, et al. Differential relationship of DNA replication timing to different forms of human mutation and variation. Am J Hum Genet. 2012;91:1033–40. Available from: https://doi.org/10.1016/j.ajhg.2012.10.018.View ArticlePubMedPubMed CentralGoogle Scholar
- Encode Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. Available from: http://0-www.nature.com.brum.beds.ac.uk/nature/journal/v489/n7414/full/nature11247.html.
- Rosenthal R, McGranahan N, Herrero J, Taylor BS, Swanton C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 2016;17:31. Available from: http://0-genomebiology.biomedcentral.com.brum.beds.ac.uk/articles/10.1186/s13059-016-0893-4.View ArticlePubMedPubMed CentralGoogle Scholar
- Tomkova M, Tomek J, Kriaucionis S, Schuster-Bockler B. Mutational signature distribution varies with DNA replication timing and strand asymmetry. Source code. Bitbucket. https://bitbucket.org/bsblabludwig/replicationasymmetry (2018).Google Scholar
- Tomkova M, Tomek J, Kriaucionis S, Schuster-Bockler B. Mutational signature distribution varies with DNA replication timing and strand asymmetry Source code figshare. 2018; https://doi.org/10.6084/m9.figshare.6941456.