Evolution-Variation-and Pathology-The Dual Role of Structural Variants in Human Genomes -

Bibliographic and Educational Resources in Cytogenomics

This platform is designed to serve as a comprehensive educational and bibliographic resource for healthcare professionals involved in cytogenomics. Covering a wide range of up-to-date topics within the field, it offers structured access to recent scientific literature and a variety of pedagogical tools tailored to clinicians, educators, and trainees.

Each topic is grounded in a curated selection of recent publications, accompanied by in-depth summaries that go far beyond traditional abstracts—offering clear, clinically relevant insights without the time burden of reading full articles. These summaries act as gateways to the original literature, helping users identify which articles warrant deeper exploration.

In addition to these detailed reviews, users will find a rich library of supplementary materials: topic overviews, FAQs, glossaries, synthesis sheets, thematic podcasts, fully structured course outlines adaptable for teaching, and ready-to-use PowerPoint slide decks. All resources are open access and formatted for easy integration into academic or clinical training programs.

By providing practical, well-structured content, the platform enables members of the cytogenomics community to efficiently update their knowledge on selected topics. It also offers educational materials that are easily adaptable for instructional use.

Evolution, Variation, and Pathology: The Dual Role of Structural Variants in Human Genomes

Dr Cécile Dupont

Dr Pierre durand

1. Evolution, Variation, and Pathology: The Dual Role of Structural Variants in Human Genomes

Structural variants (SVs)—large-scale genomic alterations that include deletions, duplications, insertions, inversions, and translocations—represent one of the most dynamic and consequential forms of genetic variation in humans. Far from being rare genomic accidents, they are integral to genome evolution, a major source of inter-individual diversity, and a frequent cause of genetic disease. Recent advances in next-generation sequencing, high-resolution cytogenetics, and chromosome conformation mapping have profoundly changed how cytogeneticists understand the impact of SVs, revealing that the same mechanisms driving genomic innovation can also generate pathogenic rearrangements. The dual role of structural variation—fueling evolutionary change while predisposing to disease—illustrates the delicate balance between genome plasticity and stability that underlies human biology.

2. The Nature and Formation of Structural Variants

SVs are defined as genomic alterations typically larger than 50 base pairs, encompassing copy number variants (CNVs, including deletions and duplications), inversions, insertions, and more complex rearrangements. Unlike single-nucleotide polymorphisms, SVs disrupt the continuity of the DNA double helix, often involving thousands to millions of base pairs. Their formation is intimately linked to the architectural features of the genome—especially low-copy repeats (LCRs), segmental duplications, and repetitive elements such as Alu and LINE sequences—that provide homologous substrates for aberrant recombination.

As described by Carvalho and Lupski (2016), two main mechanistic classes underlie SV formation. Recombination-based mechanisms, particularly non-allelic homologous recombination (NAHR), occur when misaligned repeats engage in crossover during meiosis or mitosis, generating recurrent deletions and duplications of predictable size and breakpoint location. By contrast, replication-based mechanisms, such as Fork Stalling and Template Switching (FoSTeS) and Microhomology-Mediated Break-Induced Replication (MMBIR), involve errors during DNA synthesis and repair, producing complex, nonrecurrent rearrangements often accompanied by microhomology at junctions. Non-homologous end joining (NHEJ) further contributes to this diversity through blunt-end or microhomology-mediated DNA repair. These molecular pathways, though essential for genomic maintenance, inadvertently create the substrates for both genomic diversity and disease.

3. Structural Variants as Engines of Evolutionary Innovation

From an evolutionary perspective, SVs are not merely sources of instability but key drivers of genome evolution and species divergence. Copy number variation—segments of DNA present in variable copy numbers across individuals—can alter gene dosage, create novel gene combinations, and facilitate adaptive traits. Studies like those reviewed by Gamazon and Stranger (2015) emphasize that CNVs contribute substantially to human phenotypic diversity and adaptation. For instance, variation in the AMY1 gene, encoding salivary amylase, correlates with dietary starch intake across human populations, demonstrating how CNV-mediated dosage changes can confer selective advantages.

Duplications provide raw material for gene innovation. Initially redundant copies can accumulate mutations, giving rise to new gene functions (neofunctionalization) or partitioning of ancestral functions (subfunctionalization). Newman et al. (2015) revealed that most duplication CNVs are tandem and in direct orientation, a configuration that facilitates gene copy expansion without disrupting regulatory context. Yet, some duplications result in fusion genes at breakpoints—novel chimeric sequences that can either drive evolutionary novelty or cause disease. The same processes that produced beneficial gene families, such as those for olfactory receptors or immune system components, can also underlie deleterious rearrangements associated with developmental disorders.

Moreover, the large-scale organization of the genome into topologically associating domains (TADs) provides an additional evolutionary layer. As Lupiáñez et al. (2015) demonstrated, these chromatin domains delineate regions within which genes and their enhancers interact. TAD boundaries, typically enriched in CTCF and cohesin, appear remarkably conserved across species, suggesting strong evolutionary constraints. Alterations of TAD structure can create novel enhancer-promoter interactions, potentially driving morphological diversification during evolution. However, the same rearrangements that might contribute to evolutionary innovation can also disrupt normal gene regulation, leading to congenital malformations. Thus, genome topology functions as both a scaffold for evolutionary flexibility and a safeguard against pathological misexpression.

4. Structural Variation as a Source of Pathology

While SVs can be beneficial or neutral, they are also a major cause of human disease. Clinical cytogenetics increasingly recognizes SVs as responsible for a substantial proportion of congenital anomalies, neurodevelopmental disorders, and cancers. Pathogenic effects arise through several mechanisms: gene dosage imbalance, gene disruption, position effects altering regulatory context, or the creation of fusion genes.

In deletions, haploinsufficiency results when a single remaining allele cannot produce adequate gene product. Duplications can lead to triplosensitivity, where excess gene dosage perturbs normal cellular pathways. As Newman et al. detailed, complex duplication events can also generate in-frame gene fusions, creating aberrant proteins with altered or novel functions. Such rearrangements are hallmarks of oncogenesis but are increasingly recognized in congenital disease.

Beyond dosage effects, SVs can exert regulatory pathologies through disruption of chromatin architecture. The work of Lupiáñez et al. revealed that rearrangements crossing TAD boundaries can “rewire” enhancer–promoter interactions. In limb malformations such as F-syndrome and brachydactyly, duplications or deletions near the EPHA4 locus reposition limb-specific enhancers, causing ectopic activation of neighboring genes like PAX3 or IHH. This “enhancer hijacking” illustrates how structural rearrangements confined to noncoding DNA can have profound developmental consequences. It also underscores a major paradigm shift in cytogenetics: pathogenicity cannot be predicted solely by coding sequence content but must consider 3D genome organization.

5. The Continuum Between Evolutionary Adaptation and Disease

A unifying insight from these studies is that the line separating adaptive and pathological variation is remarkably thin. The same molecular mechanisms—NAHR, replication-based template switching, and chromatin domain reorganization—that promote genetic diversity are also responsible for disease-causing mutations. This evolutionary–pathological continuum reflects the inherent trade-off between genome flexibility and stability.

Genomic regions that facilitate rapid structural innovation, such as those rich in segmental duplications or repetitive elements, are simultaneously “fragile sites” prone to deleterious rearrangements. For example, loci associated with Charcot–Marie–Tooth disease type 1A and its reciprocal deletion syndrome, hereditary neuropathy with liability to pressure palsies, are both mediated by NAHR between the same flanking repeats. Such loci exemplify the “two-edged sword” of genomic architecture: repeats that enable adaptive copy number expansion also predispose to recurrent disease-causing rearrangements.

Similarly, the functional redundancy introduced by gene duplication can mask deleterious effects in the short term, allowing variants to persist in populations and potentially contribute to adaptation. However, under different environmental or developmental contexts, these same CNVs can become pathogenic. The UGT2B17 deletion, for instance, influences steroid metabolism and drug response but has also been implicated in osteoporosis and graft-versus-host disease, highlighting how the same structural variant can have pleiotropic outcomes depending on tissue context and environmental interactions.

6. Cytogenetic and Genomic Perspectives: Mapping and Interpreting Structural Variation

Modern cytogenetics has evolved from karyotyping to a multi-scale, integrative discipline combining molecular, genomic, and 3D chromatin approaches. The studies discussed collectively illustrate the necessity of such integration. High-resolution array CGH and SNP arrays provide quantitative assessments of CNVs, while next-generation and whole-genome sequencing, as applied by Newman et al., enable nucleotide-level resolution of breakpoints. FISH and long-range PCR remain indispensable for validating complex rearrangements.

Equally transformative has been the inclusion of chromatin conformation capture technologies—such as Hi-C and 4C-seq—exemplified by Lupiáñez et al. These techniques reveal how structural variants reshape higher-order genomic topology, offering a bridge between classical cytogenetics and functional genomics. Integrating such data with transcriptomic analyses (Gamazon & Stranger) enables researchers to link structural variation directly to expression quantitative trait loci (eQTLs) and clinical phenotypes, ultimately supporting personalized genomic medicine.

This convergence of cytogenetic visualization, molecular sequencing, and 3D architecture mapping defines a new era of integrative cytogenomics, where the interpretation of SVs extends beyond the linear genome to encompass its spatial and functional dimensions.

7. Conclusion

Structural variants embody the paradox of genomic evolution: they are simultaneously architects of diversity and agents of disease. Through recombination and replication-based mechanisms, the human genome continuously reshapes itself, generating CNVs, inversions, and translocations that contribute to both adaptability and pathology. As shown across the four foundational studies, the impact of SVs is multifaceted—affecting gene dosage, regulatory networks, and chromatin architecture.

Understanding this duality is central to modern cytogenetics and human genetics. It reframes disease not as an aberration of an otherwise static genome, but as an inherent consequence of the same dynamic processes that shaped our species. By integrating molecular mechanisms, evolutionary theory, and three-dimensional genome biology, researchers and clinicians can better predict the consequences of structural variation, distinguishing when the genome’s plasticity is a source of resilience—and when it becomes a source of vulnerability.

In essence, the study of structural variants bridges the evolutionary and medical dimensions of genetics. It demonstrates that the forces driving the birth of new genes, adaptive traits, and complex regulatory systems are inseparable from those that give rise to human disorders. The genome, ever dynamic, evolves and errs through the same molecular grammar—a grammar that cytogenetics now seeks not only to decode, but to interpret in the context of both our past and our pathology.

1. What exactly qualifies as a structural variant (SV)?

A structural variant is any genomic alteration larger than roughly 50 base pairs that changes the structure or organization of the DNA. This includes deletions, duplications, insertions, inversions, and translocations. Some definitions extend to complex rearrangements involving multiple breakpoints or combined events.

2. How are structural variants different from single nucleotide variants (SNVs)?

SNVs affect one or a few bases, while SVs involve breaks and rearrangements in the DNA backbone, often spanning thousands to millions of base pairs. SVs therefore tend to have much larger effects on gene dosage, chromatin organization, and genome architecture.

3. What are the main molecular mechanisms that generate structural variants?

The principal mechanisms are:

Non-allelic homologous recombination (NAHR) during meiosis or mitosis between repetitive sequences.
Replication-based mechanisms such as FoSTeS (Fork Stalling and Template Switching) and MMBIR (Microhomology-Mediated Break-Induced Replication).
Non-homologous end joining (NHEJ) during double-strand break repair.

4. What determines where SVs occur in the genome?

SV hotspots often coincide with low-copy repeats (LCRs) or segmental duplications, which provide homologous substrates for misalignment during recombination. Repetitive elements such as Alu and LINEs also create instability. These architectural features make certain regions inherently prone to rearrangement.

5. What is a copy number variant (CNV)?

A CNV is a type of structural variant that changes the number of copies of a DNA segment compared to a reference genome. It includes both deletions (copy loss) and duplications (copy gain) ranging from kilobases to megabases in size.

6. How can CNVs influence gene expression?

CNVs can alter gene dosage directly by changing copy number or indirectly by modifying chromatin environment and regulatory contacts. Gamazon and Stranger (2015) showed that CNVs act as expression quantitative trait loci (eQTLs), affecting transcription levels in tissue-specific and population-specific ways.

7. Why are duplications considered “engines of evolution”?

Duplications provide redundant genetic material that can acquire new functions. Over time, duplicated genes can diverge through mutation, leading to neofunctionalization (new roles) or subfunctionalization (partition of original functions). This process drives innovation in gene families, immunity, and sensory systems.

8. What are tandem duplications and why are they important?

Tandem duplications occur when the duplicated segment is placed adjacent to the original locus in the same orientation. Newman et al. (2015) found that most human duplication CNVs are tandem and can create fusion genes at the breakpoints, offering both evolutionary potential and disease risk.

9. How can structural variants create fusion genes?

When breakpoints join sequences from two distinct genes in the same orientation and reading frame, a new chimeric gene can form. Such fusions can yield new proteins with hybrid functions or, in some cases, cause misregulation and disease.

10. How are structural variants detected today?

Techniques include:

Array comparative genomic hybridization (aCGH) and SNP arrays for CNV mapping.
Next-generation sequencing (NGS) and long-read sequencing for breakpoint resolution.
Fluorescence in situ hybridization (FISH) for spatial visualization.
Hi-C and 4C-seq for studying 3D chromatin topology and enhancer-promoter contacts.

11. What is the role of topologically associating domains (TADs) in structural variation?

TADs are 3D chromatin units that constrain enhancer-promoter interactions. Lupiáñez et al. (2015) demonstrated that SVs disrupting TAD boundaries can “rewire” enhancer contacts, leading to ectopic gene expression and developmental disorders such as limb malformations.

12. How can noncoding structural variants cause disease?

Even without affecting coding sequences, SVs can misplace enhancers relative to their target genes. When a TAD boundary is deleted or duplicated, enhancers may activate genes in neighboring domains, a phenomenon known as enhancer hijacking.

13. What are recurrent versus nonrecurrent rearrangements?

Recurrent rearrangements have identical size and breakpoints in unrelated individuals, often mediated by NAHR between the same repeats.
Nonrecurrent rearrangements are unique events, usually caused by replication-based or repair-based mechanisms, showing variable breakpoints.

14. Why are structural variants both adaptive and pathogenic?

The same mutational processes that produce adaptive changes (e.g., gene duplication, regulatory innovation) can also disrupt essential genes or regulatory structures, causing disease. Genome plasticity thus provides evolutionary flexibility at the cost of stability.

15. Can you give examples of adaptive CNVs in humans?

Yes — notable examples include:

AMY1 gene duplications increasing starch digestion efficiency in high-starch diets.
CNVs in immune-related or olfactory receptor genes contributing to environmental adaptation.
UGT2B17 deletion affecting steroid metabolism and drug response.

16. How do structural variants contribute to neurodevelopmental and psychiatric disorders?

Large, rare CNVs (e.g., at 16p11.2, 1q21.1, or 22q11.2) are linked to autism spectrum disorder, schizophrenia, and intellectual disability. These rearrangements affect dosage-sensitive genes and can alter neuronal development and synaptic signaling pathways.

17. What is meant by “mirror phenotypes” in CNV syndromes?

Reciprocal deletion and duplication of the same genomic region can produce opposite traits—for instance, obesity versus underweight, or microcephaly versus macrocephaly—reflecting gene dosage sensitivity. This phenomenon supports the mechanistic symmetry of NAHR-mediated events.

18. How stable are TAD boundaries evolutionarily?

TAD boundaries are highly conserved across species and cell types, implying strong selective pressure to maintain genomic topology. However, small shifts or losses in boundaries may contribute to species-specific gene regulation and morphological evolution.

19. How does structural variation interact with single-nucleotide variation in disease risk?

SVs and SNPs often co-occur or are in linkage disequilibrium. Sometimes an apparent SNP association in GWAS actually tags an underlying CNV. Integrating both types of variation is essential for accurately identifying causal mechanisms in complex traits.

20. What is the emerging role of cytogenetics in the era of genome architecture?

Modern cytogenetics has evolved into integrative cytogenomics—combining classical visualization (karyotyping, FISH) with genome-wide assays (WGS, Hi-C, RNA-seq). This holistic approach links the linear sequence with the 3D chromatin context, enabling researchers to interpret how structural variants simultaneously shape evolution, phenotypic diversity, and human disease.

Gamazon ER, Stranger BE. The impact of human copy number variation on gene expression. Briefings in Functional Genomics. 2015;14(5):352–357. doi:10.1093/bfgp/elv017
Newman S, Hermetz KE, Weckselblatt B, Rudd MK. Next-generation sequencing of duplication CNVs reveals that most are tandem and some create fusion genes at breakpoints. American Journal of Human Genetics. 2015;96(2):208–220. doi:1016/j.ajhg.2014.12.017
Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene–enhancer interactions. 2015;161(5):1012–1025. doi:10.1016/j.cell.2015.04.004
Carvalho CMB, Lupski JR. Mechanisms underlying structural variant formation in genomic disorders. Nature Reviews Genetics. 2016;17(4):224–238. doi:10.1038/nrg.2015.25

Gamazon ER, Stranger BE. The impact of human copy number variation on gene expression.

This paper provides a foundational understanding of the relationship between genomic structural variation and gene regulation — a cornerstone in modern human genetics and cytogenomics.

Introduction: Structural Variation as a Major Source of Genetic Diversity

For decades, human genetic diversity was primarily attributed to single-nucleotide polymorphisms (SNPs). However, the sequencing revolution has revealed that copy number variants (CNVs) — deletions or duplications of DNA segments larger than about 1 kilobase — represent a significant fraction of genomic variability between individuals. CNVs collectively encompass more nucleotides than all SNPs combined and can affect thousands of genes.

The study explores how CNVs shape gene expression variation within and across populations. While it was already known that CNVs can cause rare genetic diseases by altering gene dosage, their broader contribution to normal transcriptional variation and complex traits was less clear. This paper sought to systematically link CNVs with changes in RNA expression levels to understand their functional consequences and evolutionary implications.

Data Sources and Analytical Framework

The authors integrated data from genome-wide CNV maps and gene expression profiles derived from lymphoblastoid cell lines (LCLs) collected in diverse human populations (including HapMap individuals). These resources allowed them to identify correlations between CNV genotypes (presence, absence, or copy number) and expression levels of nearby genes.

To ensure accuracy, the authors applied high-resolution array comparative genomic hybridization (aCGH) and SNP-based microarrays to define CNVs across individuals. Expression data were obtained from microarrays and RNA-seq. The resulting datasets enabled expression quantitative trait locus (eQTL) analyses — specifically, assessing whether CNVs act as expression quantitative trait loci (CNV-eQTLs).

Importantly, they also evaluated whether CNVs influence only genes they directly overlap or whether they exert trans effects on distant genes through regulatory or network-level mechanisms.

Distribution and Nature of CNVs in the Human Genome

Genome-wide CNV analyses revealed that about 5–10% of the human genome is structurally variable. CNVs range from small (<10 kb) to very large (>1 Mb) segments and often overlap gene-rich regions, segmental duplications, or regulatory elements such as enhancers. Some CNVs are rare or even private (unique to an individual or family), while others are common polymorphisms segregating in human populations.

Interestingly, CNVs are not uniformly distributed: hotspots often coincide with low-copy repeats (LCRs) that promote non-allelic homologous recombination (NAHR), a key mechanism generating recurrent deletions and duplications. In contrast, other CNVs arise through replication-based errors (FoSTeS, MMBIR) and tend to be nonrecurrent.

The authors distinguish between genic CNVs (those overlapping coding regions) and intergenic CNVs, noting that the former are more likely to have direct effects on transcript abundance.

Local Effects of CNVs on Gene Expression

The first major finding is that CNVs frequently correlate with local changes in expression of the genes they encompass — a clear demonstration of dosage sensitivity. When a gene’s copy number increases, its transcript level often rises proportionally; deletions generally reduce expression. However, the relationship is not always linear: some duplicated genes exhibit compensatory regulation, while others show amplified expression beyond copy number expectations, possibly due to chromatin context or promoter dosage.

Approximately 15–20% of expressed genes in the analyzed cell lines were significantly affected by nearby CNVs. The strength of correlation was highest for CNVs directly overlapping exons, promoters, or untranslated regions, confirming that physical inclusion within a CNV is the strongest predictor of expression impact.

The paper provides examples of specific CNV-expression pairs, such as CNVs affecting GSTT1 (glutathione S-transferase theta 1) and CYP2D6 — both important in xenobiotic metabolism — illustrating how CNV-driven dosage differences can influence pharmacogenetic responses.

Distant (Trans) Effects and Regulatory Network Impact

Beyond local (cis) effects, the authors found evidence that CNVs can influence the expression of distant genes, implying network-level or regulatory impacts. For instance, CNVs affecting transcription factors, chromatin regulators, or noncoding RNAs can cascade through gene networks. Such trans effects are less common but potentially more biologically significant, as they may alter pathways or entire cellular programs.

One example includes a CNV influencing a transcription factor gene whose altered dosage correlates with expression changes in dozens of downstream targets. These findings support the concept that structural variation contributes to gene regulatory network diversity, not merely local gene dosage.

Population and Evolutionary Perspectives

By comparing CNV–expression associations across populations (e.g., European, African, Asian HapMap samples), the authors observed both shared and population-specific CNV-eQTLs. This suggests that CNVs participate in population-level adaptation by modulating gene expression in response to environmental pressures.

Several well-known adaptive CNVs are highlighted. The AMY1 gene, which encodes salivary amylase, varies in copy number among populations with different diets: groups with starch-rich diets tend to have more AMY1 copies and higher salivary amylase expression. This is a prime example of a CNV with clear functional and adaptive consequences.

Another example concerns immune-related gene clusters, where CNV variation influences gene dosage and expression, potentially providing population-level differences in pathogen resistance.

The authors note that most CNVs are under weak or moderate selection, but a subset shows strong signals of positive selection — particularly those linked to immune response, sensory perception, and metabolism.

Functional Categories and Gene Sensitivity

The study identifies that not all genes tolerate copy number changes equally. Dosage-sensitive genes — such as those involved in transcription, development, or signal transduction — are underrepresented within variable regions. This implies purifying selection acts to prevent copy number changes in essential or tightly regulated genes.

In contrast, genes encoding environmental response proteins (immune receptors, metabolic enzymes, odorant receptors) show extensive CNV variability, consistent with their adaptive potential and functional redundancy.

This pattern reflects a fundamental evolutionary principle: the genome maintains stability for critical developmental genes while allowing flexibility in genes that mediate environmental interactions.

Methodological Insights

The authors discuss the challenges of measuring CNV effects on expression:

Many CNVs are complex or multiallelic, complicating genotype–phenotype correlations.
CNV detection sensitivity varies across platforms.
Cell type–specific expression patterns can obscure effects observable only in particular tissues.

To address these issues, the study employed rigorous statistical models that controlled for population structure and technical noise, and validated findings using multiple independent datasets.

Implications for Human Health and Disease

Although the majority of CNVs analyzed were benign or adaptive, the same mechanisms underlie pathogenic CNVs implicated in disease. The study emphasizes the continuum between polymorphism and pathology: the difference often lies in gene content and dosage sensitivity.

CNVs contribute to complex trait variation, including metabolic efficiency, immune responsiveness, and drug metabolism, and are increasingly recognized in the etiology of neurodevelopmental disorders. Understanding their effect on expression provides insight into how structural variation contributes to both normal phenotypic diversity and disease susceptibility.

Conclusions

The paper concludes that copy number variation is a pervasive and functional layer of human genomic diversity. CNVs influence transcription both locally and at a distance, shaping the expression landscape of the human genome. They serve as important evolutionary substrates for adaptation while also representing a major category of genomic risk factors for disease.

Three overarching messages emerge:

CNVs are common and functional, contributing measurably to inter-individual expression variability.
Dosage effects are context-dependent, modulated by gene function, chromatin environment, and regulatory networks.
Structural variation bridges evolution and pathology, embodying the same genomic plasticity that enables both innovation and disease.

By integrating population genetics, transcriptomics, and structural genomics, this study redefines how we understand the relationship between genome architecture and gene regulation — a cornerstone for both evolutionary biology and medical cytogenetics.

Newman S, Hermetz KE, Weckselblatt B, Rudd MK. Next-generation sequencing of duplication CNVs reveals that most are tandem and some create fusion genes at breakpoints.

A seminal study that dissects the molecular architecture of duplication-type copy number variants (CNVs) using next-generation sequencing (NGS) to understand how these rearrangements form and how they affect genes and genome function

Introduction: The Enigma of CNV Structure

Copy number variants (CNVs) — segments of DNA that are deleted or duplicated relative to a reference genome — are a major component of human genetic variation. While deletions are relatively easy to interpret because they remove genomic material, duplications are far more complex. Their architecture — whether the duplicated segment is inserted next to its original location, inverted, or relocated elsewhere — is crucial to understanding both how these CNVs form and how they impact gene function.

Prior to this study, the structural details of human duplication CNVs were largely inferred from array-based or low-resolution methods. This paper used high-throughput sequencing to directly map duplication breakpoints at nucleotide-level resolution. The main goals were to (1) determine the structural organization of duplication CNVs, (2) infer their mutational mechanisms, and (3) explore their functional and evolutionary consequences — including the formation of novel fusion genes.

Study Design and Methodology

The researchers selected a set of duplication CNVs identified from large-scale genomic datasets, including individuals from the HapMap project and other cohorts. These duplications ranged from a few kilobases to several megabases in length. Using targeted next-generation sequencing and long-range PCR, the team mapped duplication junctions — the breakpoints where the duplicated DNA joins to the genome — with single-base precision.

They also compared the CNV structures with local genomic architecture, focusing on segmental duplications (low-copy repeats) and repetitive elements such as Alu and LINE sequences, which are known to mediate recombination. Sequence analyses allowed them to classify duplication types (tandem, inverted, or displaced) and to identify features like microhomology or insertions at breakpoints, which serve as signatures of specific mutational mechanisms (e.g., non-allelic homologous recombination vs. replication-based events).

Major Finding 1: Most Duplication CNVs Are Tandem

The study’s most striking conclusion is that the vast majority of human duplication CNVs are tandem and in direct orientation, meaning that the duplicated DNA is inserted immediately adjacent to the original segment in the same 5′–3′ direction.

This finding resolves a long-standing question: previous cytogenetic models had proposed that duplications might frequently be inserted elsewhere in the genome or in inverted orientation. Instead, sequencing showed that tandem arrangement is the predominant configuration — reflecting a relatively simple rearrangement process.

This has important implications for genome stability and mutation mechanisms: tandem duplications suggest local DNA misalignment or template-switching rather than long-distance transposition. Furthermore, tandem orientation facilitates unequal crossing-over events in future generations, making these regions hotspots for further copy number expansion and genomic instability.

Major Finding 2: A Minority of Duplications Are Complex or Inverted

While most duplications were simple and tandem, a subset displayed complex architectures. Some involved inverted orientations or insertions of additional small sequences at the junctions. A few duplications were displaced, inserted several kilobases away from the source locus, occasionally on the opposite strand. These complex rearrangements often exhibited microhomology (2–20 base pairs) at the junctions, implicating replication-based mechanisms such as FoSTeS (Fork Stalling and Template Switching) or MMBIR (Microhomology-Mediated Break-Induced Replication).

In several cases, the junctions showed small insertions or deletions not derived from either parental sequence — hallmarks of imperfect DNA repair. Together, these data reveal that while NAHR (non-allelic homologous recombination) explains some tandem duplications, replication errors play a major role in creating the more complex and unique duplication patterns.

Major Finding 3: Breakpoints Are Enriched in Repetitive and Homologous Sequences

Mapping of the breakpoint regions showed that many duplication CNVs occur within or near low-copy repeats (LCRs) or repetitive elements such as Alu and LINE-1. These sequences provide homologous substrates for mispairing or template switching. When misalignment occurs during meiosis or DNA replication, unequal recombination can lead to the duplication of intervening segments.

Interestingly, duplications that formed in areas without extensive homology displayed breakpoint microhomology — short matching sequences — suggesting that multiple mechanisms can act depending on local genomic architecture.

NAHR dominates in regions flanked by long, highly similar repeats.
FoSTeS/MMBIR predominates where repeats are short or imperfect.
This dual mechanism model explains both recurrent (identical in multiple individuals) and nonrecurrent (unique) duplication CNVs in the human genome.

Major Finding 4: Some Duplications Create Fusion Genes at Breakpoints

A particularly novel finding was that some duplication breakpoints fall within genes, creating chimeric or fusion transcripts. In such cases, one duplicated gene segment fuses with a neighboring gene, potentially producing a new mRNA that encodes a hybrid protein.

These fusion genes represent a potential source of evolutionary innovation, echoing the gene fusions seen in cancer and developmental disorders. The study identified several examples where exons from adjacent genes were juxtaposed, forming open reading frames predicted to be transcribed and translated. Some of these fusions were supported by RNA expression data, indicating functional activity.

The formation of fusion genes demonstrates that structural variation does not simply change dosage — it can create entirely new coding sequences. While most fusion events are likely deleterious or neutral, a few could provide novel functions subject to natural selection, offering a possible mechanism for the emergence of new genes in evolution.

Major Finding 5: Mechanistic Insights from Breakpoint Signatures

Detailed sequence analysis of breakpoints revealed mechanistic “footprints” characteristic of different mutational processes:

Homology of >100 bp between breakpoint regions → strong evidence for NAHR.
Short microhomology (2–15 bp) → indicative of replication-based mechanisms (FoSTeS/MMBIR).
Small insertions or deletions → consistent with non-homologous end joining (NHEJ).

These mechanistic patterns correlate with CNV recurrence: recurrent duplications generally result from NAHR between pre-existing repeats, while nonrecurrent, complex duplications stem from replication stress and template switching.

Thus, the study unifies distinct classes of duplication events under a mechanistic continuum, showing how genome architecture and replication dynamics jointly shape CNV formation.

Major Finding 6: Functional and Evolutionary Implications

From a functional standpoint, tandem duplications can increase gene dosage, influencing transcript abundance and potentially altering cellular pathways. Over evolutionary timescales, such duplications supply raw material for gene diversification.

Because tandem duplications place gene copies side by side, they facilitate unequal crossing-over in meiosis, promoting further expansion — a mechanism thought to underlie the birth of many multigene families (e.g., olfactory receptors, immune genes).

However, the same properties make these regions unstable and disease-prone. Tandem duplications overlapping dosage-sensitive genes or disrupting regulatory domains can cause developmental or neurogenetic disorders. The paper therefore underscores the continuum between adaptive evolution and pathogenic rearrangement, depending on which genes or regulatory elements are affected.

Conclusion: A New View of Duplication CNVs

This study fundamentally redefined how scientists view the structure and origin of duplication CNVs. The key conclusions are:

Most duplication CNVs are tandem and direct, forming simple adjacent repeats rather than distant insertions.
Multiple mutational mechanisms contribute to their formation — recombination between repeats for recurrent events, and replication-based errors for unique ones.
Breakpoints often occur within repetitive DNA, emphasizing the central role of genome architecture in structural variation.
Some duplications create novel fusion genes, providing a source of evolutionary novelty and, occasionally, pathogenic rearrangement.
Duplication CNVs link molecular mechanism, genome instability, and functional consequence, illustrating the interplay between DNA repair, replication, and evolution.

By combining next-generation sequencing with cytogenetic interpretation, this paper provided one of the first base-pair–level maps of human duplication CNVs, revealing the genome’s remarkable capacity for rearrangement. It bridged the gap between molecular mechanism and biological outcome, showing that the forces generating structural variants are not purely destructive — they are also creative forces that shape genomic architecture, gene regulation, and human diversity.

Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene–enhancer interactions.

This landmark study revealed how alterations of 3D genome organization, rather than changes in DNA sequence itself, can lead to human developmental disorders. It introduced a mechanistic framework linking structural variants (SVs) to chromatin topology and enhancer misregulation, profoundly reshaping how cytogeneticists interpret genomic rearrangements.

Introduction: The 3D Genome as a Regulatory Framework

Classical genetics explained disease through gene mutations or copy-number imbalance. However, many structural variants — including inversions, duplications, or balanced translocations — occur without disrupting coding regions, yet cause severe developmental phenotypes. The missing explanatory layer lies in the three-dimensional folding of chromosomes.

Mammalian genomes are partitioned into topologically associating domains (TADs) — megabase-scale chromatin regions within which genes and their regulatory elements (enhancers) interact frequently, while insulation boundaries restrict cross-talk between neighboring domains. These TADs are largely conserved across cell types and species, suggesting a stable regulatory architecture.

Lupiáñez and colleagues hypothesized that disruptions of TAD integrity caused by structural variants could rewire enhancer–promoter communication, thereby activating genes inappropriately and causing congenital malformations. To test this, they combined human genetic studies, mouse CRISPR engineering, and chromatin conformation analyses.

Background: Structural Variants and Limb Malformations

The authors focused on congenital limb malformations, a class of disorders known to be associated with chromosomal rearrangements yet often lacking identifiable coding mutations. Several of these disorders map to chromosome 2q35–36, encompassing the WNT6/IHH/EPHA4/PAX3 gene cluster. These loci contain numerous developmental genes and extensive noncoding regulatory landscapes.

Previous cytogenetic mapping had identified deletions, duplications, and inversions in this region among patients with distinct limb abnormalities (brachydactyly, syndactyly, or polydactyly). However, the causal mechanism remained unclear since most variants spared coding exons. The researchers proposed that these rearrangements alter the 3D regulatory topology around key developmental genes, particularly EPHA4.

Methods: From Genomic Mapping to Chromatin Interaction Analysis

The study combined array comparative genomic hybridization (CGH) and whole-genome sequencing to define the rearrangements in affected families. To explore the regulatory architecture, the authors used Hi-C and 4C-seq to map chromatin contacts within the region in both human and mouse tissues.

They then used CRISPR/Cas9 genome editing to recreate the human-like rearrangements in mouse embryonic stem cells and generated mutant mice carrying the engineered structural variants. This allowed direct testing of whether rearrangement-induced TAD disruption could reproduce the observed limb malformations.

Organization of the EPHA4 Locus

Hi-C data revealed that the extended WNT6/IHH/EPHA4/PAX3 locus is organized into three adjacent TADs. The largest TAD contains EPHA4, flanked by smaller domains containing WNT6/IHH on one side and PAX3 on the other. Each TAD acts as an independent regulatory unit insulated by CTCF-binding boundary regions.

During normal limb development, Epha4 is expressed in limb tissue, whereas Pax3 and Ihh have distinct spatial expression domains. Enhancers within the Epha4 TAD drive its proper limb expression. The boundaries between these domains ensure that enhancers do not activate neighboring genes.

Structural Variants Identified in Patients

The study examined several distinct structural variants found in unrelated patients or families:

Heterozygous deletions (~1.8 Mb) removing the boundary between the EPHA4 and PAX3 TADs — observed in families with brachydactyly (shortened digits).
Inversions (~1.1 Mb) and duplications (~1.4 Mb) affecting the same interval — found in families with F-syndrome, characterized by syndactyly and polydactyly.
A large duplication (~900 kb) overlapping the IHH and EPHA4 domains — associated with polysyndactyly and craniofacial abnormalities.

All variants disrupted at least one predicted TAD boundary, suggesting a unifying mechanism: rearrangements that merge adjacent TADs or shift boundaries lead to abnormal enhancer–gene pairing.

Mouse CRISPR Models Recapitulate Human Phenotypes

To experimentally validate this hypothesis, the authors engineered mice carrying deletions, duplications, or inversions analogous to the human alleles:

The DelB mouse line, modeling the human brachydactyly deletion, showed shortened digits and partial syndactyly, precisely mimicking the human phenotype.
A 1 Mb inversion corresponding to the F-syndrome rearrangement was introduced but produced no overt limb defects, implying that not every boundary shift is sufficient for misexpression.
The doublefoot (Dbf) mouse mutant, a pre-existing line with a large deletion affecting the Ihh–Epha4 interval, displayed massive polydactyly, paralleling human duplication cases.

These animal models provided direct functional evidence that structural variants altering TAD organization can cause predictable limb patterning defects.

Chromatin Interaction Mapping and Enhancer Miswiring

Using 4C-seq with promoter viewpoints from Epha4, Wnt6, Ihh, and Pax3, the authors analyzed chromatin contacts in embryonic day 11.5 (E11.5) mouse limb buds. Each promoter exhibited interactions confined within its TAD, consistent with Hi-C maps from other tissues. However, in rearranged alleles where TAD boundaries were disrupted, enhancers normally associated with Epha4 began contacting and activating neighboring genes.

For example, in the DelB deletion, limb enhancers that should target Epha4 now interacted with Pax3, located in the adjacent TAD. As a result, Pax3 became ectopically expressed in limb tissue, producing brachydactyly. Similarly, rearrangements that juxtaposed Ihh with the Epha4 enhancer cluster drove abnormal Ihh activation, leading to polydactyly.

Crucially, these enhancer rewiring events occurred only when the CTCF-associated boundary was removed. Variants that left boundaries intact did not cause misexpression, highlighting the causal importance of domain insulation in gene regulation.

Key Mechanistic Insights

TADs act as functional regulatory units that constrain enhancer–promoter communication.
Structural variants can merge, split, or reposition TADs, thereby changing which enhancers are physically proximal to a gene.
Pathogenic gene activation occurs through enhancer hijacking — enhancers drive expression of the wrong gene when boundaries are lost.
CTCF and cohesin binding sites demarcate the boundaries whose disruption triggers regulatory rewiring.
The 3D genome organization is conserved between mouse and human, enabling cross-species modeling of structural variant consequences.

Broader Implications for Cytogenetics and Disease

This study marked a paradigm shift in medical genetics by demonstrating that noncoding structural variants can cause disease through topological mechanisms, even when gene sequences remain intact. It provided a molecular rationale for previously unexplained chromosomal rearrangement syndromes and emphasized that genome folding is as critical as sequence for proper gene regulation.

The implications extend beyond limb malformations. Many developmental disorders, cancers, and congenital syndromes involve inversions, duplications, or translocations that may disrupt chromatin domain boundaries. This study offered a framework for predicting pathogenicity: if a rearrangement breaks a TAD boundary and places strong enhancers near dosage-sensitive genes, it is likely deleterious.

Moreover, the conservation of TAD structure across tissues implies that these regulatory domains form a stable scaffold for gene expression, yet rearrangements can have tissue-specific outcomes depending on where affected enhancers are active.

Technological and Methodological Contributions

Lupiáñez et al. combined cytogenetic mapping, next-generation sequencing, and chromatin conformation capture to dissect the spatial consequences of structural variants. Their use of CRISPR/Cas9 to engineer large genomic rearrangements in mice represented a methodological milestone, allowing direct causal testing of topological hypotheses.

The integration of genomic and epigenomic data established a blueprint for 3D cytogenomics, bridging the gap between linear DNA alterations detected by arrays or sequencing and the spatial organization revealed by Hi-C. This approach now underpins clinical assessments of structural variant pathogenicity.

Conclusions

The central discovery of this study is that genome topology — not just sequence — is essential for normal development.
When structural variants disrupt the architectural boundaries of TADs, they can reconnect enhancers with inappropriate target genes, leading to pathogenic misexpression and congenital malformations.

Key take-home messages include:

TADs define regulatory neighborhoods that maintain specificity of enhancer–gene interactions.
CTCF-associated boundaries act as genetic insulators protecting genes from ectopic enhancer influence.
Disruption of these boundaries by deletions, duplications, or inversions leads to enhancer hijacking and abnormal developmental gene expression.
3D genome architecture is highly conserved, allowing predictive modeling across species.
Integrative 3D cytogenomics provides a new diagnostic lens for interpreting structural variants that appear “balanced” at the sequence level.

In summary, Lupiáñez et al. revealed that structural variation affects not only gene dosage but also the spatial logic of gene regulation. Their work established a conceptual and experimental foundation for understanding how the folding of the genome into topological domains orchestrates development — and how its disruption underlies human disease.

Carvalho CMB, Lupski JR. Mechanisms underlying structural variant formation in genomic disorders.

This authoritative review synthesizes two decades of cytogenetic and genomic research to explain how structural variants (SVs)—including deletions, duplications, inversions, and complex rearrangements—arise in the human genome. It connects molecular mechanisms of DNA repair and replication with the architecture of the genome and the clinical manifestations of genomic disorders.

Introduction: Structural Variation as a Central Feature of the Human Genome

Structural variants (SVs) are genomic alterations typically larger than 50 base pairs, encompassing copy number variants (CNVs), inversions, translocations, and complex rearrangements. They are a major source of genetic diversity and disease. While small mutations change individual nucleotides, SVs reshape entire genomic segments—sometimes altering megabases of DNA.

Carvalho and Lupski argue that understanding SVs requires a mechanistic perspective: how does the physical structure of DNA and its repair machinery lead to rearrangements? Their review organizes the various mutational mechanisms into a unified framework that links genome architecture, replication dynamics, and DNA repair pathways to both recurrent and nonrecurrent SVs.

The authors emphasize that these mechanisms are not random accidents but consequences of the genome’s inherent design—rich in repeated sequences, fragile sites, and structural motifs that predispose to rearrangement.

Genome Architecture and Recombination Substrates

The human genome contains abundant repetitive elements, including Alu elements, LINE-1 sequences, and segmental duplications (also known as low-copy repeats, or LCRs). These duplications, typically 10–500 kilobases in size and sharing 90–99% sequence identity, provide substrates for ectopic recombination—recombination between similar sequences located in non-allelic positions.

The existence of these homologous regions underpins the most common mechanism of recurrent structural variation: non-allelic homologous recombination (NAHR).
When homologous chromosomes misalign during meiosis because of repeated sequences, recombination between mispaired LCRs can delete, duplicate, or invert the intervening genomic segment.

Genomic disorders such as Charcot–Marie–Tooth disease type 1A (CMT1A) and hereditary neuropathy with liability to pressure palsies (HNPP) exemplify this mechanism. Both arise from NAHR between the same flanking repeats on chromosome 17p12 but result in reciprocal duplication (CMT1A) or deletion (HNPP). This “recurrent reciprocal rearrangement” pattern is a hallmark of NAHR-mediated SVs.

Mechanistic Categories of Structural Variant Formation

Carvalho and Lupski categorize SV mechanisms into three broad classes:

Recombination-based mechanisms, such as NAHR and single-strand annealing (SSA).
Replication-based mechanisms, including FoSTeS (Fork Stalling and Template Switching) and MMBIR (Microhomology-Mediated Break-Induced Replication).
Repair-based mechanisms, primarily non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ).

Each mechanism leaves distinct molecular signatures at the rearrangement breakpoints—patterns of homology or microhomology that serve as “mutational fingerprints.”

Non-Allelic Homologous Recombination (NAHR)

NAHR is mediated by long stretches of high sequence identity, typically greater than 300 base pairs, often found in segmental duplications. When homologous chromosomes or sister chromatids misalign, crossing over between these repeats produces recurrent rearrangements of predictable size and orientation.

Because NAHR relies on the same repeat pairs in different individuals, breakpoints are almost identical across patients. This explains the recurrent nature of syndromes such as:

Williams–Beuren syndrome (7q11.23 deletion)
Smith–Magenis and Potocki–Lupski syndromes (17p11.2)
CMT1A/HNPP (17p12)

NAHR can occur in meiosis or mitosis, and it generates reciprocal products: a deletion in one chromatid and a duplication in the other. The mechanism depends heavily on genomic architecture—without homologous repeats, NAHR cannot occur.

Non-Homologous End Joining (NHEJ)

NHEJ repairs DNA double-strand breaks (DSBs) without requiring extensive homology. It simply ligates DNA ends after limited end processing. This mechanism can create unique, nonrecurrent rearrangements characterized by blunt or microhomologous joins (1–5 bp) and occasional small insertions.

NHEJ operates throughout the cell cycle, especially in G1, and contributes to structural variation in somatic cells, including in cancer. In the germ line, it can produce deletions, duplications, and translocations at fragile sites where replication stress leads to DSBs. Because it lacks sequence constraints, NHEJ-mediated SVs are highly heterogeneous in size and position.

Replication-Based Mechanisms (FoSTeS and MMBIR)

One of the review’s most influential contributions is its description of replication-based mechanisms, first proposed by Lupski’s group to explain complex, nonrecurrent CNVs.

When replication forks stall—due to secondary structures, DNA lesions, or transcription conflicts—the lagging strand can disengage and anneal to a new template elsewhere in the genome using microhomology (2–15 bp). Replication then resumes from this ectopic template, creating duplications, triplications, or complex rearrangements. This is the essence of Fork Stalling and Template Switching (FoSTeS).

A related mechanism, Microhomology-Mediated Break-Induced Replication (MMBIR), occurs when a collapsed replication fork invades another DNA molecule using short homology tracts, similarly generating complex rearrangements.

Replication-based mechanisms explain SVs that are nonrecurrent, complex, and contain multiple templated segments, such as triplication-within-duplication structures or inverted insertions. Breakpoint sequencing in patients with genomic disorders often reveals these signatures.

Complex and Chromothriptic Rearrangements

The authors discuss chromothripsis—a catastrophic event in which a chromosome shatters into dozens of fragments that are then stitched back together in random order. Although originally observed in cancer, chromothripsis-like events have been found in congenital disorders.

Such complex rearrangements likely involve multiple DSBs and replication stress, engaging combinations of NHEJ and MMBIR mechanisms. These phenomena blur the distinction between “simple” SVs and massive genome restructuring, highlighting the continuum of mutational complexity.

Determinants of Structural Variant Hotspots

Genome architecture strongly influences where SVs form:

Segmental duplications promote NAHR.
AT-rich regions and replication origins are prone to fork stalling.
Palindromic or repetitive motifs can form secondary structures (e.g., cruciforms, hairpins) that trigger DSBs.
Replication timing and chromatin state affect accessibility and susceptibility to breakage.

Furthermore, some regions are “reused” in multiple independent rearrangements across different syndromes, emphasizing that genomic context, not just random error, defines mutational landscapes.

Recurrent vs. Nonrecurrent Rearrangements

The review distinguishes:

Recurrent SVs: identical size and breakpoints in unrelated individuals (mediated by NAHR).
Nonrecurrent SVs: variable size and breakpoint positions (arising from NHEJ, FoSTeS, or MMBIR).

This distinction is clinically relevant. Recurrent events cause well-defined genomic syndromes with consistent phenotypes, whereas nonrecurrent events often underlie sporadic or unique cases.

Clinical and Evolutionary Implications

From a medical standpoint, these mechanisms explain both genomic disorders (due to recurrent rearrangements) and individual pathogenic CNVs (due to replication-based events). Understanding mechanism can guide diagnostic interpretation: for example, if breakpoints occur within LCRs, NAHR is likely; if unique and complex, replication errors are suspected.

From an evolutionary perspective, the same mechanisms that cause disease also generate beneficial variation. NAHR-driven duplications can create raw material for new gene functions, while replication-based processes contribute to gene family expansion. Thus, genome instability is both a source of disease and of innovation.

Molecular Signatures for Mechanism Inference

Carvalho and Lupski outline diagnostic clues visible at breakpoint junctions:

>100 bp homology → NAHR
2–15 bp microhomology → FoSTeS/MMBIR
blunt or minimal overlap → NHEJ
insertions of non-templated bases → end joining errors

Sequencing of CNV junctions thus reveals the underlying mutational pathway, offering mechanistic insights in both research and clinical diagnostics.

Unifying Model: Replication–Recombination–Repair (RRR) Interplay

The review concludes by proposing an integrative Replication–Recombination–Repair (RRR) model, recognizing that these processes operate concurrently and sometimes sequentially. DNA replication stress can initiate DSBs, recombination resolves them, and repair pathways finalize the rearrangement. The balance among these processes determines whether genome maintenance succeeds or mutational catastrophe ensues.

Conclusions

Carvalho and Lupski provide a comprehensive mechanistic framework for understanding how structural variants form and why certain genomic regions are unstable. Their key messages are:

Genome architecture predisposes to rearrangement by providing homologous or repetitive substrates.
Multiple molecular mechanisms—NAHR, NHEJ, and replication-based pathways—underlie SV formation.
Breakpoint signatures enable inference of mutational origin.
Recurrent and nonrecurrent SVs represent two ends of a mechanistic continuum.
The same processes drive both pathology and evolution, reflecting the dual nature of genome instability.

This review transformed cytogenetic thinking: structural variation is no longer viewed as a random anomaly but as an intrinsic feature of a dynamic genome shaped by its own repair and replication machinery. It laid the conceptual groundwork for modern mechanistic cytogenomics, where the origin of a rearrangement is as important as its outcome.

Allelic variation
Differences in DNA sequence between alleles of the same gene, ranging from single nucleotide changes to larger structural rearrangements.

Array Comparative Genomic Hybridization (aCGH)
A microarray-based technique that detects copy number gains and losses across the genome by comparing hybridization intensity between a test and a reference DNA sample.

Adaptive CNV
A copy number variant that confers a selective advantage in specific environmental or physiological contexts (e.g., AMY1 duplication in high-starch diets).

Breakpoint
The exact genomic position where a structural variant begins or ends; analyzing breakpoints helps determine the mutational mechanism (e.g., NAHR, FoSTeS, NHEJ).

Balanced rearrangement
A structural change (e.g., inversion, translocation) that alters chromosome structure without net gain or loss of DNA; may still cause disease through positional or regulatory effects.

Chromothripsis
A catastrophic genomic event involving chromosome shattering and random reassembly, resulting in complex rearrangements within a single cell cycle.

Chromatin topology
The three-dimensional spatial organization of chromatin within the nucleus, influencing which genes interact with which regulatory elements.

CNV (Copy Number Variant)
A DNA segment (typically >1 kb) that varies in copy number between individuals due to duplication or deletion events; a major source of genetic diversity.

Cohesin
A protein complex that maintains sister chromatid cohesion and helps shape chromatin loops that define TAD boundaries.

Complex structural variant
A rearrangement involving multiple breakpoints and combinations of deletions, duplications, inversions, and insertions—often generated by replication-based mechanisms.

CTCF (CCCTC-binding factor)
An architectural protein that binds specific DNA motifs and demarcates TAD boundaries, insulating gene–enhancer domains.

Deletion
Loss of a DNA segment from the genome, resulting in copy number reduction and potential haploinsufficiency of affected genes.

Duplication
Repetition of a DNA segment, which can be tandem (adjacent) or dispersed elsewhere in the genome; may increase gene dosage or create new genes.

Dosage sensitivity
The phenomenon in which changes in gene copy number alter phenotype or viability, due to strict requirements for balanced gene expression.

Double-strand break (DSB)
A severe DNA lesion involving cleavage of both strands of the double helix; central to SV formation when misrepaired.

Enhancer
A regulatory DNA element that increases transcription of target genes, often acting over long genomic distances via 3D chromatin looping.

Enhancer hijacking
A pathogenic mechanism where structural variants reposition enhancers, causing them to inappropriately activate genes in neighboring TADs.

Evolutionary innovation
The acquisition of novel gene functions or regulatory patterns through processes like duplication, fusion, or rearrangement.

Expression quantitative trait locus (eQTL)
A genomic variant that statistically correlates with changes in gene expression level; CNVs often act as strong eQTLs.

Fusion gene
A chimeric gene created when a structural variant joins exons from different genes, producing a hybrid transcript that may have new functions.

FoSTeS (Fork Stalling and Template Switching)
A replication-based mechanism where a stalled DNA polymerase switches templates, generating complex duplications or insertions.

Genomic disorder
A disease caused by pathogenic structural variation that disrupts gene dosage, structure, or regulatory topology (e.g., 22q11.2 deletion syndrome).

Genome architecture
The large-scale structural organization of DNA, including repeats, segmental duplications, and 3D chromatin folding—all factors influencing SV formation.

Gene dosage
The number of functional copies of a gene; deviations from the normal dosage can lead to altered expression and phenotype.

Haploinsufficiency
A condition in which a single functional copy of a gene is insufficient to maintain normal function, often due to deletions.

Hi-C / 4C-seq
Chromosome conformation capture techniques used to map physical DNA–DNA interactions and visualize 3D chromatin structure.

Homology
Sequence similarity between DNA regions; essential for recombination-based mechanisms such as NAHR.

Insertion
Addition of a DNA fragment into a new genomic location; can be small (few bases) or large (megabase scale).

Inversion
Reversal of the orientation of a DNA segment within the chromosome; can disrupt genes or regulatory domains if breakpoints lie within functional regions.

Low-copy repeat (LCR) / Segmental duplication
Large (10–500 kb), highly homologous genomic regions that predispose to NAHR-mediated rearrangements.

Locus architecture
The physical and regulatory arrangement of genes, enhancers, and boundaries within a genomic region.

Microhomology
Short stretches (2–20 bp) of identical sequence at rearrangement junctions, characteristic of replication-based or repair-based mechanisms (FoSTeS/MMBIR/NHEJ).

MMBIR (Microhomology-Mediated Break-Induced Replication)
A replication restart process using short homologies, often generating complex or templated rearrangements.

Mirror phenotypes
Opposing clinical traits resulting from reciprocal CNVs (e.g., deletion vs. duplication at the same locus producing opposite size or metabolic outcomes).

NAHR (Non-Allelic Homologous Recombination)
A recombination process between similar but non-allelic sequences, leading to recurrent deletions or duplications with nearly identical breakpoints.

NHEJ (Non-Homologous End Joining)
A DNA repair mechanism that directly ligates broken ends without extensive sequence homology, producing unique, nonrecurrent rearrangements.

Polymorphism
A genetic variation present at a frequency greater than 1% in the population; CNVs can be polymorphic or pathogenic depending on context.

Population stratification
Differences in allele frequencies across populations due to ancestry, influencing the distribution of adaptive CNVs.

Replication stress
A cellular condition where DNA replication slows or stalls, often triggering template switching or chromosomal breakage leading to SV formation.

Recurrent rearrangement
An SV of identical size and breakpoint sequence found independently in multiple individuals, typically mediated by NAHR between the same LCRs.

Segmental duplication
A large duplicated genomic block with high sequence identity, often acting as a substrate for NAHR and as a reservoir for gene family expansion.

Structural variant (SV)
Any genomic rearrangement altering the structure, orientation, or copy number of DNA segments, typically larger than 50 bp.

Structural variant hotspot
A genomic region repeatedly affected by rearrangements due to local sequence architecture, such as clusters of segmental duplications.

TAD (Topologically Associating Domain)
A self-interacting chromatin domain where genes and enhancers interact frequently; boundaries prevent cross-domain activation.

TAD boundary
A genomic region enriched in CTCF and cohesin that insulates neighboring domains; its disruption can cause enhancer miswiring and disease.

Tandem duplication
A duplication placed adjacent to the original locus in the same orientation, the most common structural configuration in human CNVs.

Template switching
A process during replication in which a DNA polymerase jumps between templates, producing rearranged or duplicated sequences.

Variation–Pathology continuum
The concept that genomic instability drives both beneficial evolutionary change and deleterious disease-causing mutations.

Whole-Genome Sequencing (WGS)
Comprehensive sequencing approach that detects single-nucleotide variants and structural variants genome-wide, often with base-pair resolution.

Evolution, Variation, and Pathology: The Dual Role of Structural Variants in Human Genomes

Learning Objectives

Describe the types and mechanisms of structural variants (SVs) and their molecular signatures.

Explain how CNVs and other SVs influence gene dosage, expression, and chromatin topology.

Distinguish between evolutionary and pathogenic outcomes of structural variation.

Interpret primary figures from modern cytogenomics papers (sequencing, expression, Hi-C, etc.).

Recognize the integrative nature of current cytogenetic methodologies.

Duration: 1 hour 30 minutes

0:00–0:10 — Introduction and Context

Concepts: What are structural variants? Why do they matter? How do they link evolution and disease?

Figures & Illustrations:

Carvalho & Lupski (2016), Fig. 1 – Classification of structural variants.
→ Use as the opening visual: deletions, duplications, inversions, insertions, and translocations illustrated schematically.
🔹 Pedagogical aim: define SVs visually and emphasize their genomic scale compared to SNPs.
Gamazon & Stranger (2015), Fig. 1/2 – CNVs and gene expression variation.
→ Display after the first schematic.
🔹 Purpose: introduce the idea that CNVs have functional consequences — gene dosage alters transcription.

Teaching sequence:

Start with the visual taxonomy of SVs.
Transition to how these structural changes affect gene expression and phenotypes.
State the central paradox: the same rearrangements that foster adaptation can cause disease.

0:10–0:25 — Mechanisms of Structural Variant Formation

Concepts: Molecular mechanisms that generate SVs — recombination, replication, repair.

Figures & Illustrations:

Carvalho & Lupski (2016), Fig. 2 – Mechanisms of SV formation (NAHR, NHEJ, FoSTeS/MMBIR).
→ Step through each mechanism, showing the DNA-level process and typical breakpoint signature.
🔹 Pedagogical aim: illustrate that different molecular pathways create distinct SV patterns.
Carvalho & Lupski (2016), Fig. 4 – Replication–Recombination–Repair (RRR) continuum model.
→ Present at the end of this section as a synthesis slide.
🔹 Purpose: convey that these mechanisms overlap dynamically, not as isolated events.

Teaching method:
Use colored overlays on Fig. 2 to highlight breakpoint microhomology, inserted bases, or homologous repeats.
Invite students to infer the underlying mechanism from example breakpoints.

0:25–0:45 — Structural Variants as Engines of Evolution

Concepts: How CNVs and duplications drive evolutionary innovation, gene family expansion, and adaptation.

Figures & Illustrations:

Gamazon & Stranger (2015), Fig. 3 – Genome-wide correlation between CNV and expression.
→ Demonstrates that CNVs act as eQTLs (expression modifiers).
🔹 Purpose: quantitative link between structure and transcription.
Gamazon & Stranger (2015), Fig. 4 – Specific CNV–gene expression examples (e.g., GSTT1, CYP2D6).
→ Used as concrete cases showing dosage sensitivity.
Newman et al. (2015), Fig. 2 – Structural classification of duplication CNVs (tandem vs. complex).
→ Transition from functional to structural dimension of CNVs.
🔹 Purpose: explain why tandem duplications dominate and how they fuel evolution.
Newman et al. (2015), Fig. 5 – Examples of duplication-mediated gene fusions.
→ Conclude this segment by showing how duplication can generate de novo

Teaching sequence:
Expression-level variation → duplication structure → gene fusion innovation → population adaptation.

0:45–1:05 — Structural Variants in Pathology

Concepts: Mechanisms by which SVs disrupt gene function or regulation, focusing on 3D genome topology.

Figures & Illustrations:

Lupiáñez et al. (2015), Fig. 1 – Genomic organization of the WNT6/IHH/EPHA4/PAX3 region and patient rearrangements.
→ Opening image: real-world cases of deletions, duplications, inversions.
🔹 Purpose: contextualize SVs clinically.
Lupiáñez et al. (2015), Fig. 2 – Hi-C or 4C-seq chromatin contact map defining TADs around EPHA4.
→ Explain concept of TADs and their boundaries.
🔹 Purpose: introduce 3D genome architecture.
Lupiáñez et al. (2015), Fig. 3 – CRISPR-engineered mouse rearrangements and resulting limb malformations.
→ Show functional validation of TAD disruption.
Lupiáñez et al. (2015), Fig. 4 – 4C-seq enhancer rewiring (EPHA4–PAX3 interaction).
→ Used to illustrate enhancer hijacking leading to ectopic expression.

Teaching sequence:
Start from cytogenetic overview → spatial organization → functional misexpression → phenotype.

1:05–1:20 — The Evolution–Pathology Continuum

Concepts: Connecting the dual roles of structural variants — mechanisms that create both innovation and disease.

Figures & Illustrations:

Carvalho & Lupski (2016), Fig. 5 – Comparison of recurrent (NAHR) and nonrecurrent (replication-based) rearrangements.
→ Demonstrates how the same mechanistic classes underpin both adaptive and pathogenic CNVs.
Gamazon & Stranger (2015), Summary schematic (final figure) – Continuum from CNV formation → expression variation → phenotype.
→ Visualizes the bridge from molecular mechanism to population phenotype.
Newman et al. (2015), summary table or example panel of fusion genes – Some neutral/beneficial, others deleterious.
→ Perfect illustration of genomic trade-offs.

Teaching sequence:
Use these visuals to guide a discussion contrasting evolutionary benefits (gene duplication) and clinical costs (dosage imbalance).
End this segment with an interactive question: “At what point does variation become disease?”

1:20–1:25 — Integrative Cytogenomics

Concepts: Modern methods uniting molecular and spatial analysis to interpret SVs.

Figures & Illustrations:

Carvalho & Lupski (2016), schematic (Box figure) – Integration of detection methods (FISH, aCGH, WGS, Hi-C).
→ Illustrates evolution from classical to genomic cytogenetics.
Lupiáñez et al. (2015), Hi-C contact map re-used – Demonstrate how topological data complement sequencing.

Teaching sequence:
Display both side-by-side.
Explain how today’s cytogenetics integrates physical (microscopy), molecular (sequencing), and spatial (chromatin conformation) data.

Slide 1 to Slide 5

Slide 1 — Evolution, Variation, and Pathology: The Dual Role of Structural Variants in Human Genomes

Structural variants (SVs) reshape large genomic regions, driving both adaptation and disease.
Understanding their mechanisms bridges cytogenetics, genomics, and evolution.
This lecture explores how SVs form, function, and disrupt.

Slide 2 — Structural Variants: A Broader View of Genetic Diversity

SVs are rearrangements > 50 bp: deletions, duplications, inversions, insertions, and translocations.
They affect more total DNA than all single nucleotide variants (SNPs) combined.
SVs can alter copy number, gene context, or 3D chromatin organization.
(Insert: Carvalho & Lupski Fig. 1 – classification of SVs.)

Slide 3 — Why Structural Variants Matter

SVs underlie both normal phenotypic diversity and many genomic disorders.
CNVs (copy number variants) are a major subset of SVs affecting expression levels.
SVs are the genome’s instruments of flexibility—and its sources of fragility.
(Insert: Gamazon & Stranger Fig. 1/2 – CNVs influence gene expression.)

Slide 4 — Mechanisms Generating Structural Variants

Structural changes arise through recombination, replication errors, or faulty repair.
Each mechanism leaves distinct breakpoint signatures detectable by sequencing.
Understanding mechanisms helps predict recurrence and clinical risk.
(Insert: Carvalho & Lupski Fig. 2 – NAHR, NHEJ, FoSTeS/MMBIR.)

Slide 5 — Non-Allelic Homologous Recombination (NAHR)

Occurs between misaligned low-copy repeats during meiosis or mitosis.
Produces recurrent deletions, duplications, or inversions of predictable size.
Explains classical genomic syndromes like CMT1A and Williams–Beuren.
Requires long (> 300 bp) stretches of sequence homology for crossover.

Slide 6 to Slide 10

Slide 6 — Replication- and Repair-Based Mechanisms

FoSTeS and MMBIR act when replication forks stall and template switching occurs.
Generate complex, nonrecurrent CNVs with microhomology (2–15 bp) at junctions.
NHEJ/MMEJ join broken ends with little or no homology, often adding small inserts.
(Insert: Carvalho & Lupski Fig. 4 – Replication–Recombination–Repair continuum.)

Slide 7 — Genome Architecture and Rearrangement Hotspots

Repetitive elements (Alu, LINE-1, segmental duplications) predispose to instability.
Structural features such as palindromes and AT-rich sequences trigger breakage.
Fragile sites often reused in multiple disorders and adaptive duplications.
Genome design itself shapes where SVs arise.

Slide 8 — CNVs as Modifiers of Gene Expression

CNVs change gene dosage, influencing mRNA abundance and phenotypic traits.
About 15–20% of expressed genes are significantly affected by nearby CNVs.
CNVs act as expression quantitative trait loci (eQTLs) across populations.
(Insert: Gamazon & Stranger Fig. 3 – CNV vs. expression correlation.)

Slide 9 — Examples of CNV-Expression Relationships

Genes like GSTT1 and CYP2D6 show dosage-dependent transcription.
Some CNVs exert trans-effects by altering transcription factor dosage.
CNVs highlight the continuum from sequence variation to expression diversity.
(Insert: Gamazon & Stranger Fig. 4 – case examples.)

Slide 10 — Duplication CNVs: Structural Patterns

Most human duplications are tandem and in direct orientation.
Tandem architecture facilitates unequal crossing-over and copy expansion.
Complex or inverted duplications are less common but mechanistically revealing.
(Insert: Newman et al. Fig. 2 – tandem vs. complex duplications.)

Slide 11 to Slide 15

Slide 11 — Gene Innovation Through Duplication

Duplications provide extra gene copies free to diverge and evolve new functions.
Breakpoints within genes can create novel fusion transcripts and proteins.
Some fusions are deleterious, others confer adaptive potential.
(Insert: Newman et al. Fig. 5 – examples of duplication-mediated fusions.)

Slide 12 — Population-Level CNV Adaptation

AMY1 gene duplication correlates with high-starch diets across human groups.
CNVs in immune and sensory genes reflect environmental and pathogen pressures.
Dosage variation underlies differences in metabolism and drug response.
The same mechanisms that cause disease drive adaptation in evolution.

Slide 13 — Structural Variants in Disease

SVs disrupt genes directly or by altering regulatory landscapes.
Deletions cause haploinsufficiency; duplications lead to triplosensitivity.
Inversions or translocations can break genes or create pathogenic fusions.
Some “balanced” rearrangements cause disease through 3D mis-regulation.

Slide 14 — The EPHA4 Locus: A Model of Topological Disruption

Rearrangements at 2q35–36 involve WNT6, IHH, EPHA4, and PAX3.
Patients show limb malformations despite intact coding sequences.
(Insert: Lupiáñez et al. Fig. 1 – genomic map of rearrangements.)
Demonstrates that genome topology, not sequence, determines pathology.

Slide 15 — TADs and Regulatory Insulation

Topologically Associating Domains (TADs) restrict enhancer–promoter contact.
Boundaries marked by CTCF and cohesin maintain regulatory specificity.
Structural variants deleting boundaries merge neighboring TADs.
(Insert: Lupiáñez et al. Fig. 2 – Hi-C contact map showing TADs.)

Slide 16 to Slide 20

Slide 16 — Enhancer Hijacking and Mis-expression

Deletion of the EPHA4–PAX3 boundary allows EPHA4 enhancers to activate PAX3.
Results in ectopic limb expression and brachydactyly phenotype.
CRISPR mouse models reproduce human malformations.
(Insert: Lupiáñez et al. Figs. 3–4 – mouse phenotype & 4C enhancer rewiring.)

Slide 17 — The Evolution–Pathology Continuum

The same SV mechanisms produce both adaptive and pathogenic outcomes.
Recurrent NAHR events yield stable syndromes; replication errors create diversity.
Gene families expand through duplications that sometimes destabilize regulation.
(Insert: Carvalho & Lupski Fig. 5 – recurrent vs. nonrecurrent rearrangements.)

Slide 18 — Balancing Flexibility and Stability

Genome evolution depends on mutational experimentation balanced by repair.
Dosage-sensitive genes are protected; tolerant genes evolve rapidly.
CNVs illustrate the trade-off between innovation and risk.
(Insert: Gamazon & Stranger summary figure – CNV → expression → phenotype.)

Slide 19 — Integrative Cytogenomics: Tools and Perspectives

Modern cytogenetics merges sequencing, imaging, and 3D chromatin mapping.
Techniques: aCGH, WGS, long-read sequencing, FISH, Hi-C, 4C-seq.
Integration reveals both linear and spatial effects of SVs.
(Insert: Carvalho & Lupski detection pipeline figure + Lupiáñez Hi-C map.)

Slide 20 — Synthesis and Take-Home Messages

Structural variants link molecular mechanism, genome architecture, and phenotype.
Evolutionary innovation and genomic disorder share the same mutational roots.
Understanding SVs requires both sequence and 3D chromatin perspectives.
The genome evolves and errs through the same molecular grammar.