Bibliographic and Educational Resources in Cytogenomics

This platform is designed to serve as a comprehensive educational and bibliographic resource for healthcare professionals involved in cytogenomics. Covering a wide range of up-to-date topics within the field, it offers structured access to recent scientific literature and a variety of pedagogical tools tailored to clinicians, educators, and trainees.

Each topic is grounded in a curated selection of recent publications, accompanied by in-depth summaries that go far beyond traditional abstracts—offering clear, clinically relevant insights without the time burden of reading full articles. These summaries act as gateways to the original literature, helping users identify which articles warrant deeper exploration.

In addition to these detailed reviews, users will find a rich library of supplementary materials: topic overviews, FAQs, glossaries, synthesis sheets, thematic podcasts, fully structured course outlines adaptable for teaching, and ready-to-use PowerPoint slide decks. All resources are open access and formatted for easy integration into academic or clinical training programs.

By providing practical, well-structured content, the platform enables members of the cytogenomics community to efficiently update their knowledge on selected topics. It also offers educational materials that are easily adaptable for instructional use.

Optical Genome Mapping

Overview

Optical Genome Mapping (OGM) is an advanced technology used for genetic analysis, particularly for the detection and characterization of structural variations (SVs) across the whole genome. It offers a robust alternative to traditional cytogenetics and complements next-generation sequencing methods for comprehensive genomic analysis.

OGM involves the analysis of ultrahigh-molecular-weight (UHMW) DNA molecules stretched in nanochannels. The process typically includes:

  • DNA Labeling: DNA is fluorescently labeled at specific sequence motifs. Conventionally, this involves Direct Labeling Enzyme (DLE-1) which tags CTTAAG hexamer motifs with green fluorophores.
  • Multicolor Mapping Strategy: A more advanced approach combines conventional sequence-motif labeling with Cas9-mediated target-specific labeling. This allows for the creation of custom labels by targeting any 20-base sequence (20mer). These 20mers are typically labeled with red fluorophores. This DLE-Cas9 strategy is considered universal and versatile, enabling the simultaneous analysis of multiple targets in a single reaction tube.
  • Nanochannel Imaging: The labeled DNA molecules are linearized within silicon chips containing hundreds of thousands of parallel nanochannels and then imaged using a high-throughput system like the Bionano Saphyr system.
  • Data Analysis: Raw molecule images are converted into BNX files, and a de novo assembly is performed based on the green channel (DLE-1) reference. Red labels are then identified based on their expected genomic locations.

OGM addresses several limitations of other genomic analysis technologies:

  • Long-Range Information Capture: OGM utilizes single molecules that average 300 kbp in length, allowing it to capture long-range genomic information, which is crucial for characterizing large SVs. This is significantly longer than the average read lengths of most long-read sequencing methods.
  • Cost-Effectiveness and High Throughput: Compared to long-read sequencing technologies, OGM offers a cost advantage, providing extensive genomic coverage (e.g., 200x coverage for about $500) more economically and quickly.
  • Improved DNA Integrity: The DLE-Cas9 labeling strategy is superior to older nickase-based methods (like Nt. BspQI-Cas9) as it better preserves DNA integrity, resulting in longer DNA molecules critical for studying long structural variants. The protocol is also simpler and has higher efficiency for second-label incorporation.
  • Detection of New Features: The multicolor mapping strategy allows OGM to interrogate features not accessible to motif-labeling alone, precisely locate breakpoints, and accurately estimate copy numbers of genomic repeats. This is particularly useful in repetitive regions where conventional motifs are often absent.
  • Bypasses Cell Culture: OGM, especially when coupled with Whole Genome Sequencing (WGS), can bypass the need for time-consuming cell culture typically required by karyotyping and FISH.

OGM has diverse applications in genetic analysis and diagnostics:

  • Structural Variation (SV) Discovery and Characterization: OGM is extensively used to detect SVs, which are crucial for understanding mutations underlying genetic disorders and pathogenic conditions. It can detect various types of SVs, including deletions, insertions, inversions, translocations, duplications, and array expansions/contractions, ranging from 1 kbp to several Mbp.
  • Quantification of D4Z4 Copy Numbers: OGM can accurately quantify D4Z4 repeats on chromosome 4q, which are a known biomarker for facioscapulohumeral muscular dystrophy (FSHD). It overcomes challenges like high sequence homology with other regions and lack of motifs, enabling precise estimation of copy numbers (e.g., less than a single repeat unit).
  • Telomere Length Estimation: Telomere length is a recognized clinical biomarker for aging and aging-related diseases, as well as malignant cancers. OGM can label and measure telomeric intensities in most chromosome arms.
  • Detection of LINE-1 Insertions: Long interspersed nuclear elements (LINE-1) insertions are transposable elements associated with various cancers, hemophilia, and muscular dystrophy. OGM with DLE-Cas9 can fluorescently tag specific sequences to differentiate LINE-1 insertions from others and characterize their zygosity and orientation.
  • Automated Karyotyping (OMKar): OMKar is a method that uses OGM data to create a virtual karyotype, providing descriptions of aneuploidies and other rearrangements. It can reconstruct karyotypes with high precision and recall, identifying constitutional disorders like Cri-du-chat, Wolf-Hirschhorn, Prader-Willi deletions, Down, and Turner syndromes. OMKar helps bridge the gap between cytogenetics and SV calling, identifying disrupted genes and providing plausible genetic mechanisms for previously undetected cases.
  • Leukemia Analysis: OGM is a powerful tool for identifying complex chromosomal SVs in acute leukemia samples, detecting more SVs than conventional tools like karyotyping, FISH, microarrays, or MLPA. It can reveal previously unknown gene fusions and complex rearrangements.

While powerful, OGM has certain limitations:

  • Lack of Base-Level Information: OGM typically lacks base-level information, making it challenging to precisely locate SV breakpoints or provide exact sequence information.
    • Solution: This limitation is effectively addressed by combining OGM with Cas9-assisted targeted nanopore sequencing or short-read Whole Genome Sequencing (WGS). OGM can pinpoint the SV region, and then targeted sequencing can resolve the breakpoints at base-level resolution. WGS is specifically noted as efficient for closing gaps at breakpoints identified by OGM.
  • Challenges in Specific Genomic Regions: Certain regions, such as telomeres and short arms of acrocentric chromosomes, can be difficult to characterize due to a lack of labeling motifs or reference sequences.
  • Reduced Sensitivity for Certain Events: OGM shows reduced sensitivity in detecting mosaic chromosomal abnormalities, events in low-complexity regions, and certain segmental duplications that lead to non-allelic recombination, such as Robertsonian translocations.
  • CNV Call Thresholds: The automated CNV caller may miss smaller duplications (e.g., those below 500 kbp).
  • Not a Primary High-Throughput Diagnostic: Currently, OGM is suggested for a well-defined set of questions rather than as a first-line method for all routine genetic testing due to its throughput not yet matching general high-throughput requirements.
  • Short-Read Next-Generation Sequencing (NGS): While highly accurate for small SVs, short reads are less suited for long and complex SV characterizations, and generally have low sensitivity for detecting SVs across the whole genome.
  • Long-Read Sequencing Technologies: These offer advantages for investigating large SVs but often suffer from low throughput, high error rates, and are prohibitively expensive for widespread adoption in routine clinical settings.
  • Conventional Cytogenetic Methods (Karyotyping, FISH, Microarrays, MLPA):
    • Karyotyping: Limited resolution (3-10 Mbp), labor-intensive, and requires significant expertise.
    • FISH: Requires a priori knowledge of loci and has limited throughput.
    • Chromosomal Microarray (CMA): Provides good resolution (few kb) but cannot detect balanced chromosomal aberrations (translocations, inversions) or low-percentage clones.
    • MLPA and RT-PCR: Fast but only target specific regions.
    • OGM generally provides higher resolution for SVs and can detect a broader range of abnormalities that these conventional methods may miss, especially balanced rearrangements. OGM is increasingly being recognized as a fundamental tool that complements G-banding analysis.

FAQ

  1. OGM is a technology that uses long single DNA molecules (typically averaging over 300 kbp) to capture long-range genomic information and create motif-labels-based maps of the whole genome. Its main purpose is to analyze ultrahigh-molecular-weight DNA molecules to provide high-resolution, genome-wide assessments of structural anomalies, identifying large DNA lesions in a cost-effective manner.

OGM is well-suited for detecting a broad range of structural variations, including those ranging from 1 kbp to several Mbp. Specifically, it can identify deletions, insertions, inversions, translocations, array expansions/contractions, and duplications. It is also effective at calling aneuploidies and both balanced and unbalanced rearrangements.

OGM typically involves fluorescently labeling DNA at specific sequence motifs (e.g., CTTAAG hexamer motifs) using a Direct Labeling Enzyme (DLE), resulting in a sequence-specific pattern of signals. The labeled DNA is then loaded onto silicon chips containing hundreds of thousands of parallel nanochannels, which linearize the individual DNA molecules for imaging and digitization. The Bionano Saphyr system is commonly used for high-throughput imaging of these labeled DNA molecules.

OGM offers a medium-resolution, robust alternative to traditional cytogenetic methods, capable of detecting a broader range of SVs in clinical settings. It captures large-scale rearrangements, bridging the gap between low-resolution cytogenetics and high-resolution sequencing. It is cheaper and quicker for SV characterization and can detect SVs as small as 500 base pairs. Furthermore, OGM can bypass the need for cell culture, unlike traditional methods that can take weeks. It also provides a significant cost advantage over long-read whole-genome sequencing.

OGM lacks base-level sequence resolution, making it impossible to precisely locate SV breakpoints with motif-based mapping alone. It cannot call single nucleotide substitutions or small insertions and deletions. Labeling motifs can be unevenly spread or absent in repetitive regions. OGM also shows reduced sensitivity for mosaic chromosomal abnormalities and struggles with events in low-complexity regions or segmental duplications that lead to non-allelic recombination, such as Robertsonian translocations. It cannot currently detect variations within centromeres or the short arms of acrocentric chromosomes. Additionally, OGM coverage can be low in certain regions like those with D4Z4 repeats, affecting analysis.

Short-read next-generation sequencing (NGS) generally has low sensitivity for detecting structural variations due to short read lengths. While long-read sequencing methods (e.g., PacBio, Nanopore) are more capable of identifying SVs, they traditionally suffer from low throughput, high error rates, and high costs, discouraging widespread adoption in clinical settings. OGM provides a cost-effective alternative, offering 200x coverage for about $500, significantly less than the $10,000-$20,000 for long-read whole-genome sequencing.

Traditional karyotyping has a low resolution (3-10 Mbp) and requires considerable manual expertise. Chromosomal microarray (CMA) offers higher resolution (a few kilobase pairs) but cannot detect balanced chromosomal aberrations. FISH requires a priori knowledge of specific probes and has limited throughput. In contrast, OGM can detect copy number neutral rearrangements like balanced translocations, which are often missed by CMA or exome sequencing. OGM can detect more SVs in leukemia samples as a single test than multiple conventional tools and serves as a fundamental tool complementing G-banding analysis for identifying complex SVs.

The multicolor mapping strategy combines conventional sequence-motif labeling (e.g., with green fluorophores) with Cas9-mediated target-specific labeling of any 20-base sequences (20mers, e.g., with red fluorophores). This universal strategy allows for the creation of custom labels to interrogate features not accessible to motif-labeling alone, precisely locate breakpoints, and estimate copy numbers of genomic repeats. It also enables the simultaneous detection of multiple targets in a single tube reaction.

Cas9-mediated labeling leverages the specificity of the Cas9 enzyme (specifically Cas9D10A nickase) for target-specific labeling of any 20 bases across the whole genome. This approach can be combined with conventional motif-mapping and is particularly useful in repetitive genomic regions lacking DLE motifs. It allows for custom labels to precisely locate breakpoints and interrogate features previously inaccessible to motif-only labeling.

DLE, such as DLE-1, is an enzyme used in a labeling scheme (e.g., Bionano Genomics) for motif-based optical mapping. It globally tags DNA at DLE-specific motifs (e.g., CTTAAG) with green fluorophores. DLE-1 based whole-genome mapping is increasingly replacing older Nt. BspQI-based mapping. Combining DLE-1 with Cas9 labeling offers advantages like better DNA integrity preservation (resulting in longer DNA molecules) and higher efficiency of second-label incorporation compared to Nt. BspQI-based methods.

FSHD is a genetic disorder associated with D4Z4 copy number variation. OGM, particularly when combined with DLE-Cas9 labeling, can accurately quantify D4Z4 copy numbers. This is crucial because D4Z4 repeats are located in regions that often lack motifs targeted by conventional mapping. The DLE-Cas9 method demonstrated the ability to quantify D4Z4 repeats with an accuracy of less than a single copy, which is vital for differentiating pathogenic phenotypes in FSHD cases. This offers a more precise alternative to conventional Southern blotting, which often yields semi-quantitative or indeterminate results.

Yes, OGM, specifically with DLE-Cas9, can be used for telomere length estimation. Telomere length is a recognized clinical biomarker for aging and aging-related diseases, and its unregulated length has been correlated with malignant cancers. Telomeres are (TTAGGG)n repeats and occur in genomic regions that also lack labeling motifs for conventional mapping. The DLE-Cas9 methodology offers significant advantages over previous nickase-labeling with Cas9 approaches, which were limited by fragile sites and DNA damage, allowing characterization of more telomeres.

LINE-1 insertions are transposable elements (approximately 6 kbp) that vary between individuals and are linked to various genetic disorders and cancers. While optical mapping with DLE alone can detect insertions, it cannot differentiate LINE-1s from other 6 kbp insertions due to its lack of base-by-base sequence information. The DLE-Cas9 method overcomes this by allowing specific fluorescent tagging of LINE-1 sequences, enabling their differentiation from other insertions. This approach successfully identified 55 LINE-1 insertion sites in the NA12878 sample, confirming most previously reported ones and discovering new locations.

OMKar is a method designed to use OGM data to create a virtual karyotype. It processes structural (SV) and copy number (CN) variants derived from OGM, encoding them into a compact breakpoint graph. It then recomputes copy numbers using Integer Linear Programming and identifies Eulerian paths to represent entire donor chromosomes. OMKar aims to automate karyotyping and bridge the gap between low-resolution cytogenetics and high-resolution sequencing. It can also automatically identify Genotype-to-Phenotype (G2P) mechanisms by correlating reconstructed karyotypes with information in databases like DDG2P.

In tests using whole-genome simulations, OMKar reconstructed karyotypes with 88% precision and 95% recall on SV concordance, and a 95% Jaccard score on CN concordance. When applied to 154 clinical samples, OMKar correctly reconstructed the karyotype in 144 cases (93.5%), achieving 100% detection for aneuploidies and balanced translocations, and 87.8% for unbalanced variations. It showed high accuracy (96% for low-complexity and 89% for high-complexity clusters) in identifying SV edges. OMKar can also infer missing SV calls, even if they are only partially supported by SVs or CNVs, and resolve conflicting boundaries.

While OGM effectively detects chromosomal translocations, deletions, duplications, and inversions, it often does not resolve the exact sequences at the breakpoints. Short-read WGS can be used to “close the gap” left by OGM by precisely defining these breakpoints once OGM has narrowed down the region to a few kilobases. This combined approach is noted to be inexpensive and quick, allowing the entire process to be completed in days. Notably, short-read WGS alone has low sensitivity for SV detection and typically generates a high number of difficult-to-interpret SVs.

Optical mapping technology relies on long DNA molecules, typically averaging 300 kbp or larger, because these molecules allow it to capture long-range genomic information. This is essential for spanning repetitive and low-complexity regions and for characterizing large structural variations. The DLE-Cas9 labeling strategy is particularly advantageous because it helps preserve DNA integrity better than other methods, resulting in longer DNA molecules, which is critical for studying long structural variants.

OGM is crucial for understanding genetic risk factors, diagnosis, and treatment decisions related to constitutional genetic disorders arising from large chromosomal rearrangements like aneuploidies and translocations. In prenatal diagnosis, OGM can provide actionable reports, especially when clinical data are insufficient or familial carrier analysis is not feasible, reducing patient distress. In leukemia, OGM identifies complex SVs and gene fusions, complementing G-banding analysis and offering new insights into disease mechanisms. It has also provided novel Genotype-to-Phenotype (G2P) explanations for previously undiagnosed postnatal phenotypes, particularly those involving neurodevelopmental genes disrupted by balanced events.

High-molecular weight (HMW) genomic DNA is required for OGM. This DNA can be purified from various cellular sources, including cells embedded in agarose-gel plugs or extracted via Nanobind disk-based solid phase extraction. Specific examples of clinical samples used in studies include peripheral blood cells, frozen bone marrow samples, cultured amniotic cells, and chorionic villi specimens (CVS).

Data collection for OGM typically aims for >300 Gbp of ultra-high-molecular-weight (UHMW) DNA, with molecules of 150 kbp or larger, corresponding to >80x coverage of the genomic reference and >50x coverage of the genomic assembly. After imaging, raw molecule images are converted into BNX files. De novo assembly and structural variant (SV) calling are performed using software modules like Bionano Access Suite or Bionano Solve pipeline. While automated calls are available, direct visual assessment by experienced human evaluators is also relied upon for data analysis. Additionally, tools like OMKar process OGM’s SV and CNV calls, filtering low-confidence variations, constructing breakpoint graphs, and reconstructing chromosomes.

Bibliography

Uppuluri, L., Jadhav, T., Wang, Y., & Xiao, M. (2021).
Multicolor Whole-Genome Mapping in Nanochannels for Genetic Analysis. Analytical Chemistry, 93(28), 9808–9816. doi:10.1021/acs.analchem.1c01373

Uppuluri, L., Wang, Y., Young, E., Wong, J. S., Abid, H. Z., & Xiao, M. (2022).
Multiplex structural variant detection by whole-genome mapping and nanopore sequencing. Scientific Reports, 12(1), 6561. doi:10.1038/s41598-022-10483-7

Raeisi Dehkordi, S., Jia, Z., Estabrook, J., Hauenstein, J., Miller, N., Güleray-Lafci, N., Neesen, J., Hastie, A., Chaubey, A., Pang, A. W. C., Dremsek, P., & Bafna, V. (2024).
OMKar: optical map based automated karyotyping of genomes to identify constitutional abnormalities. bioRxiv. doi:10.1101/2024.03.11.584500

Tsai, M.-J. M., Kao, H.-J., Chen, H.-H., Yu, C.-H., Chien, Y.-H., Hwu, W.-L., Kwok, P.-Y., Lee, N.-C., & Yang, Y.-L. (2025).
Optical genome mapping with whole genome sequencing identifies complex chromosomal structural variations in acute leukemia. Frontiers in Genetics, 16, 1496847. doi:10.3389/fgene.2025.1496847

Dremsek, P., Schachner, A., Reischer, T., Krampl-Bettelheim, E., Bettelheim, D., Vrabel, S., Delissen, Z., Pfeifer, M., Weil, B., Bajtela, R., Hengstschläger, M., Laccone, F., & Neesen, J. (2024).
Retrospective study on the utility of optical genome mapping as a follow-up method in genetic diagnostics. Journal of Medical Genetics. Epub ahead of print. doi:10.1136/jmg-2024-110265

The research aims to overcome the limitations of existing technologies in characterizing structural variations (SVs), which are crucial for understanding genetic disorders and pathogenic conditions.

Challenges with Existing Technologies Characterizing SVs has been challenging. Short-read, high-throughput sequencing technologies struggle with SV detection due to their short read lengths. While long-read sequencing technologies offer longer reads suitable for larger SVs, they suffer from low throughput and high costs, which discourage their widespread adoption. For instance, analyzing D4Z4 repeat copy numbers effectively requires read lengths exceeding 300 kbp, including flanking sequences, to differentiate haplotypes, a capability that long-read sequencing often lacks.

Optical mapping technology uses long single DNA molecules, averaging 300 kbp, enabling it to capture long-range genomic information more economically and quickly than long-read sequencing. However, traditional optical mapping, which relies on mapping specific 6- to 8-base sequence motifs (e.g., CTTAAG), has its own set of limitations:

  • These motifs are unevenly distributed across the genome and are often absent in repetitive regions.
  • It lacks base-level information, making it difficult to precisely locate SV breakpoints or estimate copy numbers accurately.
  • It cannot differentiate between different types of insertions of similar size, such as full-length LINE-1 insertions from other 6 kbp insertions.

The Proposed Multicolor Mapping Strategy The authors present a novel approach that integrates two labeling systems:

  • Direct Labeling Enzyme (DLE-1) is used to label conventional sequence motifs (like CTTAAG-motifs) with green fluorophores.
  • Cas9D10A nickase is employed for target-specific labeling of any 20-base sequences (20mers), which are custom labeled with red fluorophores.

This universal and versatile strategy allows for:

  • Detecting SVs.
  • Utilizing custom labels to interrogate genomic features not accessible via motif-labeling.
  • Precisely locating breakpoints.
  • Accurately estimating copy numbers of genomic repeats.
  • Fluorescently tagging specific sequences (e.g., LINE-1) to differentiate them from other insertions.
  • Simultaneous detection of multiple targets within a single-tube reaction.

Advantages over Previous Optical Mapping Approaches The new DLE–Cas9 approach offers significant improvements over prior methods like Nt. BspQI–Cas9-based multicolor labeling:

  • Better DNA integrity preservation: DLE–Cas9 labeling minimizes DNA breakage, resulting in longer DNA molecules, which is crucial for studying long structural variants like the D4Z4 locus.
  • Simpler and more efficient protocol: Unlike Nt. BspQI–Cas9, which requires removing first-color nucleotides before introducing a second color, making it tedious and less efficient, the DLE–Cas9 protocol is simpler and results in higher efficiency of second-label incorporation.
  • Expanded characterization capability: This method enabled the characterization of five telomeres (16p, 17p, 19q, 22q, and 23p) that were previously uncharacterized due to DNA breakage at fragile sites with Nt. BspQI–Cas9.

Experimental Methodology High-molecular weight genomic DNA was purified, quantified, and then subjected to a two-step labeling process:

  1. DLE-1 labeling: Genomic DNA was first labeled with the DLS labeling kit using DLE enzyme and DL-green labeling mix.
  2. Cas9-mediated nick-labeling: The DLE-1 labeled DNA was subsequently nicked using Cas9D10A guided by specific crRNA or sgRNA (for telomere, D4Z4, or LINE-1 targets). The nicked sites were then labeled with Taq DNA polymerase using red fluorophores (Atto647 dUTP, etc.). The labeled DNA samples were then loaded onto Bionano Saphyr G1.2 chips and imaged using a dual-labeled sample workflow, sequentially exciting red, green, and DNA backbone stains. Data was assembled based on the green channel, and red labels were identified by their expected genomic locations.

Key Applications and Results

  1. Quantification of D4Z4 Copy Numbers for FSHD Diagnosis:
    • The D4Z4 locus on chromosome 4q35 is a biomarker for facioscapulohumeral muscular dystrophy (FSHD) but is difficult to quantify due to high homology with other regions and lack of DLE motifs.
    • By using two guide RNAs (4qD4Z4 and 10qD4Z4) to target the D4Z4 repeat array with red fluorophores, the method achieved a sensitivity of half a repeat unit, detecting a 1.68 kbp repeating unit. This contrasts with conventional motif-based optical mapping which relies on less accurate estimations based on flanking DLE sites.
    • The study quantified D4Z4 copies in NA12878, estimating 4qA to have 48 ± 0.94 copies of 3.36 kb units and 4qB to have 19 ± 0.29 copies of 3.36 kb units. This marks the first time standard deviation for D4Z4 quantification (0.97 repeats for 4qA) was reported, allowing differentiation of less than one repeat unit, which is critical for FSHD diagnosis.
  1. Telomere Length Estimation:
    • Telomere length is a clinical biomarker for aging-related diseases and cancers.
    • The DLE–Cas9 methodology successfully labeled telomeric repeats with red fluorescent dye at molecule ends.
    • The method was able to label and measure telomeric intensities in all chromosome arms except the five acrocentric chromosomes, significantly improving upon previous methods that could only map 36 out of 46 telomeres due to fragile sites and laborious protocols.
  1. Detection of Long Interspersed Elements 1 (LINE-1) Insertions:
    • LINE-1 insertions constitute about 17% of the human genome and are associated with various diseases. Traditional optical mapping cannot differentiate LINE-1 from other 6 kbp insertions.
    • The authors designed four specific sgRNAs targeting distinct 20-base sequences within the LINE-1 reference, labeled with red fluorophores. The observed distances between these red labels confirmed the identity and orientation of LINE-1 insertions.
    • The method discovered 55 LINE-1 insertion sites in the NA12878 human genome, identifying 51 of the 52 sites reported by a recent PacBio sequencing study and four additional, previously unidentified locations.
    • This demonstrated the method’s ability to provide haplotype-resolved and structurally accurate LINE-1 consensus maps.

Conclusions The multicolor whole-genome mapping strategy combining DLE and Cas9-mediated labeling offers a universal and versatile solution for genetic analysis. It overcomes the limitations of short-read and long-read sequencing technologies by providing high throughput (average 300 kbp single molecules) and cost-effectiveness (e.g., ~$500 for 200x coverage compared to $10,000–$20,000 for long-read whole-genome sequencing). The ability to synthesize custom sgRNAs significantly reduces assay costs, and the entire assay is built on a commercial instrument and kit, making it accessible for general laboratory use. This approach allows for simultaneous analysis of multiple targets, leading to more precise detection of breakpoints and comprehensive characterization of genomic features for various applications, including breakpoint detection, repetitive sequence characterization, and mutagenesis investigations.

Challenges with Existing Technologies
Characterizing structural variations (SVs) is inherently complex due to their diverse sizes (from 50 base pairs to several megabases) and classifications (deletions, insertions, translocations, inversions, and copy number variations).

  • Short-read, high-throughput sequencing technologies are accurate for smaller SVs but are less suitable for long and complex SV characterizations.
  • Long-read sequencing technologies offer longer reads that are advantageous for large SVs, but they are hampered by low throughput, high error rates, and prohibitive costs, which restrict their widespread adoption. For instance, a recent study on Icelandic populations using long-read sequencing involved a flowcell cost of $5600 for four flowcells, or $1400 per flowcell.
  • Single-molecule optical mapping (OM) technology utilizes long DNA molecules (averaging 300 kbp), making it cheaper and quicker for whole-genome mapping and initial SV detection. It is widely used for creating maps of long-range SVs and complex rearrangements, including repeats or segmental duplications. However, traditional optical mapping lacks sequence-level resolution and cannot accurately pinpoint SV breakpoints. It also does not provide base-level information, making it impossible to precisely locate SV breakpoints.

To overcome these limitations, multi-platform and computational approaches are increasingly employed for comprehensive SV discovery and analysis, but they are expensive, resource-intensive, and add complexity, limiting their routine use.

The Proposed Combined Strategy
The authors propose a strategy that leverages the strengths of both optical mapping and Cas9-assisted targeted nanopore sequencing.

  1. Optical Mapping for SV Discovery: Whole-genome optical mapping is first performed to economically and quickly detect SVs across the entire genome, identifying their general location and type. Optical maps are created based on motif-specific labeling, where SVs 1 kbp or larger can be economically detected based on label density and distribution. For instance, in the NA12878 sample, 2280 SVs of 2 kbp or longer were detected, including 1265 insertions, 759 deletions, and 256 inversions.
  2. Cas9-assisted Targeted Nanopore Sequencing for Breakpoint Resolution: Once SVs are discovered by optical mapping, a subset of these SVs, those known to affect biology, are selected for targeted nanopore sequencing to precisely resolve their breakpoints. This targeted approach makes the overall characterization more economical.

Experimental Methodology
The methodology involves several key steps:

  • DNA Preparation: High-molecular weight genomic DNA is used.
  • Optical Mapping: DNA is labeled with DLE (Direct Labeling Enzyme) and imaged on a Bionano Saphyr system to generate whole-genome maps. Data is de novo assembled based on these DLE labels.
  • Targeted Nanopore Sequencing:
    • Guide RNA (gRNA) Design: For each SV of interest, unique gRNA pairs are designed to target specific cleavage sites. A single sgRNA mix can be synthesized and used for all targeted deletions and insertions in a single-tube Cas9-cleavage reaction. The authors previously reported synthesizing and using up to 200 sgRNAs in a single reaction.
    • Blocking Steps for Efficiency: To efficiently generate the target fragments (which are a tiny fraction of the whole genome, e.g., ~0.03%), the protocol is optimized to suppress non-target fragments. This involves two blocking steps:
      1. 3′ End Blocking: Dideoxynucleotides are incorporated to block 3′ ends at internal nick and break sites.
      2. 5′ End Dephosphorylation: 5′ ends at internal nick and break sites are dephosphorylated to discourage non-specific dA-tailing and adapter ligation.
    • Targeted Fragmentation and Sequencing: Following Cas9 digestion at the targeted sites, dA-tailing is performed, and universal Y-adapters are ligated to the exposed DNA ends. The adapter-ligated fragments are then PCR amplified and sequenced on a Nanopore Flongle. PCR enrichment is used to counteract low data yield on a Flongle, making the assay more economical, though it limits target sizes to under 25-30 kbp.
    • Data Analysis: Raw sequencing reads are base-called with MinKNOW and Guppy software, then aligned to the hg38 reference genome using minimap2. Integrated Genomics Viewer (IGV) is used for visualization and further analysis of aligned reads at SV regions.

Key Applications and Results

The study demonstrates the application of this methodology by resolving the breakpoints of fourteen different SVs identified by optical mapping, including six deletions, seven insertions, and an inversion.

  1. Deletion Resolution (e.g., Heterozygous Deletion on Chr 12):
    • Optical mapping initially detected a 13.2 kbp deletion in a heterozygous state on chromosome 12.
    • Specific gRNAs were designed: one pair for the undeleted haplotype (matching hg38) and another pair spanning the deletion for the deleted haplotype.
    • Sequencing confirmed the 3.5 kbp fragment for the undeleted haplotype and a shorter, discontinuous ~5.5 kbp fragment for the deletion-containing haplotype. This discontinuous alignment revealed the precise breakpoint at 45,509,371 bp, effectively validating the approach and differentiating the heterozygous deletion.
  1. LINE-1 Insertion Detection and Breakpoint Resolution (e.g., Homozygous Insertion on Chr 12):
    • Optical mapping detected a 12.9 kbp insertion, suspected to be a LINE-1 insertion based on extra labels and size.
    • Two gRNAs were designed: one targeting the flanking region on hg38 and another specifically within the LINE-1 reference sequence (GenBank L1.2/L19088).
    • Sequenced reads aligned to hg38 at the flanking gRNA site and extended into the insertion region. The portion of reads that did not align to hg38 perfectly aligned to the putative LINE-1 reference, confirming the insertion’s identity and precisely locating its breakpoint at 33,864,403 bp. This demonstrates the ability to identify specific types of insertions that traditional optical mapping cannot differentiate from other insertions of similar size.
  1. Inversion Breakpoint Resolution (e.g., Homozygous Inversion on Chr 12):
    • Optical mapping estimated a ~90 kbp homozygous inversion on chromosome 12.
    • Two gRNAs were designed: one in the inversion-flanking region and one inside the inversion. Due to the inversion, the gRNAs were expected to be closer, yielding a ~6.5 kbp fragment.
    • Sequenced fragments aligned to hg38 from both expected cut sites, demonstrating partial alignment to the inverted region and the flanking region. This dual alignment allowed the detection of both breakpoints of the inversion, at 17,768,358 bp and 17,861,570 bp. This approach addresses the complexity of detecting inversions, especially those in regions with segmental duplications.

Advantages and Conclusions This combined methodology offers several significant advantages:

  • Comprehensive SV Analysis: It provides a universal and flexible methodology for SV discovery and precise breakpoint localization, addressing limitations of single technologies.
  • Cost-Effectiveness: Optical mapping is cheaper and quicker than long-read sequencing technologies. The proposed approach further reduces costs by using targeted nanopore sequencing only for specific SVs, synthesizing sgRNAs economically (<$5), and using a Nanopore Flongle flow cell (~$70). This makes it accessible for routine diagnostics and large-scale population studies.
  • High Throughput and Specificity: While optical mapping detects a high number of SVs genome-wide, targeted sequencing focuses on biologically relevant SVs, leading to efficient and accurate breakpoint resolution.
  • Multiplexing Capability: The ability to use a single sgRNA mix for multiple targets in one reaction enables the efficient analysis of multiple SVs in one or more samples simultaneously.

The authors conclude that this method can lead to accelerated SV discovery and typing screens for routine diagnostics and association studies by providing precise detection of breakpoints economically. The simplicity of the analysis workflow (using standard tools like minimap2 and IGV) also supports its general adoption. While the gRNA design for certain insertions like LINE-1 relies on known reference sequences, the approach remains highly valuable, especially for enriching insertion-containing loci.

Limitations of Current Karyotyping Methods

Traditional karyotyping relies on microscopic examination of chromosomes, a complex process requiring high expertise and offering only megabase (Mb) scale resolution (3-10 Mbp). While highly precise for smaller SVs, short-read sequencing struggles with long and complex SV characterizations due to short read lengths. Long-read sequencing technologies, though advantageous for large SVs due to longer reads, are hindered by low throughput, high error rates, and prohibitive costs, preventing widespread adoption in routine clinical settings. Existing methods like Chromosomal Microarray (CMA) and FISH (Fluorescence In Situ Hybridization) also have limitations: CMA does not easily detect copy number neutral rearrangements, and FISH requires a priori knowledge of probes, making it limited in detecting novel variations. For instance, approximately 50% of all reciprocal translocations are de novo.

Optical Genome Mapping (OGM) as an Alternative

OGM technology, which uses long single DNA molecules (over 300 kbp), is an exciting alternative positioned between cytogenetics and exome sequencing in terms of resolution. It is cost-effective and efficient for detecting long-range SVs, complex rearrangements, and copy number changes, including aneuploidies, inversions, and deletions. However, standard OGM alone lacks base-level resolution, meaning it cannot precisely locate SV breakpoints or estimate copy numbers. This necessitates multi-platform approaches for comprehensive SV discovery and analysis, which are often expensive and resource-intensive.

OMKar’s Purpose and Methodology

OMKar aims to automate karyotyping using OGM data, bridging the gap between low-resolution cytogenetics and high-resolution sequencing methods. It processes Structural Variant (SV) and Copy Number (CN) Variant calls from the Bionano Solve pipeline as inputs.

Key Definitions and Steps: OMKar defines key terms for its analysis:

  • Segment: An oriented, continuous genomic interval from the reference genome.
  • Breakpoint: A pair of non-adjacent coordinates denoting a transition between segments.
  • Chromosome Group: All homologous donor chromosomes sharing the same chromosomal identity.
  • Chromosome Cluster: A connected component of dependent chromosome groups, often defined by a set of canonical SVs.
  • Molecular Karyotype: A file format describing the karyotype as an ordered sequence of segments with nucleotide-level resolution.

The OMKar algorithm follows a multi-step process:

  1. Pre-processing and Filtering: SV and CNV calls are filtered to ensure data quality and relevance. This includes removing low-confidence calls, masking problematic genomic regions (e.g., centromeres, telomeres), and filtering CNVs smaller than 200 kbp (unless supported by SVs). Breakpoints are merged within a 50 kbp window to simplify the graph, and CNV segments are split if breakpoints occur within them, ensuring uniform copy numbers per segment.
  2. Breakpoint Graph Construction: A directed multi-graph G(V,Es ∪ Er ∪ Eb) is created, where vertices represent segment boundaries. Edges include segment-edges (Es), reference-edges (Er) for adjacent segments, and breakpoint-edges (Eb) for rearrangements.
  3. Smoothing Edge Multiplicities (Integer Linear Programming – ILP): An ILP formulation constrains the copy number of each genomic segment and assigns copy numbers to reference and breakpoint edges. This step aims to make the graph Eulerian (where all vertices have an even degree) by minimizing discrepancies between observed and expected copy numbers, penalizing non-zero slack, rewarding breakpoint usage, and penalizing odd-degree vertices.
  4. Computing Eulerian Tours: A Breadth-First Search (BFS) algorithm identifies all connected components (chromosome clusters). For components with odd-degree non-telomeric vertices, dummy edges are added to create an Eulerian structure. Eulerian tours are then computed starting from telomeric vertices.
  5. Chromosomal Segregation and Identification: Each chromosome is represented as a subpath of alternating segment and non-segment edges. The algorithm forces this alternation and splits subpaths at boundaries between homologous chromosome pairs. A heuristic refinement is applied to ensure each reconstructed chromosome contains a single centromere.
  6. Event Interpretation: SVs are re-coded using the International System for Human Cytogenomics Nomenclature (ISCN). The module aligns reconstructed chromosomes with their wild-type (WT) counterparts, identifies concordant, insertion, and deletion blocks, and assigns ISCN based on unique block-type signatures, favoring compound SVs over multiple simpler ones.
  7. Report Generation: OMKar compiles an HTML report including decomposed paths (chromosomes), visualizations in cytoband and segment views, interpreted SVs in ISCN, and a table of disrupted developmental genes (from the DDG2P database).

Supporting Modules (KarSim and KarCheck):

  • KarSim (Karyotype Simulator) generates random karyotypes, FASTA files, and event logs for simulating common genetic disorders and various sequencing technologies. It allows for parameterized SV generation and masking of genomic regions.
  • KarCheck (Karyotype Checker) compares two karyotypes (simulated vs. reconstructed) by measuring SV and CNV similarities. It preprocesses chromosome groups into clusters, partitions segments for comparability, and computes Jaccard similarity scores for SV edges and CNVs. It can accommodate varying resolution thresholds.

Results and Performance

  • Efficiency: OMKar runs efficiently on a standard desktop, with a median runtime of 8.4 seconds for analysis and 21.3 seconds including image generation for the HTML report. Runtime is correlated with the number of rearrangements.
  • Simulated Data Accuracy:
    • OMKar reconstructed karyotypes from 38 simulated datasets (552 SVs) with high accuracy.
    • True-negative rate for non-event clusters was 98.8%.
    • It correctly reconstructed 13 out of 14 simulated aneuploidies.
    • The average Jaccard Similarity for SVs was 84.8% (recall 94.7%; precision 87.5%), with better performance on low-complexity clusters (89.9% Jaccard) compared to high-complexity ones (80.8% Jaccard).
    • CNV comparison showed high accuracy with a 96.0% average Jaccard Similarity.
    • Overall SV edge accuracy was 96% for low-complexity and 89% for high-complexity clusters. Balanced reciprocal translocations were reconstructed with 100% accuracy in low-complexity clusters.
  • Clinical Data Accuracy:
    • OMKar was applied to 154 clinical samples (50 prenatal, 41 postnatal, 63 parental) previously diagnosed by traditional methods.
    • It fully reconstructed 129 (91%) of 141 previously detected variations, including 100% of aneuploidies (25/25) and balanced reciprocal translocations (32/32).
    • OMKar consistently reconstructed the correct karyotype in biological replicates processed at different sites.
    • It identified additional SVs not caught by other techniques, averaging 2.8 deletions, 3.3 amplifications, and 0.44 inversions per sample as novel events.
  • Handling Partially Missing Calls: OMKar can reconstruct karyotypes even when supported only by SVs or only by CNVs. For example, it inferred missing SV calls using support from other SV and CNV data in a complex inter-chromosomal duplicated insertion, which was previously only partially reported by karyotyping and CMA.
  • Genetic Basis Explanations (G2P): OMKar automatically identified all 20 previously diagnosed Genotype-to-Phenotype (G2P) mechanisms in prenatal/postnatal samples, particularly aneuploidies. It also provided novel G2P explanations for 5 out of 21 previously undiagnosed postnatal samples, including disruptions of neurodevelopmental genes by balanced events like translocations and transpositions.

Discussion and Conclusion

OMKar effectively bridges the gap between low-resolution cytogenetics and high-resolution sequencing by automating karyotype inference and capturing large-scale rearrangements. It reduces manual workload, enhances speed, and improves diagnostic accuracy. A key strength is its ability to resolve conflicts between SV and CNV calls, inferring variations even with partially missing or conflicting information.

Despite its capabilities, OMKar (and OGM) has limitations: reduced sensitivity for mosaic chromosomal abnormalities, events in low-complexity regions, segmental duplications (e.g., Robertsonian translocations), and centromeres or acrocentric short arms. However, the core algorithm of OMKar, based on Eulerian graphs, is technology-agnostic and could be adapted to other sequencing platforms as more datasets become available.

The study concludes that OMKar offers a robust, scalable, and high-resolution approach for detecting constitutional genetic abnormalities using OGM data. While ongoing improvements are needed, OMKar has demonstrated significant potential to become an increasingly important method in research and clinical diagnostics, complementing and potentially surpassing traditional methods in accuracy and efficiency.

Introduction and Background
Chromosomal SVs, defined as DNA regions larger than 1 kb with changes in copy number, orientation, or chromosomal location, are crucial in the formation of human cancers, including leukemias. Traditional methods for SV analysis, such as karyotyping, fluorescence in situ hybridization (FISH), microarrays, multiplex ligation-dependent probe amplification (MLPA), polymerase chain reaction (PCR), and reverse transcription PCR (RT-PCR), each have significant limitations. For example, karyotyping offers a maximum resolution of approximately 5 Mb, FISH requires a priori knowledge of loci and has limited throughput, and microarrays, while providing higher resolution (a few kb), cannot detect balanced chromosomal aberrations like translocations and inversions, nor are they effective for low-percentage clones in cancer cells. Short-read next-generation sequencing (NGS), including WGS, is widely used for sequence variations but exhibits low sensitivity for SV detection. While long-read sequencing methods like PacBio or Nanopore are more capable of identifying SVs, their costs remain prohibitively high for widespread adoption.

OGM emerges as a cutting-edge technology for analyzing ultrahigh-molecular-weight DNA molecules, providing high-resolution and long-range genome-wide assessments of structural anomalies. It works by fluorescently labeling DNA at specific hexamer motifs (CTTAAG), resulting in sequence-specific patterns across the genome. The labeled DNA is then linearized in nanochannels and imaged to assess SVs as small as 500 base pairs. The integration of OGM into diagnostic pipelines can potentially reduce personnel and overall costs, enhancing the understanding of disease mechanisms in hematologic malignancies.

Methods
The study utilized bone marrow aspiration samples from five leukemia patients and peripheral blood samples from five healthy donors as controls. These choices were based on clinical relevance, with bone marrow being the standard diagnostic sample for leukemia due to its higher yield of leukemic cells.

  • DNA Extraction: Ultrahigh-molecular-weight DNA was extracted from 1.0–1.5 million cells using Bionano Prep™ kits, optimized for fresh peripheral blood or frozen bone marrow samples.
  • OGM Procedure: DNA was labeled with DLE-1 enzyme (Bionano Genomics) using the Direct Label and Stain kit. After removing excess fluorophores, the labeled DNA was loaded onto a Saphyr Chip® and analyzed on a Bionano Saphyr system. OGM data collection aimed for >300 Gbp of UHMW DNA, corresponding to >80x genomic reference coverage. De novo analysis and SV calling were performed using Bionano Access Suite software v3.7, annotating SVs (deletions, insertions, inversions, translocations) against the hg38 reference.
  • WGS Procedure: WGS was performed on an Illumina NovaSeq 6000 system to an average coverage depth of 30X. Raw reads were aligned to hg38 using the BWA-GATK-ANNOVAR pipeline, and SVs identified by the Manta program were used to close gaps left by OGM alignments.
  • Statistical Analysis: Mann–Whitney U test was used for comparisons between leukemia and control groups, with p < 0.05 indicating statistical significance.

Results General SVs in Leukemia and Control Samples:
OGM analysis achieved high effective coverage (>300x) in all leukemia samples. On average, 1,044 SVs were identified per leukemia sample, comprising 477 insertions, 457 deletions, 32 inversions, and 73 duplications. This was significantly higher than the average of 650 SVs in control samples (315 insertions, 284 deletions, 32 inversions, 17 duplications), with statistical significance for insertions (p = 0.016), deletions (p = 0.028), and inversions (p = 0.028). Chromosomal translocations (intertranslocation) were exclusively observed in leukemia samples.

Comparison between OGM and Conventional Diagnostic Tools (Case Studies):
OGM successfully detected all previously known SVs from conventional tools in the leukemia samples, with the exception of one specific fusion (IGH::DUX4), and additionally identified more SVs.

  • Case 1 (B-ALL):
    • Conventional tests indicated a complex karyotype and an ETV6::RUNX1 fusion. MLPA detected duplications and deletions in specific regions.
    • OGM revealed an even more intricate karyotype, showing sequential translocations between chromosomes 5, 8, 12, and 21. It also detected a deletion in chromosome 12 (p12.1 to p13.2) missed by conventional methods.
    • The complex changes in chromosome 21 involved translocation to 12p13.2, then to 5q23, followed by duplication into an isodicentric chromosome 21.
    • WGS successfully closed all gaps at breakpoints mapped by OGM, including a novel BCAT1::BAALC fusion between chromosomes 8 and 12. BCAT1 and BAALC are both associated with chronic myeloid leukemia, highlighting the clinical significance of this new finding.
  • Case 2 (B-ALL):
    • Conventional tests found a normal karyotype, BCR::ABL1 fusion, and several deletions (IKZF1, CDKN2A/2B, PAX5).
    • OGM revealed a translocation between chromosomes 9 and 22, explaining the BCR::ABL1 fusion. It also detected a large deletion of chromosome 9p (>16 Mb), encompassing 245 genes including CDKN2A/2B and PAX5, and a smaller deletion on chromosome 7 involving IKZF1.
    • The large 9p deletion, possibly mosaic or due to reduced proliferative capacity, was missed by conventional karyotyping.
    • WGS precisely defined the breakpoint for the Philadelphia chromosome.
  • Case 3 (B-ALL):
    • The only previously known abnormality was an IGH::DUX4 fusion detected by RNA-seq.
    • OGM revealed a terminal translocation between chromosomes 6p25.2 and 14q32.3, and an IGH::DUSP22 fusion, which has been associated with chronic myeloid leukemia and lymphoma.
    • However, the IGH::DUX4 fusion could not be confirmed by OGM or WGS due to low OGM probes in the region and the presence of multiple DUX4 pseudogenes, which caused poor WGS alignment. This suggests a limitation for regions with high genomic complexity or low probe density.
  • Case 4 (AML):
    • Conventional tests identified an inv(16)(p13q22) and a CBFB::MYH11 fusion.
    • OGM/WGS confirmed both the inversion breakpoint and the fusion, identifying 676 genes within the inverted region.
  • Case 5 (T-ALL):
    • Conventional karyotyping was normal, but an STIL::TAL1 fusion and deletions of CDKN2A/2B and STIL were detected by RT-PCR and MLPA.
    • OGM confirmed an 81,839 bp deletion on chromosome 1 causing the STIL::TAL1 fusion and a 115,539 bp deletion on chromosome 9 involving CDKN2A/2B.

Discussion
The study emphasizes OGM’s power as a single, efficient test for SV detection in leukemia, in contrast to the time-consuming battery of conventional methods. While the cost of OGM is comparable to combined conventional tests, it is significantly less than long-read sequencing, making it a viable alternative for comprehensive analysis. OGM offers higher resolution than karyotyping or FISH, enabling the identification of novel or rare SVs and streamlining diagnostic workflows.

Coupling OGM with WGS
provides a powerful solution to overcome OGM’s limitation in resolving exact breakpoint sequences. OGM can pinpoint SVs within a few kilobases, allowing WGS to efficiently define precise breakpoint sequences, particularly for novel gene fusions or when reading frames are critical. This combined approach is inexpensive, quick, and bypasses the need for cell culture, which often takes weeks for conventional methods.

Limitations
A key limitation observed was the inability to confirm the IGH::DUX4 fusion in Case 3. This was attributed to low OGM probe density in the DUX4 region and the presence of multiple DUX4 pseudogenes, hindering short-read WGS alignment. The authors suggest that long-read sequencing might be necessary for challenging regions like these, especially those with repetitive elements. While freeze-thaw procedures for frozen bone marrow samples can cause DNA breakage, the study found that DNA fragment lengths remained sufficiently long for accurate OGM analysis.

Conclusion
The study concludes that OGM is a fundamental tool that complements G-banding analysis, providing characterization of chromosomal rearrangements and involved genes. Its ability to reveal complex SVs and new fusion genes, coupled with WGS for precise breakpoint resolution, significantly advances the understanding of disease biology and its application in precision medicine for leukemia patients.

The authors highlight that while current SOC methods, such as karyotyping, chromosomal microarray (CMA), and whole-exome sequencing (WES), are proficient at detecting deletions and sequence variants, they frequently fall short in providing precise breakpoint information for duplications and balanced SVs. This limitation can severely impede clinical assessment, especially when patient phenotypes are unclear or when family segregation analysis is not feasible. The study posits OGM as a promising and efficient solution to unlock this clinically relevant information.

Background and Limitations of Conventional Methods Genetic testing in clinical practice currently employs a diverse array of SOC methods. However, these techniques often struggle with comprehensive characterization of SVs. For instance, they may fail to resolve the breakpoints of copy-number neutral (balanced) aberrations and copy-number gains, which is a significant disadvantage when accurate variant classification is critical. This issue is particularly pronounced in prenatal diagnostics or for diseases with non-specific phenotypes, where clinical data might be insufficient. While segregation analysis of close relatives can sometimes provide tentative clinical assessments, this approach is limited to pathogenic variants causing observable phenotypes and requires informative carriers. This excludes numerous scenarios, such as cases involving egg or sperm donation, unavailable relatives, or late-onset, low-penetrance, or recessive (especially X-linked recessive) diseases, all of which necessitate precise topological characterization of the SV.

Study Objective and Methods The primary objective of this retrospective study was to systematically evaluate OGM as an additional tool in the routine diagnostic pipeline to gain crucial clinical insights into SVs that SOC methods could not fully characterize.

The study involved a cohort of prenatal and postnatal samples collected over a one-year period in 2023 at the Institute of Medical Genetics. Sample materials included blood, amniotic fluid, or chorionic villi specimens (CVS), based on the clinical indication. Initial genetic tests were performed using SOC methods such as WES, CMA, or karyotyping.

For OGM, ultra-high-molecular-weight (UHMW) DNA was extracted from 1.0 to 1.5 million viable cells using Bionano Prep kits. The DNA was labeled with DLE-1 enzyme (Bionano Genomics) using a Direct Label and Stain kit, then loaded onto a Saphyr Chip® and imaged on a Bionano Saphyr system. Data collection aimed for over 300 Gbp of UHMW DNA, corresponding to more than 80x coverage of the genomic reference. While automated SV calling algorithms were used, the study primarily relied on direct visual assessment by an experienced human evaluator for regions of interest, often bypassing automated calls to avoid potential errors.

Follow-up analyses using OGM were specifically considered under certain criteria: (1) to differentiate true SVs from potential detection artifacts, (2) to assess the SV’s clinical relevance, or (3) to estimate the risk of its recurrence in the patient’s offspring.

Results: OGM’s Impact on Challenging Cases Out of 3021 patients referred for genetic analysis in 2023, follow-up analyses for potentially pathogenic abnormalities were requested for 41 cases using secondary SOC methods. However, for a small subset of seven patient cases (P1–P7), conventional follow-up approaches proved unfeasible, necessitating the use of OGM. In all seven cases, OGM proved crucial. The data obtained either allowed direct interpretation of the corresponding SV or enabled region-specific downstream analysis to achieve that goal. These cases presented highly individual genetic and clinical constellations, yet all qualified for OGM for the same reason: to investigate potential gene disruptions by observed or suspected SVs, either by pinpointing the location and orientation of duplicated material or by characterizing breakpoints in copy-number neutral aberrations.

The study detailed three instructive cases (P1-P3) and briefly mentioned others (P4-P7):

  • Case P1: Localization of Additional Genetic Material
    • Initial Findings: A male fetus, sonographically inconspicuous, showed two X-chromosomal duplications by CMA: a 193 kbp region in Xp22.31 (involving STS gene) and a 654 kbp region in Xp21.2p21.1 (involving DMD gene), both with potential severe implications. MLPA confirmed maternal origin.
    • OGM Contribution: OGM was performed to localize the duplicated material. It suggested a complex rearrangement on the X chromosome’s p-arm, involving consecutive insertion of material from both duplications downstream of the STS gene [352, 354a]. The automated SV caller missed one breakpoint, and the CNV caller did not detect the smaller duplication due to its size.
    • Subsequent Targeted Sequencing: Long-range PCR followed by Sanger sequencing was performed to identify exact breakpoints. This revealed a homologous base at one breakpoint (non-homologous end joining) and a 714 bp LINE (Long Interspersed Nuclear Element) region at the other, fusing STS intron 10 with DMD intron 55.
    • Clinical Outcome: The presence of an intact DMD copy and the insertion of duplicated material outside gene-containing regions led to the classification of both duplications as likely benign. This assessment was confirmed by a healthy male relative carrying the aberration, and the pregnancy continued successfully.
  • Case P2: Breakpoint Clarification of a Balanced SV
    • Initial Findings: Fetal karyotyping suggested a paracentric inversion of the q-arm on chromosome 7 (q11.2q22), but CMA showed no CNVs in the region. While inherited inversions are often benign, similar SVs on chromosome 7 have been linked to hematological malignancies.
    • OGM Contribution: OGM was used to characterize the inversion. It depicted an inversion involving cytobands 7q11.23 and 7q22.1, with breakpoints mapping to large regions of homologous segmental duplications (0.75 and 3.25 Mbp). Low effective coverage from OGM (25x instead of >80x recommended) likely hindered automated identification.
    • Parental OGM and FISH: Parental karyotyping revealed maternal inheritance. OGM on maternal DNA with adequate quality metrics confirmed the inversion’s breakpoints and allowed automated identification. FISH analysis also confirmed the inversion, consistent with OGM data.
    • Clinical Outcome: The breakpoint intervals were sufficiently defined to rule out disruption of known genes, classifying the variant as likely benign. The pregnancy continued to term, resulting in a healthy girl.
  • Case P3: Confirmation of a Cryptic SV
    • Initial Findings: WES of a newborn girl with suspected hereditary pseudohypoaldosteronism showed no coverage of exon 13 of the SCNN1B gene, leading to suspicion of a cryptic SV. Previous PCR attempts to confirm the SV were unsuccessful.
    • OGM Contribution: Years later, OGM became available. Fetal OGM data revealed a homozygous paracentric inversion on chromosome 16 (16p13.13 and 16p12.2). The distal breakpoint interval overlapped with SCNN1B exon 13, strongly suggesting gene disruption. The automated SV caller identified it as two separate intrachromosomal fusions. Parental OGM showed the mother was heterozygous and the father homozygous for the inversion, consistent with his phenotype.
    • Subsequent Long-Read Sequencing: PCR based on OGM-identified breakpoint intervals yielded an amplicon. Long-read sequencing of this amplicon definitively determined the exact breakpoint within exon 13, confirming the disruptive and pathogenic effect of the inversion. FISH analysis also confirmed the inversion.
    • Clinical Outcome: The confirmed breakpoint knowledge enabled the development of a customized PCR test for convenient family carrier testing.

The study also noted other cases where OGM proved useful: P4, a translocation breakpoint located approximately 300 kbp upstream of the FOXL2 gene, not completely ruling out impairment of regulatory elements; and cases P5-P7, where OGM demonstrated that suspected aberrations found by fetal karyotyping were likely artifacts or benign, reducing unnecessary patient distress.

Discussion and Future Directions The study concludes that OGM offers crucial information not obtainable through conventional SOC methods, specifically concerning the precise locations of SV breakpoints. It serves as a low-threshold method for following up on uncertain results, thereby alleviating patient distress. The authors propose a diagnostic strategy where OGM is primarily utilized as a follow-up analysis for well-defined questions, rather than a first-line method. This approach leverages OGM’s strengths to complement existing SOC methods, leading to more clinically actionable reports.

Clinical scenarios that particularly benefit from OGM include cases with a lack of informative family history, situations where carrier analysis is unfeasible (e.g., egg/sperm donation, unavailable relatives, or suspected late-onset/low-penetrance/recessive diseases), and challenging prenatal settings where unclear results can cause significant parental distress. In the prenatal cases presented, OGM provided reports that facilitated parental decision-making and were perceived as reassuring, leading to the continuation of all pregnancies.

However, the study also acknowledges certain limitations of OGM. It shows reduced sensitivity in detecting mosaic chromosomal abnormalities, events in regions of low complexity, and segmental duplications that lead to non-allelic recombination (e.g., Robertsonian translocations). Furthermore, OGM technologies cannot currently detect variations within centromeres or the short arms of acrocentric chromosomes. While long-read sequencing technologies are emerging as a potential solution for these challenging regions, their widespread clinical availability is still developing. The authors suggest that the core OMKar algorithm, which underlies OGM’s analysis, is agnostic to specific sequencing technologies and could potentially be adapted to other platforms as more datasets become available.

In conclusion, this retrospective study strongly advocates for OGM as a valuable tool for enhancing the accuracy and comprehensiveness of constitutional genetic diagnostics. By providing detailed characterization of complex SVs, particularly breakpoint locations, OGM significantly complements existing SOC methods, enabling more precise diagnoses and guiding clinical management, especially in challenging prenatal cases.

Summary sheet

Optical Genome Mapping (OGM) is an advanced genomic analysis technology primarily used for the detection and characterization of structural variations (SVs) across the whole genome. It offers a robust alternative or complement to traditional cytogenetics and next-generation sequencing methods for comprehensive genomic analysis.

Methodology and Key Features OGM involves analyzing ultrahigh-molecular-weight (UHMW) DNA molecules (averaging 300 kbp in length) stretched in nanochannels. The process utilizes a multicolor mapping strategy that combines Direct Labeling Enzyme (DLE-1) tagging of specific sequence motifs (typically CTTAAG hexamer motifs with green fluorophores) with Cas9-mediated target-specific labeling of any 20-base sequences (20mers) with red fluorophores. This DLE-Cas9 strategy is considered universal and versatile, allowing simultaneous analysis of multiple targets in a single reaction. Labeled DNA molecules are linearized within silicon chips and imaged using high-throughput systems like the Bionano Saphyr system. Data analysis involves de novo assembly based on the green channel, followed by identification of red labels based on their expected genomic locations.

A key advantage of OGM is its ability to capture long-range genomic information, which is crucial for characterizing large SVs. The DLE-Cas9 labeling approach also preserves DNA integrity better and is simpler and more efficient than older nickase-based methods, resulting in longer DNA molecules crucial for studying large structural variants. OGM offers a cost-effective and high-throughput solution compared to long-read sequencing technologies, providing extensive genomic coverage more economically and quickly. The multicolor strategy specifically enables interrogation of features not accessible by motif-labeling alone, precise breakpoint localization, and accurate copy number estimation of genomic repeats.

Applications OGM has diverse applications:

  • SV Discovery and Characterization: It is extensively used to detect various SVs, including deletions, insertions, inversions, translocations, duplications, and array expansions/contractions, ranging from 1 kbp to several Mbp.
  • Quantification of D4Z4 Copy Numbers: OGM can accurately quantify D4Z4 repeats on chromosome 4q, a biomarker for facioscapulohumeral muscular dystrophy (FSHD), overcoming challenges in repetitive regions.
  • Telomere Length Estimation: OGM can label and measure telomeric intensities, providing a recognized clinical biomarker for aging and cancer.
  • Detection of LINE-1 Insertions: It can fluorescently tag specific sequences to differentiate and characterize LINE-1 insertions, which are associated with various cancers and genetic disorders.
  • Automated Karyotyping (OMKar): OMKar is a method that uses OGM data to create a virtual karyotype, providing descriptions of aneuploidies and other rearrangements with high precision and recall. It can identify disrupted genes and plausible genetic mechanisms for previously undetected cases.
  • Leukemia Analysis: OGM is a powerful tool for identifying complex chromosomal SVs in acute leukemia samples, often detecting more SVs than conventional tools. It can reveal previously unknown gene fusions and complex rearrangements.

Limitations and Complementary Technologies While powerful, OGM has limitations. It typically lacks base-level information, making it challenging to precisely locate SV breakpoints or provide exact sequence information. This limitation is effectively addressed by combining OGM with short-read Whole Genome Sequencing (WGS) or Cas9-assisted targeted nanopore sequencing. OGM can pinpoint the SV region, and then targeted sequencing can resolve the breakpoints at base-level resolution. Certain genomic regions, like telomeres and short arms of acrocentric chromosomes, can be difficult to characterize due to lack of labeling motifs or reference sequences. OGM may also have reduced sensitivity for mosaic chromosomal abnormalities or events in low-complexity regions. Compared to conventional cytogenetic methods (karyotyping, FISH, microarrays), OGM generally offers higher resolution for SVs and can detect a broader range of abnormalities, especially balanced rearrangements, that these methods may miss. However, it is not yet a primary high-throughput diagnostic for all routine genetic testing.

Podcast

Course Outline:
Optical Genome Mapping for Advanced Genetic Analysis

  • 5 minutes: Importance of Structural Variations (SVs)
    • Analysis of structural variations is crucial for understanding mutations underlying genetic disorders and pathogenic conditions.
    • SVs are associated with complex and multifactorial disorders.
    • They vary in size from 50 base pairs (bps) to several megabases (Mbs) and include deletions, insertions, translocations, inversions, and copy number variations.
    • SVs play a pivotal role in the pathogenesis of leukemia, with chromosomal aberrations detected in up to 65% of adult and 75% of pediatric patients.
  • 5 minutes: Limitations of Conventional Genetic Analysis Tools
    • Short-read, high-throughput sequencing: Difficult to characterize SVs. While accurate for smaller SVs, short reads are less suited for long and complex SV characterizations.
    • Long-read sequencing technologies: Increasingly employed for SV characterization. However, they suffer from low throughput, high error rates, and high costs, discouraging widespread adoption. For instance, whole-genome sequencing is expensive, and targeted sequencing for long regions like D4Z4 repeats remains infeasible.
    • Karyotyping: Current standard based on microscopic examination. It is a complex process requiring high expertise and offers resolution only at the Mb scale (typically 3–10 Mbp, maximum banding resolution ~5 Mb). It doesn’t easily detect copy number neutral rearrangements.
    • Fluorescence in situ hybridization (FISH): Requires a priori knowledge of probes and has limited throughput.
    • Chromosomal Microarray (CMA): Resolution of a few kbp. Does not easily detect copy number neutral rearrangements (balanced chromosomal aberrations like translocations and inversions). Limited in detecting low-percentage clones or subclones.
    • Summary: These limitations often necessitate multiple platforms and computational approaches for comprehensive SV analysis, which can be resource-intensive and expensive.
  • 5 minutes: What is Optical Genome Mapping (OGM)?
    • A cutting-edge technology for analyzing ultrahigh-molecular-weight DNA molecules.
    • Uses over 300 kbp single molecules, allowing it to capture long-range information.
    • DNA is fluorescently labeled through covalent modification at specific motifs, typically CTTAAG hexamer motifs.
    • Labeled DNA is loaded onto silicon chips with hundreds of thousands of parallel nanochannels, where individual DNA molecules are linearized, imaged, and digitized.
    • The fluorescent labeling pattern of individual DNA molecules is evaluated for unbiased, genome-wide structural variant assessment.
  • 5 minutes: Traditional Motif-Based OGM and its Limitations
    • Typically based on mapping specific 6-base to 8-base sequence motifs across the whole genome.
    • Limitations:
      • Motifs are unevenly spread along the genome.
      • Often absent in repetitive regions.
      • Lacks base-level information, making it impossible to precisely locate SV breakpoints or estimate copy numbers.
      • Cannot differentiate insertions, such as LINE-1s, from other insertions without sequence-level information.
  • 10 minutes: Multicolor OGM (DLE-Cas9) – The Innovation
    • Developed to overcome limitations of motif-based mapping.
    • Core Strategy: Combines a conventional sequence-motif labeling system (using Direct Label Enzyme (DLE-1) with green fluorophores for CTTAAG motifs) with Cas9-mediated target-specific labeling of any 20-base sequences (20mers) using red fluorophores.
    • Cas9 Role: Cas9 enzyme’s specificity is leveraged, along with traditional motif-mapping, to develop a nickase-based strategy for target-specific labeling of any 20 bases across the whole genome. The Cas9D10A nickase creates nicks, which are then labeled with Taq DNA polymerase and red fluorophores.
    • This strategy allows targeting and fluorescently labeling any 20mer or combination of multiple 20mers, especially in repetitive regions lacking DLE motifs.
  • 5 minutes: Advantages of DLE-Cas9 OGM
    • Enhanced Precision: Not only detects SVs but also utilizes custom labels to interrogate features not accessible to motif-labeling, precisely locate breakpoints, and accurately estimate copy numbers of genomic repeats.
    • DNA Integrity: Preserves DNA integrity better than previous nickase-based (Nt. BspQI) methods, resulting in longer DNA molecules.
    • Simplicity and Efficiency: The DLE-Cas9 labeling approach is simpler and has higher efficiency of second-label incorporation compared to Nt. BspQI-Cas9.
    • Multiplexing: Enables simultaneous detection of multiple targets within a single reaction. Custom synthesizing single guide RNA (sgRNA) significantly reduces assay costs.
    • Cost-Effectiveness: Optical mapping offers a cost advantage, allowing 200x coverage for about $500, compared to $10,000-$20,000 for whole-genome sequencing with long-read technologies.
  • 10 minutes: D4Z4 Copy Number Quantification in FSHD
    • Background: D4Z4 is a 3.3 kbp repeat sequence on chromosome 4q35, a known biomarker for facioscapulohumeral muscular dystrophy (FSHD). It is difficult to detect due to high sequence homology (99.9%) with 10q26 and a region on Chr Y. Conventional optical mapping is inaccurate due to the lack of motifs within the D4Z4 array.
    • DLE-Cas9 Approach: DLE enzyme labels repeat motifs (CTTAAG) with green fluorophores. Two guide RNAs (4qD4Z4 and 10qD4Z4) target the D4Z4 repeat array with red fluorophores, expected to generate 1.68 kbp and 3.3 kbp repetitive label patterns, respectively.
    • Results: The method can accurately estimate D4Z4 copy numbers by dividing the total length of D4Z4 from the first to last detected red labels by the 1.68 kbp repeating unit. It achieved an accuracy of less than a single copy (0.97 repeats for 4qA), which is critical for differentiating FSHD phenotypes.
    • Comparison to SOC: Southern blotting, the conventional diagnostic method, only offers semi-quantitative results and can produce indeterminate results in many cases.
  • 15 minutes: Telomere Length Estimation
    • Background: Telomere length is a recognized clinical biomarker for aging and aging-related diseases, and unregulated telomere length correlates with malignant cancers. Previous Cas9-mediated nickase-based labeling could only map 36 out of 46 telomeres due to fragile sites and laborious two-successive nicking reactions.
    • DLE-Cas9 Approach: DLE-1 globally tags DNA with green fluorophores. Cas9 nickase, directed by a 20-base synthetic guide RNA (TTAGGGTTTAGGGTT), specifically labels telomere repeats with a red fluorescent dye.
    • Results: The method allowed labeling and measurement of telomeric intensities in all chromosome arms except the five acrocentric chromosomes (due to lack of hg38 reference sequences). It could characterize five of the previously missing telomeres (16p, 17p, 19q, 22q, and 23p).
    • Comparison to SOC: Common assays like Terminal Restriction Fragment (TRF) and qPCR estimate average telomere length. Single Telomere Length Analysis (STELA) and Quantitative Fluorescence In Situ Hybridization (Q-FISH) have limitations, such as being restricted to specific chromosomes or cells in metaphase.
  • 15 minutes: LINE-1 Insertion Detection
    • Background: Long Interspersed Nuclear Elements 1 (LINE-1s) make up ~17% of the human genome and are associated with various cancers, hemophilia, and muscular dystrophy. Active LINE-1s are ~6 kbp in length and differ between individuals. Optical mapping with DLE alone cannot differentiate LINE-1s from other 6 kbp insertions.
    • DLE-Cas9 Approach: Four specific single guide RNAs are designed to target 20-base sequences on the LINE-1 reference and are labeled with red fluorescent nucleotides. This allows fluorescent tagging of specific sequences to differentiate LINE-1 insertions from others.
    • Results: The DLE-Cas9 methodology successfully detected all LINE-1 insertions across the NA12878 human genome. It identified 55 LINE-1 insertion sites, matching 51 of 52 previously reported sites and discovering four new, previously unidentified locations.
    • Utility: This approach can benefit clinical investigations by providing haplotype-resolved and structurally accurate LINE-1 consensus maps for genomic analysis.
  • 10 minutes: OGM + Nanopore Sequencing for Breakpoint Resolution
    • Rationale for Combination: Optical mapping economically detects SVs but lacks sequence-level resolution and cannot pinpoint exact breakpoints. Nanopore sequencing provides targeted, long-read sequencing to resolve specific SV loci and locate exact breakpoints. This multi-platform approach maximizes haplotype-resolved SV discovery and rare SV characterization.
    • Combined Workflow:
      1. OGM for SV Discovery: Whole-genome optical mapping identifies SVs (deletions, inversions, insertions, translocations, duplications) 2 kbp or longer.
      2. Cas9-assisted Targeted Nanopore Sequencing: Based on OGM-detected SVs, unique guide RNA pairs are designed to target specific loci.
      3. DNA Preparation: Genomic DNA fragments are blocked at 5′ and 3′ ends to suppress non-specific ligation.
      4. Cas9 Cleavage: Cas9-sgRNA complex cleaves at target sites, exposing fresh DNA ends.
      5. Ligation and Amplification: dA-tailing and universal Y-adapter ligation are performed, followed by PCR amplification.
      6. Nanopore Sequencing: Purified amplicons are sequenced on a Nanopore flongle.
    • Demonstrated Successes: The approach precisely detected base-level breakpoints for five deletions, five insertions, and an inversion. This included resolving heterozygous deletions, homozygous LINE-1 insertions (validating insertion sequence against a reference), and homozygous inversions.
    • Advantages: Universal and flexible methodology for targeting multiple loci efficiently and economically. Simplicity of analysis benefits accelerated SV discovery and typing screens for routine diagnostics. Cost-competitive (OGM <$500, sgRNA <$5, Flongle ~$70).
  • 10 minutes: OGM for Automated Karyotyping (OMKar)
    • Addressing Karyotyping Limitations: Traditional karyotyping is complex, low-resolution, and struggles with balanced rearrangements. OMKar provides an exciting alternative that bridges cytogenetics and exome sequencing in terms of resolution.
    • OMKar Functionality:
      1. Input: Processes structural variant (SV), copy number variation (CNV), and contig alignment data from OGM.
      2. Breakpoint Graph: Segments chromosomes based on CNV boundaries and breakpoints, constructing a breakpoint graph where vertices are segment boundaries and edges represent segment continuity, reference adjacencies, and rearrangements.
      3. Copy Number Balancing: Uses Integer Linear Programming (ILP) to recompute copy numbers and estimate edge multiplicities, ensuring consistency and maintaining CN balance.
      4. Karyotype Reconstruction: Identifies constrained Eulerian paths representing entire donor chromosomes. It also refines chromosomes based on known biology, such as ensuring paths contain a single centromere.
      5. Interpretation: Describes SVs using the International System for Human Cytogenomics Nomenclature (ISCN) and identifies disrupted genes.
    • OMKar Performance:
      • Reconstructed the correct karyotype in 144 out of 154 clinical samples, covering all aneuploidies (25/25) and balanced translocations (32/32).
      • Identified a plausible genetic mechanism for five cases of constitutional disorder not detected by other technologies.
      • Demonstrated high precision (88%) and recall (95%) for SV concordance and 95% Jaccard score for CN concordance in simulations.
      • Can reconstruct variations supported only by SVs or only by CNVs.
      • Efficient, with a median runtime of 8.4 seconds.
  • 10 minutes: OGM in Clinical Diagnostics (Leukemia & Follow-up)
    • Leukemia Diagnosis:
      • OGM can identify complex chromosomal SVs and gene fusions in acute leukemia samples that conventional tools often miss.
      • It detected significantly more insertions, deletions, and inversions in leukemia samples compared to normal controls.
      • OGM identified all previously known SVs in several leukemia cases and detected additional SVs, including complex sequential translocations and various gene fusions (e.g., ETV6::RUNX1, BCAT1::BAALC, BCR::ABL1, IGH::DUSP22, CBFB::MYH11, STIL::TAL1).
      • OGM + Whole Genome Sequencing (WGS): OGM provides SV localization, and short-read WGS effectively closes the gaps at breakpoints, providing precise sequence information. This combined approach is inexpensive, quick, and bypasses the need for time-consuming cell culture.
    • Follow-up in Genetic Diagnostics:
      • Current standard-of-care (SOC) methods often fail to provide breakpoint information for duplications and balanced SVs, which is critical for clinical assessment, especially in complex cases or when familial analysis is not feasible (e.g., prenatal setting, late-onset diseases, egg/sperm donation).
      • OGM is a valuable follow-up method for characterizing these ambiguous SVs.
      • Case Examples:
        • P1 (DMD/STS duplication): OGM revealed a complex rearrangement of X-chromosomal duplications that CMA could not fully characterize. This information, combined with subsequent sequencing, helped classify the variant as likely benign.
        • P2 (Chr 7 inversion): OGM clarified the exact breakpoints of a balanced paracentric inversion, ruling out gene disruption and classifying it as likely benign.
        • P3 (Chr 16 SCNN1B inversion): OGM confirmed a cryptic, homozygous inversion affecting an exon (missed by WES coverage) and enabled subsequent long-read sequencing to pinpoint the pathogenic breakpoint.
      • Utility: OGM can confirm or rule out suspected SVs (distinguishing true SVs from artifacts), assess clinical relevance, and estimate recurrence risk, leading to more actionable clinical reports.
  • 5 minutes: Current Limitations of OGM
    • Breakpoint Resolution: OGM alone does not provide sequence-level information, so exact breakpoint sequences are not resolved without complementary sequencing.
    • Specific Genomic Regions: Cannot detect variations within centromeres or the short arms of acrocentric chromosomes due to lack of OGM probes. Robertsonian translocations, involving these regions, are currently not detected by OGM.
    • Low-Complexity and Repetitive Regions: Motifs can be unevenly spread or absent in repetitive regions. Reduced sensitivity for segmental duplications and regions prone to non-allelic recombination.
    • Mosaicism: Reduced sensitivity for mosaic chromosomal abnormalities (variations in chromosomal numbers within different cell populations).
    • Automated Calling: Automated SV calling by OGM software can miss certain breakpoints or CNVs, especially for variants smaller than recommended thresholds (e.g., 500 kbp for CNVs).
    • Terminal SVs: Lower accuracy in capturing terminal SVs in peri-telomeric regions due to challenges in mapping.
  • 5 minutes: Future Directions
    • Integration with Long-Read Sequencing: Emerging long-read sequencing technologies (like Oxford Nanopore) may be used for purposes similar to OGM, potentially addressing its shortcomings, such as resolving D4Z4 repeat contractions or DUX4 fusions in regions with low OGM coverage.
    • Algorithmic Improvements: Future research will focus on algorithms for identifying telomeric abnormalities (e.g., ring chromosomes). OMKar’s core algorithm is agnostic to specific sequencing technology, allowing adaptation to other platforms as more datasets become available.
    • AI in Cytogenetics: Artificial intelligence (AI) is already aiding both conventional cytogenetics and new technologies like OGM to improve cancer cytogenetic analysis.
    • Broader Adoption: While currently not a high-throughput first-line method for all diagnostics, OGM is gaining recognition, especially in hematologic malignancies, and its incorporation into ISCN indicates increasing clinical relevance. Its use as a follow-up method for specific, challenging cases is a practical strategy for its entry into diagnostic laboratories.

Powerpoint Slides

Slide 1: Understanding Genomic Structural Variations (SVs)

  • Structural Variations (SVs) are regions of DNA larger than 1 kilobase (kb) that exhibit changes in copy number (deletions and duplications), orientation (inversions), or chromosomal location (insertions and translocations).
  • They play a crucial role in understanding mutations underlying genetic disorders and pathogenic conditions.
  • SVs can significantly affect gene expression and are associated with a wide range of genetic and cancer-related conditions.
  • Analysis of SVs is fundamental for establishing genetic risk factors, facilitating diagnoses, guiding treatment decisions, and informing genetic counseling.
  • Many complex SVs, especially those found in cancers like leukemia, cannot be effectively identified by conventional diagnostic tools, often leading to difficulties in analysis and unknown breaking points.

Slide 2: Limitations of Conventional SV Detection Methods

  • Short-read, high-throughput sequencing technologies struggle with characterizing large and complex SVs, despite their high accuracy for smaller variations.
  • Long-read sequencing technologies offer longer read lengths for investigating large SVs but are currently limited by low throughput, high error rates, and prohibitively high costs, discouraging widespread adoption.
  • Traditional karyotyping methods, based on microscopic examination, require considerable manual expertise, offer low resolution (typically 3-10 Mbp), and may miss complex rearrangements or mosaicism.
  • Chromosomal Microarrays (CMA) provide higher resolution (a few kb) for copy number changes but are unable to detect balanced chromosomal aberrations like translocations and inversions, or low-percentage clones.
  • FISH (Fluorescence In Situ Hybridization) is limited by its need for a priori knowledge of loci and low throughput, making it unsuitable for detecting novel or unpredicted variations.

Slide 3: Introduction to Optical Genome Mapping (OGM)

  • Optical Genome Mapping (OGM) is a cutting-edge technology for analyzing ultrahigh-molecular-weight DNA molecules, offering a high-resolution, long-range, and unbiased genome-wide assessment of structural anomalies.
  • OGM utilizes fluorescent labeling of DNA at specific sequence motifs (e.g., CTTAAG hexamers via DLE-1 enzyme) to create distinct patterns across the genome.
  • These labeled DNA molecules are then linearized in silicon nanochannels and imaged on systems like the Bionano Saphyr, allowing for the digitized evaluation of their fluorescent patterns.
  • OGM can economically and quickly detect structural variations as small as 500 base pairs and capture long-range information from single molecules averaging 300 kbp, which is 10 times higher than the average read length of long-read sequencing.
  • It serves as an “exciting alternative” to conventional diagnostic technologies, bridging the gap between low-resolution cytogenetics and high-resolution sequencing methods, and its demand is increasing in clinical settings.

Slide 4: Evolving OGM: From Basic to Multicolor Mapping

  • Traditional OGM, typically based on mapping specific 6-base to 8-base sequence motifs with enzymes like DLE-1, is efficient for detecting large insertions and SVs.
  • However, these conventional motifs are unevenly spread along the genome and often absent in repetitive regions, making it difficult to precisely locate breakpoints or estimate copy numbers.
  • To overcome this, a universal multicolor mapping strategy was developed, combining conventional sequence-motif labeling with Cas9-mediated target-specific labeling.
  • This enhanced approach allows custom labels for any 20-base sequence (20mer), detecting new features and interrogating genomic regions previously inaccessible to motif-labeling alone.
  • In this multicolor scheme, sequence motifs are typically labeled with green fluorophores, while the Cas9-mediated 20mers are labeled with red fluorophores, enabling dual-color visualization and analysis.

Slide 5: Multicolor Whole-Genome Mapping: DLE-Cas9 Methodology

  • The DLE-Cas9 method combines Direct Label Enzyme (DLE-1), which provides global haplotyping characteristics, with a programmable Cas9-mediated nick-labeling reaction.
  • This enzymatic strategy allows for the fluorescent labeling of any 20mer or combination of multiple 20mers across the entire genome, proving especially useful in repetitive regions that lack DLE motifs.
  • Custom maps can be generated to enable precise detection of breakpoints and to interrogate repetitive sequences, providing more in-depth analysis of SVs than previously possible.
  • A significant advantage is that DLE-Cas9 preserves DNA integrity better and results in longer DNA molecules compared to the older Nt. BspQI-Cas9 method, which could cause DNA breakage and had a tedious protocol.
  • The DLE-Cas9 approach is simpler and has a higher efficiency of second-label incorporation compared to the two successive nicking reactions of the Nt. BspQI-Cas9 method.
  • This universal strategy enables the simultaneous detection of multiple targets within a single tube reaction, making it highly versatile for various applications such as breakpoint detection and repetitive sequence characterization.

Slide 6: Validating DLE-Cas9: A Multifaceted Approach

  • The DLE-Cas9 methodology was validated by quantifying D4Z4 copy numbers, a known biomarker for facioscapulohumeral muscular dystrophy (FSHD).
  • It also proved highly effective in estimating telomere length, a crucial clinical biomarker for assessing disease risk factors in aging-related diseases and malignant cancers.
  • Furthermore, the methodology demonstrated its application in discovering transposable long non-interspersed Elements 1 (LINE-1) insertions across the whole genome.
  • These validation cases represent challenging genomic regions that are often inaccessible or difficult to characterize accurately with conventional motif-based optical mapping alone or short-read sequencing due to their repetitive nature or lack of labeling motifs.
  • The approach confirmed its ability to precisely locate breakpoints and estimate copy numbers of genomic repeats, directly addressing key limitations of prior optical mapping techniques.

Slide 7: Application: D4Z4 Copy Number Quantification for FSHD

  • The D4Z4 locus on chromosome 4q35 is composed of tandemly repeating 3.3 kbp units, and variations in its copy number are directly linked to facioscapulohumeral muscular dystrophy (FSHD).
  • Conventional diagnosis for FSHD often relies on Southern blotting, which provides semi-quantitative results and can yield indeterminate outcomes in up to 23% of cases.
  • The DLE-Cas9 method allows for direct quantification of D4Z4 repeats by fluorescently tagging specific sequences, overcoming inaccuracies associated with conventional optical mapping where D4Z4 repeats lack labeling motifs.
  • By using two guide RNAs (4qD4Z4 and 10qD4Z4), a 1.68 kbp repeating unit can be detected, increasing sensitivity and accuracy to resolve differences as small as half a repeat unit.
  • The method achieved an accuracy of less than a single copy (e.g., a standard deviation of 0.97 repeats for 4qA haplotype), which is critically important for FSHD cases where differentiating fewer than 8-10 repeats is necessary for phenotype assessment.

Slide 8: Application: Telomere Length Estimation

  • Telomeres are chromosome-capping (TTAGGG)n repeats with varying lengths, serving as a recognized clinical biomarker for aging, aging-related diseases, and malignant cancers.
  • Previous optical mapping methods faced limitations such as DNA breakage at fragile sites and labor-intensive protocols, preventing comprehensive telomere characterization across all chromosomes.
  • The DLE-Cas9 assay globally tags DNA with green fluorophores at DLE-specific motifs, while Cas9 nick-labeling uses a 20-base synthetic guide RNA to specifically label telomere repeats with a red fluorescent dye.
  • This improved approach enabled the labeling and measurement of telomeric intensities in almost all chromosome arms (all but 5 acrocentric p-arms), offering significant advancement over prior methods.
  • Unlike traditional assays like Terminal Restriction Fragment (TRF) and qPCR that only estimate average telomere length, OGM-based methods can provide insights into single telomere length analysis, addressing a critical gap in characterization.

Slide 9: Application: Detecting LINE-1 Insertions

  • Long Interspersed Nuclear Elements 1 (LINE-1) insertions constitute approximately 17% of the human genome and are implicated in various human diseases, including cancers, hemophilia, and muscular dystrophy.
  • While conventional optical mapping using DLE is efficient at detecting 6 kbp insertions, it cannot differentiate LINE-1 insertions from other 6 kbp insertions because it does not provide base-by-base information.
  • The DLE-Cas9 method utilizes multiple single guide RNAs (sgRNAs) specifically designed to target distinct 20-base sequences on the LINE-1 reference, which are then labeled with red fluorescent nucleotides.
  • This allows for the fluorescent tagging of specific LINE-1 sequences, enabling their differentiation from other insertions, and providing sequence-level insights into these mobile elements.
  • The methodology successfully identified 55 LINE-1 insertion sites in the NA12878 human genome, confirming most previously reported insertions and discovering four additional, previously unidentified locations.

Slide 10: Bridging OGM with Nanopore Sequencing for Breakpoint Resolution

  • Optical Genome Mapping (OGM) is highly effective for economical and rapid SV discovery across the whole genome, but it often lacks the sequence-level information needed to precisely resolve breakpoints or exact sequences.
  • The precise location of SV breakpoints is critically important for diagnostics, target association studies, and understanding the functional impacts of mutations.
  • Next-generation sequencing (NGS), including long-read sequencing, has advantages for large SVs but can be limited by low throughput, high error rates, and high costs, making comprehensive SV analysis resource-intensive.
  • A powerful hybrid strategy involves combining whole-genome optical mapping for initial SV discovery with Cas9-assisted targeted nanopore sequencing to resolve specific SV loci.
  • This integrated approach ensures that post-discovery, SVs can be extensively and accurately characterized for breakpoint detection, genotyping, and SNP detection in a more economical manner.

Slide 11: Cas9-Assisted Targeted Nanopore Sequencing Workflow

  • The workflow begins with high-molecular-weight genomic DNA preparation, followed by critical blocking steps to suppress non-target fragments.
  • First, 3′ ends at internal nick and break sites are blocked by incorporating dideoxynucleotides to prevent non-specific ligation.
  • Next, 5′ ends at internal nick and break sites are dephosphorylated to further discourage non-specific dA-tailing and adapter ligation, optimizing target enrichment.
  • Cas9-sgRNA complexes are then used to cleave at specific target sites, creating fresh DNA ends amenable to universal adapter ligation.
  • Following dA-tailing and ligation of universal Y-adapters, the target fragments are amplified via PCR and subsequently purified.
  • Finally, the purified amplicons are sequenced on a nanopore flongle, generating base-level resolution data for precise breakpoint detection at target sites with a median coverage of 17x.

Slide 12: Validation: Resolving Deletions with OGM-Nanopore

  • Optical mapping effectively detects deletions, such as a 13.2 kbp heterozygous deletion on chromosome 12 in the NA12878 sample.
  • However, OGM alone cannot pinpoint exact breakpoints or differentiate between heterozygous haplotypes at base-level resolution, often showing missing motifs in the deleted region.
  • For validation, specific gRNA pairs were designed to target both the undeleted haplotype (expected to generate a fragment of reference length) and the deletion-containing haplotype (expected to generate a shorter fragment).
  • Nanopore sequencing of the targeted fragments successfully confirmed the presence of both haplotypes, with one set of reads matching the undeleted reference and another showing discontinuity.
  • The aligned reads with a gap in between precisely identified the breakpoint at base-level resolution (e.g., at 45,509,371 bp), validating the approach.

Slide 13: Validation: Resolving Insertions with OGM-Nanopore

  • Optical mapping can detect insertions and may suggest their type (e.g., a 12.9 kbp homozygous insertion on chromosome 12 suspected as a LINE-1 due to extra-label patterns).
  • However, OGM lacks the base-by-base information to definitively identify the inserted sequence or its precise breakpoints, as mapping does not provide sequence-level detail.
  • To resolve this suspected LINE-1 insertion, two gRNAs were designed: one outside the insertion on hg38 and another within a known LINE-1 reference sequence.
  • Sequencing of the targeted region confirmed the presence of the insertion, with one segment aligning to hg38 and the other aligning perfectly with the putative LINE-1 reference sequence.
  • The approach accurately identified the breakpoint of the insertion (e.g., at 33,864,403 bp), demonstrating its capability to characterize inserted sequences and their exact locations.

Slide 14: Validation: Resolving Inversions with OGM-Nanopore

  • Optical mapping is capable of detecting inversions, such as a ~90 kbp homozygous inversion on chromosome 12, by identifying regions where labels map in an inverted (3′ to 5′) orientation.
  • However, precisely defining the breakpoints of inversions with optical mapping alone can be challenging, particularly in heavily populated regions with long and highly similar segmental duplications.
  • To resolve this, a pair of gRNAs was designed: one targeting the inversion-flanking region and another targeting a site inside the inversion.
  • This design aimed to generate a specific fragment where the sequenced portion would reflect both the inverted region and the flanking region.
  • The nanopore sequencing successfully identified both breakpoints of the inversion (e.g., at 17,768,358 bp and 17,861,570 bp), confirming the complex structural rearrangement at base-level resolution.

Slide 15: Strategic Advantages of the OGM-Nanopore Hybrid Approach

  • This combined methodology offers a more economical and efficient way to characterize multiple SVs compared to relying solely on extensive whole-genome sequencing for breakpoint resolution.
  • Optical mapping provides the broad discovery and localization of SVs (often >1kbp), acting as a cost-effective initial screen for long-range rearrangements.
  • Cas9-assisted targeted nanopore sequencing then precisely resolves breakpoints and provides sequence-level information only for the SVs of biological interest, reducing computational and economic burdens.
  • The sample preparation protocol includes blocking steps (5′ dephosphorylation and 3′ dideoxynucleotide incorporation) that efficiently suppress the sequencing of non-target fragments, optimizing data yield at specific SV loci.
  • The approach is flexible and universal, allowing for the targeting of multiple SVs in a single sample or a single SV across multiple samples (up to 200 sgRNAs in a single reaction), enabling multiplex analysis.
  • This strategy helps circumvent the technological limitations of any single genome analysis technology, maximizing the accuracy and comprehensiveness of SV characterization for routine diagnostics and association studies.

Slide 16: Introducing OMKar: Automated Karyotyping with OGM

  • OMKar (Optical Map based Automated Karyotyping) is a novel computational method that leverages Optical Genome Mapping (OGM) data to generate a virtual karyotype, bridging the gap between cytogenetics and sequencing.
  • It addresses the challenge that traditional karyotyping methods, based on microscopic examination, are complex, require high expertise, and offer only Mb-scale resolution.
  • OMKar takes structural variant (SV) and copy number (CN) variant calls from the Bionano Solve pipeline as inputs and processes them into a compact breakpoint graph.
  • The method then recomputes copy numbers using Integer Linear Programming to maintain CN balance and identifies constrained Eulerian paths that represent entire donor chromosomes.
  • OMKar aims to automate karyotype inference, reducing manual workload and enhancing the speed and scalability of genomic analysis for constitutional disorders.

Slide 17: The Need for Automated Molecular Karyotyping

  • The whole-genome karyotype describes the sequence of large chromosomal segments that constitute an individual’s genotype, crucial for understanding genetic risk factors and constitutional disorders.
  • Current standard-of-care methods for genetic diagnosis (like CMA or whole exome sequencing) often miss copy number neutral rearrangements such as balanced translocations.
  • Balanced rearrangements, while found in 0.2% of individuals, may not directly present with a phenotype but can lead to fertility issues or unbalanced copy numbers in offspring.
  • Traditional SV calling pipelines typically do not capture the complete larger karyotype, making it difficult to assign clinical significance to translocation events or determine the exact locations of amplified segments.
  • OMKar provides an unambiguous description of the karyotype using a custom file format (Molecular karyotype) and presents chromosomal clusters using ISCN language, which enhances interpretability compared to traditional SV calls.

Slide 18: OMKar Methodology: Core Steps

  • Pre-processing and Filtering: OMKar rigorously filters SV and CNV calls, removing low-confidence variants and merging adjacent breakpoints to create a minimal set of segments for each chromosome.
  • Breakpoint Graph Construction: A directed multigraph is built where vertices represent segment boundaries and edges denote segment continuity, reference adjacencies, or breakpoint rearrangements, including their orientations.
  • Smoothing Edge Multiplicities (Integer Linear Programming – ILP): An ILP formulation constrains the copy number of each genomic segment, ensuring consistency of copy number constraints and transforming the graph into an Eulerian structure for tour computation.
  • Computing Eulerian Tours: A Breadth-First Search (BFS) algorithm identifies connected chromosome clusters and computes Eulerian tours originating from telomeric vertices, ensuring each edge is traversed exactly once.
  • Chromosomal Segregation and Identification: The Eulerian paths are segregated to represent individual chromosomes, with heuristic refinements (e.g., ensuring a single centromere per path) and standardization of orientation.
  • Event Interpretation and Reporting: OMKar interprets structural variations using ISCN notation, aligns reconstructed chromosomes with wild-type counterparts, classifies blocks (concordant, insertion, deletion), identifies disrupted genes, and compiles a comprehensive HTML report.

Slide 19: OMKar Performance: High Accuracy in Simulations

  • In tests using 38 whole-genome simulations of constitutional disorders, OMKar demonstrated robust performance in reconstructing karyotypes.
  • It achieved a remarkable 88% precision and 95% recall on SV concordance, along with a 95% Jaccard score on CN concordance, indicating high-quality karyotype reconstruction.
  • OMKar showed a true-negative rate of 98.8% for non-event clusters, demonstrating its ability to reliably identify samples without structural variations.
  • The method successfully reconstructed 13 out of 14 simulated aneuploidies, demonstrating high accuracy in detecting gains or losses of entire chromosomes.
  • Performance was higher for low-complexity clusters (Jaccard 89.9%) compared to high-complexity clusters (Jaccard 80.8%), where closely spaced SV edges created challenges.
  • Accuracy varied by SV type; for example, balanced reciprocal translocations were reconstructed with 100% accuracy, while Tandem Duplications and Duplication Inversions had slightly lower rates.

Slide 20: OMKar Performance: Clinical Validation

  • OMKar was applied to 154 clinical samples (50 prenatal, 41 postnatal, 63 parental) from ten different sites, with prior diagnoses from combinations of traditional cytogenetic methods.
  • It successfully reconstructed the correct karyotype in 144 out of 154 samples, showcasing its robustness in real-world scenarios.
  • The method achieved 100% concordance for all 25 aneuploidies and 32 balanced reciprocal translocations, and high rates for deletions (97.4%) and amplifications (84.2%).
  • Importantly, OMKar improved upon individual conventional technologies (karyotyping, CMA, FISH) when considered in isolation, by detecting a higher percentage of SVs.
  • OMKar also identified additional SVs (e.g., 436 deletions, 506 amplifications, 67 inversions) not previously caught by other techniques, averaging 2.8 deletions, 3.3 amplifications, and 0.44 inversions as novel events per sample.
  • It provided plausible genetic mechanisms for five previously undiagnosed postnatal phenotypes that were not detected or fully explained by other technologies.

Slide 21: Integrated Approach: OGM + WGS in Leukemia

  • Optical Genome Mapping (OGM) and Whole Genome Sequencing (WGS) are employed together to identify complex chromosomal structural variations in acute leukemia, a field where SVs play a pivotal role in pathogenesis.
  • This combined approach has shown that leukemia samples have significantly more SVs (insertions, deletions, inversions, and translocations) compared to normal control samples, with translocations observed only in leukemia cases.
  • OGM can detect a broader spectrum of SVs as a single test than multiple conventional tools combined, and it identifies additional SVs beyond those already known from traditional methods.
  • While OGM efficiently detects SVs and provides their genomic locations, WGS is used to precisely define the exact breakpoint sequences, effectively “closing the gaps” left by OGM alignments.
  • This integration allows for the identification of novel gene fusion events (e.g., BCAT1::BAALC, IGH::DUSP22) and complex sequential translocations, providing invaluable insights for cancer research that conventional methods often miss.

Slide 22: OGM’s Utility as a Follow-up Method in Genetic Diagnostics

  • Current Standard-Of-Care (SOC) methods often fail to provide crucial breakpoint information for duplications and balanced structural variants, which is necessary for their complete clinical assessment.
  • Optical Genome Mapping (OGM) serves as a valuable follow-up method to resolve such ambiguous cases, especially when the carrier’s phenotype is difficult to assess or familial analyses are not feasible (e.g., prenatal setting, egg/sperm donations, unavailable relatives).
  • In a retrospective study of seven challenging cases, OGM was crucial for determining the clinical relevance of detected SVs, solving six cases by OGM alone and enabling further sequencing for the seventh.
  • OGM helps distinguish true SVs from possible detection artifacts suspected by SOC methods (e.g., in cases P5 and P6 where OGM showed suspected inversions were inconspicuous), preventing unnecessary patient distress.
  • It can precisely characterize the location and orientation of duplicated material or breakpoints of copy-number neutral aberrations, which is essential for interrogating gene disruption, a common reason for OGM follow-up.
  • This approach focuses on OGM’s strength to complement existing SOC methods with the goal of providing clinically actionable reports, which has been successfully attempted in individual cases.

Slide 23: Case Studies: OGM Resolving Ambiguities (Dremsek et al.)

  • Case P1 (X-chromosomal duplications): CMA detected two duplications potentially disrupting STS and DMD genes. OGM revealed a complex rearrangement and identified the insertion site outside gene-containing regions, leading to a “likely benign” classification and rapid family counseling for a healthy pregnancy.
  • Case P2 (Chromosome 7 inversion): Karyotyping suspected a paracentric inversion potentially associated with hematological malignancies. OGM mapped breakpoints to large homologous segmental duplications and, after maternal OGM, confirmed the inversion and ruled out gene disruption, classifying it as “likely benign”.
  • Case P3 (Cryptic SCNN1B inversion): WES showed no coverage of a critical SCNN1B exon, suspecting a cryptic SV. OGM definitively identified a homozygous paracentric inversion with a breakpoint within SCNN1B exon 13, confirming its pathogenic, disruptive effect and enabling targeted PCR and long-read sequencing.
  • These cases illustrate OGM’s capability to provide clinically actionable reports by clarifying the precise nature of SVs, even in complex or challenging genomic regions where traditional methods are insufficient, ultimately guiding crucial patient decisions.

Slide 24: Benefits and Limitations of OGM in Clinical Practice

  • Benefits:
    • Higher resolution for SVs than conventional tools like karyotyping or FISH, leading to improved diagnostic accuracy, especially for large SVs, balanced and unbalanced rearrangements, and inversions.
    • Enables identification of novel or rare SVs, advancing understanding of disease mechanisms and biomarker discovery.
    • Generates comprehensive SV information from a single test, streamlining diagnostic workflows, reducing turnaround times, and potentially lowering overall costs.
    • Bypasses the need for cell culture required by karyotyping and FISH, reducing turnaround times from weeks to days.
    • Can reconstruct karyotypes even with partially missing SV or CNV calls, inferring missing information to complete the genomic picture.
  • Limitations:
    • Reduced sensitivity for mosaic chromosomal abnormalities, events in regions of low complexity, and segmental duplications that can lead to non-allelic recombination (e.g., Robertsonian translocations).
    • Cannot detect variations within centromeres or the short arms of acrocentric chromosomes.
    • Automated SV callers may miss some breakpoints or CNVs, necessitating manual visual assessment by experienced evaluators, especially for variants smaller than recommended detection thresholds.
    • Resolution can be limited in highly repetitive regions (e.g., D4Z4 repeats, DUX4 pseudogenes), potentially requiring follow-up with long-read sequencing.

Slide 25: The Future of SV Analysis: Integration and Automation

  • The rapid progress in optical genome mapping technologies, combined with sophisticated computational tools like OMKar and targeted sequencing methods, is transforming the landscape of structural variation analysis.
  • Moving forward, these integrated and multi-platform approaches promise to offer more accessible, comprehensive, and accurate SV characterization for routine diagnostics and large-scale population studies.
  • Automated karyotyping through OMKar not only reduces manual workload but significantly enhances the speed and scalability of genomic analysis, providing rapid insights into complex rearrangements.
  • The ability to precisely define breakpoints using complementary technologies (like WGS after OGM, or Cas9-assisted Nanopore sequencing) addresses critical diagnostic needs that single platforms cannot fulfill.
  • Future research will continue to focus on improving resolution in challenging genomic regions (e.g., telomeres, centromeres, highly repetitive sequences) and adapting these methodologies to emerging long-read sequencing platforms.
  • Ultimately, the goal is to provide haplotype-resolved and structurally accurate genomic maps for clinical investigations, advancing precision medicine and patient care in fields from constitutional genetics to oncology.