[MA 2023 28] Data-driven development of a genetic disease prioritization platform for CRISPR based gene correction therapy interventions.

Division of Metabolic Diseases, Wilhelmina Children’s Hospital, University Medical Center Utrecht Uppsalalaan 8, 3584 CT, Utrecht
Proposed by: Paul Schürmann, MSc (PhD candidate) [p.j.l.schurmann-3@umcutrecht.nl]

Introduction


Genetic disorders impact millions of patients worldwide, and often result in poor quality of life. Current treatments primarily target symptoms, leaving the causative errors in the DNA unaddressed. The recent advancements of genome editing technologies, particularly CRISPR-based prime and base editing, offer a paradigm shift by directly correcting these pathogenic mutations at their source. Due to these advancements, the FDA expects an enormous number of cell and gene therapy submissions in the next 5 years.

If the Netherlands aims to take a leading position in offering gene correction therapies, it is crucial to identify all targetable genetic diseases and prioritize the ones with highest chance of approval by national and international health agencies. Because there are over 7,800 genetic diseases cataloged (Pavan et al., 2017), comprehensive molecular investigation for each disease is challenging. Our solution? Develop a data-driven screening tool to prioritize genetic diseases, genes, and pathogenic variants.

Developing such a tool is challenging. So far, we have structured our gene correction therapy platform around three core pillars: technical feasibility, medical necessity, and ethical considerations. While certain genetic interventions may be technically feasible within specific body regions, they might not align with medical demands or priorities. Furthermore, ethical dilemmas, such as undertaking CRISPR interventions in minors without their consent, add complexity to these decisions. To address these multifaceted challenges, our tool seeks to integrate these diverse considerations. Consequently, we require a student skilled in broad coding skills (Python, JavaScript, Machine learning), who can also incorporate stakeholder feedback and dynamically visualize the results.



Description of the SRP Project/Problem


Our research utilizes public databases such as OMIM (Hamosh et al., 2021) and Orphadata (Pavan et al., 2017; Orphadata, 2022) to address a diverse set of technical, medical, and ethical questions. Our goal is to prioritize genetic diseases suitable for gene correction therapy. By harnessing these databases and incorporating feedback from researchers, doctors, patients, and ethicists, we aim to differentiate key parameters from secondary ones (parameters are defined in research questions). From the resulting insights, we intend to generate a prioritized list of all known genetic diseases, associated genes, and pathogenic variants for preclinical gene correction therapy studies.

Required skills: Python for data analysis (mandatory), Javascript for live visualization (a plus), machine learning (a plus)



Research questions


1. Medical RQ’s

Main Question 1: Which genetic diseases are medically relevant to treat with gene editing under what conditions?


Sub-question 1.1: Classification of the genetic diseases and its occurrence in Europe and the Netherlands: Could we start clinical trials locally?

Disease classification (e.g., metabolic) (Orphadata, 2022)

Birth and point prevalence geographic data (Orphadata, 2022)

Medical expertise centers (Orphadata, 2022)


Sub-question 1.2: What is the certainty of severe damage occurring, and can this be prevented with already developed medication?

Age at which the disease starts and whether we can diagnose it pre-symptomatically:

Average age of onset (Orphadata, 2022)

Type of inheritance (Orphadata, 2022)

Is the disease included in (Dutch) newborn screening (Orphadata, 2022)

Will the disease cause severe damage?

Disease functional consequences that impact quality of life organized into frequency, temporality, degree of severity (Orphadata, 2022)

Is there a need for new treatment options?

Current treatment options & ongoing clinical trials (Orphadata, 2022)

Organ transplantation & enzyme replacement therapy? (Orphadata, 2022)


Sub-question 1.3: Should treatment occur before or after the onset of the disease?

Are disease phenotypes reversible? And what is the window of opportunity per disease?

Deep learning of human phenotypes? (Orphadata, 2022)

Link phenotype to genes/diseases (Orphadata, 2022) and link phenotypes to gene expression per tissue per age (GTEx Portal, 2023)

Associate phenotype with permanent damage (score with ‘likely reversible’ or ‘likely irreversible’).

Create input tool that allows input from doctors and researchers that can train the machine learning model.


2. Technical RQ’s

Main Question 2: Under what conditions are genetic diseases technically relevant to treat with gene editing?


Sub-question 2.1: What are the pathogenic founder mutations of diseases and how likely is it they will cause the disease?

Gene-disease relationship (Orphadata, 2022)

Causative, modifier, susceptibility gene (Orphadata, 2022)

Pathogenic variants of diseases

Allele frequency (Kleywegt et al., 2021)

Severity status (Pathogenic vs benign) (Kleywegt et al., 2021)


Sub-question 2.2: Which pathogenic variants can we target according to prediction algorithms?

Link published gene editing tools to published delivery tools.

Use chromosomal location to extract gene sequence (Orphadata, 2022)

Predict on & off-target efficiency (Mathis et al., 2023; Kim et al., 2021)


Sub-question 2.3: What is a good measure to determine damaged organs per disease?

Correlate age of gene expression per organ/cell type to known damaged organs (GTEx Portal, 2023)

Optional alternative: Involved organs (OMIM) and classifications (Orphadata, 2022)


Sub-question 2.4: What percentage of cells and cell types per organ can we target using the current delivery tools?

Literature study of delivery tools. (Paunovska et al., 2022)


Sub-question 2.5: What percentage of cells per organ per disease should be corrected to restore phenotype?

Approach is not clear yet. Possibly necessary to exclude.


3. Platform

Main Question 3: How can we align the input of stakeholders with technical and medical data to determine suitable genetic diseases for gene editing?


Allow visualization of the data via a Javascript/HTML tool of the dataset.

Allow scoring input from stakeholders by survey data.

Process and save the survey data.

Update the visualization based on the updated data.

Creating the survey won’t be part of your research questions.


Expected results

You will take part in a dynamic ambitious gene-editing group where you will investigate technical and medical questions. Your questions can be answered using publicly available databases; however, the bigger goal of this project requires consultation of stakeholders (doctors, researchers, ethicist, patients, etc.). A part of your goal will be to develop an interactive tool that can process external input (from stakeholders) and live-update its output. The output should be easily understandable for stakeholders; basically, a list with diseases that comply with the preassigned rules and explanations why this is the output.

Remaining obligations will depend on your master’ program, e.g., oral presentation, written master thesis.


Time period

November – June: 7 months (November ‘23 until June ‘24)


Contact

Paul Schürmann, MSc (PhD candidate)

Tel: +31611623047

Email: p.j.l.schurmann-3@umcutrecht.nl

Research group: Sabine Fuchs & Edward Nieuwenhuis


References

Butt, M. H., Zaman, M., Ahmad, A., Khan, R., Mallhi, T. H., Hasan, M. M., Khan, Y. H., Hafeez, S., Massoud, E. E. S., Rahman, M. H., & Cavalu, S. (2022). Appraisal for the Potential of Viral and Nonviral Vectors in Gene Therapy: A Review. In Genes (Vol. 13, Issue 8). MDPI. https://doi.org/10.3390/genes13081370

Zhao, Z., Shang, P., Mohanraju, P., & Geijsen, N. (2023). Prime editing: advances and therapeutic applications. In Trends in Biotechnology. Elsevier Ltd. https://doi.org/10.1016/j.tibtech.2023.03.004

Pavan, S., Rommel, K., Marquina, M. E. M., Ho¨hn, S., Lanneau, V., & Rath, A. (2017). Clinical practice guidelines for rare diseases: The orphanet database. PLoS ONE, 12(1). https://doi.org/10.1371/journal.pone.0170365

Hamosh, A., Amberger, J. S., Bocchini, C., Scott, A. F., & Rasmussen, S. A. (2021). Online Mendelian Inheritance in Man (OMIM®): Victor McKusick’s magnum opus. American Journal of Medical Genetics, Part A, 185(11), 3259–3265.

Pavan, S., Rommel, K., Marquina, M. E. M., Ho¨hn, S., Lanneau, V., & Rath, A. (2017). Clinical practice guidelines for rare diseases: The orphanet database. PLoS ONE, 12(1). https://doi.org/10.1371/journal.pone.0170365

Orphadata. (2022, December). Catalogue Orphadata. [Online] Available at: https://www.orphadata.com/docs/Catalogue_Orphadata.pdf [Accessed: September 26, 2023].

GTEx Portal. (2023). Retrieved September 26, 2023, from https://gtexportal.org/home/

Kleywegt, M., Elsayed, J., Jobst, A., & Smyth, S. (2021). The role of microorganisms in the degradation of pharmaceuticals in the aquatic environment. In Current Opinion in Environmental Science & Health (Vol. 12, Issue 1). ScienceDirect. https://doi.org/10.1016/j.coenvc.2021.100196

Mathis, N., Allam, A., Kissling, L.?et al.?Predicting prime editing efficiency and product purity by deep learning.?Nat Biotechnol?41, 1151–1159 (2023). https://doi.org/10.1038/s41587-022-01613-7

Kim, H.K., Yu, G., Park, J.?et al.?Predicting the efficiency of prime editing guide RNAs in human cells.?Nat Biotechnol?39, 198–206 (2021). https://doi.org/10.1038/s41587-020-0677-y

Paunovska, K., Loughrey, D. & Dahlman, J.E. Drug delivery systems for RNA therapeutics.?Nat Rev Genet?23, 265–280 (2022). https://doi.org/10.1038/s41576-021-00439-4