understanding genomics 3: SNPs and other esoterica
Canto: So SNPs are pretty essential to modern genomics I believe, so why, and what are they? I know that they’re ‘single nucleotide polymorphisms’ and that nucleotides are A, C, G, T and U, each of which have a slightly different structure. They’re all based on sugar structures – ribose in the case of RNA and deoxyribose in the case of DNA – attached to a phosphate group and a nitrogenous base. Here’s a diagram of thymine (T) filched from the USA’s National Human Genome Research Institute:
So that’s a nucleotide, one of the building blocks of DNA and RNA, but the real problem, for me anyway, is the connection between single and polymorphic, if there is one. I know that poly means many and that morphology is about shape and size and such….
Jacinta: You can only get so far with interrogating the words themselves. An SNP is a genetic variation in a single nucleotide between one person’s genome and another (I think). But there are many of these variations, which is where the ‘poly’ comes in. I’ll quote this from a NIH website, and then try to make sense of it:
SNPs occur normally throughout a person’s DNA. They occur almost once in every 1,000 nucleotides on average, which means there are roughly 4 to 5 million SNPs in a person’s genome. These variations occur in many individuals; to be classified as a SNP, a variant is found in at least 1 percent of the population. Scientists have found more than 600 million SNPs in populations around the world.
Canto: So they’re called ‘variants’ because they vary from the ‘normal’ pattern in 1% or more of those whose genomes are mapped? So there’s such a thing as a ‘normal’ human genome, but perhaps everyone differs from that normal pattern due to different SNPs? And why is 1% the cut-off? Isn’t that a bit arbitrary? Also, it says that these variations occur in many individuals, which sounds a bit vague. Does this mean that there are many individuals where they don’t occur at all? I mean, what is a normal human genome, if there are so many variants? Is it just some kind of aggregated value?
Jacinta: Uhh, maybe. And note – but I’m not sure if this is relevant to your question – that these SNPs mostly occur in non-coding DNA, where they won’t be affecting the phenotype and its general functioning, though it seems to depend on how close they are to coding regions. Anyway, we’re just scratching the surface here. Look at this diagram, from Wikipedia.
As you can see, there are synonymous and non-synonymous SNPs. Synonymous with what, you might ask?
Canto: As a language teacher I know what a synonym is, obviously. My guess is that a synonymous SNP is associated with, ‘synonymous’ with, some kind of malfunction or defect, or maybe different function or effect. A ‘missence’, as the diagram suggests.
Jacinta: No, it’s the non-synonymous SNPs that cause the problems, because coding DNA generally leads to effective function, that’s what it’s all about. If the SNP is synonymous then it works toward proper functioning, perhaps by a different pathway, or it just doesn’t affect the pathway.
Canto: What I’m learning about genetics/genomics is that the more I delve into the subject, the more there is to learn, and yet I don’t really want to specialise, I want to know a bit of everything. I’ve just learned, for example, that it’s not just a divide between coding and non-coding DNA, because a mutation near a coding region can have effects, deleterious or otherwise, I think.
Jacinta: I don’t know about that, but I’m learning some interesting random facts, for example that there appears to be more C-G base pairings in coding DNA than T-A. Just to get it in our heads, cytosine (a pyrimidine) always pairs with guanine (a purine), and the other pyrimidine, thymine, always pairs with adenine. Always purines with pyrimidines, and purines are the larger molecules, with a two-ring structure, rather than one for pyrimidines. Note the structure of thymine, above. Anyway, back to SNPs, which we’re interested in mainly for what they might tell us about earlier populations. I’ve just glanced through a 2020 research article – generally way to technical for lay persons or dilettantes like us, titled ‘Genome-wide SNP typing of ancient DNA: Determination of hair and eye color of Bronze Age humans from their skeletal remains’. I did get some useful info from it though. The researchers compared the SNP method with ‘single base extension (SBE) typing’, and what they found was interesting enough:
The DNA samples were extracted from the skeletal remains of 59 human individuals dating back to the Late Bronze Age. The 3,000 years old bones had been discovered in the Lichtenstein Cave in Lower Saxony, Germany.
It seems that this was a kind of proof-of-concept piece of research, and they were able to obtain good to excellent results from two thirds of the skeletal samples:
With the applied technique, it was for the first time possible to get information about major phenotypic traits—eye and hair color—of an entire prehistoric population. The range of traits, varying from blonde to brown hair and blue to green-hazel eye colors for the majority of individuals is a plausible result for a Central European population.
Canto: Yes, that’s the exciting stuff – true it’s only going back 3000 years, and you could say that there were no surprises in the findings – but it brings the past back to life in such a vivid way… what can I say?
Jacinta: So you don’t want to know about haplotypes, and homozygous and heterozygous alleles? What’s wrong with you?
Canto: Okay, a haplotype – haven’t we gone through this? – a haplotype is a set of variants, or polymorphisms, along a single chromosome, involving one or more genes, that tend to stick together, inheritance-wise. We know that homozygous inheritance means inheritance from both parents whereas heterozygous means that you have a different genetic marker from each parent. A genetic marker is any ‘DNA sequence with a known location on a chromosome’. They may offer clues to inherited traits, such as diseases. All of this comes from the USA’s National Human Genome Research Institute, and I think I mostly understand it.
Jacinta: So SNPs can have all sorts of uses, regarding the present and the past, and tracing the present into the past, as with disease gene mapping. Their abundance within the genome has made them the go-to marker in bioinformatics. My guess, though, is we’ll never get to fully understand them without actually working with them. I mean, we can go through ScienceDirect, and jump from underlined term to underlined term (e.g. linkage disequilibrium, QTL mapping, PCR assays, point mutations and the like), but we’ll start to forget it all from the moment we have aha moments, because for us dilettantes, locked out of labs due to dumbness, shyness, laziness, poverty-ness etc, it’s all just book-larnin, sans even books. I suppose we just have to be grateful that we’ve, or they’ve, developed the technology to collect and analyse SNPs, to create libraries of them…
Canto: It seems like, as with so many fields, we’re at what Deutsch called ‘the beginning of infinity’ – but then didn’t they think that at the advent of string theory?
Jacinta: But we know this isn’t theory, this is about results. Tools producing results. Tools within the body, or rather natural phenomena made into tools by human ingenuity, like circles made into wheels, cubes into containers, triangles into struts. And we’re likely to get more and more out of DNA in the future. I recently learned about the petrous bone, though of course researchers have known about it for some years – it’s about the hardest part of the skull, down somewhere near the foramen magnum I think, and its density has, it seems, been a preservative for DNA – generally better than teeth. So that means more analysis of fossil collections. As David Reich puts it, technologies for analysing ancient DNA have created an explosion of information to rival the invention of the microscope/telescope a few hundred years ago.
Canto: Yes, some of the developments he mentions are next-generation sequencing (which has vastly reduced sequencing costs), more efficient DNA extraction methods, improvements in separating human from microbial DNA, and again the use of the petrous bone for extraction – a bone which tends to remain intact longer than others.
Jacinta: Okay, so we might continue to blunder on in trying to make sense of this genomics stuff, or maybe not. Enough for now.
References
https://www.genome.gov/genetics-glossary/Nucleotide
https://medlineplus.gov/genetics/understanding/genomicresearch/snp/
https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/point-mutation
https://en.wikipedia.org/wiki/Coding_region
https://onlinelibrary.wiley.com/doi/full/10.1002/ajpa.23996
Leave a Reply