Search billions of records on Ancestry.com
   

Haplogroups

A haplogroup is all of the men who carry a particular SNP mutation.    Haplogroups divide into subclades, that carry more mutations.   It is common for haplogroups and clades to be defined by more than one mutation, that everyone in that group carries.    Labs usually test for and charts often show just one or two SNP’s per clade..   Each haplogroup name begins with a capital letter, like A, J, or N.   This is followed by convention by alternating letters and numbers, like R2b2a1.  

Here is a chart of the major Y DNA haplogroups that shows how they are interrelated.   A is the ancestral haplogroup of Genetic Adam.  Everyone now living is descended from him.  (More on this later.)   People in haplogroup I are descended from earlier haplogroups, and carry their SNP mutations as well as those of haplogroup I

 

Following is a map of the movement of specific  haplogroups over time through human history.  This map illustrates how anthropologists use Y DNA.   Some of the lines on this map are controversial.  For instance, this diagram has ancestors of I1 and I2b1 moving due west from the Balkans where haplogroup I originated, to France, and then going north to Scandinavia.  Archeology clearly shows that descendants of the Gravettian culture of the Balkans, who included the first haplogroup I people, went north to Poland, then east to Germany and Denmark.   Logic says they moved north and northwest as the climate warmed, following herds of large game.   Those who think they went due west could be thinking that they also went to Britain before the English Channel was flooded, but noone now living in Britain belongs to clades of haplogroup I that originated before 6000 BC, and those clades that are that old, of I2a, carried agriculture from the Ukraine region along the Mediterranean coast to Spain, then by sea to the facing coasts of Britain and Ireland, bringing Megalithic culture along with agriculture.  Haplogroup I1 and I2b1 are found in France in any numbers, only where large numbers of Germanic peoples have settled.  

 

The Genographic Project, which is a joint project of National Geographic and IBM, is one of several major projects that attempt to put together human migration on a large scale.   It also has a map like this.   In fact, Family Tree DNA does the DNA testing for the Genographic project (and enrolls anyone who tests with Family Tree DNA for a small fee), and this map, which came from the Family Tree DNA web site, very much looks like a Genographic map.  

Knowledge of SNP based clades has advanced greatly.   Many subclades of the major haplogroups were originally constructed with STR haplotypes.   Eventually SNP mutations were found that actually define many of these clusters of STR’s.   SNP mutations also often change scientists’ knowledge of how the subclades are related to each other.   For instance, the structure of haplogroup I, and the structure of haplogroup R, changed completely.  SNP’s are always more accurate at determining subclades than STR haplotypes, sometimes a lot more accurate.   .

It is becoming more common for people interested in their deep (ancient) roots to order SNP’s instead of or in addition to STR testing.   Family Tree DNA now offers “deep clade testing” and single SNP’s at $30 apiece, to people who have already done STR testing at Family Tree DNA.  .  

Some genealogical testing services, like 23andMe, only test SNP’s, and test for most or all known SNP’s.   23andMe’s testing is very expensive and does not include STR markers.   23andMe uses technology that can only test for SNP’s; they cannot report mutations that involve repeating segments of code.   23andMe employs a new alternate technology that purports to be able to identify anyone who is related to you to 5th cousins, by looking for shared blocks of DNA of a certain size that were not yet scrambled by genetic recombination.   That method isn’t really very helpful if you can’t identify common ancestors by genealogical research, but sometimes people can use it to help crack genealogical brick walls. I’ve found it particularly useful for identifying who shares my known Old New England 10th great grandparents.  Some people on 23andMe appear to find the whole thing useful for looking for multiple women to marry.   23andMe offers, of all things, genetic social networking on their web site.   Most of its customers are not genealogically nor genetically sophisticated.   Confused people on that site often think they are related if they share the same four or five thousand year old Y DNA SNP’s.    Y DNA has the advantage that it only applies to your paternal line, which is the one line of their ancestry that most western people care the most about.      One knows what line to apply the results to.   Usually the people who share your Y DNA also share your surname. 

Estimating SNP’s from STR haplotypes

SNP testing is expensive, and most people who order Y DNA testing are interested mainly in finding their recent male line ancestry.    Originally even major haplogroups were usually estimated from STR markers.   People still routinely estimate both major haplogroups and subclades from STR haplotypes.   It is possible to do this because some STR markers are slow to change, thus stable even for large groups of people over long periods of time.  Online haplogroup predictors apply statistical probability to fit a particular set of STR markers to a haplogroup or its subclades.  Usually a haplogroup predictor gives one or several possible results, with for each, the probability of it being correct.   For Theophilus’s I2b1 M284 Isles-Scottish haplotype, the haplogroup predictors give a 100% probability of being correct, but it isn’t unusual to see a probability of only 50% that a predicted haplotype is correct.   That SNP testing is a hundred percent correct and the cost continues to come down are making SNP testing more common.  

Sometimes particular markers consistently have a particular value depending on whether or not one has a particular SNP mutation.   For instance, in my brother Larry’s I1 haplogroup, with 99% consistency, DYS 462 is 13 if one has the Norse SNP L22, and DYS 462 is 12 if one does not.   DYS 462 is not among Family Tree DNA’s routine markers.  However, it costs $7 to order a STR marker, and $30 to order an SNP.  

People who have the M284 SNP of I2b1 always have DYS 425 = 0, or, in other words, null.   Long ago someone with the SNP M284 lost that entire STR marker with all its repeats.   Of all those who have the M284 mutation, only those who also have DYS 425 null survive.   DYS 425 occasionally disappears, when it does so its loss is always passed on to one’s descendants, and not everyone who has DYS 425 null is I2b1 M284+.  In fact, there are entire subclades of other haplogroups that are also defined by null DYS 425.   However, within I2b1, the probability that someone who has null DYS 425 is M284+ is nearly 100%.  

One reason why Theophilus’s upgrade from 37 to 67 markers took so long is that when Family Tree DNA encounters null DYS 425 they always check three times to see if it is really not there, even if it shouldn’t be there.

Often the STR haplotype gives one an idea what SNP’s to expect to find, and if one does want to know one’s SNP’s, one can save a lot of money by only ordering one or two SNP’s.   For example, Theophilus McKinstry’s 37 marker haplotype predicted the M284 SNP and its daughter SNP’s L126 and L137 with near certainty, so I only ordered L126 and L137 to confirm them.    I could have spent more money to order a deep clade test of all SNP’s downstream from those that define I2b1, but I did not need to.  

I consulted with several experts on haplogroup I when making this decision, and the only disagreement was on whether I needed to test for L126 and L137.  The actual views were variously that I didn’t need to confirm any SNP’s, and that I could test for L126 and L137 if I wanted to.