Search billions of records on

Spearin Surname Project



Variants & Deviants

what's in a name?

Where & When ... Temporal & Geographic Distribution

Traditional genealogy

Genetic genealogy








Contact Us



Join our Project

Building a DNA Family Tree - using your genes to trace your ancestors back

Right! Let's take stock!

If we suppose that several people have joined your project (they have), each of them have received their results which have also been posted on the website (they have), your results are up there too (they are), and now, you're faced with a gaggle of haplotypes (you are). You can see perhaps that some of them match you exactly, some are off by a marker here or there, and some are not even close to you at all. So ... what do you do next?

Let's try and answer that question ... bit by bit ...

Lineages, clusters, genetic families

As more people join a Y-DNA group and add their haplotype results to the pot, it becomes apparent that they match on certain markers and differ on others (usually by a value of 1 or 2 if they are closely related). Thus it becomes clear that some haplotypes are quite closely related to each other, and others not so much. As yet more people are tested, typically the haplotypes can be arranged into subgroups. Individuals in a particular subgroup will be more closely related to each other than to other people in the larger group. In this way, the entire collection of haplotypes can be subdivided into 'lineages', 'clusters' or 'genetic families'.

Chris Pomery writes an excellent step-by-step guide on how to sort or cluster a varied group of many different haplotypes into distinct genetic families or lineages and I reproduce it here (see

The process of clustering DNA results takes place in several stages.

The first stage is to group all the participants by haplogroup. This is because men belonging to haplogroup R could only share a common ancestor with someone who is haplogroup Q some tens of thousands of years ago, well outside the range of genealogical research. The same is true of men defined as haplogroup R1a as compared to, say, members of haplogroup R1b. All members of a bona fide genetic family by definition will share the same haplogroup.

With the haplogroup-based clusters established, the second stage of the analysis process is to compare the DNA signatures of all the participants in each haplogroup using the most reliable markers. The most reliable markers are those that are the slowest to mutate.

The third stage is to use the remaining markers, generally described as fast-mutating, to further sub-divide the genetic families that became visible during the second stage.

The fourth stage is to use those markers which express their result as a pair of numbers or as a sequence. There are specific issues relating to these markers which make them best suited to be used to further define existing clusters rather than to create them.

The resulting genetic families are broad clusters of identical and near identical DNA results. The main conclusion one can draw from an individual's inclusion in a specific genetic family is that they are highly likely to share a common direct male-line ancestor with other members of that genetic family. Put another way, if they are able to research their family tree with perfect accuracy they should be able, eventually, to document the links that tie them to everyone else within the same genetic family and to end up looking at one big family tree.

The Modal Haplotype

The 'modal haplotype', put simply, is a haplotype generated from the most frequently occurring values (marker by marker) in the group as a whole. The thinking behind this is that those marker values that occur most frequently in the group have probably been there the longest and come closest to the marker values in the 'Ancestral Haplotype' of the group i.e. the haplotype of the 'founder' of the group, which in the case of the Irish Spearin's was quite possibly George Sperynge from London (1558-1611) or one of his relations.

So the modal haplotype is the sequence of marker values that occur most commonly for each of the markers in the participants tested to date. Although this should reflect the most ancient haplotype from which the various branches of the genetic family have evolved, this may not be the case if the sample is biased by having an "over-representation" of members from one particular family branch. However, it is a good starting point, the assumption being that this is the haplotype from which branched out all the other haplotypes in the group as a whole.

Cladograms, Phylograms, & 'Mutation History' Family Trees 

File:E1b1a ancestry.pngUltimately, it should be possible to map the earliest haplotype and estimate when and how the other subgroups branched away from it. This map could be called a 'Mutation History' Family Tree, or a Y-DNA Family Tree, because it traces the 'descendants' of a particular Y-chromosome, but instead of identifying individuals by name and date of birth, it will characterise them by mutations in specific Y-DNA STR markers.

It is also possible to use special computer software programmes to generate cladograms, phylograms, or phylogenetic network diagrams. This type of diagram has also been used to map other evolutionary events, such as the evolution of animals, or the more distant changes in Y-DNA (haplogroups) going back to the earliest common male ancestor (in Africa, 60,000 years ago).

For some examples of phylograms, see Debbie Kennett's website and there are several videos on how to read phylograms and create your own at

Chris Pomery gives a very useful overview of how phylograms can assist traditional genealogy in an very well-produced video

He also provides a useful overview of the process of combining genetic genealogy with traditional genealogy

The best guidance on how to generate phylograms is provided by David Ewing on the Ewing Surname Y-DNA Project website


The first step in generating a DNA Family Tree is to calculate the haplotype of the MDKA (Most Distant Known Ancestor) for each of the branches in the Family. This is done by a process of 'triangulation', a concept borrowed from trigonometry and geometry. Basically, this process compares the haplotypes of known cousins and if they are identical, then one concludes that the identical DNA has been passed on to both of them via their most recent common ancestor (MRCA).

This is taken as confirmation that the haplotype of a pair of cousins has not mutated since the time of their MRCA. Here's how it would work in the Spearin Surname Project:

  • The first 4 participants each came from different unrelated Spearin branches. Three were exact matches on 43/43 markers, and one matched the others on 42/43 markers. This tells us that all 4 participants are closely related and probably descended from the London Spering's. That in itself is a great result.
  • For the subject with the one mismatch (let's call him Subject 4), we don't know when this mutation occurred. Did it happen 300 years ago? Or did it just happen in the previous generation and he actually inherited it from his father? We just don't know. But this is where triangulation can help us find out.
  • If we take a Y-DNA sample from a second person from this subjects branch, in fact from his most distant known relative (call him Subject 5), if the result shows the same mutation, then we can be reasonably certain that both participants inherited the mutation from their MRCA (Most Recent Common Ancestor). If we are lucky, the MRCA and the MDKA (most distant known ancestor) will be the same person and we can therefore say that we are 99% sure that the haplotype of the MDKA is the same as that of the two subjects tested.
  • But what if Subject 5's results show that he doesn't have the mutation? Supposing he matches the other 3 subjects who match each other exactly. This suggests that the mutation in Subject 4 occurred sometime since their MRCA. But another possibility is that both Subject 4 and 5 DID inherit the mutation from their MRCA, but a reverse mutation occurred in Subject 5's line and they reverted back to the haplotype that was there before the first mutation! In this situation, a third sample needs to be taken, from another distant cousin (again, the further from both of them the better), and the results from this subject (Subject 6) will hopefully resolve the question - the haplotype that he most closely matches can be considered the haplotype of the MRCA of all three of them. 

And that, in essence, is triangulation. A video explanation of triangulation can be found in the section on Interpreting Results.

Once the haplotype of the MDKA has been estimated for each of the various branches of the family, it should be possible to work out through a logical process of deduction, which mutation came first, and which mutations followed it. In this way a 'Mutation History' Family Tree can be developed which can be superimposed on top of the paper-based Family Trees (generated by traditional methods).

Thus ancestors earlier than each MDKA can be characterised by their mutations, and calculated guestimates can be made regarding the nature of their relationship with other branches. Phylograms can also be generated (to double-check this work) but the software (although freely available on the net) is not user-friendly and defies interpretation by any but the most phylogenetically-orientated minds. Furthermore, they serve as a guide rather than a definitive account of how and when mutations occurred.

For an interesting discussion of the pros and cons of phylograms vs mutation history trees, see

The other drawback is that this modelling exercise is based on probability and uses a 'best fit' approach. The problem here is that there are going to be times when it just doesn't fit i.e. the 'best fit' is not always the correct fit.  

If anyone can, Genghis can 

Despite these potential drawbacks, this type of phylogenetic approach has produced some startling revelations. 

In 2006, an article appeared in the American Journal of Human Genetics reporting that the same Y-chromosome haplotype had been identified in about 8% of men in a large region of Asia (about 0.5% of the male world population). The pattern of variation within the lineage was consistent with the theory that it originated in Mongolia about 1,000 years ago (several generations prior to the birth of Genghis Khan).

The rapid spread of the haplotype could not be reliably explained by genetic drift, and it was proposed that it was the result of social selection i.e. the male-line descendants of Genghis Khan and his close male relatives had spread the Y-DNA haplotype throughout Asia due to a) the power that Khan and his direct descendants held, b) polygamy being a social norm, and c) widespread rape in conquered cities. Genghis Khan is believed to have belonged to Haplogroup C3 (see 

The 25 Marker Y-DNA Profile of Genghis Khan according to Family Tree DNA is:

Y-STR Name 385a 385b 388 389i 389ii 390 391 392 393 394 426 437 439 447 448 449 454 455 458 459a 459b 464a 464b 464c 464d
Haplotype 12 13 14 13 29 25 10 11 13 16 11 14 10 26 22 27 12 11 18 8 8 11 11 12 16

Irish Warlord spreads more than just the word

In January 2006, geneticists in Trinity College Dublin suggested that the 5th-century warlord 'Niall of the Nine Hostages' may have been the most fertile male in Irish history. In northwest Ireland as many as 21.5% of men (8.3% in Ireland in total) have the same Y-chromosome haplotype and share a common male line ancestor roughly 1500 years ago. Below is their 25-marker Y-DNA ancestral haplotype, and by extrapolation, the haplotype of Niall of the Nine Hostages.  Thus the genetic evidence confirms ancient fables about Niall and suggests that he may be the forefather of approximately 3 million men in the world today (see

















































Join us today ... you could find out more than you ever imagined!

Maurice Gleeson
Oct 2011

Copyright 2011 ( Rights Reserved.  Creative Commons License
The Spearin Surname Project at is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Information and data obtained from the Spearin Surname Project must be attributed to the project as outlined in the Creative Commons License. Please notify administrator when using data for public or private research. 

Last update: Oct 2011

Free Site Counter
Free Site Counter