Search billions of records on



graphic courtesy of DNA Heritage 

About DNA Testing

DNA and its Uses in Genealogy 


DNA Concerns

The text of this section is included in the  PDF document called The Warburton Surname DNA Project.

DNA and its Uses in Genealogy

The following is my attempt to provide a simple explanation of DNA and how it can be used in genealogy. It is my understanding of the subject and is meant as an introduction. The books and sites in the reference section that follows provide a more professional and authoritative explanation.

Part 1: What is DNA?

Human DNA is the blueprint for the human body. A copy is carried in every cell of our bodies. It is a set of instructions, carried in a string of molecules. Think of it as a string of letters. There are 4 possible letters - A, C, G and T (these are the initials of the chemical molecules that are represented by the letters).

This string of letters is divided up into chromosomes (so called because scientists use coloured dyes to identify them). Chromosomes in turn contain genes. A gene could be defined as the shortest string of letters that actually does something useful in our development. However between the genes are strings of useless or junk DNA that do nothing (maybe they did once in earlier stages of evolution). These useless strings of letters are important for genealogy but I will come back to that later.

The human genome (i.e. our DNA) consists of 46 chromosomes, or rather 23 pairs. We inherit one set of 23 from our father and one from our mother. When we come to pass on 23 chromosomes to our children we pass a mixture, some from our father and some from our mother. The mixture is different every time.

One of the pairs of chromosomes is the X and Y-chromosomes. A female has 2 X-chromosomes, one from each parent. However a male has an X-chromosome from his mother and a Y-chromosome from his father. It is a gene on the Y-chromosome that causes a baby to be a boy. In the absence of this gene the default is always to produce a girl.

The significance of this is that (unlike all the other chromosomes) the Y-chromosome is never mixed with a copy from the mother. It passes unchanged from father to son through the generations.

Now the body is very good at faithfully copying DNA from generation to generation, but it is not perfect (otherwise evolution wouldn't work). Very occasionally a copying mistake occurs. For example an A may become a T. If it happens in a gene it may cause disease, or rarely it improves the gene. But if it happens in junk DNA it has no effect and so the mistake continues to be copied from generation to generation. It is these differences that make DNA useful in historical and genealogical studies.

There is one other piece of DNA that is passed unchanged from generation to generation. It is in addition to the 46 chromosomes and acts as the energy source for a cell. It is called mitochondria and is only passed down the female line. Males do have it, inherited from their mother, but don't pass it on.

Part 2: Genetic Clans

This section is not specific to genealogy. That will come later. Firstly I want to introduce SNIPs.

As I have said, very occasionally a DNA letter is copied wrongly e.g. an A becomes a T. This is known as a Single Nucleotide Polymorphism or SNP or SNIP. So a SNIP is when a letter is copied wrongly.

SNIPs occur very rarely. A specific letter may have only changed once in the whole of modern mankind's existence (150-180,000 years). By concentrating on certain specific sequences of junk, DNA researchers have classified the human population into a small number of classes. Of course they have a special name for them i.e. haplogroups, though it is easier to call them clans.

They have done this with both mitochondria, and the Y-chromosome. For mitochondria there are 36 clans, 13 in Africa. There are even fewer Y-chromosome clans.

95% of Europeans fall into 7 mitochondrial clans. Professor Brian Sykes has written a book called 'The Seven Daughters of Eve' which gives these 7 clans names. Your clan can be determined by looking at a specific 400 letter sequence of mitochondrial DNA. Just 2 copying mistakes, at positions 69 and 126, define me as being in clan J or Jasmine. I have only one other mistake, at 366, which means I am in the main group of Js, not in one of a number of sub-clans that have been defined.

Researchers do look at other bits of the mitochondria for more definition of the clans, but the 400 letters are all that is needed to position you.

How is this information used? Well firstly it is used to build family trees (again they have a fancy name - phylogenetic trees). These trees show the relationships between the various clans, and within them. It is done by working out the sequence in which SNIPS must occur. For example my SNIP at position 126 is shared with another clan (Tara) but that at 69 is unique to J so must have occurred later.

Perhaps the most startling assertion, though it is logical when you think about it, is that each mitochondrial clan must be descended from a single woman. The copying error occurred just once, so everyone carrying the error must be descended from the first woman to carry the error. Not only that, but by linking the clans in a tree, one clan becomes the source of all the others. Not unnaturally the origin of this clan is called Eve. This doesn't mean that Eve was not one of a population of similar early humans. It's just that no descendants of her contemporaries exist today.

Interestingly, if the same thing is done with the Y-chromosome, 'Y-chromosome Adam' seems to have lived a lot later than Eve (60-80,000 years ago, rather than 150,000 years ago). Of course there must have been a male (or males) around 150,000 years ago to perpetuate the species. and one of them (pre-Adam) would have passed his Y-chromosome to the Adam of 80,000 years ago. The thing is the Y-chromosome of all pre-Adam's contemporary males has got lost somewhere along the way, so only Adam's Y-chromosome is the precursor of all Y-chromosomes in today's population.

It seems to be a phenomenon of the Y-chromosome that a few powerful men spread their seed very widely at the expense of other men (sometimes called the Genghis Khan effect), while the female mitochondria are spread much more evenly. This probably explains the difference in dates.

It should be remembered that these dates are very approximate. It should also be remembered that mitochondria and the Y-chromosome form a small part of the DNA we inherit from a wide range of ancestors (mother's father, father's mother etc.). It's just that they are unique in being able to be traced, and I suppose the totally male and totally female lines of descent are in themselves unique.

I've not really looked into it but I believe there is another type of DNA study that tries to address these other lines of descent by determining if you have versions of specific genes which are typical of a particular ethnic background. It is obvious how genes that affect physical appearance (skin colour etc.) could be used in this way, and there are apparently other non-visible ones that can be used in the same way.

Returning to mitochondria and the Y-chromosome, researchers can determine the migrations of man around the world by looking at the distribution of the clans in the world, and the amount of change that has happened in various locations. Of course this has to be linked to the archaeological evidence to be meaningful. Also by building separate models using mitochondria and the Y-chromosome two separate, corroborating pictures are obtained.

Studies of this can get extremely complicated, and employ lots of probability calculations etc. The results though can be fascinating. For example my mitochondria, and my Y-chromosome have completely different histories. Mitochondrial Clan J originated in the Near East and only entered Europe after the last Ice Age with the first wave of farmers. It followed 2 tracks, one of which followed the coast around the Mediterranean, and eventually up to Britain. My Y-chromosome came with the first migrants to come to Europe 40,000 years ago (the Aurignacian culture). They spent the last Ice Age in a refuge near the Pyrenees, where the last mutations took place, and then moved back into Europe as the ice retreated. It is by far the most common clan in Western Europe, particularly along the western seaboard. My results are described in Commentary on Results in The DNA Project.

So to summarise, SNIPs, or letter copying errors are sufficiently infrequent to allow the world population to by classified according to which SNIPS a person has. This is done both for mitochondria, which is passed down the female line, though present in men, and the Y-chromosome, which is passed down the male line and is not present in women. By working out sequences of change the clans can be linked into a family tree, and by looking at the distribution of clans, the amount of subsequent changes in the various localities, and at corroborating archaeological evidence the history of man's migration out of Africa and around the world can be determined.  

Part 3: A DNA Test for Genealogists.

So far we have discussed SNIPs. Unfortunately these occur too infrequently to help with genealogy. Every related male Warburton will probably have an identical profile. Fortunately there is another test which is more helpful, but it can only be carried out on the Y-chromosome.

It so happens that there are some short DNA sequences that are repeated several times. Whereas with SNIPs we were dealing with a change to a single letter in the DNA sequence, you can think of these sequences as words that are repeated several times. Every now and then the number of copies of the word changes. For example one may be added so whereas there were 10 repeats before, there are now 11.

These strings of words are called Short Tandem Repeats (STRs), so a test for them is an STR test. There are a number of locations where they occur on the Y-chromosome. The test I am using for the Warburton project tests 43 of them. They are known as markers, hence the term 43 marker test. Some tests use as little as 10 markers, though they have limited use.

The rate of change for each marker is estimated to be once in every 357 transmissions from father to son. With 43 markers this means one of them will change every 8.3 transmissions on average. The markers actually have different mutation rates and as more data becomes available the rates of change for the different markers may change, but for now 1 in 357 is the best estimate.

The number of repeats can change up or down, and occasionally by more than 1. It could be that due to random changes cancelling each other out, two people who are unrelated finish up with the same profile. Therefore matches are only considered meaningful when there is additional information to link two people. A shared surname is such a piece of additional information. This is why most STR studies are surname studies, though there are some locational ones.

Surnames were introduced around 12-1300 AD, when feudal estates needed them for record keeping. This is roughly 25-30 generations (transmissions) ago. So if a change in the number of repeats occurs every 8.3 generations we would have 3-4 of them by now, and the chance of the same marker changing twice is low. Of course if you are comparing two people alive today the number of transmissions is roughly double because you need to consider the path from one of them back to their common ancestor, and then back down to the other to determine how many transmissions apart they are.

So far so good, but now it gets a bit statistical. A lot of maths goes into calculating the probability of two people having a common ancestor within a certain number of generations back, given the number of differences (i.e. marker changes) between them. Fortunately tables have been published to save having to calculate them. So for example given that my genetic cousin Clive and I have one difference in 43 markers, the tables tell me the probability of our common ancestor being within 25 generations is 98%. For 10 generations (and I am 10 generations from George, my earliest known ancestor) the probability is 68%. See My Genetic Links  for more about my search for this link.

Next I will talk about the test itself. It is actually very simple. The hardest part is parting with the money (currently $189 US for a 43 marker test). The process is that a participant is sent a couple of cotton buds in the post. He wipes these around inside his mouth for 30 seconds, puts them in the container provided, and posts them back. Results take about 3 weeks.

Although a male Warburton is needed to take the test there are many instances where the genealogist is a female relative.  Provided the male is willing to provide his DNA there is no reason why the female genealogist cannot handle all other aspect of participation on his behalf.

Some people might be wary of the idea of giving their DNA for fear of unforeseen consequences. Most concerns are groundless. Firstly test samples are normally kept by DNA Ancestry for 3 years in case further tests (e.g. a SNIP test) are requested, but will be destroyed earlier if requested, so that further, unauthorised testing is not possible. The test itself is far too limited to uniquely identify an individual. After all we are looking to match people, not uniquely differentiate them. Also no medical information can be determined. Remember these tests target the useless junk DNA where changes have no bearing on the persons life, health, or ability to pass on the changed DNA.

However, there is one issue that participants should be aware of. Whilst a test is not detailed enough to prove paternity, it can prove two people are unrelated, which could be a problem if they thought they were. To mitigate this possibility two close relatives should not both participate. The results would not be particularly useful to the project anyway.

Part 4:  Using the Results

The last thing to discuss is how the results can be used. The best way to look at it is to view DNA testing as an additional tool in traditional genealogy. We are trying to add information to help us understand our past better.

The result for an individual participant will consist of a number, typically between 10 and 30, for each of the 43 markers tested. The first step is to group people into clans where all, or nearly all the numbers are the same for each clan member. The clan will probably have a common ancestor who bore the Warburton name. To be included in a clan a person should have no more than 5 mismatches from another clan member. More than that would raise uncertainties, though these might be resolved by more results that provided further links.

With a lot of results the pattern of the clans would be instructive for Warburton history. For example if 50% or more of us fell into a single clan, with everyone else falling into relatively small clans, this would suggest that a single person adopted the name originally, and most of us are descended from him.

The small clans would result from 'non-paternal events'. These are occasions when a a male receives the Warburton name from someone other than his biological father. Adoption and illegitimacy are obvious examples. The rate of such events is apparently about 2% per generation.

Of course the original adoption of the name was a 'non-paternal event'. So except where we have a documented history like in the case of the Warburtons of Arley, we can never be sure of the exact details. An early illegitimacy could start a clan as big as one resulting from an original adoption of the name by a feudal serf.

Just as likely we may find several large clans, suggesting a number of men adopted the name around the same time.

Once we have identified one or more Warburton clans we can explore the clans themselves. Individual clan members are all distant cousins sharing a common ancestor so they will want to find where the link might be. The degree of relatedness will indicate the possibility of finding the link. If the probability of a link since around 1600 AD is high then it is worth looking for the link in parish and other records. Even if the link is earlier it may be possible to show a relationship between two previously unrelated branches.

As an example I mentioned there is a 68% or better possibility that the common ancestor of my genetic cousin Clive and I is within 10 generations, or post 1600.  From my knowledge of my family tree I know that if he is post 1600 then he must be John Warburton who lived 1608-91. He is the only one who had multiple sons (five in fact) who may themselves have had sons. This has focused my search for a link (see My Genetic Links ). Even if there is none I could still find a link to one of the 4 or 5  other Warburton families we know were living in Bowdon parish at the same time as my oldest known ancestor.

Indeed  by looking at the earliest known ancestors of individual clan members we may find a clan seems to originate in a particular location. For example Cheshire Warburtons may be distinct from Lancashire ones. Such information may be particularly appealing to overseas Warburtons who want to know more precisely where they originate from.

As well as directing our more traditional genealogical research, study of the clans can employ some new techniques. For example the number of DNA changes between the various clan members will indicate how recently the original clan father lived. Again this employs maths and probabilities, but I think we will be able to develop a gut feel just from looking at the number of changes.

This will be helped further by building family trees of the changes. This is a simplified tree in which the location of changes is deduced as far as possible.

As an example, the one difference between Clive and I is that he has 17 repeats at marker DYS458 and I have 16. We don't know where in the chain from me to our common ancestor X and back to him the change occurred, or whether 16 or 17 was the original value.  

Now suppose we had another result from someone we knew was linked to me by common ancestor Y. If that new participant had 16 repeats at DYS458, like me it would indicate that the change was in the link between Clive and common ancestor Y.

If, however, the new participant has 17 repeats then the change to 16 occurred in the chain from common ancestor Y to me. In this case both common ancestor X and common ancestor Y would have had 17 repeats. Any other clan member with only 16 repeats must be fairly closely related to me. Furthermore, if the new participant has a different change from me,  any future clan member showing the same change would be closely related to him. See the example genetic family tree below.

           Common Ancestor X

                 |       =17      |

                 |                   |     

                 |                   |     

                 |        Common Ancestor Y     

                 |               |      =17       |     

                 |               |                   |        

                 |              # 17->16     @   new change     

                 |               |                   |         

           Clive             Me         New Participant

            =17             =16       =17 + new change

 =17 or =16   the value of DYS458

#   position of change from 17 to 16 (roughly: it could be anywhere between common ancestor Y and Me)

@ rough position of new change

As more and more participants are shown to be clan members the genetic family tree would become richer. Combine this with knowledge of the genealogical family tree and you can see how, over time a DNA test will provide more and more information about where a new participant fits in. It is my hope that I might reach this position over the next few years. But I can only achieve that if I get participants for the project.

An STR result will also include a haplotype prediction. This is a prediction of the likely result of a SNIP test. It can be used to understand more about the origins and ancient history of the male line, using one of the books written on the subject.



For a quick introduction to DNA and Genealogy I recommend you look at the DNA Heritage site. (the testing site for this project).  Firstly there is a Tutorial. You could then look at the Masterclass. You should also review the FAQs.   To see the website of a more mature project I recommend the Davenport website. Davenports have similarities with Warburtons, with similar numbers, a Cheshire origin, and a story of Norman ancestry. I have also placed this site in a webring of similar sites. You can access it from the navigator at the top of this page. The International Society of Genetic Genealogy (ISOGG) also have an interesting site.

My own interest in this area grew out of an interest in Ancient History. I began with Ancient Egypt and then began reading about earlier subjects, including evolution and the history of climate. One book I came across was ‘Out of Eden’ by Stephen Oppenhiemer. The subject of the book was what I now know to be a new ‘science’ called phylogeography.  This combines phylogenetics with traditional archaeology to study the ancient migrations of peoples. The startling conclusion of the book was that all non-Africans in the world are descended from a small group of humans that left Africa 80,000 years ago.

I then came across two books by Professor Bryan Sykes, ‘The Seven Daughters of Eve’, and ‘Adam’s Curse’.  These books have very readable discussions of the science, and lots of interesting anecdotes. For example he shows how Thor Hyerdahl’s Kon Tiki expedition to show how Polynesia could have been populated from South America was a waste of time because genetics proves that the Polynesians came from China, probably via New Guinea. However, I found the pseudo-life descriptions of the seven European clan mothers a bit contrived.

If you are interested in the genetics and evolution, I also enjoyed ‘The Selfish Gene’ by Richard Dawkins.

Professor Bryan Sykes has set up a testing company called Oxford Ancestors to allow people to get their DNA tested, so I got a test. My mitochondria test was interesting. It is discussed on the The DNA Project page. However, my Y-chromosome test turned out to be an STR test on a limited number of bases. However my deep ancestry was deduced from this. I seem to match the most common Western European type, known as the Atlantic Modal Haplogroup (AMH). This is discussed in the DNA Heritage Masterclass.

I then read a book called ‘DNA and Family History’ by Chris Pomery and realised my test result is not terribly useful for genealogy. This is a very useful book and has an associated website (though it doesn’t seem to have been updated much recently).   I had a proper SNIP test done (see The DNA Project).

Recently both Sykes and Oppenheimer had produced books on the origins of the British peoples. Sykes' 'Blood of the Isles' is perhaps the more readable, but Oppenheimer's 'The Origins of the British' is the more detailed work and I have used it as the basis for my comments on the haplotype predictions for Warburton Surname DNA Project participants.


DNA Concerns

After reviewing all the available material you may still have concerns. Typical concerns  include cost, and fear or ignorance of what might be revealed.

Cost is unavoidable, though at under £100 sterling it is not unmanageable. Cost can be shared. Close family members (brothers, cousins, even second cousins) will yield the same or similar result, so only one test is required per extended family. Interested members of a family, including females, who are often the most committed genealogists, can share the cost.

Fear and ignorance covers fear of the test itself, and fear of what might be uncovered. The test itself should be no barrier. It is simple, painless and self administered. You are simply required to take a swab from the inside of your cheek.

Two concerns about the  results are that the test is medically informative, or it can identify someone as an individual.  Neither is an issue. The bits of the Y-chromosome DNA tested  are not part of any genes and contain no medical information. Thought of logically a DNA change that affected health would be less likely to survive, and so would have no use for genealogy. Also the tests are looking at a small number of DNA sequences that can be expected to change only every 12 generations, so they will clearly not be unique for any individual.

The test indicates whether two or more people have a common ancestor within a given timeframe. Of course this might show that two individuals are unrelated, which might be a concern if they thought they were. This is why I recommend that two people who know they are related more closely then 3rd cousins, do not both take the test.