graphic courtesy of DNA Heritage
The text of this section is included in the
PDF document called The Warburton Surname DNA Project.
The following is my attempt to provide a simple
explanation of DNA and how it can be used in genealogy. It is my understanding
of the subject and is meant as an introduction. The books and sites in the
reference section that follows provide a more professional and authoritative
Part 1: What is DNA?
Human DNA is the blueprint for the human body. A copy
is carried in every cell of our bodies. It is a set of instructions, carried in a
string of molecules. Think of it as a string of letters. There are 4 possible
letters - A, C, G and T (these are the initials of the chemical molecules that
are represented by the letters).
This string of letters is divided up into chromosomes
(so called because scientists use coloured dyes to identify them). Chromosomes
in turn contain genes. A gene could be defined as the shortest string of letters
that actually does something useful in our development. However between the
genes are strings of useless or junk DNA that do nothing (maybe they did once in
earlier stages of evolution). These useless strings of letters are important for
genealogy but I will come back to that later.
The human genome (i.e. our DNA) consists of 46
chromosomes, or rather 23 pairs. We inherit one set of 23 from our father and
one from our mother. When we come to pass on 23 chromosomes to our children we
pass a mixture, some from our father and some from our mother. The mixture is
different every time.
One of the pairs of chromosomes is the X and
Y-chromosomes. A female has 2 X-chromosomes, one from each parent. However a
male has an X-chromosome from his mother and a Y-chromosome from his father. It
is a gene on the Y-chromosome that causes a baby to be a boy. In the absence of
this gene the default is always to produce a girl.
The significance of this is that (unlike all the other
chromosomes) the Y-chromosome is never mixed with a copy from the mother. It
passes unchanged from father to son through the generations.
Now the body is very good at faithfully copying DNA
from generation to generation, but it is not perfect (otherwise evolution
wouldn't work). Very occasionally a copying mistake occurs. For example an A may
become a T. If it happens in a gene it may cause disease, or rarely it improves
the gene. But if it happens in junk DNA it has no effect and so the mistake
continues to be copied from generation to generation. It is these differences
that make DNA useful in historical and genealogical studies.
There is one other piece of DNA that is passed
unchanged from generation to generation. It is in addition to the 46 chromosomes
and acts as the energy source for a cell. It is called mitochondria and is only
passed down the female line. Males do have it, inherited from their mother, but
don't pass it on.
Part 2: Genetic Clans
This section is not specific to genealogy. That will
come later. Firstly I want to introduce SNIPs.
As I have said, very occasionally a DNA letter is
copied wrongly e.g. an A becomes a T. This is known as a Single Nucleotide
Polymorphism or SNP or SNIP. So a SNIP is when a letter is copied wrongly.
SNIPs occur very rarely. A specific letter may have
only changed once in the whole of modern mankind's existence (150-180,000
years). By concentrating on certain specific sequences of junk, DNA researchers
have classified the human population into a small number of classes. Of course
they have a special name for them i.e. haplogroups, though it is easier to call
They have done this with both mitochondria, and the
Y-chromosome. For mitochondria there are 36 clans, 13 in Africa. There are even
fewer Y-chromosome clans.
95% of Europeans fall into 7 mitochondrial clans.
Professor Brian Sykes has written a book called 'The Seven Daughters of Eve'
which gives these 7 clans names. Your clan can be determined by looking at a
specific 400 letter sequence of mitochondrial DNA. Just 2 copying mistakes, at
positions 69 and 126, define me as being in clan J or Jasmine. I have only one
other mistake, at 366, which means I am in the main group of Js, not in one of a
number of sub-clans that have been defined.
Researchers do look at other bits of the mitochondria
for more definition of the clans, but the 400 letters are all that is needed to
How is this information used? Well firstly it is used
to build family trees (again they have a fancy name - phylogenetic trees). These
trees show the relationships between the various clans, and within them. It is
done by working out the sequence in which SNIPS must occur. For example my SNIP
at position 126 is shared with another clan (Tara) but that at 69 is unique to J
so must have occurred later.
Perhaps the most startling assertion, though it is
logical when you think about it, is that each mitochondrial clan must be
descended from a single woman. The copying error occurred just once, so everyone
carrying the error must be descended from the first woman to carry the error.
Not only that, but by linking the clans in a tree, one clan becomes the source
of all the others. Not unnaturally the origin of this clan is called Eve. This
doesn't mean that Eve was not one of a population of similar early humans. It's
just that no descendants of her contemporaries exist today.
Interestingly, if the same thing is done with the
Y-chromosome, 'Y-chromosome Adam' seems to have lived a lot later than Eve
(60-80,000 years ago, rather than 150,000 years ago). Of course there must have
been a male (or males) around 150,000 years ago to perpetuate the species. and one
of them (pre-Adam) would have passed his Y-chromosome to the Adam of 80,000
years ago. The thing is the Y-chromosome of all pre-Adam's contemporary
males has got lost somewhere along the way, so only Adam's Y-chromosome is
the precursor of all Y-chromosomes in today's population.
It seems to be a phenomenon of the Y-chromosome that a few powerful men spread their seed very widely at the expense of other men (sometimes called the Genghis Khan effect), while the female mitochondria are spread much more evenly. This probably explains the difference in dates.
It should be remembered that these dates are very approximate. It should also be remembered that mitochondria and the Y-chromosome form a small part of the DNA we inherit from a wide range of ancestors (mother's father, father's mother etc.). It's just that they are unique in being able to be traced, and I suppose the totally male and totally female lines of descent are in themselves unique.
I've not really looked into it but I believe there is another type of DNA study that tries to address these other lines of descent by determining if you have versions of specific genes which are typical of a particular ethnic background. It is obvious how genes that affect physical appearance (skin colour etc.) could be used in this way, and there are apparently other non-visible ones that can be used in the same way.
Returning to mitochondria and the Y-chromosome,
researchers can determine the migrations of man around the world by looking at
the distribution of the clans in the world, and the amount of change that has
happened in various locations. Of course this has to be linked to the
archaeological evidence to be meaningful. Also by building separate models using
mitochondria and the Y-chromosome two separate, corroborating pictures are
Studies of this can get extremely complicated, and employ lots of probability calculations etc. The results though can be fascinating. For example my mitochondria, and my Y-chromosome have completely different histories. Mitochondrial Clan J originated in the Near East and only entered Europe after the last Ice Age with the first wave of farmers. It followed 2 tracks, one of which followed the coast around the Mediterranean, and eventually up to Britain. My Y-chromosome came with the first migrants to come to Europe 40,000 years ago (the Aurignacian culture). They spent the last Ice Age in a refuge near the Pyrenees, where the last mutations took place, and then moved back into Europe as the ice retreated. It is by far the most common clan in Western Europe, particularly along the western seaboard. My results are described in Commentary on Results in The DNA Project.
So to summarise, SNIPs, or letter copying errors are
sufficiently infrequent to allow the world population to by classified according
to which SNIPS a person has. This is done both for mitochondria, which is passed
down the female line, though present in men, and the Y-chromosome, which is
passed down the male line and is not present in women. By working out sequences
of change the clans can be linked into a family tree, and by looking at the
distribution of clans, the amount of subsequent changes in the various
localities, and at corroborating archaeological evidence the history of man's
migration out of Africa and around the world can be determined.
Part 3: A DNA Test for Genealogists.
So far we have discussed SNIPs. Unfortunately these
occur too infrequently to help with genealogy. Every related male Warburton will
probably have an identical profile. Fortunately there is another test which is
more helpful, but it can only be carried out on the Y-chromosome.
It so happens that there are some short DNA sequences
that are repeated several times. Whereas with SNIPs we were dealing with a
change to a single letter in the DNA sequence, you can think of these sequences
as words that are repeated several times. Every now and then the number of
copies of the word changes. For example one may be added so whereas there were
10 repeats before, there are now 11.
These strings of words are called Short Tandem Repeats
(STRs), so a test for them is an STR test. There are a number of locations where
they occur on the Y-chromosome. The test I am using for the Warburton project
tests 43 of them. They are known as markers, hence the term 43 marker test. Some
tests use as little as 10 markers, though they have limited use.
The rate of change for each marker is estimated to be
once in every 357 transmissions from father to son. With 43 markers this means
one of them will change every 8.3 transmissions on average. The markers actually have different mutation rates and as more data
becomes available the rates of change for the different markers may change, but
for now 1 in 357 is the best estimate.
The number of repeats can change up or down, and
occasionally by more than 1. It could be that due to random changes cancelling
each other out, two people who are unrelated finish up with the same profile.
Therefore matches are only considered meaningful when there is additional
information to link two people. A shared surname is such a piece of additional
information. This is why most STR studies are surname studies, though there are
some locational ones.
Surnames were introduced around 12-1300 AD, when
feudal estates needed them for record keeping. This is roughly 25-30 generations
(transmissions) ago. So if a change in the number of repeats occurs every 8.3
generations we would have 3-4 of them by now, and the chance of the same marker
changing twice is low. Of course if you are comparing two people alive today the
number of transmissions is roughly double because you need to consider the path
from one of them back to their common ancestor, and then back down to the other
to determine how many transmissions apart they are.
So far so good, but now it gets a bit statistical. A
lot of maths goes into calculating the probability of two people having a common
ancestor within a certain number of generations back, given the number of
differences (i.e. marker changes) between them. Fortunately tables have been
published to save having to calculate them. So for example given that my genetic
cousin Clive and I have one difference in 43 markers, the tables tell me the
probability of our common ancestor being within 25 generations is 98%. For 10
generations (and I am 10 generations from George, my earliest known ancestor)
the probability is 68%. See My
Genetic Links for more about my search for this link.
Next I will talk about the test itself. It is actually
very simple. The hardest part is parting with the money (currently $189 US for a
43 marker test). The process is that a participant is sent a couple of cotton
buds in the post. He wipes these around inside his mouth for 30 seconds, puts
them in the container provided, and posts them back. Results take about 3 weeks.
Although a male Warburton is needed to take the test
there are many instances where the genealogist is a female relative.
Provided the male is willing to provide his DNA there is no reason why
the female genealogist cannot handle all other aspect of participation on his
Some people might be wary of the idea of giving their
DNA for fear of unforeseen consequences. Most concerns are groundless. Firstly test samples are normally kept by DNA Ancestry for 3 years in
case further tests (e.g. a SNIP test) are requested, but will be destroyed
earlier if requested, so that further, unauthorised testing is not possible.
The test itself is far too limited to uniquely identify an individual. After all
we are looking to match people, not uniquely differentiate them. Also no medical
information can be determined. Remember these tests target the useless junk DNA
where changes have no bearing on the persons life, health, or ability to pass on
the changed DNA.
However, there is one issue that participants should
be aware of. Whilst a test is not detailed enough to prove paternity, it can
prove two people are unrelated, which could be a problem if they thought they
were. To mitigate this possibility two close relatives should not both
participate. The results would not be particularly useful to the project anyway.
Part 4: Using
The last thing to discuss is how the results can be
used. The best way to look at it is to view DNA testing as an additional tool in
traditional genealogy. We are trying to add information to help us understand
our past better.
The result for an individual
participant will consist of a number, typically between 10 and 30, for each of
the 43 markers tested. The first step is to group people into clans where all,
or nearly all the numbers are the same for each clan member. The clan will
probably have a common ancestor who bore the Warburton name. To be included in a
clan a person should have no more than 5 mismatches from another clan member.
More than that would raise uncertainties, though these might be resolved by more
results that provided further links.
With a lot of results the pattern of the clans would
be instructive for Warburton history. For example if 50% or more of us fell into
a single clan, with everyone else falling into relatively small clans, this
would suggest that a single person adopted the name originally, and most of us
are descended from him.
The small clans would result from 'non-paternal
events'. These are occasions when a a male receives the Warburton name from
someone other than his biological father. Adoption and illegitimacy are obvious
examples. The rate of such events is apparently about 2% per generation.
Of course the original adoption of the name was a
'non-paternal event'. So except where we have a documented history like in the
case of the Warburtons of Arley, we can never be sure of the exact details. An
early illegitimacy could start a clan as big as one resulting from an original
adoption of the name by a feudal serf.
Just as likely we may find several large clans,
suggesting a number of men adopted the name around the same time.
Once we have identified one or more Warburton clans we
can explore the clans themselves. Individual clan members are all distant
cousins sharing a common ancestor so they will want to find where the link might
be. The degree of relatedness will indicate the possibility of finding the link.
If the probability of a link since around 1600 AD is high then it is worth
looking for the link in parish and other records. Even if the link is earlier it
may be possible to show a relationship between two previously unrelated
As an example I mentioned there is a 68% or better
possibility that the common ancestor of my genetic cousin Clive and I is within
10 generations, or post 1600. From
my knowledge of my family tree I know that if he is post 1600 then he must be
John Warburton who lived 1608-91. He is the only one who had multiple sons (five
in fact) who may themselves have had sons. This has
focused my search for a link (see My
Genetic Links ).
Even if there is none I could still find a link to one of the 4 or 5
other Warburton families we know were living in Bowdon parish at the same
time as my oldest known ancestor.
looking at the earliest known ancestors of individual clan members we may find a
clan seems to originate in a particular location. For example Cheshire
Warburtons may be distinct from Lancashire ones. Such information may be
particularly appealing to overseas Warburtons who want to know more precisely
where they originate from.
As well as directing our more traditional genealogical
research, study of the clans can employ some new techniques. For example the
number of DNA changes between the various clan members will indicate how
recently the original clan father lived. Again this employs maths and
probabilities, but I think we will be able to develop a gut feel just from
looking at the number of changes.
This will be helped further by building family trees
of the changes. This is a simplified tree in which the location of changes is
deduced as far as possible.
As an example, the one difference between Clive and I
is that he has 17 repeats at marker DYS458 and I have 16. We don't know where in
the chain from me to our common ancestor X and back to him the change occurred,
or whether 16 or 17 was the original value.
Now suppose we had another result from someone we knew
was linked to me by common ancestor Y. If that new participant had 16 repeats at
DYS458, like me it would indicate that the change was in the link between Clive
and common ancestor Y.
If, however, the new participant has 17 repeats then
the change to 16 occurred in the chain from common ancestor Y to me. In this
case both common ancestor X and common ancestor Y would have had 17 repeats. Any
other clan member with only 16 repeats must be fairly closely related to me.
Furthermore, if the new participant has a different change from me,
any future clan member showing the same change would be closely related
to him. See the example genetic family tree below.
Common Ancestor X
Common Ancestor Y
| =17 |
# 17->16 @
Me New Participant
+ new change
or =16 the value of DYS458
of change from 17 to 16 (roughly: it could be anywhere between common ancestor Y
@ rough position of new change
As more and more participants are shown to be clan
members the genetic family tree would become richer. Combine this with knowledge
of the genealogical family tree and you can see how, over time a DNA test will
provide more and more information about where a new participant fits in. It
is my hope that I might reach this position over the next few years. But I can
only achieve that if I get participants for the project.
An STR result will also include a haplotype
prediction. This is a prediction of the likely result of a SNIP test. It can be
used to understand more about the origins and ancient history of the male line,
using one of the books written on the subject.
For a quick introduction to DNA and Genealogy I
recommend you look at the DNA Heritage site. (the testing site for this
project). Firstly there is a Tutorial.
You could then look at the Masterclass.
You should also review the FAQs.
To see the website of a more mature project I recommend the Davenport
website. Davenports have similarities with Warburtons, with similar numbers,
a Cheshire origin, and a story of Norman ancestry. I have also placed this site
in a webring of similar sites. You can access it from the navigator at the top
of this page. The International Society of Genetic Genealogy (ISOGG) also have
an interesting site.
My own interest in this area grew out of an interest
in Ancient History. I began with Ancient Egypt and then began reading about
earlier subjects, including evolution and the history of climate. One book I
came across was ‘Out
of Eden’ by Stephen Oppenhiemer. The subject of the book was what I now
know to be a new ‘science’ called phylogeography.
This combines phylogenetics with traditional archaeology to study the
ancient migrations of peoples. The startling conclusion of the book was that all
non-Africans in the world are descended from a small group of humans that left
Africa 80,000 years ago.
I then came across two books by Professor Bryan Sykes,
Seven Daughters of Eve’, and ‘Adam’s
Curse’. These books have very
readable discussions of the science, and lots of interesting anecdotes. For
example he shows how Thor Hyerdahl’s Kon Tiki expedition to show how Polynesia
could have been populated from South America was a waste of time because
genetics proves that the Polynesians came from China, probably via New Guinea.
However, I found the pseudo-life descriptions of the seven European clan mothers
a bit contrived.
If you are interested in the genetics and evolution, I
also enjoyed ‘The
Selfish Gene’ by Richard Dawkins.
Professor Bryan Sykes has set up a testing company
Ancestors to allow people to get their DNA tested, so I got a test. My
mitochondria test was interesting. It is discussed on the The
DNA Project page. However, my Y-chromosome test turned out to be an STR test
on a limited number of bases. However my deep ancestry was deduced from this. I
seem to match the most common Western European type, known as the Atlantic Modal
Haplogroup (AMH). This is discussed in the DNA Heritage Masterclass.
I then read a book called ‘DNA
and Family History’ by Chris Pomery and realised my test result is not
terribly useful for genealogy. This is a very useful book and has an associated website
(though it doesn’t seem to have been updated much recently). I had a
proper SNIP test done (see The DNA Project).
Recently both Sykes and Oppenheimer had produced books
on the origins of the British peoples. Sykes' 'Blood
of the Isles' is perhaps the more readable, but Oppenheimer's 'The
Origins of the British' is the more detailed work and I have used it as the
basis for my comments on the haplotype predictions for Warburton Surname DNA
After reviewing all the available material you may
still have concerns. Typical concerns include cost, and fear or ignorance
of what might be revealed.
Cost is unavoidable, though at under £100 sterling it
is not unmanageable. Cost can be shared. Close family members (brothers,
cousins, even second cousins) will yield the same or similar result, so only one
test is required per extended family. Interested members of a family, including
females, who are often the most committed genealogists, can share the cost.
Fear and ignorance covers fear of the test itself, and
fear of what might be uncovered. The test itself should be no
barrier. It is simple, painless and self administered. You are simply required
to take a swab from the inside of your cheek.
Two concerns about the results are that the test is medically informative, or it can
identify someone as an individual. Neither is an issue. The bits of
the Y-chromosome DNA tested are not
part of any genes and contain no medical information. Thought of logically a DNA
change that affected health would be less likely to survive, and so would have
no use for genealogy. Also the tests are looking at a small number of DNA
sequences that can be expected to change only every 12 generations, so they will
clearly not be unique for any individual.
The test indicates whether two or more people have a
common ancestor within a given timeframe. Of course this might show that two
individuals are unrelated, which might be a concern if they thought they were. This is why I recommend that two people who know they are related more closely then
3rd cousins, do not both take the test.