Thursday, 12 June 2014

Introduction to Family Research

Introduction to Family Research

For much of my adult life I have had a curiosity about my ancestry.  This curiosity was based mainly upon family anecdotes that I had been told, or which I had overheard.  They provided enough clues to stimulate my interest but they did not, even in the remotest sense, constitute an account or my origins, or those of my family.  Mostly, these anecdotes lacked specific detail on timing, place or social context but it is worth repeating some of them so as to give some understanding of the basis for my curiosity, prior to undertaking serious family research.
I knew that both my grandfathers, Frederick Fox and Thomas Harriman, had been farmers but had ended their working lives as farm labourers.  Frederick was reputed to have lost his farm due to gambling and Thomas’ fall from grace was said to be due to alcohol addiction.  I also knew that my maternal grandmother, Mary Jane, was a first cousin of Thomas and that she also came from a farming family.  My other grandmother, Jane (nee McAlpine) was a Scottish lady who was reputed to be related to Sir Robert McAlpine, the founder of the construction company of the same name. 
These stories were in the back of my mind for decades as I pursued my career, first as an academic, teaching and researching in genetics and then as a university manager, before finally entering commercial life as the Chief Executive of Southampton Science Park Ltd.  During this varied life in work I learned many skills which would stand me in good stead once I found the resources and the inclination to undertake serious research into my family background.  That opportunity arose after my retirement in 2007.  Then I had the time to spare and the financial means to pursue the study, which quickly became a demanding and consuming hobby.  Also, family research had received a substantial boost from the increasing availability of on-line sources of information and many lines of enquiry could thus be conducted at home from my PC. 
How much truth was there in the family anecdotes and what else would I find out about my past along the way?  I did not start my study by reading “how to” books on genealogy but simply allowed myself to be seduced by the offer of a period of free use of the Ancestry.co.uk website and set about finding my grandparents in the 1901 Census.  This proved to be straightforward and within a few minutes I become hooked on family research, as many others have been, due to an addictive desire to know ever more about my origins and about my ancestors and their lives.
At this point I will digress to the present.  After almost five years of research I know a lot more about my antecedents and the forebears of my wife and my son-in-law.  I now have almost 4500 individuals (all but a few being relatives of my wider family) on my genealogical database.  The process of acquiring this information has also had the effect of stimulating the evolution of a personal philosophy of family research.  This is how I now view my activities.
The terms “family research”, “family history” and “genealogy” are used, loosely, as being interchangeable.  There does not seem to be any generally accepted set of definitions for the terms, which would allow us to say if they are exact equivalents.  “Genealogy” seems to be the least contentious.  The OED defines the term as “the study and tracing of lines of descent.”  This seems to imply a circumscribed discipline in which, once a line of descent from an ancestor has been identified with high probability, the study is essentially at an end.  In the initial stages of a study into one’s ancestors, establishing such relationships is both revealing and fascinating.  Learning that I emerged from generations of farmers in the East Riding of Yorkshire and North Lincolnshire was initially exciting for me but I now find that adding another name and another generation to these lineages is only mildly stimulating, since one can usually know little about the newly discovered ancestor, except name, year and place of birth.  Now I find it more important to have some understanding of relatives’ lives than simply to know who they were.
Before examining the terms “family research” and “family history” let us first look at how we might define “family”.  A narrow definition focuses on there being a genetic relationship between the members of a family and typically this includes parents and children, but occasionally encompasses other close relatives, such as grandparents, cousins, aunts and uncles.  More broadly, “family” can be defined to include everyone living together in a household.  This will typically include parents and children and perhaps other genetic relatives but may also include various categories of non-genetic family members, such as adoptees, new spouses and the children of new spouses from previous relationships. 
When used in a medical context, the term “family history” has a narrow definition, meaning the recording of family structure and relationships, including information about diseases in family members and typically going back about three generations.  This is an aid to identifying the presence of disease with a genetic causation.    Outwith a narrow, medical context, “family history” also has a wider definition encompassing all family matters, not just those dealing with health and disease.
So, what are the essential elements of my philosophy of “family research”, the term I shall use as shorthand to describe the framework of my activities?  Firstly genetic relatedness, ultimately to me, is a predominant, but not an exclusive property linking the people that I study.  This includes people who are my direct relatives, going forward as well as backwards in time, but also people who are collateral relatives, such as uncles and aunts and the spouses of direct and collateral relatives.  Close, collateral relatives are not very interesting if they have left little trace, but more distant collateral relatives are interesting if they have achievements in their lives.    Secondly, I study the relationships, activities and interactions within families, in the broadest sense, to whom I am linked genetically.  But families are not static entities.  They evolve, mainly through marriage, birth and death and my family research covers this evolutionary process too.  Also, families do not exist in a vacuum, insulated from interactions with their neighbours, employers and landlords and my research takes frequent forays into these external relationships of the family.  On a grander scale, families and the individuals within them live within societies which are themselves evolving, buffeted by social and technological change and fashioned by national and international events such as wars and natural disasters.  From time to time consideration of cultural, social and international events is essential in interpreting what happened within families, for example relatives who moved from the country to burgeoning towns as a consequence of the industrial revolution, or relatives caught up in WWI and WWII.. 
It is also important to say something about the research methods that I have employed.  When I started out on this journey I did so without preconceptions and with a frankly casual attitude to the collection, evaluation and recording of data.  It did not take long for me to realise that family research, as I have chosen to define it, is a serious academic discipline, which needs to adhere to certain principles.  In my opinion they include the following. 
Recording information.  Since the strongest theme running through my family research is that of genetic linkage, the employment of genealogical software to store these fundamentals is indispensible.  I use PAF5, which is freely available from the Latter Day Saints (LDS – the Mormon Church).  Not only does PAF5 allow you to record the fundamental statistics of peoples’ lives, such as date of birth/christening, marriage, death and burial but it also allows you to record genetic linkages and to navigate with ease through this complex network of interrelationships.
Information sources.  As important as recording information is the need to record the source of a piece of information so that in the future you, or anyone else, can check the information and evaluate any conclusions based upon it.
Time sequence.  All information on an individual should be recorded in a time sequence.  The “Notes” section of an individual’s record on PAF5 can be used for this purpose.  This should include dates and places of births marriages and deaths, census records, will and probate records, press mentions, etc.  Getting an insight into someone’s life is much easier when data are organised in such a time sequence.  Indeed, this is the start of a biography of the person, though that biography will prove to be sketchy to the point of triviality for most relatives born before 1750.
Understanding the nature of data.  Registration of a birth will usually only tell you when and where a birth was registered, not when and where the birth occurred.  Similarly a christening, would usually have occurred shortly after birth and in the same village, or one nearby, but there may only be a loose relationship between date and place of christening and date and place of birth.  It is important not to make unwarranted assumptions in evaluating such data.   
Limitations of data sources.  Conventional family research is only possible once written records became routinely available.  The Subsidiary Rolls of Edward I, used to raise taxes for his wars in Scotland and Wales hardly meet that description.  The same is true of the Poll Tax records which first date from 1377 in the reign of Edward III.  This was a tax levied at the rate of 4d on all adults to pay for military excursions into France.  The first records which were really useful were parish records of baptisms, marriages and burials.  These records were first introduced by the Catholic Church in Europe in the 15th century but were not introduced in Britain until 1538 in the reign of Henry VIII, after the split with Rome.  Thomas Cromwell, Henry’s Vicar General ordered that each parish priest must keep these records.  However, this instruction was poorly observed and many records, in any case, were subsequently lost.  Statutory registration of births, marriages and deaths was introduced in England and Wales in 1838 but not in Scotland until 1855.  A census of the whole population was first carried out in 1801 and has been repeated every 10 years, except in 1941.  Sadly, the 1931 records for England and Wales were destroyed by fire.  The censuses for 1801 to 1831 were statistical in nature and did not give information on individuals or households, so it is only from 1841 that we are able to get a ten-yearly snapshot of family relationships. In gathering data from official records, primary sources, eg microfiches of parish records or scans of census records are superior to secondary sources, such as transcripts of parish records and census returns, especially machine-mediated transcripts.  Comparison of equivalent primary and secondary sources will show quite quickly how often transcription, by either man or machine, can introduce errors.  But primary sources can also contain errors, especially when data are collected orally and then rendered in writing, eg a census enumerator who was told that a child was called “Tom” but rendered the child’s name as “John” because he misheard what was said.  One of the weakest sources of secondary information is a source, such as a published family tree or an LDS record submitted by an individual (see original Family Search database), which present “facts” on family relationships, eg parentage, but without supporting data sources.  Such “facts” should be treated as “opinions” until they can be independently verified.
Identifying distant relatives.  It is frequently the case that in identifying relatives before 1800 we are reduced to using only parish records and identification may then rely on someone having the right name, being born in a plausible time bracket and not far from the place of birth of his or her children.  But this is a dangerous practice, especially where a surname is frequent in a given locality, eg Harrison in the East Riding of Yorkshire.  The most plausible birth is not necessarily that of your ancestor.  Parish records are notoriously incomplete and your relative may have been christened or married in a parish whose records have not survived, eg in a non-conformist church.  Ideally you need to arrive at the same attribution by two independent routes.  If this is not possible, then I usually stop chasing a line back because the effort may be wasted.  It is important to remember that an error will normally result in all subsequent, more distant ancestors being wrongly identified too.  I find that potential ancestors for whom I do not have a high level of confidence of correct attribution are simply not interesting.
Can any facts be relied upon to be true?  The answer to this question, in an absolute sense, is “no”.  A brief consideration of any “fact” will soon uncover potential sources of error.  Attributed fatherhood may be unreliable because of hidden liaisons by the mother.  Attributed motherhood may be more reliable but even this apparent “fact” can be in error, eg from accidental baby-switching in a hospital setting.  But if we can never be 100% sure of our facts, how can we proceed with family research, or indeed any kind of research?  The answer is that we try to reduce the margin for error to acceptable levels.  Probability is usually expressed as a percentage or as a decimal fraction.  The probability that a particular statement is true may, in theory, range from 100%, ie it is certainly true, to 0%, ie it is certainly false.  While the extremes of this distribution of probabilities can never be reached, the closer we get to 100% the more likely we are to be correct in our deductions based on the statement.  In quantitative statistical tests the probability that a result was obtained by chance can be calculated and a probability of 95% is usually taken as a practical measure of significance but this accepts that in 1 in 20 such cases the conclusion will be in error.  Most of the time, when conducting family research, we cannot attribute a quantitative value to the probability that a statement is true but we can use perfectly legitimate qualitative approaches to increasing the probability that conclusions are correct.  For example, if we are trying to decide where someone was born we can collect as many independent statements as possible which give this datum, eg different census returns, birth certificate.  If out of three such sources only two agree with each other we would have a dilemma in concluding where someone was born but if we had 10 such sources and 9 agreed, we would be very confident that we could identify that individual’s place of birth.
Essay writing.  Facts are of limited use without integration and interpretation.  For me, this process involves writing a series of extended essays covering coherent areas based on my overall family research database.  Essay writing is an essential process because it drives you to evaluate fully your thoughts on a topic.  It is firmly based on fact but blatantly and openly strays into interpretation and hypothesis building, starting from established facts (as qualified by high probability).  An essay might deal with a branch within my family research data base, eg The Foxes or The Spurriers, or it might deal with an inanimate object, eg “Clipper Ship Conway”, or even an individual to whom we are not related but with whom our ancestors had interacted, eg “Captain WH Duguid”.  I like to keep my essays free from references within the body of the work, to help the flow of argument, uninterrupted by citations.  The reader can always refer to my family research database for the discovery of sources.
Autobiography.  I believe every genealogist should write his or her autobiography, being careful to distinguish fact from opinion and trying as far as possible to be balanced and accurate.  The autobiographer may wish to exclude some materials which might cause offence or embarrassment to living people and this approach is likely to be justified unless it bears upon a matter of substantial interest or importance, which would otherwise be lost.   
It has been pointed out above that the (almost) unifying theme of my family research is the existence of a web of genetic links.  My own career, which included a period of 20 years when I was a teacher and researcher in genetics, gave me a good insight into the nature and significance of these genetic links and it is useful to give a brief summary of the important aspects of the academic discipline of genetics which impact most directly on family research.
The characteristics of an individual are due to information acquired by two parallel routes.  We acquire genes from our two parents which are involved, often predominantly, in determining many of our characteristics, from hair colour to our ability to metabolise chemicals that we ingest.  On the other hand we acquire other characteristics almost entirely from the environment in which we are raised.  The ability to speak a language is genetically determined but the ability to speak Chinese or English is not.  The type of language that we speak is entirely learned during our upbringing, as are many other attributes, such as system of religious belief or style of cooking.  However, many characteristics are the result of an interaction between our genes and out environment.  For example, the disease Favism occurs when susceptible individuals eat broad beans.  Their red blood cells then break down due to an enzyme deficiency. The deficiency is caused by a defective gene but the disease is only expressed if a deficient individual also eats broad beans (an environmental factor).  Genetic inheritance involves the physical passage of an information-bearing molecule, via egg and sperm, from one generation to the next, but cultural inheritance involves only the passage of ideas and information via written and spoken language and via observation of others within a family, tribe or community.
(Please skip the next few pages if you have an understanding of basic genetics!)
The information-bearing molecule alluded to above is deoxyribonucleic acid, universally known by its abbreviation, DNA.  Information is encoded in DNA using an alphabet of four letters and these letters can form 64 different 3-letter words (4^3).  The sequences of these words on linear DNA molecules provide the instructions which pass from the parents to the fertilised egg and determine the genetic component of inheritance.
The fertilised egg cell divides repeatedly during development and the cells produced eventually differentiate to form our tissues and organs.  Before cell division the information in DNA is copied exactly and one copy transmitted to each of the two new cells.  All cells in our bodies (except unfertilised eggs and sperm – see below) thus contain the same genetic information and cells with different functions, eg liver cells, brain cells, are made by switching on and off different sets of instructions in the cell’s DNA.
Most of the DNA in a cell is present in the nucleus but some is present in the mitochondria (small bodies generating energy) in the cytoplasm.  Mitochondria copy their DNA and the mitochondria divide, just like the whole cell.  However, the copying of DNA and the division of the mitochondrion are not synchronised.  Each body cell contains ~100 mitochondria and each mitochondrion contains a variable number of DNA molecules, typically ~5.  When the cell divides the cytoplasm is pinched into two and the mitochondria are distributed roughly equally to the two daughter cells.  The mechanism for separating copied nuclear DNA into daughter cells is much more precise.  The DNA exists as several very long molecules which are condensed by coiling and looping into structures called chromosomes for ease of transmission at cell division.  A chromosome prior to cell division consists of two separate parts, each containing one copy of the DNA information and each daughter cell gets one of these copies, so that each contains exactly the same number of chromosomes and exactly the same nuclear DNA information.
When egg and sperm cells are formed there is a special kind of cell division which treats the chromosomal and mitochondrial DNA differently.  When an egg cell is formed most of the cytoplasm (and its contained mitochondria with their DNA molecules) goes to the unfertilised egg cell.  On the other hand, when a sperm cell is formed all the cytoplasm (and mitochondria) is excluded from the part of the sperm cell (the nucleus) involved in fertilisation.  Egg cells contain mitochondria (with their DNA molecules) but sperm cells do not transmit mitochondria, thus our mitochondrial DNA comes exclusively from our mothers. 
Chromosomes are of two types, those which occur in pairs in the body cells of males and females and those (sex chromosomes) which differ in number and/or type in the body cells of the two sexes.  Humans have 46 chromosomes, in total, in their body cells. These are made up of 24 different types. Twenty two of the types are not concerned with sex determination and each of these occurs as a pair in body cells of both males and females.  The other 2 types of chromosome are the sex chromosomes, called X and Y respectively.  Female body cells contain two X chromosomes and male body cells have one X and one Y chromosome.  X chromosomes contain lots of information in their DNA but Y chromosomes contain very little. 
There is a special kind of cell division which generates gametes (unfertilised egg and sperm cells).  It ensures that each gamete contains only one chromosome of each type not concerned with sex determination.  This is also true of the pair of X chromosomes in females.  Thus, unfertilised eggs all contain a single X chromosome.  In males the X and the Y chromosome separate from each other during sperm cell formation. Half of all sperm cells thus contain a single X chromosome and the others contain a single Y chromosome.  After fertilisation all eggs have 2 chromosomes of each type not concerned with sex determination.  Half of all fertilised eggs contain two X chromosomes and the others contain an X and a Y.  A fertilised egg which contains only X chromosomes develops into a female child.  If a Y chromosome is present it switches development from a pathway leading to femaleness to one leading to maleness.
A gene is a unit of inheritance.  It determines one particular character, for example the structure of a protein molecule.  The information in a gene is made up of the code words in a linear section of DNA and different genes are arranged in a linear sequence along a DNA molecule.  A change in the code words within a gene, eg swapping one word for another, may change the information in the gene in such a way that some body characteristic is changed.  Such a change in DNA information is called mutation and almost all genes exist in a population in these different forms, or alleles.
Humans have ~20,000 different genes.  While all genes are made of DNA, not all DNA is used to make genes.  In humans quite a lot of the DNA in the chromosomes does not appear to contain any information but its sequence of letters is still copied and distributed accurately at cell division.  This so-called “junk” DNA can still accumulate changes in its sequence of code letters, which can be detected by determining the sequence of letters in DNA by chemical analysis.  There is far more “junk” DNA in our cells than DNA used to make genes.  “Junk” DNA has proved to be very important in studying the historical movement of human populations by plotting the geographical presence or absence of particular sequence variants.
Each of our body cells has two of the 22 different types of non-sex chromosomes.  A gene has a fixed position on a chromosome and each body cell has 2 copies of each of the ~20,000 different genes.  If, in a particular human population, there are two forms, or alleles of a particular gene, which we will call A and a, then each individual may potentially have a genetic make-up of AA or Aa or aa for that gene.  If individuals with the make-up AA look identical to those with Aa but different from aa for a particular characteristic, we say that allele A is dominant over allele a.  Sometimes an allele will cause disease in a person possessing it.  Usually when this happens, eg in Sickle Cell Anaemia, where there is an abnormal haemoglobin molecule in the red blood cells, the allele causing the disease is recessive to the allele which determines normal haemoglobin structure.  So only people who have two “a” alleles show the full-blown disease.
The special kind of cell division which occurs in the formation of egg and sperm cells has another important function in addition to halving the number of chromosomes.  It also causes the different alleles of the many genes, both on the same pair of chromosomes and on different pairs of chromosomes, to be recombined to give combinations which did not occur in either the father or the mother.  Many of our characteristics are determined, not by single genes, but by many genes interacting with each other, so recombination is important in generating new combinations of alleles which may produce new characteristics. 
Disease-causing recessive alleles of genes usually only occur rarely in a population, because people with the disease often don’t survive to reproductive age, so these alleles tend to be eliminated.  If a disease-causing recessive allele has a frequency of 1/1000 in a population and mating is at random, only 1 in a million (1/1000 x 1/1000) people will have two copies of the allele and show the disease, though almost 2/1000 will possess one copy of the allele ([1/1000 x 999/1000] + [999/1000 x 1/1000] = approximately 2/1000).  However in a family where a disease-causing recessive allele is present, if close relatives mate together, their children have a much higher chance of having a child suffering from a genetic disease, because this increases the chance that the two relatives, though not suffering from the genetic disease will both be carrying it.  In such circumstances the chance of each child showing the disease is 1 in 4 (1/2 x 1/2).  This is why inbreeding, for example cousin marriage, can have unpleasant outcomes and should be avoided.
All individuals of a species are genetically related to each other and this is true of humans as it is of other animals and plants.  The practical test of this relationship is that all individuals of a species are capable of mating with each other and producing fertile offspring.  All humans present on this planet are linked to each other genetically and, using techniques to identify DNA markers, we are able to follow this linkage back through time.  On the other hand, looking forward, there is a significant chance that any one of us will not leave descendants far into the future.
Looking to posterity, each of us usually has 2 parents, 4 grand-parents, 8 great-grand-parents, etc, with the number of our direct relatives doubling with each successive generation going backwards.  Now, if this proposition was absolutely true, the number of our direct relatives would quickly exceed the number of people in the population of these islands.  If we assume that there are 3 generations per century and we go back 27 generations to a time shortly after the Norman Conquest, then the population would need to have been at least 2^27, which is greater than 134 million!  It is estimated that the population of Britain at that time was actually ~2 million.  The explanation for this discrepancy is that if we could trace our relationships back through 27 generations we would find that many, perhaps most, of our marrying ancestors in those generations were relatives, even if the relationship was often remote.
How much genetic similarity is there between an individual and his/her parents, grandparents and more remote direct ancestors?  The answer depends upon the location of the genes in question.  All the genes which lie on the mitochondrial DNA of an individual come from his/her mother.  There are 37 genes on the mitochondrial DNA and most of the mitochondrial DNA is used to code for these genes.  All the genes which lie on the Y chromosome of a male individual come from his father.  The Y chromosome DNA has the capacity to code for several thousand genes but in fact only codes for ~27, most of which do not have vital functions.  Thus, only ~0.3% of our genes is found in these two locations. As a result, they tend to be ignored so that we can make approximate calculations for the bulk of our genes which lie on the chromosomes and which carry lots of genes, the chromosomes not concerned with sex determination and the X chromosome. (Strictly speaking the X chromosome ought to be treated separately because, while a mother passes her X chromosomes equally to her sons and daughters, a father only passes his X chromosome to his daughters.) 
With these qualifications, we can say that each individual has half his/her genes (strictly speaking alleles of genes) in common with his/her father and the other half with his/her mother.  This halving continues with each further generation and the general rule for calculating the proportion of genes in common with a direct relative is ½ ^ n, where n is the number of generations.  If we go back 3 generations to our 8 great-grand-parents, we only have ½^3 = 1/8 or 12.5% of our genes in common with each of them.  Frequently it is possible to trace some part of our family tree back for 12 generations (say 400 years).  While we may feel emotionally connected to individuals from so many generations back, we actually only share ½^12 or <0.025% of our genes with them.  We are barely connected at all. 
The genetic relationship with remote ancestors, even where a reliable family tree is present (and ignoring the possibility of hidden illegitimacy) may actually be more or less than the estimate (which is an average) and can even be zero, because of the way that recombination of alleles occurs during the special kind of cell division which precedes the formation of eggs and sperm.  This recombination of alleles occurs due to two different mechanisms, depending upon whether the genes being examined lie on the same chromosome or upon different chromosomes.
(Consider just 2 chromosome types in a child’s body cells, called A and B.  There are 2 chromosomes of type A present and 2 chromosomes of type B, which we will call A1, A2 and B1, B2, where chromosomes labelled “1” come from the mother and chromosomes labelled “2” come from the father.  When the child starts to produce egg or sperm cells each gamete receives either A1 or A2 and either B1 or B2.  A quarter of the gametes will have the chromosome complement A1 B2 and another quarter will have the complement A2 B1.  Neither of these combinations occurred in either parent, so there has been recombination.
When genes occur on the same chromosome type, they can also be recombined to give new combinations of alleles on the same chromosome type by the operation of a mechanism which swaps sections of chromosome.  This adds to the mechanism of recombination of genes which lie on different chromosome types.  However, the swaps do not occur very often, typically 1-3 per chromosome pair, some copies of the chromosomes have no swaps at all and swaps tend to occur in a small number of places, meaning that groups of genes tend to be inherited together without being recombined.)
Ignoring the complications caused by genes in mitochondria and genes on sex chromosomes, we can calculate what proportion of other genes (the great majority) that two related individuals have in common due to recent common ancestors.  This value has the grand title of the Coefficient of Relatedness and the symbol R.  Two individuals may be direct relatives or collateral relatives.  The line of direct relatives is, for example, son-father-grandmother-great grandmother.  A collateral relationship is, for example son-(father and mother)-son (ie brothers) or son-father-(grandfather and grandmother)-son (ie nephew –uncle).  R can easily be calculated.  Sons (or daughters) share half their genes with their fathers (or mothers).  We can calculate R by multiplying the proportions together for each link in the genealogical chain in travelling from one individual to the other.  In the case of two brothers there are two separate routes, each of two links (son-father-son, son-mother-son) and so R, the proportion of genes in common between two brothers is (½ x ½)+(1/2 x ½) =1/2 , or 50%.  In the case of nephew – uncle there are two separate routes, each of 3 links (son-[father or mother]-grandfather-son, son-[father or mother]-grandmother-son) so they have (½ x ½ x ½) + (1/2 x ½ x ½) = 1/4, or 25% of their genes in common.  The calculation of R can be extended to any relationship, no matter how remote. 
The above calculation of the coefficient of relatedness applies to a situation where there has been no inbreeding in the last few generation of a family.  Where there has been inbreeding the coefficient of relatedness will be increased in value.  The most common inbreeding situation encountered by family researchers is cousin marriage.  As we have seen, R for two brothers whose parents are unrelated is 0.5, ie they have 50% of their genes in common.  If their parents were first cousins, then the value of R increases to 0.5625.  If their parents were double first cousins, then the value for R increases again to 0.625.
Inbreeding is the mating of genetically-related individuals.  The degree of inbreeding can be quantified by measuring the Coefficient of Inbreeding, F.  It measures the probability that the two genes of a particular kind in an individual are identical due to inheritance from a common ancestor.  In an otherwise outbred population F is approximately half the value of R, the coefficient of relatedness, for the two parents.  The method for calculating F will not be given or explained.  Any good book on quantitative inheritance or plant/animal breeding will give the formula.
(That’s the basic genetics over!)  
Given names and surnames are a vital component of family research, because they are a major tool in identifying individuals at different times and places throughout their lives and, in the case of surnames, in identifying the parents and children of individuals.  Indeed, family research could not proceed if individuals changed their given names with a significant frequency and if surnames were not inherited between generations and, on marriage, if the female marriage partner did not routinely give up her parents’ surname and adopt the surname of her husband.  Also, if the big events in life were not recorded in writing, names would be nearly useless to the family researcher.  But the recording of life events, the maintenance of names within and between generations and the discarding of female surnames on marriage are practices which are only a few hundred years old.  It is worth looking at the origin of given and inherited names, so that we can understand the limitations that this history places on family researchers.
In primitive hunter/gatherer societies it was probably essential that each individual had a unique name, so that accurate communication could occur between members of the group, for example during hunting.  There was no need to have more than one name, provided that every member of the group had a unique name. In Anglo-Saxon England there were so many give names that within a family or small group, such as a village, each is likely to have been unique.  With the coming of Christianity there was a trend to use biblical names as given names and thus the pool of given names became much smaller.  The Norman invasion of 1066 had a profound effect on Britain. Given names introduced after the invasion tended to be either biblical names or names popular in France at that time.  Most Anglo-Saxon names were discarded.

In addition to a given name, many people acquired a second name, or byename, probably to differentiate them from other individuals who had the same given name.  When a byename became fixed, adopted by all members of a family and inherited between generations, it became transformed into a surname.  Byenames were ephemeral but surnames have survived, though byenames probably had many of the characteristics of surnames.  Surnames are classified into four principal subdivisions, those derived from place, eg Bielby, those derived from kinship, eg Harrison, those derived from nicknames, eg Fox and those derived from occupation, eg Harriman.

This fixing of byenames into surnames did not occur quickly or uniformly.  At the time of the Conquest, a few Norman barons had surnames which were derived from the names of their estates back in France, for example Warenne (modern Warren) was derived from the hamlet of Varenne near Dieppe.  Some barons took surnames derived from the names of their English estates as a means of establishing ownership.  However, even long after 1066 surnames were confined to the ruling barons.  By 1250 most knights (landowners at a lower level than the barons) had surnames. Also by this date some families in provincial towns had surnames and by about 1350 most families in England had surnames. The convention of women adopting their husband’s surname did not necessarily occur immediately on the adoption of surnames.  Sometimes women kept their old surname on marriage, or even changed it for one different to the surname of their husband.  The rules were not fixed until well after 1300 but had become fixed by 1400.  Once surnames had been adopted they did not necessarily remain immutable. Some surnames evolved by truncation and some by variation in spelling, which was often a variable rendering by clerks of local pronunciations. After the Conquest, French words were not understood and were often twisted due to English pronunciation, eg Bohun became Boon, Bone or Bown.  Thus whole families of similar surnames arose from the same original name.  Even as late as the mid-19th century variable spellings of surnames can be found in parish and census records.

One major stimulus for the fixing of byenames into surnames was the creation of written records for the purposes of taxation. In the reign of Edward I (1272-1307) the whole country was taxed to raise money for the king’s wars in Wales and Scotland and the names of all those who paid their taxes were written down and survive today in the Subsidiary Rolls in the Public Record Office.  A century later, in the time of Richard II another tax, the Poll Tax was introduced and names were written down again.  Some of these lists have also survived.  In the Subsidiary Rolls, names such as “Hugh the Baker” probably indicate current occupation (ie “the Baker” is a byename) but “John Carter, draper” clearly shows that in some cases a surname had already evolved. Ignoring 20th century immigration, at least 30,000 surnames are present in Great Britain.  Some surnames are much more common than others, Smith being the most frequent.

The practice of assuming the surname of the male partner on marriage parallels the mode of inheritance of the Y chromosome in one respect, ie it passes from father to son, to son, etc.  However, unlike genetic inheritance, social inheritance of surname is blind to such events as hidden illegitimacy and adoption. Individual surnames today have characteristic geographical distributions, even common names such as Smith, which may contain information about geographical origins of names.  It is likely that many of the rarer surnames had a single geographical origin and even that all bearers of that name are potentially genetically related.  However, in order to draw conclusions about the current geographical distributions of surnames it is essential first to trace the origin and evolution of that name in written records through the ages.  Also, studies of particular DNA sequences on the Y chromosome show that while such sequences are associated with some surnames, this is never exclusively so, presumably due to hidden illegitimacy.  In the case of very frequent surnames there is usually a diversity of sequence variants present, perhaps mainly due to the name having arisen independently on several or even many occasions.

Prior to the Marriage Act of 1753 marriage was governed entirely by church law and the only requirement was that it should be celebrated by a priest, normally, but not exclusively, after banns or the obtaining of a marriage licence.  The 1753 Act required for the first time in England and Wales, under the civil law, that marriages must take place in the parish church.  Curiously, the law applied to anglicans and roman catholics but not to jews or quakers.  Before this enactment it was normal for couples to live and sleep together and for the bride to be pregnant at the time of marriage.  By the start of the 19th century it had become the social convention that there should be abstention from sexual intercourse before marriage.  However, premarital conception was at a level of ~40% at the start of the 19th century, though it dropped to ~20% during the Victorian era.  By the end of the 20th century it had risen to ~40% again.
Most extra-marital conceptions were actually pre-marital conceptions, since marriage usually followed.  In the late 1700s in England only 2-4% of births were illegitimate.  The illegitimate were often stigmatised, especially during the Victorian era.

It was not unusual for men to delay marriage until their late 20s in the 18th and 19th centuries, though women usually married at an earlier age.  Women then usually had children regularly, approximately every 2 years until death or infertility intervened.  In the 1730s, 24% of marriages were ended within 10 years and 56% within 25 years, due to the death of a spouse.  These rates fell substantially during the 19th and 20th centuries as health and longevity improved. While women could have 12 or more children, they typically had only 4 or 5.  Towards the end of the 19th century there is evidence that couples had started to limit conception voluntarily.

Family research depends crucially upon birth registration, both parish records and statutory registration, to identify the genetical parents of a child.  There is usually little doubt as to who was the mother, since her pregnancy was obvious, but the father may sometimes have been someone other that the registered father.  Modern genetical studies have shown that in the UK 1-2% of all births are associated with a so-called non-paternity event, ie the registered father is not the genetical father.  It is impossible to estimate the frequency of non-paternity events in the 17th, 18th and 19th centuries but we can state with confidence that they occurred.  A non-paternity event normally broke the link between Y-chromosome type and surname (but not if the child was conceived by the brother of the husband)!  Modern studies looking for the association of surnames with DNA markers on the Y chromosome support the presumption that non-paternity events did occur in the past.  In former times, in addition to extramarital liaisons by a wife, other possible mechanisms involved in non-paternity events were informal adoption and accidental baby swapping by nursemaids.  Contemporary society provides additional, more exotic mechanisms, such as egg and sperm donation and surrogacy, for non-paternity (and non-maternity) events.

It is worth going back over this introduction to family research to summarise the clear limitations under which such research labours.  Once we understand these limitations we can appreciate what it is possible to achieve with our own family studies and what is beyond knowing.

The evolution of surnames to create firm links between generations and the regular  recording of the great events of life, birth (baptism), marriage and death (burial) from the mid-16th century onwards, created the opportunity to pursue family research back from the present to that time.  However, there is little chance of pushing back knowledge of our individual ancestors to earlier times.  In the case of my own family researches, the earliest entries on my database are for the Maw family, farmers in Epworth, Lincolnshire from as early as 1467 to the present.  Although the Maws were not my direct ancestors, they intermarried with my Fox farming relatives extensively in the 19th century.  But they constitute a rare case where family research can penetrate back before the 1650s.  My earliest direct relatives who have been traced are various individuals, probably farmers, on the Yorkshire Wolds going back 9 generations to the mid 17th century. The earliest records tell us little about the lives of individuals other than names and dates.  It is difficult to get any appreciation of the lives they led.  Even if we are lucky and can trace back our relatives to the mid-16th century, we have to ask ourselves if such information is useful.  We also need to remember that the greater the number of presumed genetical links back into history, the poorer is the evidence on which that supposition of genetical linkage is based and the lower the probability that we have correctly identified the next set of genetical relatives in the chain as it disappears into the mists of time.

Genetical linkage, either direct or collateral, is a strong driver for most family researchers.  However, the approximate halving of genetical relatedness with each direct link going either forwards, backwards or collaterally, quickly takes us to individuals with whom we have little genetically in common.  Genetic dilution is a factor which is ignored by, or unknown to, most family researchers, but it is an important and inevitable fact.  In addition, it is also worth bearing in mind the more esoteric consideration that, in spite of a proven genealogical link to a remote ancestor, sampling error during the recombination of genes prior to gamete formation may have totally excised a genetic linkage.  We may be “related” to someone but have no genes in common, at least due to recent reproductive history!  The final point in relation to genetic relatedness is that there may be non-paternity events in our lineage.  If we assume that that the frequency of non-paternity events is 2% per generation, then if we can trace a line of ancestors back for 11 generations the probability that there has been no non-paternity events in this chain of 10 genetic links is (0.98)^10, which is approximately 0.8, ie there would be  ~20% chance that there would be at least one non-paternity event in the chain.  A non-paternity event is, like a wrong attribution of parenthood based on records, a fundamental break in understanding our genetic link to the past.

Taking all these considerations together, I find the most interesting area of family research relates to direct ancestors born from ~1770 onwards and extending back from me by 5 generations, ie to my great, great grandparents.  These are people of whom we may come to know something significant about their lives.

Inheritance, as we noted at the start of this discourse, consists of two fundamental elements, cultural inheritance and genetical inheritance.  Cultural inheritance, like genetical inheritance, is diluted with the passage of time but not in such a precise and predictable way as genetical inheritance.  But in one way it endures and may have the power to influence people in future generations.  That is the written word, the creation of music or works of art, or any number of contemporary digital outpourings.  A book, such as “The Origin of Species” by Charles Darwin, could be cited as such a cultural work which has not lost its impact, even though it was first published more than 140 years ago.  Few of us will have a cultural inheritance as profound as that of Darwin’s writings lurking in our family history but through family research we usually find something that is insightful of its times and worthy of preservation.

In recent years it has become clear that each of us carries a precise information record within the 20,000 genes and associated “junk” DNA in our cells which will, in time, become the means to create a deep insight into our individual genetic origins.  Although this branch of genealogical research is in its infancy, it has already given some understanding of the migration of ancient human populations.  These DNA sequences provide a link with a past much more remote than can ever be achieved with conventional family research.  DNA information does not depend on the written word and is unaffected by name changes, sampling error and non-paternity events

Human mitochondrial DNA has a 400 letter variable sequence which can be classified into a branching tree.  This leads to the recognition of ~36 different major branches in human populations throughout the world but only 8 in Europe.  Because of the exclusive mother-to-daughter transmission of mitochondrial DNA, these 8 European families represent only 8 original females in the original population which invaded Europe and left offspring generation after generation continuously to the present day.  These females have been called the “Seven Sisters of Eve” (even though we now recognise eight of them).  Their places and times of origin can be estimated and most seem to have originated in SW Europe 10 – 45 thousand years ago.

A similar exercise to that for mitochondrial DNA can be carried out for Y-chromosome DNA which is passed exclusively from father to son.  It turns out that there are ~21 major Y-chromosome clans in the world, only 8 of which occur in Europe and, of these, only 5 occur in the British Isles

Analysis of minor variants in DNA sequence for both mitochondrial DNA and Y-DNA gives a finer grain understanding of population movements in more recent times.  Of course these two DNA sources deal only with a small percentage of all human DNA and development of markers in the non-sex and X chromosomes has the potential to give an even greater insight into the geographical and racial origins of each of us, if we are prepared to pay the necessary fee!  

Don Fox


20120629

donaldpfox@gmail.com

No comments:

Post a Comment