00:00:12.05Well, I'm going to talk to you today about the most fascinating
00:00:18.17problems in biology that anyone can imagine.
00:00:23.25And the interesting thing is that you don't need a big lab to do this work.
00:00:31.28You can do it at home. You can do it where ever you are.
00:00:38.05But before I explain this, let me tell you about a very important development
00:00:47.29that took place in the last century (that's the 20th century)
00:00:52.18in regards to our knowledge of the history of the universe.
00:00:58.07See, what I'm going to talk about is the reconstruction of a past.
00:01:04.13And if we can find scientific evidence for that
00:01:09.27so that we could recreate the past at least in theory.
00:01:14.26That seems to me to be an enormously challenging problem for us to solve.
00:01:22.24Now what happened in physics was the following
00:01:28.02it soon became realized after Einstein's principles of relativity were established
00:01:37.28that light, which has a constant speed in the universe,
00:01:44.17that light coming from afar actually takes time to reach us.
00:01:51.03So, the further out in the universe we can look,
00:01:57.18the earlier the events we are seeing.
00:02:00.27The light left there many, many millions of years ago.
00:02:07.16And so the question is: if you could measure the distance of these remote stars, galaxies,
00:02:17.21you could begin to measure exactly what they tell you about the early stages of the universe.
00:02:27.09And of course we know have a means of measuring them.
00:02:31.03It's called the red shift because this is a function of distance and
00:02:37.28it was noted that the lines of the elements in the spectrum were shifted to the red.
00:02:46.06The further the light came to us, the more the red shift took place.
00:02:53.29And so, we can therefore, knowing that distance,
00:02:58.23tie those two things together and
00:03:02.22we can now begin to reconstruct events that took place in the remote past.
00:03:09.26And so the question we can ask for our science of biology:
00:03:17.02Is there anything like that?
00:03:19.13Do we have within ourselves, so to speak, evidence of the remote past?
00:03:27.28Now, as all of you know, over the last few decades and particularly over the last few years,
00:03:37.12there has enormous development of DNA sequencing techniques.
00:03:46.04And as a result of this, the genomes of a large number of organisms
00:03:52.19have been sequenced, including the genome of ourselves,
00:03:58.04of our nearby primate cousins (chimpanzee, the orangutan),
00:04:05.10and going all the way back through many invertebrates
00:04:11.04and going all the way back into the unicellular organisms and many bacteria as well.
00:04:19.16And what we'd like to know by the study of these genomes we'd like to know two things of course.
00:04:29.00The first is, could we actually read and understand these genomes?
00:04:36.00Could you pick up a genome, look at it and say, "Oh, yes. This is the genome of a zebra."?
00:04:43.16That would need an understanding of almost all biology.
00:04:48.17It is very remote for us to do this at the present moment.
00:04:54.01We can do a little in microorganisms. We can recognize some proteins that they may produce.
00:05:03.00But, for the main, that still remains an enormous challenge;
00:05:08.29the interpretation of the meaning, that is the function, of these genomes.
00:05:15.23This is important because as you all know biological systems contain
00:05:23.11a complete description of what they are and will be in the DNA code of their genomes.
00:05:34.23And of course, since that is what is propagated, that is the structure that
00:05:41.17is changing during evolution with the consequent changes in the phenotype.
00:05:47.28But the problem I'll deal with today is not so much that problem,
00:05:55.26which in itself is enormously challenging, and will occupy biologists for a long time to come,
00:06:03.29but with the other one,
00:06:06.01which is to ask: Can we see anything in these sequences that tell us of the remote past?
00:06:16.08And what has happened now, very frequently,
00:06:20.07is that a large number of people have begun to compare these sequences,
00:06:27.06notice and calculate the differences between them.
00:06:32.29So, for example, the chimpanzee genome can be compared with the human genome.
00:06:43.05They are extremely similar, about 99% the same.
00:06:48.20And we can also compare this with animals that are more distantly related to us
00:06:56.07such as the mouse and the cow and even down to frogs, lizards, fish and so on.
00:07:03.24What people then do is pick a region of the genome, pick a gene as they say.
00:07:11.04Then, make an alignment, that is, put the things into line,
00:07:16.11because amongst the many mutations that occur are insertions and deletions.
00:07:23.24And then one can compare the two and count the changes.
00:07:30.10And of course, it stands to reason that the greater the number of differences,
00:07:37.15the greater the number...the more distant these two organisms are.
00:07:45.00Now, you will see that if you do this comparison, you get out of this
00:07:53.15a number which is essentially compounded of two other numbers.
00:08:00.26The two other numbers are the time in the past at which the two organisms diverged
00:08:12.10and the rate at which they diverged.
00:08:17.07So you can see you get exactly the same if
00:08:21.15you go twice as far into the past and evolve at half the speed.
00:08:28.06The numbers will come out exactly the same.
00:08:31.20So how is this dilemma resolved?
00:08:36.19In most cases, it is by accepting a number of assumptions
00:08:43.18and one of the difficulties of this field is the number of hidden assumptions that
00:08:51.19lie behind all the programs and equations that people use to calculate these differences.
00:09:02.04Now, I think that unveiling these assumptions is one of the best ways of approaching this.
00:09:13.21And in order to do this, I think we need to focus on a particular case.
00:09:22.03What I'm going to do for the rest of this talk is to ask a lot of questions
00:09:27.27about mice and men. How distant are we?
00:09:31.26And what can we learn, really, about the comparison?
00:09:37.21Now, of course, I'd like to make some other, more general comments first.
00:09:45.26And those deal with the difference between genes that change under selection,
00:09:56.07that is a favorable mutation that enhances the reproductive success of the animals it is found in,
00:10:06.03will of course, come to then be, as we say, fixed in the population by selection.
00:10:16.00This can be extremely rapid and that is one form of change.
00:10:26.07There's another form of change where the actual change has no effect on
00:10:31.21the phenotype so far as it effects reproductive success neither one way or another.
00:10:42.22These are called neutral mutations and it has been assumed that
00:10:51.05if there are genes under selection, then, of course, the genomes will change more quickly
00:10:59.05and if there are genes that are not under selection,
00:11:05.02then they will become established by a process that is called genetic drift.
00:11:14.11I will explain these in a moment but before I go on, I want to say one other thing.
00:11:23.06A perplexing problem in the study of the genomes of higher organisms like ourselves,
00:11:32.14is that most of the genome appears to be unnecessary. It is, if you like to call it, rubbish.
00:11:41.11Now, as you well know, there are two kinds of rubbish; the rubbish you keep,
00:11:46.13which we call junk and the rubbish you throw away which we call garbage.
00:11:51.27And the question has been debated fast and furious over many decades
00:11:59.28what was these meaningless sequences that we find in our genome and
00:12:09.20in fact in our genome it could be as much as 95% of the sequence.
00:12:15.22Now, in this study, you're not allowed to have the issue that
00:12:23.19genomes can plan their future because they obviously can't.
00:12:29.06You can imagine this if you think of the primordial bacterium in the primordial soup saying,
00:12:38.03"I can't change this gene because I'm going to need it in 2 billion years to make actin."
00:12:45.01So that is absolutely not possible to do.
00:12:49.16So, you have to ask whether what does this junk do?
00:13:01.08The best answer I know is that it does nothing.
00:13:05.17It doesn't do you any harm, so you have it.
00:13:08.23It doesn't do you any good so it might as well...you could lose bits of it.
00:13:15.10It doesn't have any effect at all.
00:13:18.23It is certainly the balance between two processes that go on in all genomes
00:13:25.27which is the increase of DNA and the loss of DNA.
00:13:32.09And these are events which take place in our genomes
00:13:39.27in which genomes increase in size with one process and genomes reduce in size by another
00:13:49.03and usually the balance between these will depend on whether the genome expands or contracts.
00:14:00.17Now, in at least mammalian genomes, there are events
00:14:06.23that lead to massive expansion of the genome and these are, effectively, transposons;
00:14:17.10that is, pieces of DNA that can copy themselves more than once in each cycle.
00:14:24.00And there are events in our germ line that allow these expansions
00:14:30.24of a large number of elements, composing at least half the DNA.
00:14:37.08One element called the MU sequence composes about 27% of your sequence,
00:14:47.02which is a lot. There's millions of copies of this strewn all over the genome.
00:14:54.00The fact that sometimes these will be used doesn't prove...
00:14:59.20in fact, they can be used but purely incidentally. If they are used, they will be used.
00:15:08.06And the problem then we have is of how we're going to analyze
00:15:13.21the parts of the genome that we want to get information about.
00:15:22.15So what I have been interested in is analyzing coding sequences.
00:15:31.04Now, coding sequences, of course, specify your proteins.
00:15:36.16They are the ones which are most conserved because
00:15:40.23you have to conserve them in order that you have a protein that is functional.
00:15:46.21But they change partly because some amino acids can easily be substituted for others.
00:15:54.25And indeed, sometimes they acquire new function as well.
00:16:00.29The interesting thing about the genetic code, is that parts of that sequence do not matter.
00:16:10.18That is, the codon are, as we say, degenerate.
00:16:16.06So you can make the same protein by using different genetic sequences
00:16:24.18and will make the identical protein.
00:16:27.15And, without getting into detail,
00:16:32.06you will know that some amino acids have two codons coding for them.
00:16:43.19They differ only in whether the third base is an A or a G or in other cases a C or a T.
00:16:54.17And there are some other codons which are coded for by all four bases.
00:17:00.20For example, a glycine is coded for by GGX where X can be A, G, C, or T.
00:17:11.10So that in the code itself, in the protein sequences that we find in the code,
00:17:21.26that we find in the genome, we are able to look through these
00:17:30.05at the frequency of neutral changes
00:17:34.16without having to have any postulated selection event.
00:17:41.28So let's ask ourselves whether we can see in rigorously conserved amino acids,
00:17:51.26that's lysine, whether there are changes in a given lysine, in a given protein
00:18:00.24which have happened throughout the lineage of organisms
00:18:05.13whether in some it's coded by a G and in others it's coded by an A.
00:18:13.12Because that change is neutral.
00:18:17.25And in the same way, we can look at the four codon amino acids
00:18:26.02and ask whether we can have all four possibilities at the same place,
00:18:33.06keeping the protein completely the same, and whether there are these switches.
00:18:40.04Now, I want to draw your attention to one of these.
00:18:45.00If I look in the third base of a codon,
00:18:54.22like leucine or lysine, I will see that sometimes I have in one organism
00:19:09.28I have CTG and if I go down, I can have in another organism CTC.
00:19:21.04Of course, I can have A and T as well.
00:19:24.28But I only want to consider these two.
00:19:27.22The reason for this is that if you see: that switches to that.
00:19:35.19That mutation from G to C is technically a mutation in the DNA of a G-C pair to a C-G pair.
00:19:49.25And simply, the base pair orientation, which does not matter in this case,
00:19:58.22that is, it just depends whether G is on the Watson strand and C is on the Crick strand
00:20:06.20or C is on the Watson strand and G is on the Crick strand.
00:20:11.07So then the forward and reverse mutation rate is exactly the same.
00:20:20.02So that if we take every gene we want to compare in this case,
00:20:28.28and we now look at the switches from G to C and vice versa,
00:20:40.15that is we decompose every protein we compare into a string of Gs and Cs
00:20:52.00just looking at the third bases of comparative codons.
00:20:56.02So one protein might go something like this.
00:21:02.10We don't worry about the distances between them.
00:21:05.18And the other one might look like this.
00:21:23.08OK. So, in this one there's been an exchange of a C for a G. We'll call it a flip
00:21:31.06because that's essentially the base pairs flipped over.
00:21:34.28Here's another flip and there's another flip.
00:21:39.10So, if you like, these are separated by three flips.
00:21:48.06So we can count the separation. This turns out to be, fortunately, a slight process.
00:21:55.21And when these are counted for, we can compare genes from different organism
00:22:03.11We will find that, as expected, the further the organism is back in the evolutionary tree,
00:22:13.07the less frequent are these flips.
00:22:20.06So, for example, if I compare human with a cow (bos), then this number is 0.94.
00:22:40.14If I compare human with a mouse or the rat, you get the same number,
00:22:51.18which is fine because rats and mice...I mean after all, mouse is just a little rat.
00:22:58.15I get a number 0.90.
00:23:03.24If I take the human and I compared it with the chicken, that number is 0.70.
00:23:10.24And if I compare this with the fish, I get numbers that are 0.53.
00:23:24.17Now, you'll notice that 0.5 means that these are uncorrelated.
00:23:33.10You can't get lower than 0.5. It means then there's no history left in the genome.
00:23:41.04So every time you can see you can find, by comparing two organisms,
00:23:48.15you can find a value that is above 0.5 for this measure,
00:23:57.08only then can you say that we have information and this is what the information means.
00:24:06.18If it's less than 0.5, there's no information.
00:24:10.05It could be any time in the past. In fact, the history is lost in the mists of time.
00:24:19.17So, now what do these numbers mean?
00:24:24.01And this is the sort of measure...
00:24:26.15this just happens to be a very particular one that I choose because
00:24:31.20I didn't want to get confused between differing rates of forward and reverse mutation.
00:24:39.05This is symmetrical, it doesn't matter anything about the organism
00:24:44.22and if the chemistry changes or there's a repair process
00:24:50.22that works in one direction, it has to work in the other direction at the same time.
00:24:56.08So, this is the actual measure that is free from all kinds of other interpretations
00:25:07.00and gives you this sort of measure.
00:25:09.13Very well. So, now, what does this mean?
00:25:12.23Now, in general, we want to find out how far back the separation time is.
00:25:21.10We want to do phylogeny. What do people normally do?
00:25:28.14What people normally do is just say, "Well, there are 3 differences.
00:25:35.19How many of them occurred in the human? And how many of them occurred in the mouse?"
00:25:42.08Because, this could have been a C in the precursor organism,
00:25:50.00and changed to a G in the mouse, and this could have been a G and changed to a C in the human.
00:25:58.08Well, what people normally do is split the difference.
00:26:03.18In other words, they'd say, "Well, of the changes here, there's 10% flip,
00:26:10.27we'll put 5% in the human (0.05) and we'll put 0.05 in the mouse."
00:26:22.26Now, you can't do that because, as I'll show you in a moment,
00:26:29.16mice and men are evolving at different rates.
00:26:34.04And this is the first time that this paradox can actually be resolved.
00:26:40.08And I'll go through the argument as it goes.
00:26:45.00I also point out that before we go to anything else,
00:26:55.11you can't tell from the absence of any information, which direction evolution is preceding.
00:27:06.22For example, if I compare a fish with a human genome,
00:27:13.16I see there are lots of differences.
00:27:16.11I see the human appears to have more genes and so on
00:27:21.15and I can have two equal hypotheses in the absence of any other information.
00:27:30.19One is, half a billion years ago there were a lot of people running around
00:27:36.23and they degenerated into fish. Some of them, of course, they left some humans behind.
00:27:44.14Or other other hypothesis is, half a billion years ago, there were lots of fish around,
00:27:52.29and they actually evolved the human genome.
00:28:00.25So you can't tell where you gained, so to speak, the sign is plus or minus
00:28:05.26in the absence of other information.
00:28:08.20And the information we have is the fossil record.
00:28:12.15And that's very important information
00:28:16.01because it tells us what was around there which is the origin of this
00:28:26.02and it also can give us a very good date.
00:28:29.22So, remember when you are doing this,
00:28:33.29you must have something that gives you a clock in a sense,
00:28:40.00but it can't be a clock in the organism as I will show you now.
00:28:45.17Many people thought that the rate of mutation remained constant over time.
00:28:51.29They thought there was a molecular clock but I think the molecular clock hypothesis is broken
00:28:59.20because the clock rates, as you will see in a moment, differ markedly.
00:29:05.10Very well. So, what I want to know is, of all those changes,
00:29:13.00and you can go through lots and lots of genes and collect these changes.
00:29:17.09It's a lot of work.
00:29:18.15But the question now arises: I can't just split them between the mouse and the human.
00:29:26.08I must have a way of telling how many of these changes occurred in the mouse.
00:29:32.29And so what I can do is take an organism that's further back in evolution
00:29:40.10for which I have good evidence lies further back,
00:29:44.01and take that organism and see what its gene is.
00:29:49.19So that if we have one that branched off earlier...
00:29:55.10So let's just say that what we're going to have here is the human (and I'll draw it here).
00:30:03.16Then we've got, at some point, the mouse branched off. I'll say that's the branch.
00:30:13.00And if I have another branch that's further back, giving me another organism over here,
00:30:21.23and I look in this organism's genes, then I can actually say that,
00:30:29.11if the human has got the background and it's also present in this organism,
00:30:36.01then the change happened in the mouse.
00:30:39.25Of course, I'm going to make a few mistakes
00:30:43.10because this one happened to change in one other direction but we can correct for that.
00:30:50.20Now, if you do this on this, going through now three sets of genes,
00:30:56.14one from the cow, one form the human, one from the mouse,
00:31:02.00you find that, in this case, 2.7 or so changes occurred in the mouse
00:31:17.27to every one change in the human.
00:31:24.14In other words, mice are evolving...
00:31:27.20and this can be said... we don't want to give, we don't have to give any reasons for it,
00:31:36.14we don't have to say this is the way they behave
00:31:40.02because everything we know about modern mice
00:31:42.14would be very hard to project backwards in time to this branch for all we know.
00:31:48.20What branched off here, what branched off here were very different animals all together.
00:31:55.01But, we can certainly say that the length of this for the mouse,
00:32:03.02is 2.7 times the length for the human.
00:32:08.10And you can do the same for the cow. You can do the same for a lot of organisms.
00:32:15.16In fact, what we can discover from this is now a very interesting paradox
00:32:24.09because we have strong evidence, because it doesn't matter...if I'll just do this now for you.
00:32:32.21We can then draw this line here.
00:32:48.21Which must be 2.7 times the length of this line. This is the human.
00:32:58.11So basically let us say that the fossil evidence suggests that
00:33:05.05this is -80 million years, that point of division is 80 million years.
00:33:13.27Then I have to conclude from this that mice are
00:33:19.12on the human scale plus 125 million years in the future.
00:33:35.03So here's today and what mice have done is evolved into the future
00:33:43.02to the extent that if we were to run the human line we would only catch up in 125 million years.
00:33:52.28And I think the mouse genome is a pretty good genome and so we can quite safely say,
00:34:01.00so far as the genome is concerned, we could guarantee this for 100 million years.
00:34:07.05And that emerges strictly by knowing the rate of evolution.
00:34:12.12And we can do the same for the cow
00:34:15.20and you'll find that the cow is about 30 million years into the future, it's evolving quite slowly.
00:34:28.00And we can do the same for all the animals.
00:34:31.18Now, of course this gives you a high resolution because if I'd have taken my numbers,
00:34:39.29(you'll remember I think they said the cow was this and the mouse was this).
00:34:47.03It actually turns out that for this flip rate calculation,
00:34:53.23I would have deduced that the mouse is a more ancient point of division than the cow,
00:35:02.19if I had assumed everything was going at the same rate.
00:35:06.15In fact it's not true because,
00:35:10.12as we've seen now, the cow division point is older than the separation from the mouse.
00:35:20.07Now, with this approach you can do absolute phylogeny
00:35:27.20with quite high resolution certainly for mammals.
00:35:31.28You need a lot of data but the data is now available.
00:35:37.14We can do the comparison with let us say marsupials which are very far back.
00:35:45.28And we can work out their rate of evolution in much the same way
00:35:52.08by doing the comparison with further and further back.
00:35:56.06The more distant you go back in time, the more difficult it is to actually make the measurements
00:36:07.15because you are getting closer and closer to the point of equilibrium.
00:36:13.03And where you have a very high resolution is in this region of 0.8 to 1.0.
00:36:23.24One is identity.
00:36:25.14Oh, by the way, on this scale, the chimpanzee,
00:36:29.09which actually is evolving at the same speed as ourselves,
00:36:33.02so is the macaque monkey,
00:36:35.08but the chimpanzee on this scale, this division is very close up here because the genomes are so close.
00:36:45.24So we can actually, by the same token, measure in a set of primates
00:36:53.05we can get the absolute point of departure
00:36:56.27because we can measure the rate of evolution in these organisms.
00:37:02.16And I've done it for the macaque, the chimpanzee, and of course ourselves.
00:37:12.01Well, this absolute phylogeny says that you can actually reconstruct the past
00:37:21.01simply from the study of the genome and simply by looking at these changes.
00:37:29.14Now, where I'd like to go on is to tell you about another problem
00:37:37.13which seems completely unable to be resolved unless you take a
00:37:49.11very clear cut attitude towards unbarring all the assumptions that lie behind what people say.
00:38:00.24We'll see here that basically by saying we can't just simply assign all the changes to this,
00:38:08.25we must measure the rate.
00:38:10.17Once we've got the rate, then we can actually make a measure of distance
00:38:17.24and once we've fixed one time, let us say mouse-man separation,
00:38:25.27we can then put in all the other animals that's we've compared on the absolute time of separation.
00:38:34.00And so you can construct a tree which is one of absolute phylogeny.
00:38:41.04which I think is a terribly important thing to do.
00:38:45.26So, by doing that, we can scale at least all the mammals
00:38:52.12together with the marsupials onto one scale and we can study them.