Teaching Hardy-Weinberg and Population Genetics using Spreadsheet Models–Part 1

(modified from a post that originally appeared at http://www.KABT.org)

Step 11

On numerous occasions I have argued that trying to model H-W equilibrium in classroom with activities such as the AP Biology H-W lab, the M & M’s labs (http://www.accessexcellence.org/AE/AEPC/WWC/1994/mmlab.php) or with beans suffer from too small of sample size (population) or the models are simply too tedious for the students to explore.  Computer spreadsheets provide a unique environment that allow students to build and test their own models on how a population’s gene pool can change.  The testing, in particular, provides for a powerful learning experience for teacher and student.

Most spreadsheets have a “Random” function that can generate random numbers to model stochastic events.  Like flipping a coin or drawing a card at random in the AP Biology Lab 8 H-W lab the “Random” function serves as the basis for our spreadsheet model. Unlike our physical models/simulations (like the M and M’s lab) the computer can generate thousands of samples in a very short time.  The benefits to learning are worth the challenge of trying to learn how to build the spreadsheet.

In this post, I’ll present the essential parts of an EXCEL spreadsheet (other spreadsheets will work as well) that can be used to explore some of the first principles of the effects of population size on genetic drift.  In addition, this post is long-winded because I’m attempting to provide the strategies and questions I would use to encourage my students to develop their own computer-based models.  This is not presented as the definitive spreadsheet model or approach but rather a rather simplistic model to be constructed and modified by your students.  The idea is that if the students can find their way through building this model it can serve as a foundation as they extend the model to explore more of the parameters that affect H-W.

I’ve tried to break down the more complex model into a series of manageable steps.  I’ll cover the extensions to this model and others in future posts.  BTW, it takes longer to read this post than it takes to make this relatively simple spreadsheet. I suggest that you bring up EXCEL or some other spreadsheet in a different window and try to create this worksheet as you follow follow along.  Be thinking what questions you will ask your students so they can develop their own version of this spreadsheet.  Once you’ve mastered this and can create or modify it at will, then try it out with your students–they can handle this level of difficulty. They just don’t know it, yet–that’s your challenge as their instructor. And when they do succeed, with your guidance, they will have an effective tool to explore the basic principles of H-W equilibrium–one they have created themselves.

Step by step instructions follow, below the fold:

There are a number of steps to model building.  Otto and Day (http://www.amazon.com/Biologists-Mathematical-Modeling-Ecology-Evolution/dp/0691123446), in their book on mathematical modelling suggest the following steps:

  1. Formulate the question.
  2. Determine the basic ingredients.
  3. Qualitatively describe the biological system.
  4. Quantitatively describe the biological system.
  5. Analyze the equations.
  6. Checks and balances.
  7. Relate the results back to the question.

Let’s get started working with the first four steps of model building.

Formulate the question:

Even before the computers enter the picture I would ask my students a simple, review question specific to Mendellian inheritance patterns.  Something like:

“Hypothetically speaking, you and your spouse recently decided to have your DNA analyzed by one of the new companies providing genetic testing.  Unknown to either of you since neither side of your family has had a known case of cystic fibrosis, you find out from the test that you are both carriers for CF.  What is the chance that your child will be born with CF?  What is the chance your child will be a carrier of the CF allele?”

As they work this out I’d ask a number of other questions that, step by step, move to this target–>”We’ve been studying inheritance patterns in families, how are inheritance patterns for the entire population different?”  One question that might help get us to this goal would be something like:  “How do recessive alleles stay in a population?”  “Don’t they gradually disappear?”  Obviously, I’m playing to a commonly held misconception but I want this on the table as the students investigate the question—”How do inheritance patterns or gene frequencies change in a population?

Determine the basic ingredients.

Next I present some questions that might  help the students simplify the model.  I’m shooting for a single generation, model with discrete time steps, with an infinite gene pool, gametes picked from the gene pool at random, single locus, sexual reproduction, two allele, simple dominance, limited population size, no mutation, no selection, and no migration.

“What’s gene system should we explore or model?  (ans:  single locus, two alleles)  Diploid vs Haploid?  Sexually reproducing?  What’s the inheritance pattern?  (ans: simple dominance) How will we represent this?  (ans:  A and B alleles instead of A and a) What do you expect to change?  How could we model the change?  Change implies…? (ans: time as a variable–model in discrete steps to avoid differential equations)  How should we study alleles in a population–what kind of term/variable? ” (ans:  gene or allele frequencies–an operational definition).

At this point I’d ask a series of questions that would lead to the conclusion that computers would make a good tool for simulating gene frequencies in a population.  Finally, the discussion would finish with a consensus from the students to explore the use of spreadsheets to model the population.  (This could be relatively easy if you have used spreadsheets in the past.)

Qualitatively describe the biological system.

This is a critical area to cover with students as they develop their models.  Here, you will want to emphasize simplifications, assumptions and the limitations of the model.  Again, using series of questions such as:

“Imagine a hypothetical organism, can you describe its life cycle?  Can you describe how it reproduces?  Can you diagram the stages of this life cycle?”

To make this initial exploration into an HW model there’s some important assumptions that need to be made:  All the gametes go into one, infinite pool and all have a chance of taking part in fertilization or formation of a zygote.  For now, all zygotes live to juveniles, all juveniles live to adults and none enter or leave the population and there’s no mutation.  For the first model attempt use questions to come to a class consensus that only one generation will be explored.  Here’s what a diagram might look like this.

cycle 2

I’m not sure if this is the time to cover processes like selection, migration or mutation but I think not.  In my mind, at this point we just want to create a single generation model–where the gametes are selected at random for each zygote.

Quantitatively describe the biological system.

Build the Basic Spreadsheet together as a class:

Here’s one possible script:

Where should we start?  Let’s start with the frequency of two alleles in the population.  (usually I wouldn’t answer this question but sometimes it’s just best to just get going)

Step 1

I’ve created a blue zone from A2 to D3—How do you do that?   Go ahead and create this blue zone.  (Highlight the cells, right click and select format cells) The blue zone on this spreadsheet represents the gene pool–you might want to label this and you can if you wish in row 1.  (At this point I’d tell the class that they don’t have to recreate the same spreadsheet—but it’s an easier way to begin.)

Note that the value for p is entered in cell “D2″ and the value for q is calculated by a formula in cell “D3″.  Make sure you enter a formula to calculate the value for q.  (you may need to offer some instruction, here.) Don’t forget to label these cells.

According to our model, the gene pool is assumed to be infinite and the selection for gametes for the next generation is assumed to be random. To accomplish this in the spreadsheet we call on the RANDOM function. Let’s see how it works.  If you were to pick an empty cell on the spreadsheet and enter the following function: =RAND() What do you get?  Hit the F9 key (manual recalculation) several times and tell me what you get. (a random number returned that was somewhere between 0 and 1.)  Our entire model is going to be based on this RANDOM function.  Unfortunately I’m not sure how random it really is but for our purposes it will work.

Step 2

In cell “E5″ let’s generate a random number, compare it to the value of p, and then place either an “A” gamete or an “B” gamete in the cell.  We’ll need two functions to do this:  The RANDOM function and the IF function.  The smart thing would be to have you look this up but I think we’ll just put this into the spreadsheet.

Note that the function that is entered in cell “E5″ is: =IF(RAND()<=D$2,”A”,”B”)

Be sure to include the “$” in front of the “2″ in the cell address “D2″ ,  it will save time later when building on to this spreadsheet.

Which basically says, if a random number between 0 and 1 is less than or equal to the value of p then put an “A” gamete in this cell or if it is not less than or equal to the value of p put an “B” gamete in this cell. “IF” functions and “RAND” functions are very powerful tools when you try to build models for biology. Now create the same formula in cell “F5″–don’t just copy it sideways or if you do make sure that it is formatted exactly like “E5″.  When you have this completed then press the “F9″ key on your windows keyboard to force a recalculation of your spreadsheet. If you have entered the functions correctly in the two cells you should see changing values in the two cells.  (This is part of the testing and retesting that you have to do while model building that tends to reinforce the general principles in the learner.)

Step 3

Finally, copy these two formulas down for about 16 rows that will represent 16 offspring for this generation.

Step 4

We’ll put the zygotes in cell “G5″ . The zygote is a combination of the two randomly selected gametes. In spreadsheet vernacular you want to concantenate the values in the two cells: In cell “G5″ enter the function: =CONCANTENATE(E5,F5) and then copy this formula down as far down as you have gametes.

Step 5

The next columns on the sheet, “H”, “I” and “J” are used for bookkeeping–keeping track of the numbers of each zygote’s genotype. They are rather complex functions that use IF functions to help us count the different genotypes of the zygotes.  Here goes:

The function in cell “H5″ is: =IF(G5=”AA”,1,0)

which basically states: if the value in cell “G5″ is “AA” then put a 1 in this cell, if not then put a 0.

Enter a very similar function in cell “J5″:  =IF(G5=”BB”),1,0)

Can you interpret this formula–what does it say in English?  (If the value in cell “G5″ is”BB” then put a 1 in this cell, if not then put a 0.

Step 6

Now let’s tackle the nested “IF” function. This is needed to test for either “AB” or “BA”

In cell “I5″ enter the nested function: =IF(G5=”AB”,1,(IF(G5=”BA”,1,0))

This example requires an extra set of parentheses–something that is necessary to nest functions. This function basically says: if the value in cell “”G5″ is exactly equal to “AB” then put a 1; if not then if the value in cell “G5″ is exactly “BA” then put a one; if it is neither then put a 0 in this cell….

Copy these three formulas down for all the rows you have produced gametes.

Enter the labels for the columns you’ve been working on, “gametes” in cell “E4″, the label “zygote” in cell “G5″, the label “AA” in cell “H4, the label “AB” in cell “I4″, and the label “BB” in cell “J4″

There you have it. Copy the cells “C5″ through “H5″ down for as many zygotes as you’d like in the first generation. Sum up the values in the “F”, “G”, and “H” columns to summarize the genotype frequencies in the next generation, make a histogram and you end up with something like this:

Step 11

With a model like this you can vary the number of offspring by inserting new rows and copying the formulas or by deleting rows to investigate the effect the size of the population has on the gene frequencies in the next generation. Those of you using the AP Biology Lab manual can use this spreadsheet to answer the questions in the Lab section 8B, Case I  and it should be pretty easy to modify it to answer others. That’s your challenge.  We’ll explore expanding this model in future posts.

In case you had trouble interpreting these instructions I’m also embedding a Zoho collaborative spreadsheet that you can click on to explore this model and how the cell formulas are entered:



No Comments »

RSS feed for comments on this post. TrackBack URL

Leave a Reply

You must be logged in to post a comment.