Teaching Hardy-Weinberg and Population Genetics using Spreadsheet Models–Part 1

(modified from a post that originally appeared at http://www.KABT.org)

Step 11

On numerous occasions I have argued that trying to model H-W equilibrium in classroom with activities such as the AP Biology H-W lab, the M & M’s labs (http://www.accessexcellence.org/AE/AEPC/WWC/1994/mmlab.php) or with beans suffer from too small of sample size (population) or the models are simply too tedious for the students to explore.  Computer spreadsheets provide a unique environment that allow students to build and test their own models on how a population’s gene pool can change.  The testing, in particular, provides for a powerful learning experience for teacher and student.

Most spreadsheets have a “Random” function that can generate random numbers to model stochastic events.  Like flipping a coin or drawing a card at random in the AP Biology Lab 8 H-W lab the “Random” function serves as the basis for our spreadsheet model. Unlike our physical models/simulations (like the M and M’s lab) the computer can generate thousands of samples in a very short time.  The benefits to learning are worth the challenge of trying to learn how to build the spreadsheet.

In this post, I’ll present the essential parts of an EXCEL spreadsheet (other spreadsheets will work as well) that can be used to explore some of the first principles of the effects of population size on genetic drift.  In addition, this post is long-winded because I’m attempting to provide the strategies and questions I would use to encourage my students to develop their own computer-based models.  This is not presented as the definitive spreadsheet model or approach but rather a rather simplistic model to be constructed and modified by your students.  The idea is that if the students can find their way through building this model it can serve as a foundation as they extend the model to explore more of the parameters that affect H-W.

I’ve tried to break down the more complex model into a series of manageable steps.  I’ll cover the extensions to this model and others in future posts.  BTW, it takes longer to read this post than it takes to make this relatively simple spreadsheet. I suggest that you bring up EXCEL or some other spreadsheet in a different window and try to create this worksheet as you follow follow along.  Be thinking what questions you will ask your students so they can develop their own version of this spreadsheet.  Once you’ve mastered this and can create or modify it at will, then try it out with your students–they can handle this level of difficulty. They just don’t know it, yet–that’s your challenge as their instructor. And when they do succeed, with your guidance, they will have an effective tool to explore the basic principles of H-W equilibrium–one they have created themselves.

Step by step instructions follow, below the fold:



An Extension to the Logistic Model

This post describes how my classes wrap up their preliminary exploration into population modeling using spreadsheets.

Earlier posts in this series:

Sparrow Lab

Exponential Growth

Logistic Growth

Once my students have successfully modeled logistic growth, we sometimes wrap things up with discussions on the power and limitations of models followed with a sequee into human population growth models and predictions.  However, as often as not, if the class has begun to embrace modeling I’ll take them on a momentary side trip.  To begin this exploration, I ask them to identify the parameters in the equation that have the most impact as the parameter changes.  I ask that they systematically try out changes to N, to K and to r.  Generally, the students conclude that changes in N and K are pretty straight forward and predictable but that changes to r change the overall shape of the growth curve more dramatically.  Interestingly, since the inititial values for r that the class have used to this point have been less than “1″, I usually have to persuade them to try out values greater than “1″.  I ask them to record in their notebook, the shape of the growth curve for various values of r between 1 and 4.  Pretty soon I hear some “ohhhh’s” and some “wow’s” coming from different students as they discover the same type of patterns that Robert May (R.M. May (1976). “Simple mathematical models with very complicated dynamics“. Nature 261: 459) did back in the early seventies when he explored the effects of different r values on the logistic.  You can find more info at:


There are some difference between their models and accepted models, but soon the students find that there is a point where there are two stable points that the “population” oscillates between–then four and then chaos…..


From Wikipedia

Now, I’ve got a problem—how far do we explore?  Generally, I leave this particular topic at this point with a few quick words on complexity and how small differences in intitial conditions can have profound effects on processes through time.  Mostly, my goal here is to introduce students to an entire other field that they might find fascinating and be able to link to their math, to their biology and to their physics.  To that end I challenge them to explore this topic on their own time–and many do, each year, bringing back all kinds of links, fractals, and applications of complexity theory in biology.


Using Spreadsheets to Introduce the Logistic Population Growth Model

This post is a continuation of exploring the use of spreadsheets in high school biology.  I’ve started with a rather obvious topic:  population growth.  What I present is only one possible scenario which is meant only as a starting point.  Two themes I hope are apparent as you read through these posts:  1.  I use questioning techniques to help the students connect to their previous knowledge while they are developing new understandings and 2.  I really work hard to have the patience to allow the students time to work out their own solutions on the spreadsheets with only a little intervention from me.  That’s the beauty of spreadsheets–they can quickly provide feedback to the students as to whether or not they’ve entered their formulas correctly or even if their proposed formulas work the way the student hoped.   In other words, making mistakes and fixing them is a critical part of these exercises.  Don’t cheat the students out of a learning opportunity by providing too much help/guidance.   In these posts I’ve suggested that you work out the spreadsheet yourself before checking out the embedded sheets.   In my experience, my mistakes help to inform my teaching as well.  I doubt that I’ve ever created an original spreadsheet model the first time from scratch that I didn’t subsequently correct or modify–that’s an essential part of the process.

Earlier posts in this series:

Sparrow Lab

Exponential Growth

At the end of the exponential growth post I mentioned that mathematical models can be additive–perhaps I should have said modular.  The exponential equation developed in the earlier sheet now serves as the core for more sophisticated models.


At this point with my students I enter a conversation that explores what they see in the real world.  Do populations continue to grow exponentially?  Why not?  What factors might limit population size?  Eventually, using guiding questions we follow a path that leads to a new concept:  carrying capacity.   At this point, with student input, I sketch a graph on the board that has the x-axis labeled time and the y-axis labeled population size.  I then draw a horizontal line across the top of the graph that I label carrying capacity.  I ask the student to do the same on a piece of paper and then challenge them to sketch a line that represents a population that grows exponentially at first but as the population size approaches the carrying capacity the population growth slows and the population size levels off.   Eventually, the class agrees that a likely scenario would be an S-shaped line, with an increasing slope early on, with a transition zone where the slope changes to a decreasing slope and an eventual leveling.

With the target in mind, I bring the class back to their earlier spreadsheet model of exponential growth that had two terms:  N and r.   I ask a number of question such as:  “Which of the two terms change as the exponential equation is recalculated”  “Which term is constant?”  “If we wanted to modify the exponential growth curve into the S-shaped curve what has to happen to r?” (no longer constant)   At this point I introduce a new variable to the work:  “K” which represents carrying capacity.  (Naturally, there is further discussion about carrying capacity in the real world and in the model.)

Now, for the hard part—having the students come up with the logistic expression themselves.  First I remind the students about the algebraic form of the exponential equation that they represented their earlier spreadsheet:

Nt = N(t-1) + r*N(t-1)

The discussion has already focused on the “r” term which is in the second expression.  I ask the questions such:  “What part of the graph is population growth maximal?  minimal?”  “How can we change ‘r’ to maximize growth? minimize growth?”  “Now if the spreadsheet has a constant value of ‘r’ how might we change that value during the calculations?”    At this point I will introduce the idea of adding another expression to the equation–the logistic.  “Is there some mathematical expression that we could add to this equation that maximizes ‘r’ early but minimizes ‘r’ in later generations?”  “Can you think of an expression that includes just the N variable and the K variable that can be multiplied times ‘r’ to fill the needs of the model?”  or  “Can you think of an expression that is approximately equal to “1″ when N (the population size) is small but approximately equal to “0″ when N approaches K in size?”  At this point I let the students “discover” this expression themselves.  I ask them to try out the expressions they think will work in their spreadsheet.  To evaluate their proposed expression put it in the spreadsheet and use the graph produced to evaluate whether the expression works as planned.

The first time I tried this, the students took most of an hour and went through quite a bit of frustration.  I’m not really sure why I thought they could “empirically” determine this expression or what I thought they’d get out of it but I realized part of the value of the exercise when all of a sudden, one of the girls jumped up and yelled “Yeaaaah, I’ve got it”.  I decided to not have her share her strategy with the others—but instead prompted them to keep trying.  Eventually the entire class came around to the logistic expression:  (K-N)/K    Definitely a powerful experience.  The students learn that they can solve seemingly impossible problems with hard work but they also learn how to think about mathematical models in of biology.  It’s fairly easy to discuss  now, the limitations and the power of the model.  BTW, that first student is now a professional biologist.

I hope that you try to create this spreadsheet yourself before you ask students to do so.  Here is an example of how the spreadsheet model might be formulated.

Link to the spreadsheet in case the embed feature is not working.



The new Data.gov web site

The new Data.gov web site

The federal government just launched a new web site:  www.data.gov.  It’s a compendium of pretty amazing (and amazingly huge!) data sets and tools – all free and available to the public.

In the “About” section of the site, they explain that the “open government” priority of the Obama administration is the driving force behind this new site.  It is an attempt to improve access to Federal data and expand the creative use of those data beyond the walls of government.

I’m thinking there might be some good stuff in here for biology teachers.  Here’s a sampling of a few intriguing data sets that I found on the site:

- FluView:  a natioanl flu activity map (there’s also a state-by-state map)

- Cancer Incidence:  Surveillance, epidemiology and end results

- Residential Energy Consumption Survey:  conducted every four years, provides national statistical survey data on the use of energy in residential housing

- American Census Data

- American FactFinder:  a tool designed to search the American Community Survey, Decennial Census, Economic Census, and population estimates.

Fun stuff.  There’s a handy search engine on the home page of the site where you can indicate the kind of data you’re interested in and which federal agency(ies) you want to query.

It’s not perfect (of course).  The data available is limited (they’re promising to add more) and, curiously, there’s nothing from the Securities and Exchange Commission (hmmm…).   I would love to hear ideas about how this might be used in teaching.


Spreadsheet Exponential Population Growth Model

Earlier I covered applying spreadsheets to the old BSCS sparrow lab-Part One.

Now for Part Two:

After the students build their own spreadsheet models of the hypothetical sparrow population, as a class we discuss the parameters that taken together determine population growth or decline.  I guide the discussion with questions until the students are able to articulate the four factors that determine population growth:  birth rate, death rate and migration (emmigration and immigration).  I am careful to make sure the discussion includes reviewing a working definition of a population and that the factors identified are rates and therefore have a time element to consider.  At this point we revisit the sparrow spreadsheet model and identify how these four parameters are taken into account in the actual cells of the spreadsheet.  Students quickly identify that there is no migration terms and that the birth rate is taken care of when each pair of sparrows produce 10 offspring (column D).  They have a more difficult time with the death rate.  There is no explicit cell with a death rate parameter but only the offspring in column D move to the next year–death is taken care of by omission.

The discussion then moves to asking the students to consolidate these four factors into one term–a per cent of increase or r“.  We also establish a variable for the population size at any particular time interval: “Nt“.  The students are now challenged to represent the exponential population growth in a single equation with the variables “r” and “Nt“.

Eventually the class arrives at the following:

The new population is equal to the previous population + rate of increase times the previous population


Nt = N(t-1) + r*N(t-1)

At this point the students are directed back to their spreadsheets and challenged to develop a new model based on this equation.  This time their instructions include assigning initial values to the variables.  On the board, again I help them set up a possible structure:

r = 0.1
N = 10
Time Interval Population
0 10
1 11

It’s now up to them to create the spreadsheet using the formula.  Interpreting normal algebraic notation into spreadsheet notation is a bit of a challenge but they usually figure it out.  By iterating the formula (using the results from one time interval as the basis for the next) the students can explore and create models that without computers would require a familiarity with calculus.  Once the students have created their simple model I have them expand it to 300 generations or time intervals and graph the results.  When trying this with your students make sure that you don’t get too explicit with your help—students have to work at building this model but it is doable for most.  You should try it as well before checking on the spreadsheet embedded here:

If the spreadsheet is not loading you can find it at: Exponential Model

There is one spreadsheet technique that you need to be aware of to make this model–the difference between relative reference and fixed reference in a cell’s formula. Since the reference to “r” always points to the same cell, it should not change. The default in a spreadsheet formula reference is “relative”, which changes. You can make a cell reference fixed by adding a dollar sign in front of both components of a cell’s address. For example referring to cell: B1 is a relative reference but referring to $B$1 is a fixed reference.

Normally when I explore this topic in my class, we can pretty easily get through the sparrow population model and the exponential model in one hour. Modeling is an additive process and this is only the start. Note that the procedure thus far has only added a bit of complexity at each step–with only rudimentary math operations. The next step will be to explore the logistic model. I try and reserve at least a day for it along with a homework assignment. I’ll cover the logistic in the next post.





A wee-bit of Math Geekery–Modeling Populations with Spreadsheets

I started teaching in the late 70′s but even then I was looking for math applications in Biology.  The BSCS Green version textbook had (and still has, I imagine) a very rudimentary exercise that introduced students to exponential models of population growth–we called it the sparrow lab.


Sparrow from Andreas Solberg's Flickr Photostream

The exercise introduces some of the main points of developing a model—deciding on your assumptions/simplifications, approximating but simplifying real world conditions and introducing the limitations of models, while introducing the power of models.  Secondarily, this lab provided for most students their first introduction to semi-log plots.  If you are not familiar with this exercise it doesn’t take long to explain or to model using a spreadsheet.


  • An island with unlimited resources and no limiting factors
  • Introduce 10 sparrows to the island:  1/2 male and 1/2 female
  • Each year, each pair of sparrow produce 10 offspring
  • All offspring survive to reproduce the next year
  • However, the parents all die before the next year (every sparrow reproduces and then dies)
  • No new sparrows immigrate to or emmigrate away from the island

It does not take long tease out these assumptions from a discussion with students and to write these on your board.  Note that the assumptions about births tend to balance out the survival assumption.  The students are then instructed to calculate how many birds will be alive on the island at the end of each year–for the next ten years.   Back before computers or calculators this calculation along with the graphing took the rest of the hour.  Scaling the graphs was particularly difficult.  When confronted with their inaccurate scales some of my students used to respond by taping 6 or 7 sheets of graph paper together.  Describing how to set up and plot on semi-log paper was another difficult challenge.  Today, with spreadsheets, this exercise seems almost trivial but it is not at all.  The exercise introduces a way of thinking about biological problems mathematically and helps to build a more intuitive sense of exponential processes that are fundamental to biology specifically and life skills in general.  Likewise, as trivial as this exercise may seem you’ll be amazed at the range of approaches that your students will take as they tackle this challenge.  Once you try this, I think you’ll agree it is time well spent.

After the discussions of our assumptions, as a class we sketch out how we might create a spreadsheet that would model the hypothetical sparrow population.  Generally, up on the board I sketch out a spreadsheet (don’t demonstrate this on an actual spreadsheet, just yet.)

We usually start something like this:

First we decide our labels:

year sparrows pairs offspring

(I usually prompt them for the “pairs” label–left over from the days when we did this by hand.  While it is not necessary to calculate “pairs” it slightly simplifies the offspring calculation.)

Next we simply calculate the first two rows in our collective heads and fill the rows in during a class discussion.

year sparrows pairs offspring
0 10 5 50
1 50 25 250

The conversation usually goes something like this:

  • “So, in year zero, how many birds do we put on the island?”
    • “10″ (I write down the 10 on the board.)
  • “How many pairs is that”"
    • “5″ (I fill in the 5.)
  • “How many offspring are produced?”
    • “50″  (You’d be surprised how often the student stumble here for a minute, remember it is all in their head at this point.)
  • “How many birds start in year 1?” (Again, the students sometimes stumble here but they get it pretty quickly.)
    • “50″ (I start and finish out the second row.)

At this point then I simply suggest that they create a spreadsheet that calculates and correctly graphs the hypothetical sparrow population on the island for 10 years.  In recent years, all of my students have had some sort of spreadsheet introduction but they’ve seldom had to create their own from scratch.  I provide very little guidance at this point.  It is important that each student has a chance to make missteps while constructing this spreadsheet–recognizing and correcting these errors are when valuable learning takes place.  Most often the first student constructed sheets have some fundamental error.  I use probing questions to help the students recognize these errors but generally let the students propose their own solutions.  What I’m trying to achieve is a mindset in the students whereby they propose possible solutions, enter them in the spreadsheet and then use the results from the spreadsheet calculations to evaluate their original proposed solution–sounds a bit like the essence of scientific thinking, eh?  As simple as this particular exercise is there are still a number of students in my classes that struggle a bit with this.  In fact, I’ve had teachers in workshops struggle a bit at this point as well.  It’s for this reason that it is important to not provide to much direct instruction in this exercise–it is accessible enough that the students can struggle a bit but still succeed.  This success is key to building the tenacity needed to solve problems and the  skill sets needed later in this modeling exercise.

Here’s some of the common problems.  Remember to use questions to the students to help them see these problems:

  1. Students generally just fill in the numbers from the board and fail to create formulas in the cells.
  2. Students copy cells incorrectly.
  3. Later, students fail to recognize fixed references vs. relative cell references.
  4. They have a difficult time evaluating their graph.

At this point you should create your own spreadsheet model.  In my experience, I’ve found that I can better teach the modeling process if I create the sheets from scratch myself.  Like doing any lab procedure you need that experience to teach it.  Once you’ve created your own, you can compare your spreadsheet to the following spreadsheet.  Since it is a collaborative spreadsheet you can check out the formulas in the different cells if you’d like.

You can find it here, if the actual spreadsheet isn’t showing up:


I ask the students to graph the model using a scatter plot and graph the model with the “y” axis logged.




So, what are questions that you’d want ask your students as they built this model? What are questions you might ask about the two types of graphs?  Please feel free to contribute some suggestions or questions in the comments. This is just the first installment of an multi-day lesson on population modeling and where it can take you.  I’ll cover more in a later post.




Math and High School Biology….

Way back in the mid-80′s I attended a summer NSF institute that was structured to include math, physics, chemistry and biology teachers.  Each day we’d concentrate on our separate disciplines but occasionally we’d have evening programs that brought us all together.  One of those evening programs included a panel discussion that explored math/science curricular integration.  Of course the folks that organized the panel discussion were looking for primarily math applications in physics and chemistry thinking there really was not that much “math” in biology.  That night I was asked at the last minute to sub in for the biologist representative on the panel.    As the biology teacher’s representative, I decided to represent what I thought math in biology education should be–not what it was.  It was my first public foray into trying to increase math applications in biology.  I don’t know why it is but there are times when controversy just seems to seek me out.

I dutifully waited my turn to pounce speak as the physics rep discussed the need for at least Algebra II skills and the chemistry teacher calling for at least Algebra I skills as prerequisites for their course–lamenting that even with these standard requirements, the students seem to have much difficulty “keeping track of units”, with proportional thinking, and with novel problem solving.  Generally, the argument presented was that physics and chemistry were math intensive–much like one long story problem.  I’ve taught all three courses and I didn’t really have any argument with most of their claims but it stuck in my craw as repeatedly the math, physics, and chem panel members kept referring to biology as the science that could be taught without a math emphasis—it still sticks in my craw. (I know, I know…a biologist shouldn’t really be implying that he has a crop–it’s just one of those homey, Kansas euphemisms.)

I went on to propose ideas for math applications across the broad scope of biology topics–Exponential functions/ equations, modeling, algebra in Hardy-Weinberg work, Fibonacci numbers, geometry, statistics and probability.  Not really demonstrating good political skills I went on in an accusatory fashion—”Why is it that the first exposure my students have to statistics and probability happens in my biology class?”  (Remember this was the 80′s.)  Obviously, the idea of math informing beginning biology instruction did not begin with me but you would have thought the audience had been suckered punched.  They were nodding their heads in agreement and about to start a constructive dialog when one of the old guard recovered quickly enough to dismiss my claims as being too unrealistic–I was jousting at windmills.  (This was also before Physics First or biotech investigations.)  Momentum lost for that round I learned my lesson and have been more politic in my approach.  To that end the landscape has changed a great deal, today.   However, despite supportive National Math Standards, Physics First curriucla, more AP courses taught, and numerous university or secondary level NSF projects funded to integrate more math in biology, it still seems that most biology teachers avoid math at all costs.  What’s my evidence?–no real hard data, just anecdotal experiences while trying to promote math and computer applications in the biology community.  Teachers are not, necessarily to blame.  I wish I could show you the looks on my student’s faces when they find out I expect some math application in biology.

My plan is to present a few posts that explore very basic math applications in biology–perhaps it will start a converstation….

In the meantime, here’s a warm-up from NABT member and former editor for the American Biology Teacher, John Junck:

10 Equations That Changed Biology (And That Should Change Biology Education)
Remember, there are only 10 kinds of people in the world—those that understand binary numbers and those that don’t.

Photo:  Brad Williamson, aka--ksbioteacher