This is a bleg.* A while back I asked your help in choosing a textbook for an introductory biostats course I co-teach. We settled on Whitlock & Schluter, which fits our needs quite well. The course covers a pretty traditional set of topics–basically, most (not all) of what’s in chapters 1-17 of the textbook.

Now I need to ask your advice again, to help my co-instructor and I improve the rest of the course. There are a couple of big things about the course that I would like to improve:

- Student grasp of the material. I think we do ok on this front, but I’d like to do better–to get more students pushed higher up Bloom’s taxonomy, if you want to think of it that way. Get them beyond just memorizing stuff.
- Student satisfaction and engagement. Not that these are ends in themselves–ultimately what I care about is that students learn the material, even if they don’t enjoy it. But we have various lines of evidence that many students just aren’t “into” the course, as compared to, say, how much they’re into their biology courses. The worry is that, if students aren’t sufficiently engaged with the course, at some point it starts affecting their performance. Further, even if student satisfaction and engagement aren’t ends in themselves, it sure would be nice if they all came out of the course feeling glad that they took it, excited about statistics, eager to learn more statistics, etc.

I’m not sure either of these can be improved substantially by tweaking either content or pedagogy. We’ve already done numerous tweaks to both over the past couple of years (and have ideas for more tweaks). But on the other hand, I’m reluctant to put in all the effort required to start completely from scratch (say, by flipping the classroom, and/or radically cutting back on the breadth of material for the sake of improved depth of understanding of core concepts) unless I’m confident that the result will be a big improvement.

My dream is that somebody out there is teaching a successful, popular intro biostats course, hopefully in a context similar to ours**, so that we can just shamelessly copy it! 🙂 But failing that, any success stories you have would be welcome. Tell me what you do in your intro biostats course that really works. And if you’re struggling with the same issues we’re struggling with, please tell us about that too.

**A summary of that context: It’s a large class–130 students who meet together for lectures and are divided among 6 lab sections. The students are mostly in their second year. Most are majoring in biology or some subfield thereof. Many take the course because it’s required for their major, but many others take it for other reasons. The labs are computer labs, which have the dual function of teaching students the basics of R, and teaching them to apply (and thus, better understand) the lecture material. We’re a large public research university, and so there’s a fair bit of among-student variance in any attribute you care to name.

Talk to Jen Pontius at UVM. I was her TA for two semesters and I am still blown away by her approach to introducing students to statistics — a fun book, great lectures, and a hands-on applied project with real USFS data.

Thanks for the tip!

I’d love to hear Caitlin expand on this or for Jeremy to post his communication with Jen.

My two cents – statistics is a life-long self-learning process. You will never ever teach in one semester even some minimal “core” set of stats. I’m not even sure what a minimal core set is. So I’d recommend that this is exactly the type of course that could be inquiry based. Take a fun dataset and work your way through the basics. Use the R scripting to have the students teach themselves about random variables and standard deviations and standard errors and error intervals and correlations and multiple testing and false discovery rates and type I error and P values and a frequentist definition of probability. Use R scripts to demonstrate the bootstrap and permutation. Use monte carlo extensively. Then relate these back to t-tests/ANOVA/Confidence interval exposition that is in their textbook. Then let your students go free at the end of the semester and encourage them keep moving down any of the paths that you’ve opened up. Bayesian, general linear models, multivariate, whatever.

Have I taught it this way?No, So I haven’t a clue if this will engage anyone. I suspect most students would like a dichotomous key like recipe approach, with an exam in the middle and at the end, which is how I was taught.

“I suspect most students would like a dichotomous key like recipe approach”

Yup, that’s my experience. I’m honestly not sure what’s the maximum percentage of students we could reasonably expect to get engaged and excited about the subject matter itself (as opposed to just wanting to be told *exactly* what they have to memorize/do to get their desired mark), no matter how we taught the course. I do think that just-give-me-the-recipe attitude can be overcome to some extent if you adopt the right teaching approach, but I don’t know that it can ever be completely overcome.

I was Jen’s TA when I was a master’s student, I think Spring of 2009 and 2010. This was a required course for undergrads in the School of Natural Resources and the Environment, but it hadn’t been taught in a couple years and Jen completely revamped the course for that 2009 semester. In addition to weekly labs, the students had a semester-long project in which they wrote a proposal to analyze USFS data, analyzed it, wrote a report, and presented their findings. This project made the class what UVM calls a “Service-Learning” class, and it was one of the best parts of the class. Students loved getting to work on real data, and they were more engaged in the weekly labs because they knew they would have to eventually apply these skills to the project dataset. Jen is also a fantastic lecturer — high energy, lots of examples drawn from current events, and lots of “out of your seat” group discussions in the middle of lecture.

Caitlin was my TA when I was in Jen’s class at UVM! So even from a student’s perspective I back up her statement. Jen’s approach to stats was fantastic. I am now a graduate student in ecology and greatly appreciate the way our class was taught. Developing projects to make it applied is a game-changer. Thank Caitlin and Jen!

Still working on mine – I really can’t wait to hear the responses! I’m a bit worried that while I’m taking a different approach to mine, it is too didactic and not exploratory enough. We shall see

BUT – I love that Caitlin commented as a TA. I wonder, a parallel post would be what WORKED for readers of this blog when they took intro stats (and what did not)…

” I wonder, a parallel post would be what WORKED for readers of this blog when they took intro stats (and what did not)…”

Could be useful, sure. Though I worry a little that readers of this blog are a very non-random sample of all biostats undergrads. So what worked for people who went on to grad school in ecology may not work for other sorts of students.

For instance, I managed to pick up stats just fine despite not having *any* stats courses as an undergrad. When I was at Williams, individual statistical techniques and the associated software (Statview!) were taught here and there in the labs of various biology courses. But I doubt that approach would work at Calgary.

Chad Brassil and I trialed a “life science statistics” course for UG here at UNL using that same book and general strategy. I think it was successful, but small sample sizes preclude sweeping conclusions. And, our Statistics department stomped on further development pretty hard.

I think the answer to getting “engagement” is making sure that stats is used in all their “core” courses, so they see the relevance and importance. But implementing that takes alot of cross-faculty interest …

The students who go on to major in ecology or environmental science do use the stats, experimental design ideas, and R commands we teach in their upper level courses (for which intro biostats is a prereq). Students in other majors, not so much.

Interesting that your stats dept. didn’t want to see biostats taught. Here at Calgary our math & stats dept. teaches their own intro stats course, which covers much of the same material as we do but from a more mathematical point of view and without covering applications, experimental design, or R (they still use Minitab in labs…). So we get a fair number of students in our biostats course who’ve already taken an intro stats course. I think because some of them want to learn R, others because they want to learn how stats is used in biology, and probably a few who think they can get an easy A by taking a second stats course (sadly, no…). But there’s never been any push from our math & stats dept. that we’re somehow intruding on their territory by teaching biostats.

If you have the time, comment here or drop me a line with more on exactly what you and Chad did and how you did it.

I’ve been teaching a lecture-only biostats course for about the past seven years. (It should have a lab, but it doesn’t. Long story.)

While ‘flipping’ would require a big up-front investment or be time intensive, doing student-inquiry driven lessons might increase understanding and engagement but not require more time. Here are a couple examples of things that I don’t lecture about, but we do as an inquiry-driven lesson.

Example 1: In the lesson on pseudoreplication, students work in groups of 3-4 and develop an experimental design for a scenario (a greenhouse with four benches, 100 seeds (ten seeds each from ten individuals of an endangered plant species), to screen four different potting soils to figure out which supports maximum reproductive output. Working in groups for about ten minutes, they come up with their design, and groups quickly share out to the class, on the board if they want. There are some differences among groups, and I pair up groups with contradicting strategies to figure out which design is best. In nearly all cases, the prevailing designs are pseudoreplicated. I ask a few leading questions to the class, asking them if X and Y and Z are potential problems with the design, and they agree, and then they then fix the design to prevent pseudoreplication. Then I take a few minutes to tell a little story about Stu Hurlbert (which was work done in the CSU, our university system).

Example 2: When we finally get around running our first statistical test, about halfway through the semester, we do a t-test. I collect data from the class (this year, how many hours each person slept the previous night, separated by gender; participation optional). Then gave the class the equation for calculating the test statistic, and then figure out the probability value, using a calculator. Then they go through whatever steps they’ve learned, make a decision about the null hypothesis, which may or may not be the right answer, but most can’t explain it well. I ask a few leading questions, and let the groups discuss/mull/sort out the answers about the relationships among the test statistic, means, variance, degrees of freedom, and critical values, and by the end, most of them have sorted out for themselves how the test works. I summarize it up at the end, but I don’t want to do this until as many people as possible in the room have sorted it out. Which is nearly everybody, because we’ve spent so much time on the underlying concepts until that point.

Example 3: In this link is a description of my inquiry-based lesson for central limit theorem: http://smallpondscience.com/2014/09/04/efficient-teaching-doing-active-learning-an-easy-way/

I used to use a relatively simple freely available online textbook, but I just swapped out for Whitlock and Schluter this last year. I think it’s a wonderful book, so much that I’m requiring my students to buy/rent it despite the huge pressure from my university to reduce textbook costs. (I wish I could make them buy it, because nearly all of them will need it in the future, but still, some rent it, even though renting isn’t much cheaper than buying. Which pisses me off in a variety of ways.)

I’m not satisfied with this class, because after a couple years, most of the students actually can’t do the stuff that the class was designed to them them how to do. I don’t blame myself too much because l realize that (as I put my syllabus) statistics isn’t like riding a bicycle or playing with a yo-yo, once you learn you can always do it – it’s more like playing an instrument, without practicing it your aptitude atrophies. And most of my students, (grad students with a biomedical focus), aren’t called on to do more statistics, and this is also part of the curriculum as well.

Thanks Terry, this is very helpful. We currently have one class session in the course where we do something like this (it’s a case study on experimental design). We could probably do more.

And thanks for sharing your dissatisfaction, which I share. And I’m sure you’re right about the source: stats is like a language, you use it or lose it.

Terry – your examples are fabulous. What good is memorizing the whole panoply of tests in classical frequentist stats when many/most working biologists (much less students, and yes there is some empirical evidence for my statement) cannot really give a very good definition of a p-value or a t-statistic or a standard error (and I don’t mean the formula).

@Jeff:

Re: having to memorize a bunch of different tests, that’s the one big negative of Whitlock & Schluter in my mind. It retains the traditional separation of t-tests, ANOVA, regression, etc. (well, except in the advanced chapters tacked on at the end), rather than putting them all in a GLM framework. Students really do struggle to memorize all these different tests and keep straight when to use them. I think it’d be better to just teach GLMs, and to use the lm command in R rather than the t.test, aov, etc. commands. But we couldn’t find a textbook that’s “just like W&S, except structured around GLMs” (Alan Grafen’s GLM text is way too advanced for us). And I hesitate to teach the course in a way that’s seriously out of line with the textbook (or to teach it without a textbook), though maybe I shouldn’t.

I’m teaching intro to biostatistics to second year Biology students in a small university. I cover roughly the same subjects as you do, and I have about 50 students and one TA. In their third year, our students take another course which deals with more advances stats, statistical software packages, and some experimental design, so I don’t have to touch on them in my course. This leaves me a lot of time to focus on the basics, and judging from the feedback I’ve been getting from students in the past two years, the teaching approach I’m using seems quite right. Disclaimer: I ‘inherited’ this course from a professor who had taught it successfully for more than a decade, so hat-tip to him for coming up with this approach.

The idea is to teach students how to ‘do statistics’ by hand, I guess in a similar manner to how a stats course might have looked before the age of computers and statistical software. I teach the same theory as everybody else, but require students to conduct statistical analyses using pen-and-paper with the help of a pocket calculator. So to run t-tests, regressions, or anything really, students have to calculate all of the necessary equations by hand, and use those ancient t, z, or Chi-square tables to locate the p-values! I realize that this sounds like a waste of time, but I find this old-school approach to really work in making students understand the rationale behind basic statistics. Once you get the rationale, I think, it’s much easier to learn more advanced methods which are covered by the advanced course.

I understand that in most institutions this approach might be irrelevant because they don’t have two undergrad stats courses, so the basic course has to cover the practicals/software as well as the theory. Still, I thought I’d share our approach because it seems to be working well.

By the way, this approach is rally neat because it allows my classes to be structured in a very unique way. A typical class takes 3.5 academic hours (so 3-hours net). I usually spend 2/3 of each class teaching the theory and showing examples. I spend the rest of the time giving the students a class assignment (e.g., run a regression or conduct statistical tests on some data), which they try to solve while I’m around. This allows for some active group learning, as they’re free to help each other. Also, I walk around and answer specific questions from anyone who needs my help. I find this to be really effective as it allows me to help students who are shy or a bit slower than the others without slowing everybody else. At the end of the class assignment, I show them the correct step-by-step solution, and take questions. In the feedback I get at the end of each semester, students consistently list this type of active learning as something that really helps them to not get lost in the flow of the course, and to better understand the theory.

Again, I realize that this approach can only work with relatively small classes so it might not be suitable in your case (this semester, I’m running this sort of class structure with 50 students, which leaves me gasping for air by the time class ends…).

Finally, as for keeping students engaged, a bit of humor and social commentary never hurt. I give many cool examples of misuse of statistics in everyday life (as well as in research). The advertising industry never fails to provide me with dumb/neglectfully wrong/downright misleading examples…

Wow, that certainly is old school!

I wonder if part of the success of the approach isn’t so much what they’re doing on their assignments, as the fact that they *are* doing assignments in class with you there to answer questions. In the past I’ve devoted a few (50 minute) class sessions entirely to having students work in pairs on practice problems (basically, mock exam questions), and the students really appreciate that and get a lot out of it. It gives them feedback on what they don’t know, which is really important. My main idea for further tweaking my own course is to incorporate more of that sort of thing (which amounts to moving towards a partially-flipped classroom).

I totally agree. It’s their ability to get my feedback while trying to deal with the assignment that helps them the most. BTW, it also helps me, as it gives me a rough idea on how well do students understand what I teach, and what’s good or bad about the way I do it. This instant feedback helps me to tweak the way I teach to better suit the strengths and weaknesses of each class.

I have ideas, big ideas about how biostats should be taught, but haven’t had a chance to test them on *real* students (but gosh, I can get people to agree with me about them on twitter a lot 😉 ). I wrote a blog post about my general thoughts on this a few weeks back: http://practicaldatamanagement.wordpress.com/2014/10/01/hey-lets-all-just-relax-about-statistics/ but I was thinking more in terms of a grad class when I wrote that. My hypothetical approach is very applied- directly working on real data, working through exploration, and de-emphasizing null-hypothesis significance testing as a rule. I’m honestly not sure if it’s possible to translate this to a large undergrad class setting.

One of the big takeaways from my undergrad, though, was the idea of propagation of error- this loomed large in my physics classes, but was never mentioned once I made the switchover to biology. This sort of approach focused on explaining variation in the data, rather than testing for differences, as per se. I don’t know if this is helpful for developing a biostats course, but I do feel like it’s been very important in shaping my scientific thinking.

Hope you get to try out your ideas on what to teach someday! (EDIT: in case it wasn’t clear, I mean that–it sounds like you have a great vision for what you want to do. Hope my original comment didn’t read like sarcasm and apologies if it did, that wasn’t at all my intent.)

Yes, our class is old school in its emphasis on null hypothesis testing. I’m mostly fine with that, as I think many of the usual criticisms of null hypothesis testing are overblown. I think there are a lot of situations in biology (and other fields) in which it really is of scientific interest to first try to rule out whether sampling error alone could account for the observed data. The discovery of the Higgs boson is one prominent recent example. An emphasis on hypothesis testing also fits well with an emphasis on the importance of experiments and experimental design. And you have to start somewhere in any subject, and I’m not sure that hypothesis testing is all that much worse as a place to start than other possible starting points. And this way of teaching things is what fits with our upper level courses as currently structured, so frankly there’s an element of inertia here. We can’t radically change the content of this class without radically changing the content of a bunch of other courses for which this class is a prerequisite.

But I freely admit that I have no experience teaching or being taught stats in any other way, so I’m sure that limits my ability to imagine other ways of approaching the subject.

I teach a biostats course to about 20 students in fall semester and 30 or so students spring semester. i have been teaching it here since 2001 but taught intro stats elsewhere since 1984 and before that taught stats lab as a TA. my MS is in stats and my PhD in eco was about 1/2 stats dept classes. that’s only to show you that i am very comfortable with the topic and try to explain it in plain english.

i use gotelli’s “primer for eco stats” because it covers the usual tests, permutation tests, and bayesian tests. it’s a dense book and honeslty i don’t know how much the students use it. they are able to supplement class discussions and lab with web resources and know how to find reputable ones so a book may not be necessary but i’m not ready to go bookless.

I use the eco stats book even though 1/2 the class is natural resources eco and the other half is ‘indoor bio’ such as pre-professional. prereq is college algebra.

i use several 20 minute essay exams in which they have to interpret results and an hour essay midterm in which they have to explain rationale. there’s a comprehensive final of three parts: interpretation, rationale,and select the test for the scenario. alas, essay tests may not work for your large classes.

we meet for 2,1 hr lectures a week and one 2, hr lab (2 or 3 sections depending on enrollment). lab is very hands on. small but realistic datasets using excel and R (not R commander, just command line R). the labs are set up to guide them through a ‘tutorial’ approach in which they think through each step to reinforce the rationale but also so that they come away with real, hands-on work. i circulate around the lab checking on their understanding so i really don’t have them turn in lab assignments (i.e, i don’t have to grade lab assignments)

there are also two ‘results section’ write ups in which they have to run the appropriate stats test and write up the results as if it were a results section of a technical report. by the time they redo the first one (which most of them have to) the second one is pretty good.

the term project is to design a study, carefully explaining the population, the response variable, the experimental variables, co-variates, what they will do about potential confounders, sampling scheme, measurement scheme and analysis procedures. it feeds into their jr seminar class in which they do that for their senior projects (all of our students to a sr project)

as far as the stats itself goes, without follow-on classes, all they will remember from an intro stats class is the rationale of a statistical test. they will run some simple stats for their thesis (mostly nothing beyond simple AOV for groups or simple linear regression, though some get a bit more sophisticated). so i try to help them get that conceptual view — we take several weeks to get to one group t-test so they get a firm understanding of sampling distribution, null and alt hypotheses, critical values/p-values, confidence intervals. then we go really rather quickly through AOV for groups (just up through interaction models and nothing beyond randomized complete blocks), simple linear regression, chi sq for cross classified, and then we spend some time on the permutation approach again with the emphasis on when it’s appropriate and when it is not. They seem to grasp the underlying frequentist approach that unifies all those tests sot hat when they read a journal article they have an idea that there was a test statistic, it had a sampling distribution and the null hypothesis was rejected or not based on where the result was on the null distribution (and what all that really means — or doesn’t). Then we spend a brief amount of time on bayesian statistics so that they can see that there are alternatives to null hypothesis signficance testing. of course we do not actually do any bayesian analysis, we look at examples from both medical research and fisheries research. the idea is to show them that it is being used and so they at least are able to read the results and see how those results came about in a very conceptual — well OK in a very handwaving — way. i think it also helps them understand what the frequentists nhst is trying to do and why it’s done that way.

sorry for the long posting, but it takes a while to explain what we do. it seems to work…i try to engage them in class by asking them to think their way through each new test approach. again, the small class size helps in in-class discussion. the most fun class session is the last class in which in groups, they are assigned a statistical test and have to develop and present a cheesy, late night tv informercial about that test. they really enjoy that and i think it helps them understand when each test is appropriate and when it is not.

I have a single semester’s experience teaching an intro class (only 30 students) and also had good success with semester-long projects. In my case the students picked a question to ask which required a month’s worth of simple daily observations (e.g., what is the daily use pattern of our campus bike share program — what time of day are they busiest?) and then they’d collect that data every weekday for a month. I made them propose a topic (so I could troubleshoot their ideas), collect the data, analyze it, and then write a brief report and do a presentation that was focused on the analysis. I also built peer-review into the process, and though that was less successful I definitely want to tweak it and do it again.

I love the idea given above about projects based on a large, pre-existing dataset — would give students more time to focus on analysis and questions, rather than spending time collecting data.

For a big class I’m not sure how to do individual (or small-group) projects without being an onerous amount of work for the instructors, although if they are in lab sections and you have TAs it might be manageable.

No data on retention following the course, but the students seemed involved in their projects and there seemed to be some correspondance to the analysis a student used in their project and their performance on final exam questions related to that analysis.

I also taught pretty old-school and made students do the equations by hand (mostly) and definitely use statistical tables, but this was mostly because of a lack of computer support for the class.

On the first class day, I collected anonymous data from all students — height, # of siblings, eye color, other continuous and categorical variables — so that in future lectures I could use this data, which the students might find more interesting than an example from my research or the textbook. For instance, when we got to t-tests we tested for a difference in height between males and females in the class rather than a hypothetical example of plants grown under different nutrient regimes or something.

Perry DeValpine at UC Berkeley teaches a course on Design and Analysis of Ecological Research that has one of the best reputations of any course in the department. While it draws heavily from a graduate student audience and covers a breadth of material, the coursework is surprisingly digestible*.

Perry takes an old school approach, setting the groundwork for the course in null hypothesis testing and strictly frequentist interpretation**. The coursework eventually leads into model selection, hierarchical models, and generalized linear models. His emphasis on simulations, both in coding and interpretation, help to clarify concepts that can otherwise feel theoretical or convolute. In addition to the book (Experimental Design and Data Analysis for Biologists (Quinn & Keough)), Perry provides class outlines*** that lay out a succinct breakdown of the core ideas and any R-interpretation gone over in class. Additionally, Perry’s problem sets cover more intricate algebraic breakdowns of concepts, reiterate core ideas, and provide further context-based problems to work through. Each week, lab assignments follow related concepts in R, focusing on understanding how the software operates (under the hood) and interpreting the outputs for analyses. A final project (related to applied design and power analyses) is assigned to wrap up the course.

I found his class to be not only interesting, but downright fun and, dare I say… inspirational. Perry teaches another, more basic, biostatistics course in the department as well. And while I don’t know how keen he’ll be on having you “shamelessly copy” his course design, he may be a worthy contact to consult and – at the very least – bounce ideas off of. I am certainly not alone in my praise of his course and teaching methods.

* coming from an undergrad with zero background in statistics

** stressing the GLM framework through the progression from univariate regression all the way through split-plot models and ANCOVAs

*** typed in LaTeX!… because who doesn’t appreciate pretty notes

Thanks for this Connor, I know Perry a bit, I’ll definitely drop him a line. Probably about his more basic course, since as great as the course you’ve described sounds it’s probably too advanced for our 2nd year biology students.

I’ve heard good things about Quinn & Keogh, commenters on that old thread were evenly split between that text and Whitlock & Schluter.

I’m the TA this term for an 200-level forestry biometrics course, which is basically intro to stats for foresters. We also use the Whitlock & Schluter as our text. We have about 65 students and we spend 3 hours/week in lecture, 2 hours/week in labs. The labs are all in Excel, which is a double-edged sword in many ways; while in some ways I’d rather teach them R, the fact is that most of these students aren’t going to go on to graduate-level research, and we mostly want them to be able to interpret statistical information and establish the basics so that they can build on that in more advanced classes.

They seem to benefit immensely from the Excel education, not just in our class but in their other classes too (the professor of the soils & climate class that many of my students are also taking has specifically thanked us for improving their Excel abilities). I worked in a number of office jobs before I came back to school and I’m certain that the Excel skills they’ve learned will help them get jobs even if they don’t go on in forestry or ecology. I also think the Excel work helps them comprehend statistics because it gets them thinking in matrix terms; “enter a formula and apply it through to each row” gets them started thinking about matrices without even knowing it and seems to improve their intuition about how to approach new data. Since we’re using Excel and not a stats package, they have to use the old-school z- and t-tables, and we have them calculate regression slopes & intercepts by hand. It takes a lot of time and gives the course a tough learning curve early in the term, but I think it helps them understand the fundamentals better than they might if they had R doing the heavy lifting for them. Having taken statistics courses taught in both Excel and stats packages, I found that the stats package courses actually encouraged the “recipe”-style approach more just because it’s more of a black box to students; you learn to feed it the right inputs and it gives you an output. In Excel, you have to engage with the data in a more tangible way, and the mechanics are a little more visible so they get a better fundamental understanding of what they’re actually doing.

We tend to use real forest ecology data in our labs and examples, and that approach has its pros and cons. The main advantage is that it exposes them to real ecological questions, methods, and vocabulary that they might not have come across yet in other classes. The main disadvantage is that many of the examples are quite abstract to them in this early stage of their academic careers, so it can be hard for them to grasp the statistical principle because they get distracted trying to understand the context of the question. I think a combination of those approaches works well, and we often seem to use “everyday” examples in lectures to improve comprehension but focus in on discipline-specific applications in lab.

If I was designing the course (we’re teaching another professor’s class since he’s on sabbatical), I’d try replacing 2-3 of the weekly labs with a student project so that they get some experience trying to answer a real, compelling question without being spoonfed the recipe. I’d also have them write at least a few full lab reports about the methods they used and the conclusions they reached, although the course already has a reputation as an time-intensive “weeder” course so that could be tough. We tend to have labs with mostly quantitative answers, and I find that emphasizes the “follow the recipe” approach – often I can only spot big holes in their understanding once they’re forced to explain what the numbers in their spreadsheets actually mean.

I’m sure this is my bias as a TA, but I find that throwing as many manhours as possible at one-on-one instruction has really helped too. I spend five or more office hours per week in the computer lab with students, and the sessions are very well-attended. Many students are very intimidated by math and computers, and I’ve found that one-on-one instruction helps to de-escalate their anxiety enough for them to start building some confidence with the material. Many of my students have not been well-served by their previous math education, to put it kindly, and many of them have fixed mindsets about “not being good at math” that hinder their success in the course. Reassuring them that statistical reasoning feels unintuitive to many people at first and that they will get more used to it with practice really seems to go a long way to improving their outcomes. They think that since this doesn’t come to them intuitively, that they’re unsuited to it, and just a few words about how some of this stuff was challenging to all of us the first time we heard it can help improve their attitudes a lot.

Thanks for posting this and starting such an interesting conversation! Avi Bar-Massada’s comments were especially thought-provoking, and I think the back-to-basics approach he describes would benefit many of my students. Best of luck with your biostats course.

I also have found work in Excel to be conducive to undergraduate learning. I always get weird looks when I say that, and trust me, I discourage use of Excel in real research. But the highly visual nature of Excel, the ability to lay out formulas for e.g. sum-squares without actually having to calculate everything by hand,etc and the ability to do what-if with instantaneous feedback make a really effective learning environment. And once a concept is mastered, you can always build on it by using the “Data Analysis” add-in (free with Excel). So for example, once you’ve worked through regression by hand, you can do it quickly with the regression option in the Data Analysis add-in.

I’ve had political science students build a maximum likelihood estimate of logistic regression and actually understand what they are doing using Excel.

Sounds like you’ve got a lot of responses so this might not be so much use, but I teach an introductory stats module to about the same number of undergraduates that you have here in the UK. Students are all from degree programmes in Biology, Zoology, Genetics or Medical Genetics and the course is compulsory for them. The module is not a completely dedicated stats module – we also include material on experimental design, graphics and also some other material on writing and presenting data. We use R with them from the start. Modestly, we (I teach it with one other person) think it works very well. We were nervous about teaching with R instead of one of the “easy” stats packages like minitab but so far it has been surprisingly successful – feedback for the module is good and student satisfaction is very high given that it’s a stats course. With respect to using R the students like being taught to use “professional” software, and they particularly like the graphics capabilities. We have had students talk about how they have been freed from having to use excel, which I imagine we can all sympathise with.

A big part of the success of the course is that we have a lot of computer workshops with good demonstrators (UK versions of TAs I guess) plus at least one and often two staff members at each one so the students get a lot of individual help if they need it. This means that the majority don’t struggle as much as you might think even though the great majority have no coding experience at all, although the ones who are not engaged and who miss the workshops mostly do very badly – the final grades for the module are distinctly bimodal. We use a book which I wrote as a textbook (Introductory R) and so far haven’t had any complaints, possibly partly because I give them all a free copy. If you’d like to know more I can happily supply some teaching materials to give an idea of the sort of things we get up to.

Rob Knell

Thank you Rob, this is helpful. I’ll take you up on that very kind offer to share some of your materials; please send me an email (jefox@ucalgary.ca)

Re: a bimodal distribution of marks driven by student attendance at workshops, we see something similar. Some students choose not to attend labs, instead doing the assignments on their own time. On average, those students get substantially lower marks than students who attend. (Of course, students who choose to attend are probably the sort of students who’d do pretty well in any case.) Which raises a problem I confess I struggle with–whether to treat them as grown ups and let them make their own mistakes, or taking a paternalistic attitude and forcing them to attend lab because it will be good for them. I suppose the ideal is to not have this problem because the students are all excited about the course and want to attend lab. But if that’s not the case…

Andy Hector runs a stats course for biology undergraduates at Oxford and has written his own textbook that is due for publication in January (http://ukcatalogue.oup.com/product/9780198729068.do). I’m a TA on the course and, whilst for some students there is still the ever present resistance to statistics that has been described by others, the emphasis of linear models as a coherent framework has resulted in improved student feedback.

Cheers for this Sean, Andy’s an old friend from my postdoc days, I’ll drop him a line.

Hey I just found your blog and found aspects of this interesting.

I realize this is an old post but i thought i would toss out my two cents.

I teach a biostats course at a small liberal arts college with a class size of 16. I have taken a slightly different approach in that i dont have any exams in the class. (We use the Whitlock & Schluter book and I have two 1.5 hr lectures a week). But instead of exams the course is structured around a series of actual experimental 4 hr. labs that the students do (including going out in the field and collecting data). Each lab is focused on one of the major general statistical tests (i.e. t-test, anova, 2 way anova etc.). The students then are engaged in the data and the analysis in a more meaningful manner. The grades for the course come from the weekly lab reports that they write (9 in all) and a final research project that they initiate and carry out with class and my feedback along the way in terms of planning and execution. The serious drawback is of course that i have to grade and evaluate weekly 3-8 page reports. This also means that I have to give copious amounts of instruction in not just statistics/experimental design but also in scientific communication (i.e. graphs, and writing and such). While my students dont always excel at all aspects of the class, they tend to leave saying that they should have taken my class much earlier in their career and that they learned a lot. Oh and we use minitab as our stats software, mostly due to cost and ease of use (most of my students would be lost if i tried to teach them R on top of everything else). My goal is to have them come away with how to use and understand statistics rather than being able to show how to calculate sums of squares. Its my third year at this and there is a waiting list every time.

just a different approach.

Thanks for sharing this, there’s a lot to like about this approach. I especially like that you’re making the students write a lot. Unfortunately, it’d be hard to scale it to 130 students like at Calgary.

A lot of great comments above and I know this post was a while ago, so I’ll keep this short. I have been teaching biostatistics at a small liberal arts college for the last three years, and I’ve been fairly satisfied with the course so far. We use the same textbook, Whitlock and Schluter. The class is three, one-hour lectures and one, three-hour lab each week. During lecture time, I have them sit in groups of four, and they stay in the same groups for the whole semester. We spend two days a week working through mini-lectures and group activities and then on Friday we usually read and try to digest the methods in a journal article. In lab, I get them started in R the second week of class and I have them use JGR/Deducer which provides a GUI for using R. However I have them record all of their commands in a script so that they can learn to code while using the drop down menu. JGR can be a little limiting, but it makes it much easier for them to grasp R. The other aspect of the course that I think works well is their independent research projects. This is something that they work on for the last half/third of the semester, and because it is their own idea, they get much more invested in analysis of the data. I’m happy to share any course materials with anyone interested! Email me for a link to my course website.

Yeah, I have sometimes wondered if it would better to teach them R by using R Commander or some other menu-driven add on package, and then having them look at the resulting code. But I worry that, if you show them menus first, that’s all they’ll want to use or learn to use. And I worry that too many of them will just click random menu buttons until R does something, never mind if it’s what R should’ve done. Would welcome your further thoughts on the best way to teach R to complete programming newbies (who are also stats newbies).