Algorithms & Pedagogy: Or, Why Is it Difficult to Use Big Pedagogical Data as an Individual Instructor?

rrodrigo's picture

Photo of two individuals in a dark room with multiple different colored lights from different directions. The individuals are only visible as dark shadows.It is no secret that upper level administrators in higher education are continuously prompted to participate in “data driven decision making” (e.g., Stratford, 2015; Sander, 2014). As their budgets shrink, various stakeholders want to know both what data is informing a decision as well as what data will be collected to measure a return on investment (ROI). This pressure trickles down to individual program administrators as well as individual instructors.

Time and energy are probably the biggest obstacles to individual faculty implementing data-driven pedagogical decisions. If I were to assess the usefulness of low-stakes, scaffolding activities in my writing courses (e.g., “Does this specific peer review prompt help students make higher-order, or global, as well as lower-order, or local, revisions in their writing project?” or “Does this particular synthesis activity help students develop more nuanced arguments?”) it would mean that I have carefully constructed the course with learning outcomes for the course, the major units/projects, and every scaffolded activity. I would need to use something like Wiggin’s & McTighe’s (2008) backwards instructional design method. It would mean I have a carefully designed course map that can track course, major project/assessment, and individual activity learning outcomes up and down, back and forth across the course curriculum. Once I had tested my scaffolded assignments, trusting that they functioned in the curriculum as I think they did, then I could use students’ work on those assignments to potentially forecast their project and/or course grades.

Time and energy aside, there are two other reasons it is extremely difficult for individual instructors to use algorithmic and/or big data to inform their pedagogy. First, it is technologically and/or mathematically difficult (aka, most instructors have not been taught how to write code, even if it is “simply” medium to advanced calculation formulas in spreadsheet applications like Microsoft Excel or Google Sheets). Second, which is a much more slippery slope, individual faculty do not necessarily have access to all of the data that helps construct a more fleshed out picture of the individual student outside of an individual course.

First, I must have well-designed curriculum to adequately assess and then forecast student success. Second, I must then have the technologies to help me run the numbers. Again, with smaller classes, a single instructor might develop formulas in spreadsheets to look at how students did on certain activities and project how they are doing in the course. If faculty struggle developing the spreadsheets and/or processing large numbers of students, they might use applications like the Google Sheets Add-ons Flubaroo and Goobric. Many learning management systems (LMSs) also provide this type of gradesheet analytics, some provide detailed charts and graphs of students, some simply start to add color to the row of each student (for example, if Desire-2-Learn believes the student is doing well in the course, he or she will be assigned cooler colors like green and blue and if other students are doing less well, they’ll be assigned warmer warning colors like red). On the one hand, this can be useful for an instructor, as she can quickly look at the calculated projections and then design an intervention to help the student reverse course and succeed in the class.

The problem, however, emerges when the instructor tries to design an intervention; how can she if she does not know how the forecast is being calculated? Some retention scholarship heavily emphasizes student attributes prior to and/or outside the course (Bean & Metzner, 1985), others emphasize the student’s social integration within institution (Tinto, 2012); both types of data are difficult for an individual instructor to come by. If this data is collected, it is usually on an institutional level. Similarly, most LMSs track when, for how long, and how often, students log on to the system. This is data an LMSs can use to incorporate in its predictive analytics.

As with most social science research, the predictive analytics problem is context. The algorithm may know whether or not a student is first generation in her family to go to college, works full time, or has taken an online course before. But the algorithm doesn’t know if that student has a friend and mentor who has gone to college, how many hours over 40 she usually works, or the obsessive nature of her time management skills. If the LMS tracks student access, the predictive algorithm does not know how the course is designed (how many deadlines per week does this instructor require; if only one, students may only log on once a week). Similarly, some students like to print everything; they may have spent a chunk of hours earlier in the course printing the course materials and then may not spend a lot of time in the LMSs later during the term. The formula used to assess whether or not a student is sufficiently “active” in a course would privilege one course design or type of student over another.

Recognizing that individual instructors might not have the time, out-of-course information, and/or technological/mathematical know-how, we already see more institutions of higher education prompting faculty to more carefully incorporate LMS analytics into their teaching processes. We also see an increasing number of companies like Civitas Learning selling predictive analytics packages to institutions, with products like Inspire, that prompt individual faculty to work within the analytics environment. In both cases, the predictive formula is like Colonel Sander’s herbs and spices for Kentucky Fried Chicken—secret. I appreciate the help to try and run “big data” style analytics on my individual courses; however, if I don’t understand what data is being used, and how it is being weighed, how can I possibly make meaning from the resulting analytics? With this data, all I see is the shadowed outline of the student cast by a variety of different lights, unclear of the significance of each light and the depth and weight of the shadow.

I would like to thank my colleagues in the field of writing studies who have already tackled this mediacommons prompt as it specifically relates to writing, rhetoric, and composition (another concern I have and they are better qualified to talk about it). I would also like to thank Catrina Mitchum for studying a topic I have only dabbled in, online student retention, and helping me to learn more.

Creative Commons licensed image posted at Flickr by Paint