7 Reflection and Iterations

SAIL is a formative assessment practice focused on improving student learning. Assessment practices used for formative purposes are underpinned by an engagement ethos and include multiple triangulated measures, including both quantitative and qualitative, that are tracked over time. Assessment measures used for improvement stem from an established goal or objective that is defined by members of the community, and multiple communication channels and opportunities for dialogue exist, so that results can be used to stimulate change (Ewell, 2009). SAIL incorporates several avenues for scholarly knowledge dissemination both within the university as well as external to the university to promote scholarly teaching and the scholarship of teaching and learning.

Reporting the Degree of Student Achievement of Institutional Learning Outcomes

Reporting of ILO-assessment findings occurs at three levels:

  • Course-level with a focus on using the results from a SAIL project to inform changes to course, assignment, and/or assessment design;
  • Program- or departmental-level with a focus on improving program learning outcomes through an effective, ongoing, and regular system of assessment. This is greatly facilitated when programs have clearly articulated program learning outcomes and curriculum maps; and,
  • Institutional level with a focus on documenting and demonstrating, at a high level, student achievement of ILOs.

Course Report

As previously mentioned, reporting at the course-level is focused on using the results from a SAIL project to inform changes to course, assignment, and/or assessment design. Faculty co-researchers are provided with a course-specific report for their own use to reflect on and consider improvements to student learning. We recognize the central role of faculty to establish curricula, assess student learning, and improve educational programming. Therefore, the first level (i.e., the faculty members’ course) is the primary focus of any SAIL project.

Here is a Sample Course Report – Lifelong Learning (PDF) that faculty members receive from their peers. This example reflects results from a single assessor (faculty peer) who assessed a random sampling of 10 students.

Programmatic or Departmental Report

The second level is focused on improving program outcomes through an effective, ongoing, and regular system of assessment. To inform program-level planning, Deans and Chairs may be provided with an aggregate report based on the results of each ILO assessed within their department during a SAIL pilot. If there are sufficient faculty co-researchers from one program (e.g., Bachelor of Arts, Bachelor of Education) then a program-level report can be produced that provides aggregate results from multiple courses or sections of courses within a program that meet an ILO.

If only one course is assessed within a department, an aggregate report is not provided because the focus of SAIL is formative assessment for student learning and we want to avoid any potential for the evaluation of individual faculty members. During Pilots #1 and #2, we did not have sufficient representation from one program to produce a program-level report.

Institutional Report

Similarly, an institutional aggregate report based on the results of each ILO assessed during the SAIL pilot may be disseminated within the university if the SAIL Coordinators and faculty co-investigators determine that there is sufficient data to reliability demonstrate the degree of student achievement of the ILOs assessed.

Cautionary Considerations for producing Aggregate Reports

Within the pilot phase, comments were raised about the comparability of results, scalability of the process, and the level of interpretation of assessors which could impact inter-rater reliability.

Course reports provided the ratings and comments as submitted. Interpreted alongside strengths and areas for further development, the course reports provide faculty with insights into the level of ability of their students. However, discussions during the Debrief and Assessor Training reflected nuanced differences in interpretation of the rubric criteria and the descriptions when applied to specific assignments across courses. What appeared as differing ratings may be a result of differing interpretations of the rubric criteria.

At this stage, the assessor ratings were more qualitative than quantitative based on Debrief discussions as interpretations varied, and cannot, at this time, be aggregated — Akin to how the average of two apples and three pears is not two-and-a-half apples; two scores (the apple score and the pear score) are based on different concepts and thus cannot be aggregated.

To improve the value of reporting, we suggest two options:

1. Emphasize Qualitative Descriptions: We can focus on enhancing the qualitative contextual conversations within the ILO Pods about the assignment ratings and course reports. When there is variability in interpretation, there is a basis for interesting and insightful discussion that can inform both the faculty members’ ratings and their peers’ teaching. Leaning into these structured dialogues, encouraging faculty to discuss assignments and what they see in those specific assignments, could further course redesign and reflective teaching practice towards better student learning.

2. Seek Numerical Consistency: We can seek consistency in course level, assignment type, and between raters to further a numerical-consistent score that could be aggregated (i.e., numerically representative of a single understanding). Aggregated assessors’ scores that can be averaged, summed, and compared over time, need to represent a single consistent understanding applied to all assignments. Often called “inter-rater reliability”, consistency requires that two raters arrive at the same score. Consistency is also necessary for validity. This requires that the descriptions and criteria are understood to mean a single concept regardless of discipline, assessor, and context. For examples of inter-rater reliability focused assessment see Simpler et al. (2018) and Turbow and Evener (2016).

The pilots also included a range of course levels from first through fourth year, as such we would expect to see a range of skills across course assignments within an ILO Pod. To capture this range when evaluating across years we must look for appropriate progression, not solely student achievement.

A comparison of options for adapting assessment of ILOs to an intended purpose is described below (Table 5.1). Note that the purposes are not mutually exclusive and could be sought within a single program or in parallel offerings within an assessment of learning outcomes ecosystem.

Table 5.1 Adapting Assessment of ILOs to Intended Purpose
Purpose of Assessing ILOs Approaches to Adapting
Focus on measuring and comparing over time achievement at the highest course level Collection: only upper year courses
Assignments: similar or consistent assignments (e.g., capstone portfolio, written reports)
Assessing: focus on inter-rater reliability with mid-assessment checks-ins and feedback discussions; can be faculty colleagues or research assistants
Coordinator Task: improve reliability, validity checks, and aggregate report creation
Focus on faculty interactions to support peer feedback and course redesign through facilitated conversations within an ILO Pod Collection: any relevant ILO course
Assignments: any relevant assignment
Assessing: focus on faculty engaging in collegial review of student assignments, documenting ratings and comments; structured discussions should focus on surfacing contextual factors that may impact assessor ratings or student performance with an emphasis on formative learning and reflection towards course redesign; recommended to be faculty colleagues
Coordinator Task: facilitating discussions, creation of structured time and space for dialogue, walking with faculty through the process of reflection and change and further review; working with faculty to contextualize findings and celebrate/report on changes to courses
Focus on student-driven curation and sense-making Collection: any relevant ILO course with significant experience or prior experience, typically capstone
Assignments: portfolio or reflection-based assignments that invite students to curate, present, and contextualize evidence of their ILO achievement or learning
Assessing: focus on confirming achievement with notes about nuances in strengths and weaknesses noted by the student or present in their work; can be rated by faculty teaching the course, by colleagues, or by research assistants
Coordinator Taskcollecting examples of assignments by committee; identifying with faculty relevant courses and assignments; providing assessor training; working with faculty to contextualize findings; provide encouragement/feedback to students; celebrate/report on changes to courses or programs

If we seek to investigate student achievement alone then we suggest modifying SAIL to include capstone courses or final year courses as the defining student artifact, as was piloted in the third action research cycle. In addition, attention should be given to the overarching Research Questions, including whether the research questions should be modified to include investigation of student progression in addition to student achievement of institutional learning outcomes.

Ensuring reflection and interaction with the how and why of results takes time both for the faculty members involved and for the process of implementing changes in teaching and learning. While it can be considered resource intensive, change in dozens of courses by dozens of people learning and creating takes time.

Finally, for scalability and consistency, we suggest shortening the time between training and assessment, incorporating a mid-assessment check-in or two-day assessment institute as was piloted in the third cycle, and discussion of ratings to improve consistency and viability for faculty members.

It is also possible to include trained research assistants in a SAIL pilot. Research assistants can rate additional assignments thus removing time constraints placed on faculty, and provide feedback to instructors for review and contextualization.

Iterative Action Research Cycles

Critical reflection is embedded in the SAIL process. Feedback during the Debrief on the efficacy and utility of the process are incorporated into a summary report or presentation, which includes recommendations for future iterations of SAIL.

In Fall 2021, the SAIL Coordinators engaged in a fulsome consultation process to gather feedback on the findings and recommendations from Pilot #1. The consultation included presentations to faculty councils and curriculum committees and the student union. In addition, an online survey was distributed through multiple channels. Results from the Fall 2021 consultation informed the second iteration of SAIL. For example, based on the feedback we received, we modified the student consent process (from opt-in to opt-out) to address low consent rates, selected a different platform (from MS Teams to Moodle) to increase efficiency and usability, and sought greater disciplinary diversity and variability in course assignments (e.g., oral presentations, round table discussions, projects) to test the efficacy of rubric-based assessment.

Findings from Pilot #2 suggested that we need to explore methods to modify and enhance ILO Pod discussions to improve inter-rater reliability to ensure consistency in ratings using the shared rubric, and explore the impact of context on the ratings.


Gosling, D. & D’Andrea V. A. (2001). Quality development: A new concept for higher education. Quality in Higher Education, 7(1), 7-17. https://doi.org/10.1080/13538320120045049

Ewell, P. T. (2009). Assessment, accountability, and improvement: Revisiting the tension, Occasional Paper #1. National Institute for Learning Outcomes and Assessment. http://www.learningoutcomeassessment.org/documents/PeterEwell_005.pdf

Simper, N., Frank, B., Scott, J., & Kaupp, J. (2018). Learning outcomes assessment and program improvement at Queen’s University (pp. 1–53). Higher Education Quality Council of Ontario (HECQO).

Turbow, D. J., & Evener, J. (2016). Norming a VALUE rubric to assess graduate information literacy skills. Journal of the Medical Library Association, 104(3), 209–214. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915638/


Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Strategic Assessment of Institutional Learning Copyright © by Carolyn Hoessler and Alana Hoare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book