Why and Example Uses
There are many uses of the ProjectAssessment.App. In this section, we discuss some example uses by individual instructors, departments, or researchers.
Last updated
There are many uses of the ProjectAssessment.App. In this section, we discuss some example uses by individual instructors, departments, or researchers.
Last updated
There are many ways to analyze the results of multiple choice exams. Item Response Theory (see de Ayala 2022 for a review) can estimate student ability, difficulty, and other parameters of exam items. Some pre- and post-test methods can compare performance to a baseline (e.g., Hake 1998, Walstad and Wagner 2016), and some pre- and post-test models can solve for underlying learning values (e.g., Smith and Wagner 2018, Smith and White 2021).
Unfortunately, there are far fewer ways to analyze data generated by a rubric. Smith and Wooten (2023) introduces a method to separate student ability from rubric-row difficulty while accounting for censoring of the data (see the "Data and Estimation" for an explanation of the censoring problem). The Project Based Assessment web application makes this technique accessible to a much larger audience (including those with no programming ability) than the Python package included in the original paper. Additionally, under some circumstances, it is much more productive to use the web application even for those that have the technical expertise to use the Python package. The web app automatically produces tables and graphics that can easily be saved and a print view designed to be used as an appendix in a report (see academic assessment example below).
As it is very easy and fast to use this software, it can be used by instructors to improve their class, departments to improve their degree programs, and researchers testing an intervention. Below are some example uses. The video shows how we generated these results and provides a short interpretation. The text below the video discusses the results in more details.
This software can be used by instructors to diagnose issues in their class and determine the performance of different groups. This can be used to improve the course or as part of a teaching portfolio for annual review or RPT (Review, Promotion, and Tenure). The author of this software used the web app to improve his classes and these results, along with the instructor's plans for mitigation, were included in his 2023 Annual Review.
Below are selected results from Data Analysis from Scratch: A graduate-level course where students learn to code traditional estimators (e.g., OLS, MLE), non-parametric techniques (e.g., KDEs), and machine learning techniques (e.g., Random Forests) from scratch (i.e., not using pre-built estimation functions). This gives the student a strong understanding of exactly how these techniques work and how they are related. This is the capstone class for many of the graduate analytics programs at the university.
The table below includes selected rubric row estimates using data from the Fall 2023 section of Data Analysis from Scratch. These estimates were produced with this application. It is important to note that there were twelve rubric rows estimated in this course and many more columns are produced by the software (see "Interpretation"). These select rows and values explain the future actions planned by the instructor.
Test Identification
0.420
0.333
Pairwise Bootstrap
0.042
-0.045
Clustered Errors
0.000
-0.087
There is a detailed explanation of how to interpret these values in the "Interpretation" section of this documentation. However, for this section, it is sufficient to know that greater values indicate that the rubric row was more difficult.
The "Test Identification" rubric row tested the students' ability to select the appropriate test given a specific situation. This involves critical thinking skills, so it isn't easy. But one would not think it is that hard either. "Pairwise Bootstrap" and "Clustered Errors" both test the students' ability to hand code specific algorithms. Again, not easy tasks.
What the estimates reveal is that the students found "Test Identification" to be exceptionally difficult and the two algorithms to be exceptionally easy. This could indicate that too much class time and resources are dedicated to these algorithms and time/resources could be reallocated to critical thinking skills related to the best use of different statistical tools. These changes are planned for the Fall 2024 iteration of the class.
Another question of interest to the instructor was if students in different degree programs performed statistically different from one another. In the above figure, the proxies for student ability are grouped by degree program. This was generated using the web app simply by making files containing the student ids in each of the degree programs that typically take this course. This was an important question as there is a persistent idea in the department that the analytics students in the MBA or MS in Data Science were not as well prepared as the MS Economics students. In the figure above, there doesn't appear to be a strong pattern. This visual interpretation was in-line with the results of the statistical tests provided by the software. This suggests that this perception isn't true - at least in this class.
Universities of all sizes are expected to show their accrediting agency that their students are gaining knowledge from the college experience. In practice, this means that universities require programs to collect and analyze data about their students' performance and take action based on these results. This often means that certain student learning outcomes (SLOs) are collected from specific classes in the program (usually required classes late in the program).
The Project Based Assessment web app can be used for academic assessment. In fact, the author's department adopted this method to assess all learning goals in the MS Economics program. The program's learning goals are assessed in three courses. For the purposes of this documentation, we will discuss select items measured in graduate Econometrics. (Note that the full Econometrics results contain estimates for nine rubric rows.)
Metrics - 2.2
0.168
0.056
Metrics - 2.4
0.020
-0.092
"Metrics - 2.2" and "Metrics - 2.4" are traits of SLO 2: "Students will demonstrate understanding of regression assumptions, including violations of said assumptions." Both are at the same Bloom's level (application) and test similar concepts ("Students will identify regression assumption violations" [2.2] and "Students will demonstrate how to address regression assumption violations" [2.4]). Moreover, the instructor intended them to be at similar difficulty levels. Nonetheless, the results suggest they were not of equal difficulty. The instructor concluded that the question used for 2.4 was not at the intended difficulty and should be adjusted.
It is also common in assessment procedures to wonder how the results have changed over time. This was of particular interest in the two semesters where this procedure was adopted as one section of Econometrics was taught in person and the other was taught over Zoom. The results suggest, both visually and by the statistical tests provided by the software, that students perform equally both semesters.
Note that in the case of all three courses, the department used the print feature discussed at the bottom of the "Interpretation" section to produce PDF appendices for the university's assessment committee.
Smith and Wooten (2023) includes a Python package that can be used by researchers who are interested in adopting the method presented in the paper. This can be a good option for researchers who are interested in integrating the estimation routine into a larger data pipeline and are familiar with Python.
However, even in the research context, the web application might be all the researcher needs to test their intervention. As highlighted in the two sections above, the web app can create separate groups of students and compare them both visually and statistically. p-values for following statistical tests are provided: Mann-Whitney, Kruskal–Wallis, Anderson-Darling, and Kolmogorov-Smirnov. Thus, as in the example provided in the introductory video, if a treatment is implemented for some of the students, the researcher can statistically compare these students to those who did not recieve the treatment.