Chapter 2 in Boudett, Kathryn Parker., Elizabeth A. City, and Richard J. Murnane. Data Wise: A Step-by-step Guide to Using Assessment Results to Improve Teaching and Learning. Cambridge, MA: Harvard Education, 2005. Print.
Principles for Interpreting Assessment Results:
- Sampling Principle of Testing – tests are not direct measures of mastery; conclusions about mastery in a domain are based on inferences drawn from a smaller sample
- Discrimination – assessment items that discriminate, separate students who know from students who don’t; use these items to reveal differences that actually exist
- Measurement error – many sources – examples: differences in questions, different student moods, etc.
- Reliability – conditions that give consistent results
- Score inflations – scores inflated by cheating, luck, etc ..
Accounting for Sampling & Measurement Errors:
- Measures and attach error bands to test scores
- Keep test scores in perspective – tests have limitations, they don’t assess all important skills, course grades can not be compared across schools like standardized tests
Different Ways of Reporting Performance:
- Beware of raw scores – not normalized for difference of questions between tests
- Norm referenced tests – PRO: results are normalized by results of all testers, CON – improvements are easy near bottom and harder near the top
- Criterion referenced tests – measure mastery of specific skills, PRO – gives specific information CON – vulnerable to sampling errors due to small number of questions per skill
- Standards-Referenced tests – tests based on content / performance standards. PRO – gives specific information relating to standards CON – vulnerable to sampling errors due to small number of questions per skill
- Reliability vs detail – more detail tends to smaller sample sizes which tend to be less reliable
- it’s risky to draw conclusions from small sample sizes – e.g. standards assigned to only 1 – 2 questions on a test
How to Measure Improvement
- Cohort-to-cohort – compares same grade, different students – CON – susceptible to demographic biases
- Value-added, longitudinal – compares same students are different times – CON – susceptible to a lot of error PRO – works well with cumulative subjects
Strategies for Interpreting Data
- Look beyond 1 year with cohort-to-cohort or value-added data
- Compare results to relevant district and state data
- Compare results to most recent assessments on similar standards
Knowing sources of data error can teach one to use data selection, summarizing and interpretation strategies that minimize impacts of data error. Teachers planning to investigate data with students need to know sources of data error in order to teach them to students and to teach students how to make careful interpretations of data. Using the strategies for data above can help highlight important data features by using comparisons to relevant reference groups.
- Decide what data sources will be summarized.
- Select methods for summarizing data that compare data to relevant reference groups – these can include comparisons to other schools, other districts, state scores and to the same students at another time in the school year
- Script questions stakeholders (students) will use to investigate the data. For more details, see this article.
- Create data displays that provoke data discussion. See this article for more details.
Early Implementation Steps
- Facilitate data meeting with students that has them use reflection prompts to draw conclusions about data displays as individuals, in pairs, and as a whole group
- Brainstorm with students how to further explore key questions and next steps for improving key trends
Advanced Implementation Steps
- Gather feedback from students aimed at improving data conversations and use these to fine tune future discussions
- Have students brainstorm data goals that can be measured with specific upcoming assessments
- Have students brainstorm methods for gathering more reliable data related to group academic goals