The Latest Research Available - At Your Fingertips!


Formative Assessment & Standards-Based Grading
Robert J. Marzano

The following tips from this book are designed to assist you in applying the latest research in tangible ways in your classroom, your school or your district. Below each tip, you will find the book excerpt on which the tip is based. Click on the book title above to learn more about this resource.
Sign up for tips

Student-generated assessments are probably the most underutilized form of classroom assessment. As the name implies, a defining feature of student-generated assessments is that students generate ideas about the manner in which they wikll demonstrate their current status on a given topic. To do so, they might use any of the types of obtrusive assessments discussed in the preceding text (see page 23–24).

For example, one student might say that she will provide oral answers to any of the twenty questions in the back of chapter 3 of the science textbook to demonstrate her knowledge of the topic of habitats. Another student might propose that he design and explain a model of the cell membrane to demonstrate his knowledge of the topic.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 25). Bloomington, IN: Marzano Research Laboratory.

A score of 0 is never recorded in the gradebook if a student has missed an assessment or has not completed an assignment. Many assessment researchers and theorists have addressed this issue in some depth (see Reeves, 2004; Guskey & Bailey, 2001). Briefly, no score should be entered into a gradebook that is not an estimate of a student’s knowledge status for a particular topic at a particular point in time.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 85). Bloomington, IN: Marzano Research Laboratory.

Scales that have been rewritten in student-friendly language should provide students with clear guidance as to what it would look like to demonstrate score 2.0, 3.0, and 4.0 competence (see Table 3.7 for an example of a student-friendly scale). It is much more likely that students have really considered and come to understand the goals when teachers give the class the opportunity to rewrite the scale(s) in their own words.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 46, 141). Bloomington, IN: Marzano Research Laboratory.

One fact that must be kept in mind in any discussion of assessment—formative or otherwise—is that all assessments are imprecise to one degree or another. This is explicit in a fundamental equation of classical test theory that can be represented as follows:

Observed score = true score + error score

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 13). Bloomington, IN: Marzano Research Laboratory.

One very important consideration when interpreting scores from assessments or making inferences about a student based on an assessment is the native language of the student. Christy Kim Boscardin, Barbara Jones, Claire Nishimura, Shannon Madsen, and Jae-Eun Park (2008) conducted a review of performance assessments administered in high school biology courses. They focused their review on English language learners, noting that “the language demad of content assessments may introduce construct-irrelevant components into the testing process for EL students” (p. 3).

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 14–15). Bloomington, IN: Marzano Research Laboratory.

In a standards-referenced system, a student’s status is reported (or referenced) relative to the performance standard for each area of knowledge and skill on the report card; however, even if the student does not meet the performance standard for each topic, he or she moves to the next level. Thus, the districts that claim to have standards-based systems in fact have standards-referenced systems.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 18–19). Bloomington, IN: Marzano Research Laboratory.

It is important to keep two things in mind when considering the practice of formative assessment. The first is that, by definition, formative assessment is intimately tied to the formal and informal processes in classrooms. The second thing to keep in mind is that while there is a good deal of agreement about its potential as a tool to enhance student achievement, the specifics of formative assessment are somewhat elusive.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 9). Bloomington, IN: Marzano Research Laboratory.

While one might characterize the work on learning progressions as relatively new and therefore relatively untested, it is related to a well-established and heavily researched area of curriculum design—learning goals. One might think of learning progressions as a series of related learning goals that culminate in the attainment of a more complex learning goal. Learning progressions can also be used to track student progress.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 11). Bloomington, IN: Marzano Research Laboratory.

At the classroom level, any discussion of assessment ultimately ends up in a discussion of grading. Not only are teachers responsible for evaluating a student’s level of knowledge or skill at one point in time through classroom assessments, they are also responsible for translating all of the information from assessments into an overall evaluation of a student’s performance over some fixed period of time (usually a quarter, trimester, or semester).This overall evaluation is in the form of some type of overall grade commonly referred to as an “omnibus grade.” Unfortunately, grades add a whole new layer of error to the assessment process.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 15). Bloomington, IN: Marzano Research Laboratory.

Grading that references student achievement to specific topics within each subject area is growing in popularity. This is called standards-based grading, and many consider this method to be the most appropriate method of grading. Where there is interest in this system, however, there is also quite a bit of poor practice on top of considerable confusion about its defining characteristics.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 17). Bloomington, IN: Marzano Research Laboratory.

The score of Emerging on the rubric has been equated to the scale score of 1.5 or 2.0. For an individual student’s composition, a teacher would have to make a determination of the score of Emerging represented a complete demonstration of the simpler aspects of organization (score 2.0) or a partial demonstration of the simpler aspects of organization (score 1.5).

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 52). Bloomington, IN: Marzano Research Laboratory. 

 

 To design a standards-referenced system, a district must first reorganize or “reconstitute” state standards into a format that can be used to track student progress using formative and summative scores.

The process starts by rewriting the standards at each grade level into a series of learning goals. Each goal is accompanied by a scale that has been constructed using the guidelines provided. In effect, the school or district creates the scales for all teachers at each grade level.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 112). Bloomington, IN: Marzano Research Laboratory.

Because no overall grades are computed in standards-based systems and because of the emphasis on demonstrating proficiency in each and every learning goal before a student progresses to the next level, reporting is typically focused on learning goals and opposed to measurement topics. This is why each ratio recorded reports the number of learning goals for which score 3.0 proficiency or higher has been attained and the number of learning goals still remaining.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 120). Bloomington, IN: Marzano Research Laboratory. 

A = 3.00 to 4.00
B = 2.50 to 2.99
C = 2.00 to 2.49
D = 1.00 to 1.99
F = Below 1.00

It is important to remember when considering overall letter grades (commonly referred to as a type of “omnibus grade”) that any attempt to summarize a student’s status across a variety of topics involves arbitrary decisions regarding where to end one grade designation and where to begin another. In the example above, an A begins with an average of 3.0 for summative scores on learning goals. The grade of B ranges from 2.50 to 2.99 and so on. There is logic to this system. Namely, the A begins at 3.0 because a score of 3.0 indicates that a student demonstrated understanding all of the content in a target learning goal with no major errors or omissions.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 106). Bloomington, IN: Marzano Research Laboratory.

Back to top

Uneven patterns of formative scores require particular scrutiny by teachers. To illustrate, consider the following sequence of formative scores: 2.0, 3.0, 2.5, and 2.0. This sequence is not easy to interpret because it does not represent a clear upward trend. The student started with a score of 2.0 and ended with a score of 2.0. In between, the student received scores of 3.0 and 2.5. Obviously, there is no clear progression of learning. The teacher has little option but to collect more information from the student when uneven patterns of scores occur. This might take the form of asking the student what she believes she deserves. If the student says she deserves a final score of 3.0, the teacher would invite her to suggest a student-generated assessment to verify the score 3.0 status. Alternatively, the teacher might engage her in a probing discussion to determine her true status. In summary, the operative behavior when a teacher observes an uneven pattern of formative scores is to gather more information about the student, using other forms of assessment.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 82-83). Bloomington, IN: Marzano Research Laboratory.
 

One of the major advantages of using the [0–4 point] scale described in this book is that teachers and students can celebrate two types of achievement at any point in time—current status and knowledge gain. Current status refers to a student’s score at the end of a particular interval of time—usually a quarter, trimester, or semester. Consequently, a teacher can acknowledge all of those students who have a score of 4.0 on a given learning goal, all of the students with a score of 3.5, and so on. In addition, the teacher can and should celebrate knowledge gain. Knowledge gain is the difference between a student’s initial formative score and his or her score at the end of a quarter, trimester, or semester. For example, assume that for a specific learning goal, a student’s initial formative score was 1.5. At the end of the quarter, the student’s summative score is 2.5. The “gain” score for that student is 1.0.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 96). Bloomington, IN: Marzano Research Laboratory.
 

In a standards-referenced system, a student’s achievement is reported (or referenced) in
relationship to his or her position on the scales for specific learning goals. However, even if the student does not achieve a specific score on the scales for those goals, the student still moves on to new learning goals the next year when he or she has matriculated to a new grade level. In a standards-based system, students do not move on to a new level of content until they have mastered the content at their current level.

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 112). Bloomington, IN: Marzano Research Laboratory.

 

A teacher using any one of the four approaches [to formative assessment] described in chapter 5 can still translate student achievement into a traditional overall grade. Before addressing the issue of grading, though, it is necessary to revisit the issue of averaging. In chapter 2, a strong case was made that formative scores for a particular learning goal should not be averaged to construct a summative score. This, of course, is perfectly accurate, since averaging scores for a particular learning goal does not take into account that learning has occurred from one assessment to another. However, averaging is a viable option when performance across learning goals is being aggregated. For example, assume that a particular student has received the following summative scores for six learning goals addressed during the grading period: 2.5, 3.0, 2.0, 4.0, 3.0, and 3.5. The numeric average of 3.0 would be a good summary score representing typical final status for the student across the six learning goals.

Many districts and schools employ traditional A, B, C, D, and F letter grades. To translate the average
score on the six learning goals into a grade, a simple guide is needed:

A = 3.00 to 4.00
B = 2.50 to 2.99
C = 2.00 to 2.49
D = 1.00 to 1.99
F = Below 1.00

Marzano, R. (2010). Formative Assessment & Standards-Based GradingGet Book Info (p. 105). Bloomington, IN: Marzano Research Laboratory.

Back to top

Extended constructed-response items require students to construct a detailed answer to a question or prompt. Most commonly, responses come in the form of essays. According to Mark Durm (1993), essays were one of the first forms of assessment used in public education…. Typically, essays are used to assess more complex content—score 3.0 and 4.0 content. This makes intuitive sense, as more complex information requires more detailed explanation. However, as shown in the essay prompt accompanied by table 4.10, essay tasks can be designed to address score 2.0 content also.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 68). Bloomington, IN: Marzano Research Laboratory.
 

student taking test

Most of what can be assessed through the medium of written response can also be assessed using oral responses. Many times, though, oral responses are used to provide instructional feedback as opposed to formative scores. This is particularly true with short oral responses….

As we saw in chapter 2, when assessments are not scored or recorded, they are referred to as instructional feedback and help both students and teachers understand what is clear and not clear about the content. As this example illustrates, short oral responses are a perfect vehicle for instructional feedback. Oral responses typically take one of two forms when they are used to generate formative scores: formal oral reports or probing interviews.

Marzano, R. (2010). Formative assessment & standards-based gradingGet Book Info (p. 70). Bloomington, IN: Marzano Research Laboratory.

Probing discussions are one of the most powerful forms of oral assessment. The teacher meets one-on-one with a particular student and asks him or her to explain or demonstrate something. For example, during a unit on relationships found in nature, a middle school science teacher might sit next to a student and ask her to explain the similarities and differences between mutualism, symbiosis, and commensalism. As the student explains these concepts, the teacher would ask probing questions that help to clarify what she knows and does not know. The scale that had been designed for this content would guide the teacher as to the types of questions he should ask to determine the scale score that most accurately represents the student’s status at that point in time.

Marzano, R. (2010). Formative assessment & standards-based gradingGet Book Info (p. 71). Bloomington, IN: Marzano Research Laboratory.

Back to top

Demonstrations are typically used with skills, strategies, or processes. Every subject area contains content that lends itself to demonstrations, though some subject areas emphasize skills, strategies, and processes more than others. Table 4.11 lists content from a number of subject areas that might readily be assessed through demonstration.

Table 4.11 Subject-Area Content for Demonstrations

Subject Area

Content That Can Be Assessed Through Demonstration

Language arts

• Using persuasive techniques in a composition
• Using a literary device in a creative composition

Mathematics

• Computations of measures of central tendency
• Analysis of data displays
• Measurement of length, weight, and temperature

Science

• Hypothesis formulation and testing
• Making relevant observations

Social studies

• Using a map’s legend to gather information

Physical education

• Coordinating hand- and footwork during a game
• Demonstrating teamwork during a game

Art

• Creating a dramatic character
• Mixing and using color

Technology

• Creating a data display in Excel
• Using spell check and grammar check

Marzano, R. (2010). Formative assessment & standards-based gradingGet Book Info (p. 72). Bloomington, IN: Marzano Research Laboratory.

Selected-response items are commonly used in obtrusive assessments. They are referred to as selected response because they require students to select an answer from among a set of options.

Common types of selected-response items are multiple choice, matching, alternative choice, true/false, multiple response, and fill in the blank. We consider each very briefly. For a more detailed discussion of these formats, see Marzano (2006).

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 59). Bloomington, IN: Marzano Research Laboratory.
 

student holding books Short constructed-response items require students to construct a correct answer as opposed to recognizing one (as is the case for selected-response items). To this extent, they are more difficult than are selected-response items. While selected-response items are either correct or incorrect, short constructed-response items have shades that range from totally incorrect to totally correct. Of course, quantifying the intervals between totally correct and totally incorrect is the difficult part of scoring these items.

The following are short constructed-response items for mathematics, language arts, science, and social studies.

Mathematics: Explain the steps necessary for finding the volume of a pyramid.

Language arts: What persuasive techniques did you find in the reading passage? Describe whether they were effective or ineffective. Use specific examples in your answer.

Science: Briefly explain the concept of circadian rhythm and why it is important.

Social studies: Briefly explain the major accomplishments of Susan B. Anthony.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 64). Bloomington, IN: Marzano Research Laboratory.

Back to top

Table 4.8 Pattern of Responses for a Particular Student

Section

Item

Item Code

Score Value

I

1

2

3

4

5

C

C

C

C

C

2.0

2.0

2.0

2.0

2.0

II

6

7

8

HP

C

LP

3.0

3.0

3.0

III

9

10

I

I

4.0

4.0

Clearly, the student depicted in table 4.8 has achieved at least score 2.0 status, since the items that pertain to that score value (section I) were answered correctly. Also, it is clear the student has not achieved score 4.0 status, since none of the score 4.0 items (section III) were answered correctly. The issue, then, is to determine where the student falls between score values 2.0 and 3.0. The student has answered one of the score 3.0 items totally correctly (item 7), one score 3.0 item with a code of low partial credit (item 8), and one score 3.0 item with a code of high partial credit (item 6). Given this pattern, the teacher would have to make a judgment as to the appropriate scale score. If half-point intervals were being used, the judgment would probably be fairly straightforward—the student would receive a score of 2.5.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (pp. 66-67). Bloomington, IN: Marzano Research Laboratory.
 

An aberrant pattern can occur for a number of reasons, including the following:

  • The items written for a particular score value were flawed in some way.
  • Students put effort into answering some items but not others.
  • The teacher’s evaluation of student responses was inaccurate.

A teacher can do many things to reconcile the issue of aberrant patterns of response, including the following:

  • Ignoring items that appear to be aberrant
  • Meeting with individual students who display such patterns and asking them to reconcile the issues
  • Reclassifying items at a higher or lower score value based on the responses of the entire class

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 67). Bloomington, IN: Marzano Research Laboratory.

Back to top

Studies have demonstrated that when teachers design their own assessments and assign points to the items in those assessments, students can obtain very different total scores from teacher to teacher simply because the teachers weight items differently (see Marzano, 2002). For example, Haponstall (2009) described a study in which 557 teachers all graded one paper using the 100-point scale. While the majority of the scores were between 59 and 73, scores ranged from 38 to 91 (pp. 27–28).

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 41). Bloomington, IN: Marzano Research Laboratory.

Clearly, a better method for developing and scoring assessments is needed—one that ensures that the scale (the size of an inch) stays the same from one assessment to the next and that a teacher applies the same logic to scoring each assessment. As the preceding discussion illustrates, such a method would exclude the typical use of the 100-point scale.

A Rigorous Rubric-Based Approach

The concept of a rubric has been around for many years. Although the term is used in a variety of ways in the assessment community, its roots can be traced to the Latin rubica terra, referring to the use of red earth centuries ago to mark or signify something of importance. In the assessment world today, the term rubric usually applies to a description of knowledge or skill for a specific topic like that shown in table 3.3.

Table 3.3 A Rubric for the Social Studies Topic of World War II at Grade 6

4

The student will create and defend a hypothesis about what might have happened if specific events that led to World War II had not happened or had happened differently.

3

The student will compare the primary causes for World War II with the primary causes for World War I.

2

The student will describe the primary causes for World War II.

1

The student will recognize isolated facts about World War II.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (pp. 41-42). Bloomington, IN: Marzano Research Laboratory.

To solve the problem of inconsistent rubrics from teacher to teacher, it is necessary to develop a systematic approach to rubric design. In the books Classroom Assessment and Grading That Work (Marzano, 2006) and Making Standards Useful in the Classroom (Marzano & Haystead, 2008), a case is made that teams of teachers and/or curriculum specialists representing the district or school should design the rubrics for the content at each grade level and provide them to teachers. This is certainly the best approach to rubric design and is recommended highly. Simply put, the most powerful approach is for a district or school to provide teachers with the rubrics to be used over an entire year for a given subject area.

If a district or school does not provide such resources for teachers, then individual teachers can and should design their own rubrics using a systematic approach. Just how to approach the design of rigorous rubrics is addressed in depth in the book Designing and Teaching Learning Goals and Objectives (Marzano, 2009).
 
Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 43). Bloomington, IN: Marzano Research Laboratory.

To make scales more useful to students, they should be written in student-friendly language. This should be done in cooperation with students. The teacher should introduce each scale to students as it is used in class; explain what is meant by the content placed at the score values 4.0, 3.0, and 2.0; and then have the entire class participate in rewriting the content at each score value in a manner that makes it easy for students to understand.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (pp. 45-46). Bloomington, IN: Marzano Research Laboratory.

Back to top

Before delving into the anatomy of formative assessment, we should begin with a working definition of classroom assessment in general. Paraphrasing from the distinctions made in Classroom Assessment and Grading That Work (Marzano, 2006), we will define a classroom assessment as anything a teacher does to gather information about a student’s knowledge or skill regarding a specific topic. This definition is very much in keeping with the general descriptions of assessment provided by Black and Wiliam in their 1998 article titled “Inside the Black Box: Raising Standards Through Classroom Assessment.”

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 22). Bloomington, IN: Marzano Research Laboratory.

The similarities in definitions for the general construct of assessment and the more specific construct of formative assessment highlight the need for clearer distinctions. Examining types of assessment in contrast to uses of assessment helps provide these distinctions.

Types of Classroom Assessments

According to table 2.1, there are three types of assessments a teacher might use in the classroom: obtrusive assessments, unobtrusive assessments, and student-generated assessments. Each can and should be used in a comprehensive system of formative assessment.

Table 2.1 Distinctions Regarding Classroom Assessments

Types of Classroom Assessment

Obtrusive

Unobtrusive

Student generated

Uses of Classroom Assessment

Formative scores

Summative scores

Instructional feedback

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 23). Bloomington, IN: Marzano Research Laboratory.

In fact, it would be accurate to say that, in general, a specific assessment is neither formative nor summative—it all depends on how the information is used. … This noted, it is also true that assessments can and perhaps should be tailored to collect data that will be used for either formative purposes or summative purposes but not both. As noted by Pellegrino, Chudowsky, and Glaser (2001): "Often a single assessment is used for multiple purposes; in general however, the more purposes a single assessment aims to serve, the more each purpose will be compromised" (p. 2).

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 27). Bloomington, IN: Marzano Research Laboratory.

Back to top

To construct a summative score, the teacher examines the student’s pattern of responses over time. The teacher does not compute an average of the student’s formative scores to construct a summative score. This would be an absolute violation of the principles of formative assessment. The technical reason averaging makes little sense is explained in some depth in Classroom Assessment and Grading That Work (Marzano, 2006). The short version of that explanation is that averaging makes sense only if no learning has occurred from assessment to assessment or if assessments measure very different things. Obviously, in a formative system, all assessments for a particular topic will be on the same topic.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 28). Bloomington, IN: Marzano Research Laboratory.

One of the defining features of the process of formative assessment as described in this book is that it provides information to students and teachers regarding adaptations they might make to improve performances. On the students’ side, this involves identifying the specific content they must improve on and things they might do to improve. For example, after receiving instructional feedback on her use of the overhand throw, a student realizes that she needs to hold the softball looser when she throws. She decides to try this the next time she is in gym class.

On the teacher’s side, behavior change involves identifying content that must be reviewed or retaught.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 33). Bloomington, IN: Marzano Research Laboratory.

Feedback can be given formally or informally in group or one-on-one settings. It can take a variety of forms. As the preceding definitions illustrate, its most important and dominant characteristic is that it informs the student, the teacher, and all other interested parties about how to best enhance student learning.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 3). Bloomington, IN: Marzano Research Laboratory.

Interestingly, though the evidence for the effectiveness of feedback has been quite strong, it has also been highly variable.… This, of course, raises the critically important questions, What are the characteristics of feedback that produce positive effects on student achievement, and what are the characteristics of feedback that produce negative effects? In partial answer to this question, Kluger and DeNisi found that negative feedback has an ES of negative 0.14. This translates into a predicted decrease in student achievement of 6 percentile points. In general, negative feedback is that which does not let students know how they can get better.

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 5). Bloomington, IN: Marzano Research Laboratory.

Back to top

student taking test Formative assessment has become very popular in the last decade. It is typically contrasted with summative assessment in that summative assessments are employed at the end of an instructional episode while formative assessments are used while instruction is occurring. As Susan Brookhart (2004, p. 45) explained, “Formative assessment means information gathered and reported for use in the development of knowledge and skills, and summative assessment means information gathered and reported for use in judging the outcome of that development.”

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 8). Bloomington, IN: Marzano Research Laboratory.

A study done by Herman and Choi (2008) asked two questions: How accurate are teachers’ judgments of student learning, and how does accuracy of teachers’ judgments relate to student performance? They found that “the study results show that the more accurate teachers are in their knowledge of where students are, the more effective they may be in promoting subsequent subject learning” (p. 18).

Marzano, R. J. (2010). Formative assessment & standards-based gradingGet Book Info (p. 14). Bloomington, IN: Marzano Research Laboratory.

Back to top