Tennessee's National Impact on Teacher Evaluation Law & Policy

Tennessee Journal of Law and Policy Tennessee Journal of Law and Policy

Volume 13 Issue 2 Article 3

July 2021

Tennessee's National Impact on Teacher Evaluation Law & Policy: Tennessee's National Impact on Teacher Evaluation Law & Policy:

An Assessment of Value-Added Model Litigation An Assessment of Value-Added Model Litigation

Mark Paige

University of Massachusetts - Dartmouth

, [email protected]

Audrey Amrein Beardsley

Arizona State University

, audrey[email protected]

Kevin Close

Arizona State University

, [email protected]

Follow this and additional works at: https://ir.law.utk.edu/tjlp

Part of the Law Commons

Recommended Citation Recommended Citation

Paige, Mark; Beardsley, Audrey Amrein; and Close, Kevin (2021) "Tennessee's National Impact on Teacher

Evaluation Law & Policy: An Assessment of Value-Added Model Litigation,"

Tennessee Journal of Law and

Policy

: Vol. 13 : Iss. 2 , Article 3.

Available at: https://ir.law.utk.edu/tjlp/vol13/iss2/3

This Article is brought to you for free and open access by Volunteer, Open Access, Library Journals (VOL Journals),

published in partnership with The University of Tennessee (UT) University Libraries. This article has been accepted

for inclusion in Tennessee Journal of Law and Policy by an authorized editor. For more information, please visit

https://ir.law.utk.edu/tjlp.

Tennessee Journal of Law and Policy Tennessee Journal of Law and Policy

Volume 13

Issue 2

(Winter 2019)

Article 4

February 2019

Tennessee's National Impact on Teacher Evaluation Law & Policy: Tennessee's National Impact on Teacher Evaluation Law & Policy:

An Assessment of Value-Added Model Litigation An Assessment of Value-Added Model Litigation

Mark A. Paige

University of Massachusetts - Dartmouth

Audrey Amrein-Beardsley

Mary Lou Fulton Teachers College

Kevin Close

Arizona State University

Follow this and additional works at: https://trace.tennessee.edu/tjlp

Part of the Education Law Commons, and the Law and Politics Commons

Recommended Citation Recommended Citation

Paige, Mark A.; Amrein-Beardsley, Audrey; and Close, Kevin (2019) "Tennessee's National Impact on

Teacher Evaluation Law & Policy: An Assessment of Value-Added Model Litigation,"

Tennessee Journal of

Law and Policy

: Vol. 13 : Iss. 2 , Article 4.

Available at: https://trace.tennessee.edu/tjlp/vol13/iss2/4

This Article is brought to you for free and open access by Volunteer, Open Access, Library Journals (VOL Journals),

published in partnership with The University of Tennessee (UT) University Libraries. This article has been accepted

for inclusion in Tennessee Journal of Law and Policy by an authorized editor. For more information, please visit

https://trace.tennessee.edu/tjlp.

1

[523]

TENNESSEE JOURNAL

OF LAW AND POLICY

VOLUME 13 WINTER 2019 ISSUE 2

ARTICLE

TENNESSEE’S NATIONAL IMPACT

ON TEACHER EVALUATION LAW &

POLICY

AN ASSESSMENT OF VALUE-ADDED MODEL

LITIGATION

Mark A. Paige

*

Audrey Amrein-Beardsley

**

Kevin Close

†

Abstract

Over the last decade or so, federal and state education

policymakers embraced the use of value-added models

(VAMs) to evaluate teachers’ performance and make high-

stakes employment decisions (e.g., tenure, merit pay,

*

Mark A. Paige, J.D., Ph.D., is an associate professor of public

policy at the University of Massachusetts-Dartmouth. Before

becoming a professor, he was an education law attorney

representing school districts.

**

Audrey Amrein-Beardsley, Ph.D., is currently a Professor in

Mary Lou Fulton Teachers College. Her research interests

include educational policy, educational measurement and

research methods.

†

Kevin Close, M.Ed., is a doctoral candidate and graduate

assistant researcher at the Arizona State University.

12

TENNESSEE JOURNAL OF LAW AND POLICY 
VOLUME 13 | WINTER 2019 | ISSUE 2 
 
[524] 
termination  of  employment).  VAMs  are  complicated 
statistical  models  that  attempt  to  estimate  a  teacher’s 
contribution to  student  test  scores, particularly  those  in 
mathematics  and  reading.  Educational  researchers,  as 
well as many teachers and unions, however, have objected 
to  the  use  of  VAMs  noting  that  these  models  fail  to 
adequately  account  for  variables  outside  of  teachers’ 
control  that  contribute  to  a  student’s  education 
performance. Subsequently, many teachers challenged the 
use of VAMs through the courts. This article assesses those 
challenges. 
 
 
I. Introduction  525 
II. VAMS: Promise and Controversy   530 
   A. A Brief History of VAMs in Educational Policy  530 
      1. The Rise of VAMs in National Education Policy:  
          Race to the Top  532 
   B. Statistical and Practical Controversies  535 
      1. Reliability  535 
      2. Validity  537 
      3. Bias  538 
      4. Transparency  540 
      5. Fairness  542 
      6. Consequential Use  543 
      7. Intended Consequences  544 
      8. Unintended Consequences  545 
III. The Cases  547 
   A. Federal Substantive Due Process Rights & Equal  
        Protection Arguments: VAMs May Be Unwise But  
        Still Constitutional  547 
      1. Cook v. Bennett  547 
      2. Trout v. Knox County Board of Education  551 
      3. Wagner v. Haslam  553 
      4. Matter of Lederman v. King  554 
   B. Legislative State Agency Authority Questioned  555 
      1. Leff v. Clark County School District  555 
      2. Stapleton v. Skandera  557 
      3. Louisiana Federation of Teachers v. State  559 
23

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[525]

4. Robinson v. Stewart 560

5. Filed but not Adjudicated 561

C. Process & “Fundamental Fairness” Cases 563

1. Houston Federation of Teachers 563

2. Washington Teachers’ Union v. D.C. Public

Schools 566

IV. Current Policy Landscape in Wake of ESSA 569

A. ESSA Reauthorization 570

B. State Plans 572

V. Conclusions 573

I. Introduction

In March of 2017, William “Bill” Sanders passed

away in Tennessee.

1

To most policymakers outside of

education (and many within it) he was a relatively

unknown statistician. His work in education policy

started far away from schoolhouses. Indeed, after he

received his degree in statistics at the University of

Tennessee, he began assessing the impact of radiation on

farm animals.

2

But his career trajectory changed markedly. In

1982, after reading a newspaper article about how

Tennessee Governor Lamar Alexander sought a model of

teacher compensation that would pay teachers for

performance, Mr. Sanders concluded he had the answer.

3

He wrote to Alexander explaining that he developed a

statistical model that could determine who the “best”

teachers were—a so-called “value-added” model (e.g., the

Tennessee Value-Added Assessment System (TVAAS)

1

Kevin Carey, The Little-Known Statistician Who Taught Us

to Measure Teachers, N.Y. TIMES (May 19, 2017),

https://www.nytimes.com/2017/05/19/upshot/the-little-known-

statistician-who-transformed-education.html [https://perma.

cc/2VBF-CZWY].

2

Id.

3

Id.

34

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[526]

which is more generally known as the Education Value-

Added Assessment (EVAAS)).

4

This model estimates a

teacher’s contribution to student achievement on

standardized tests,

5

and it formed the basis for his

private company that developed algorithms for the

models.

6

Tennessee ultimately incorporated value added

models into policies and laws, linking high-stakes

employment decisions and evaluation to student test

scores.

7

Mr. Sanders’s models—sparked by this random

collision of events—has had profound impact on national

educational policy. In 2009, President Obama’s Race to

the Top (RttT) program conditioned state receipt of

federal education dollars on states’ use of VAMs to

evaluate and make employment decisions for teachers.

States seeking much-needed

federal money during the

4

Id. VAMs have a policy history that precede Mr. Sanders’s

adoption of the term in education. They had been used in

economics since the 1960s. See, e.g, Douglas Harris, Would

Accountability Based on Teacher Value Added Be Smart

Policy? An Examination of the Statistical Properties and Policy

Alternatives, 4 J. EDUC. FIN. & POL’Y 319, 321 (2009). Yet

Sanders is widely credited as the one who popularized the use

of VAMs for educational accountability. E.g., Carey supra note

1.

5

E.g., EDWARD WILEY, A PRACTITIONER’S GUIDE TO VALUE

ADDED ASSESSMENT 5 (2006) https://nepc.colorado.edu/

publication/a-practitioners-guide-value-added-assessment-

educational-policy-studies-laboratory-resea [https://perma.cc/

EH6R-S7QN].

6

SAS® EVAAS® FOR K-12, https://www.sas.com/en_si/

software/evaas.html [https://perma.cc/65TE-VEFG] (crediting

the development of this particular model sold by a private

company to Mr. Sanders).

7

TENN. CODE ANN. §§ 49-1-302(a)(2)(C), 49-5-503(4) (2016);

TENN. STATE BD. OF EDUC., TEACHER AND ADMINISTRATOR

POLICY § 5.201 (2017) (statutory and regulatory framework

delegating authority to state department of education to

develop policy for evaluation and further linking that

evaluation to tenure determinations).

45

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[527]

“Great Recession” eagerly complied.

8

As a consequence,

VAMs became codified in state teacher evaluation and

employment laws across the country.

9

Despite their widespread adoption, the use of

these statistical models in improving public schools is a

source of considerable debate in law and policy. Some

scholars applaud their use, arguing that they provide a

clear measure of a teacher’s worth and address a

persistent policy dilemma: How to improve the quality of

our public school teachers.

10

Detractors insist that a

teacher’s value is much more than the measure of test

scores and, more importantly, that VAMs are statistically

flawed.

11

Critics note that VAMs fail to account for the

complexity of teaching and cannot accurately control for

the impact of other variables (e.g., students’ individual

8

See generally Rhoda Freelon et al., Overburdened and

Underfunded: California Public Schools Amidst the Great

Recession, 2 MULTIDISCIPLINARY J. EDUC. RES., 152 (2012)

(documenting the impact of the Great Recession on public

schools in California, but also noting the broader impact of the

recession on schools and institutions beyond California).

9

KATHRYN M. DOHERTY & SANDI JACOBS, STATE OF THE STATES

2013: CONNECT THE DOTS: USING EVALUATIONS OF TEACHER

EFFECTIVENESS TO INFORM POLICY AND PRACTICE 10 (2013)

(noting that in 2013 at least 31 states had adopted the use of

standardized test in their teacher evaluation protocols); see

also MARK A. PAIGE, BUILDING A BETTER TEACHER:

UNDERSTANDING VALUE-ADDED MODELS IN THE LAW OF

TEACHER EVALUATION 15, 16 (2016) (describing the links

between teacher evaluation systems and teacher employment

statutes, such as tenure, and warning against such use for

high-stakes decisions).

10

See, e.g., Eric A. Hanushek, Conceptual and Empirical Issues

in the Estimation of Educational Production Functions, 14 J.

HUM. RESOURCES 351, 353 (arguing for the adoption of

production function models to evaluate teachers).

11

E.g., Linda Darling-Hammond, Can Value-Added Add Value

to Teacher Evaluation?, 44 EDUC. RESEARCHER 132, 133

(placing the use of value added models in the larger policy

debate about how to improve teacher quality).

56

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[528]

motivation) that impact student achievement.

12

Because

of these issues, commentators cautioned against the use

of VAMs in high-stakes employment decisions (e.g.

termination), noting such use would invite legal action.

13

Notwithstanding these warnings, many states

embraced VAMs. Florida, for example, amended their

teacher evaluation statutes to ensure that VAMs played

a controlling role in teacher employment status,

including tenure decisions.

14

Teachers and unions almost

immediately challenged the use of VAMs through legal

means. Lawsuits ranged from violations of the Federal

Constitution

15

to assertions that requirements to use

VAMs violated the non-delegability doctrine.

16

Many of

these received widespread attention in the popular

press.

17

12

Id.; see also SEAN P. CORCORAN, CAN TEACHERS BE

EVALUATED BY THEIR STUDENTS’ TEST SCORES? SHOULD THEY

BE? THE USE OF VALUE-ADDED MEASURES OF TEACHER

EFFECTIVENESS IN POLICY AND PRACTICE 22 (2010).

13

PAIGE, supra note 9, at 22 n.28; see also Preston C. Green III

et al., The Legal and Policy Implication of Value-Added

Teacher Assessment Policies, 2012 BYU EDUC. & L.J. 1, 15–16

(2012).

14

E.g., FLA. STAT. ANN. § 1012.22(1)(c)(5) (West 2013)

(connecting teacher salary to an evaluation system that

requires use of VAMs).

15

E.g., Cook v. Bennett, 792 F.3d 1294, 1298 (11th Cir. 2015)

(alleging use of VAMs violated substantive and procedural due

process clauses, as well as the Equal Protection Clause of the

14th Amendment).

16

E.g., State ex rel. Stapleton v. Skandera, 346 P.3d 1191, 1194

(N.M. App. 2015).

17

E.g., Peter Greene, Over a Year Ago a Federal Court Struck

Down VAM: Why Are We Still Using it to Evaluate Teachers?,

FORBES (June 25, 2018, 08:23 PM), https://www.forbes.com/

sites/petergreene/2018/06/25/over-a-year-ago-a-federal-court-

struck-down-vam-why-are-we-still-using-it-to-evaluate-teachers/

[https://perma.cc/AA4M-NRQ5]; Patricia MacGregor-Mendoza,

Court Finds Teacher Evaluation System Flawed, LAS CRUCES

SUN NEWS (May 26, 2017, 07:17 PM), https://www.lcsun-

67

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[529]

It has been almost ten years since Race to the Top

brought Mr. Sander’s idea of VAMs from Tennessee to a

national scale, and it seems an appropriate moment to

assess their legal and policy ramifications. Indeed, as we

note, the use of VAMs has triggered a wave of litigation

and policy change that continues today. Many states

continue to use VAMs, while others have reduced their

use under new federal laws.

18

Thus, assessing the legal

and policy landscape forms the basis of this article.

Generally speaking, three lines of legal challenges

have emerged. First, some are grounded in the

substantive Due Process Clause and Equal Protection

Clause of the 14th Amendment, arguing that the laws do

not pass rational basis scrutiny.

19

Second, a line of cases

challenges the authority or jurisdiction of a particular

agency (e.g., state Department of Education) to enact

evaluation regulations or laws that use VAMs. Third,

some cases advance what we refer to as “process”

arguments. These contend that the use of VAMs violates

some agreed-upon or standing procedural terms found in

the Procedural Due Process Clause or collective

bargaining agreements (CBAs). As we note, plaintiffs

have captured the most success (although not always) on

this third line of argument.

That litigants have experienced more success

arguing VAMs offend certain procedural protections

comports with common understanding of procedural due

news.com/story/opinion/2017/05/26/court-finds-teacher-

evaluation-system-flawed/102219102/ [https://perma.cc/ESS8-

SXWX];Valerie Strauss, Judge Calls Evaluation of N.Y.

Teacher “Arbitrary” and “Capricious” in Case Against New U.S.

Secretary of Education, WASH. POST (May 10, 2016),

https://www.washingtonpost.com/news/answer-sheet/wp/

2016/05/10/judge-calls-evaluation-of-n-y-teacher-arbitrary-

and-capricious-in-case-against-new-u-s-secretary-of-

education/ [https://perma.cc/Y645-2T82].

18

See infra Part III.

19

See, e.g., Cook, 792 F.3d at 1298, 1300.

78

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[530]

process. At its core, procedural due process ensures

“fundamental fairness” when the government moves to

take away a protected interest, such as employment.

While courts generally have not overruled a legislature’s

policy choice to use VAMs as violative of the substantive

due process, they (including a federal appeals court case)

have questioned the wisdom of the legislature’s

decision.

20

Where they have overturned the use of VAMs,

they have done so on procedural grounds.

21

This allows

courts to stay within “their lane” and avoid jurisdictional

overreach into the policy area.

The article is organized as follows. Part I

overviews VAMs, their link to teacher evaluation and

employment, and the controversy surrounding their use,

especially as a factor in high-stakes employment

decisions. Part II provides the most current assessment

of cases where the statistical controversy has led to legal

action. Part III discusses the recent policy and legal

developments with respect the use of VAMs in evaluation

that have occurred because changes in federal education

law. In conclusion, we note that VAMs have receded,

somewhat, in terms of their role in evaluation and

employment matters.

II. VAMs: Promise and Controversy

A. A Brief History of VAMs in Educational

Policy

In the simplest of terms, VAMs (e.g., Tennessee’s

TVAAS) are statistical models used to measure the

predicted and the actual “value” a teacher “adds” to (or

detracts from) student achievement from the point at

which students enter a teacher’s classroom to the point

students leave. This is typically done using student

20

See id. at 1301.

21

See id. at 1301–02.

89

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[531]

achievement growth as measured by large-scale

standardized test scores (i.e., the tests mandated by the

No Child Left Behind (NCLB) Act of 2001). The models

attempt to statistically control for outside variables,

including students’ prior test performance, and student-

level background variables (e.g., whether students are

eligible for free-and-reduced lunches).

22

The most widely used VAM is the EVAAS,

developed and used in Tennessee.

23

EVAAS

comes in

different versions for different states (e.g., the EVAAS in

Ohio, North Carolina, and South Carolina, the PVAAS in

Pennsylvania, the TVAAS in Tennessee, and the

TxVAAS in Texas) and different ones based on large and

small school districts (e.g., located within Arkansas,

Georgia, Indiana, Texas, and Virginia). For each

consumer, EVAAS modelers choose one of two

sophisticated statistical models.

24

Using these models, student growth scores are

aggregated at the teacher or classroom level to yield

teacher-level value-added estimates. Depending on where

22

See e.g., Sean Corcoran & Dan Goldhaber, Value Added and

Its Uses: Where You Stand Depends on Where You Sit, 8 EDUC.

FIN. & POL’Y 418, 421 (2013). Other variables include things

such as, English language learners (ELLs), gifted, receiving

special education services, and classroom and school-level

variables (e.g., class sizes, school resources, school leadership).

23

The EVAAS is advertised as “the most comprehensive

reporting package of value-added metrics available in the

educational market” in that the EVAAS offers states, districts,

and schools “precise, reliable and unbiased results that go far

beyond what other simplistic [value-added] models found in the

market today can provide.” SAS® EVAAS ® FOR K-12,

https://www.sas.com/en_us/software/evaas.html [https://perma.cc/

76AY-G47W].

24

For a comprehensive statistical summation of the various

models and options available, see WHITE PAPER: SAS®

EVAAS® FOR K12 STATISTICAL MODELS, https://www.sas.

com/content/dam/SAS/en_us/doc/whitepaper1/sas-evaas-k12-

statistical-models-107411.pdf [https://perma.cc/F5EW-WCB6].

910

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[532]

teachers’ EVAAS

estimates fall, as compared to similar

teachers to whom they are compared (e.g., within

districts) at the same time, teachers’ value-added

determinations are made.

25

Thereafter, EVAAS modelers

make relativistic comparisons and rank teachers

hierarchically along a continuum.

26

Teachers whose

students grow significantly more than the average and/or

surpass projected levels of growth are identified as

“adding value”; teachers whose students grow

significantly less and/or fall short of projected levels are

identified as “detracting value.”

27

Teachers whose

students grow at rates that are not statistically different

from average (i.e., falling within one standard deviation

of the mean) are classified as Not Detectibly Different

(NDD).

28

1. The Rise of VAMs in National Education

Policy: Race to the Top

In 2007, TVAAS/EVAAS entered the national

education policy discussion when developer Dr. William

L. Sanders shared his research with Congress.

Specifically, he testified before the U.S. House of

Representatives Committee on Education and the

Workforce on how TVAAS could improve teacher

25

For a general overview of the use of VAMs and the concepts

noted herein, see WILEY, supra note 5.

26

Id.

27

Id.; Audrey Amrein-Beardsley & Clarin Collins, The SAS

Education Value-Added Assessment System (SAS® EVAAS®)

in the Houston Independent School District (HISD): Intended

and Unintended Consequences, 20 EDUC. POL’Y ANALYSIS

ARCHIVES, no. 12, Apr. 2012, at 1, 7 n.2.

28

WILEY, supra note 5; Amrein-Beardsley & Collins, supra

note 27, at 7 n.2; see, e.g., WILLIAM L. SANDERS, COMPARISONS

AMONG VARIOUS EDUCATIONAL ASSESSMENT VALUE-ADDED

MODELS 18 (2006).

1011

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[533]

accountability and promote educational reform.

29

His

testimony spurred the U.S. Department of Education’s

piloting of VAMs.

30

The use of VAMs nationally grew under the Race

to the Top program. By way of background, RttT was a

competitive federal grant program that amounted to an

injection of $4.35 billion to selected states to support

educational reform efforts.

31

Receipt of the grant was

conditioned on states developing teacher evaluation laws

and policy that used VAMs.

32

States that attached

relatively more serious consequences (e.g., employment

status) to teachers’ VAM-based output were viewed more

favorably than those that did not.

33

High-stakes

consequences included, but were not limited to: teachers’

permanent files being flagged, thus preventing teachers

from changing jobs within states; the revocation of

teacher licenses; teacher tenure; salary increases,

decreases, and merit pay; and teacher probation and

termination.

34

Beyond RttT, the federal government used other

mechanisms to embed VAMs in state evaluation and

employment matters as a matter of law and policy. In

2011, the federal government required that states adopt

the accountability practices discussed above

29

CHRISTOPHER B. SWANSON & JANELLE BARLAGE, INFLUENCE:

A STUDY OF THE FACTORS SHAPING EDUCATION POLICY 41

(2016), https://secure.edweek.org/media/influence_study.pdf

[https://perma.cc/346S-HJSX].

30

Id.

31

U.S. DEP’T OF EDUC., RACE TO THE TOP FACT SHEET (2009),

https://www2.ed.gov/programs/racetothetop/factsheet.pdf

[https://perma.cc/35GG-Y3HM].

32

Id.

33

Arne Duncan, Sec’y, Dep’t of Educ., Remarks at The Race to

the Top Program Announcement: The Race to the Top Begins

(July 4, 2009), https://www.ed.gov/news/speeches/race-top-begins

[https://perma.cc/3RD5-RP7A].

34

See generally PAIGE, supra note 9 (noting that VAMs became

required factors for employment decisions).

1112

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[534]

(notwithstanding if a state applied or received RttT

funds) to secure waivers from the penalties that they

would incur for non-compliance with the No Child Left

Behind Act of 2001.

35

NCLB, passed with bipartisan

support in 2001, required 100 percent of students to

attain proficiency in math and reading state

standardized tests.

36

The utopian goal has been widely

criticized as impractical.

37

Nevertheless, the federal

government required states to submit waivers to escape

the punitive measures of non-compliance (e.g.,

intervention of state authorities in the operation of local

schools). More specifically, these waivers buttressed the

core policy drivers of RttT by continuing to incorporate

student test scores as a means to hold teachers

accountable for their “value added,” or lack thereof.

38

The cumulative impact of RttT and federal

waivers on the use of VAMs in teacher evaluations was

substantial. By 2014, 40 states and Washington, D.C.,

35

KEVIN CLOSE ET AL., STATE-LEVEL ASSESSMENTS AND

TEACHER EVALUATION SYSTEMS AFTER THE PASSAGE OF THE

EVERY STUDENT SUCCEEDS ACT: SOME STEPS IN THE RIGHT

DIRECTION 5 (Nat’l Educ. Policy Ctr. ed., 2018),

https://nepc.colorado.edu/sites/default/files/publications/PB%20C

lose-Beardsley-Collins_1.pdf [https://perma.cc/RG4N-B8N2].

36

No Child Left Behind Act of 2001, Pub. L. No. 107-110, §

1001, 115 Stat. 1425 (requiring all students obtain proficiency

in specified test areas) (repealed 2015).

37

See, e.g., Bruce Meredith & Mark A. Paige, Opinion,

Rethinking Federal Role in Education Makes Sense. Trump’s

Plan Does Not, ATLANTA J.-CONST.: GET SCHOOLED (Oct. 3,

2018, 11:15 AM) https://www.myajc.com/blog/get-schooled/

opinion-rethinking-fed-education-role-makes-sense-trump-plan-

does-not/T19cWlKAznnDpcoxmvr1nJ/ [https://perma.cc/S3J4-

B4FW] (characterizing the NCLB goal of proficiency as

unrealistic, especially in light of the lack of support from the

federal government to education and other important public

policy areas that impact education success, like housing and

health care).

38

CLOSE ET AL., supra note 35, at 8.

1213

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[535]

(80%) were using or still developing some type of VAM for

increased teacher accountability purposes.

39

While state

department of education leaders recognized and

encouraged the use of VAMs, they did not develop

support mechanisms and resources to help teachers

understand and subsequently use their VAM-based data

to improve their effectiveness.

40

Put differently,

information from VAMs was not actionable. This

disconnect has been the source of serious contention and

concern about the VAM-based teacher and educational

reform enterprise.

B. Statistical and Practical Controversies

Significant statistical and practical concerns

surround VAMs, and these are best understood with

reference to the professional guidelines that govern

education and psychological professions, the Standards

for Educational and Psychological Testing

41

(hereinafter

“Standards”). These issues include, but are not limited to:

(1) reliability, (2) validity, (3) bias, (4) transparency, and

(5) fairness, with emphasis also on (6) whether VAMs are

being used to make consequential decisions using

concrete (e.g., not arbitrary) evidence, and (7) unintended

consequences. These are discussed below.

1. Reliability

Reliability is the degree to which test- or

measurement-based scores “are consistent over repeated

applications of a measurement procedure (e.g., a VAM)

and hence and inferred to be dependable and consistent”

39

Id.

40

Id. at 14.

41

AM. EDUC. RESEARCH ASS’N, AM. PSYCHOLOGICAL ASS’N &

NAT’L COUNCIL ON MEASUREMENT IN EDUC., STANDARDS FOR

EDUCATIONAL AND PSYCHOLOGICAL TESTING (2014)

[hereinafter STANDARDS].

1314

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[536]

for the individuals (e.g., teachers) to whom the scores

pertain.

42

VAMs are reliable when within-group (same

school or district) VAM estimates of teacher effectiveness

are more or less consistent over time, from one year to

the next, regardless of the type of students and subject

areas teachers teach. Consistency over time is typically

captured using particular statistical tools such as

standard errors, reliability coefficients per se, and

generalizability coefficients, among others.

43

These

situate and make explicit VAM estimates and their

(sometimes sizeable) errors and, importantly, help others

understand the errors that come along with VAM

estimates.

Research has documented serious concerns with

respect to VAM reliability (or intemporal stability).

Indeed, teachers classified as “effective” one year might

have a 25–59% chance of being classified as “ineffective”

the next year, or vice versa, with other permutations

possible.

44

If a teacher who is classified as a “strong”

teacher this year is classified as a “weak” teacher next

year, and vice versa, this casts doubt on the reliability of

VAMs for the purpose of identifying and making high-

stakes decisions regarding teachers. Accordingly, across

VAM, reliability is a hindrance, especially when

unreliable measures are to be used for consequential

purposes like decisions to terminate or deny tenure.

42

Id. at 222–23.

43

Id. at 33.

44

For a comprehensive overview of these concepts, see José

Felipe Martínez et al., Approaches for Combining Multiple

Measures of Teacher Performance: Reliability, Validity, and

Implications for Evaluation Policy, 38 EDUC. EVALUATION &

POL’Y ANALYSIS 738-56 (2016); see also Peter Z. Schochet &

Hanley S. Chiang, What are Error Rates for Classifying

Teacher and School Performance Using Value-Added Models?,

38 J. EDUC. & BEHAV. STAT. 142-71 (2013).

1415

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[537]

2. Validity

Validity is “the degree to which evidence and

theory support the interpretations of test scores for [the]

proposed uses of tests.”

45

It is measured by “the degree to

which all the accumulated evidence supports the

intended interpretation of [the test-based] scores for

[their] proposed use[s].”

46

Put another way, validity asks:

Does the model assess what it is supposed to assess?

47

Accordingly, one must be able to support validity

arguments with quantitative or qualitative evidence that

the data derived allows for accurate inferences.

There are various means to assess validity, but of

particular focus for researchers is validity as it concerns

“concurrent-related evidences.”

48

This helps to assess, for

example, whether teachers who post large and small

45

STANDARDS, supra note 41, at 11.

46

Id. at 14.

47

There are sub areas of validity that have been the subject of

considerable research as it relates to VAMs.

These are: (1) content-related evidence of validity; (2)

concurrent-related evidence of validity; (3) predictive-related

evidence of validity; and (4) consequence-related evidence of

validity. See Michael T. Kane, Validating the Interpretations

and Uses of Test Scores, 50 J. EDUC. MEASUREMENT 1, 2, 8

(2013); see generally Samuel Messick, Validity, 3 J. EDUC.

MEASUREMENT 1, 8–103 (1989). However, while all these

evidences of validity help to support construct-related evidence

of validity, in VAM research most researchers rely on

gathering concurrent-related evidence of validity.

48

E.g., Edward Sloat, Audrey Amrein-Beardsley & Jessica

Holloway, Different Teacher-Level Effectiveness Estimates,

Different Results: Inter-Model Concordance Across Six

Generalized Value-Added Models (VAMs), 30 EDUC.

ASSESSMENT EVALUATION & ACCOUNTABILITY 367, 372 (2018);

see also Pam Grossman et al., The Test Matters: The

Relationship Between Classroom Observation Scores and

Teacher Value Added on Multiple Types of Assessment, 43

EDUC. RESEARCHER 293, 293-303 (2014).

1516

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[538]

value-added gains or losses over time are the same

teachers deemed effective or ineffective, respectively,

over the same period using other independent

quantitative and qualitative measures of teacher

effectiveness. Other measures might include supervisors’

observational scores. If all measures line up and

theoretically validate one another, then confidence in

them as independent measures increases.

49

If all

indicators point in different directions, something may be

wrong with either or both indicators (the VAM tool or

observational scores, or both).

50

Researchers have questioned whether measures

of teacher value-added are substantively related to at

least one other criterion of teacher effectiveness (e.g.,

teacher observational or student survey indicators).

51

Moreover, they question whether the concurrent-related

evidence of validity that does exist is strong or

substantive enough to warrant valid inference-making.

3. Bias

Bias pertains to the validity of the inferences that

stakeholders draw from test-based scores.

52

Specific to

49

Kane, supra note 47, at 6–8, 37, 40, 64.

50

Id.

51

E.g., Morgan S. Polikoff & Andrew C. Porter, Instructional

Alignment as a Measure of Teaching Quality, 36 EDUC.

EVALUATION & POL’Y ANALYSIS 399, 399–401 (2014); Tanner

LeBaron Wallace, Benjamin Kelcey & Erik Ruzek, What Can

Student Perception Surveys Tell Us About Teaching?

Empirically Testing the Underlying Structure of the Tripod

Student Perception Survey, 53 AM. EDUC. RES. J. 1834, 1835,

1837–38 (2016).

52

The Standards define bias as follows: as the “construct

underrepresentation of construct-irrelevant components of test

scores that differentially affect the performance of different

groups of test takers and consequently the . . . validity of

interpretations and uses of their test scores.” STANDARDS,

supra note 41, at 216. Biased estimates, also known as

1617

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[539]

VAMs, unpredictable characteristics (variables outside of

the control of a teacher or school) of students can bias

estimates about teachers’ contributions. Student

characteristics include: students’ individual motivation,

capability to learn, and levels of academic achievement.

53

Because schools do not randomly assign teachers, these

variables are not controlled in a way to mitigate bias.

54

Biased results are quite possible, especially when

relatively homogeneous sets of students (e.g., English

Language Learners (ELLs), gifted and special education

students, or free-or-reduced lunch eligible students) are

non-randomly concentrated into schools, purposefully

placed into classrooms, or both.

Statistical models—even the most sophisticated—

cannot control for such bias.

55

One influential study

illustrated VAM-based bias when it found that a

systematic error as concerning “[t]he systematic over- or

under-prediction of criterion performance” are observed when

said criterion performance varies for “people belonging to

groups differentiated by characteristics not relevant to the

criterion performance” of measurement. STANDARDS, supra

note 41, at 216, 222.

53

See generally Noelle A. Paufler & Audrey Amrein-Beardsley,

The Random Assignment of Students into Elementary

Classrooms: Implications for Value-Added Analyses and

Interpretations, 51 AM. EDUC. RES. J. 328, 328–62 (2014).

54

See, e.g., Charles T. Clotfelter, Helen F. Ladd, & Jacob L.

Vigdor, Teacher-Student Matching and the Assessment of

Teacher Effectiveness, J. HUM. RESOURCES 778, 779–82 (2006)

(noting the various ways teachers are assigned to schools).

Class assignments in schools are historically a function of a

host of factors, including: pressure from parents for particular

class placement and pressure from teachers for placement of

particular students, especially those who may tend to be

considered “high-achieving.” Id. at 781. Additionally,

placement among schools within a district is similarly subject

to other variables, such as housing patterns. Id.

55

See, e.g., Paufler & Amrein-Beardsley, supra note 53, at

335.

1718

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[540]

student’s 5th grade teacher was a better predictor of a

student’s 4th grade growth than was the student’s 4th

grade teacher.

56

The absurdity of that finding raises

serious questions about the ability of VAMs to control for

bias. Notwithstanding, the primary debate raging across

articles concerns whether statistically controlling for

potential bias by using complex statistical approaches to

account for non-random student assignment makes bias

negligible, or rather “strongly ignorable.”

57

4. Transparency

Transparency is defined as the extent to which

something is accessible and understandable.

58

In terms

of VAMs, this relates to the extent to which VAM-based

estimates may not make sense to those receiving the

information. In education, teachers and principals may

not understand the models being used to evaluate their

performance. Because of this, they are generally unlikely

to use the VAM-generated information for formative

purposes (i.e., as a tool to gather information and change

practice as soon as possible).

59

Practitioners often

56

Jesse Rothstein, Student Sorting and Bias in Value-added

Estimation: Selection and Observables and Unobservables, 4

EDUC. FIN. & POL’Y 537, 546–47 (2009); Jesse Rothstein,

Teacher Quality in Educational Production, Q.J. ECON. 175,

210 (2010).

57

Sean Reardon & Stephen Raudenbush, Assumptions of

Value-Added Models for Estimating School Effects, 4 EDUC.

FIN. & POL’Y 492, 496–97 (2009).

58

STANDARDS, supra note 44.

59

Jonathan M. Eckert & Joan Dabrowski, Should Value-Added

Measures Be Used for Performance Pay?, KAPPAN, May 2010,

at 88, 89–90; Rachel Gabriel & Jessica Nina Lester, Sentinels

Guarding the Grail: Value-Added Measurement and the Quest

for Education Reform, 21 EDUC. POL’Y ANALYSIS ARCHIVES 1,

1–30 (2013); Ellen Goldring et al., Make Room Value Added:

Principals’ Human Capital Decisions and the Emergence of

1819

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[541]

describe value-add data reports as confusing, not

comprehensive in terms of the key concepts and

objectives taught, ambiguous regarding teachers’ efforts

at both the student and composite levels, and often

received months after students leave teachers’

classrooms.

For example, teachers in Houston, Texas,

expressed that they are learning little about what they

did effectively or how they might use their value-added

data to improve their instruction.

60

Teachers in North

Carolina reported that they were “weakly to moderately”

familiar with their value-added data.

61

Tennessee

teachers maintained that there was very limited support

or explanation helping teachers use their value-added

data to improve upon their practice.

62

Quite apart from the statistical concerns noted

above, the “black-box” nature of VAMs raises additional

questions in the field. Indeed, the purported strength of

VAMs is that they will improve instruction by providing

a wealth of positive diagnostic information. The models

are supposed to give practitioners useful, actionable

information. Yet, if practitioners have problems

understanding the models, the value (if you will) of VAMs

is greatly diminished. Unfortunately, statisticians that

have developed the models make “no apologies for the

Teacher Observation Data, 44 EDUC. RESEARCHER 96, 96–97

(2015).

60

Clarin Collins, Houston, We Have a Problem: Teachers Find

No Value in the SAS Education Value-Added Assessment

System, 22 EDUC. POL’Y ANALYSIS ARCHIVES 1, 4, 15, 22 (2014).

61

Kim Kappler Hewitt, Educator Evaluation Policy That

Incorporates EVAAS Value-Added Measures: Undermined

Intentions and Exacerbated Inequities, 23 EDUC. POL’Y

ANALYSIS ARCHIVES 1, 11 (2015).

62

See Eckert & Dabrowski, supra note 59, at 90.

1920

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[542]

fact that [their] methods [are] too complex for most of the

teachers whose jobs depended on them to understand.”

63

5. Fairness

General questions of fairness have been raised

concerning the use of VAMs, especially in the context of

high-stakes employment decisions. Fairness is the

impartiality of “test score interpretations for intended

use(s) for individuals from all relevant subgroups.”

64

But

issues of fairness arise when a test or test use impacts

some more than others in unfair or prejudiced, yet often

consequential ways.

65

Fairness issues are amplified as VAMs are

applied in the field. Indeed, VAMs are generally only

directly applicable to teachers who instruct in areas that

are subjected to standardized tests (typically, math and

reading).

66

States and districts can only produce VAM-

based estimates for approximately 30–40% of all

teachers.

67

The other 60–70%, which sometimes includes

entire campuses of teachers (e.g., early elementary and

high school teachers) or teachers who do not teach the

core subject areas assessed using large-scale

standardized tests (e.g., mathematics and

English/language arts), cannot be evaluated or held

accountable using teacher-level value-added data.

68

Importantly, when districts use this information to make

63

Carey, supra note 1, at 13; see also Gabriel & Lester, supra

note 59, at 20.

64

STANDARDS, supra note 41, at 219 (emphasis added).

65

This concern is consistent with the general argument of this

paper. To wit, courts have sustained objections to the use of

VAMs where they violate procedural due process, the basic

“fundamental fairness.” See Cook v. Bennett, 792 F.3d 1294,

1301 (11th Cir. 2015).

66

E.g., Green et al., supra note 13 (noting that the models only

apply to 30–40% of teachers).

67

Id.; see also Gabriel & Lester, supra note 59, at 7.

68

Green et al., supra note 13, at 15, 27–28.

2021

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[543]

consequential, high-stakes employment decisions the

unfairness can have considerable consequences. Some

teachers in certain grades or subject areas experience the

negative or positive consequences of these VAM-based

data more than their colleagues.

69

6. Consequential Use

Assessing the appropriate use of tests must

consider the social and ethical concerns

70

in addition to

more sterile concerns about statistical methodology.

71

The Standards recommend ongoing evaluation of both

the intended and unintended consequences of any test as

an essential part of any test-based system, including

those based upon VAMs.

72

Typically, ongoing evaluation of social and ethical

consequences rests on the shoulders of the governmental

bodies that mandate such test-based policies.

73

In this

case, local and state education departments would be the

agencies in charge of assessing the social costs and

ethical issues associated with the use of VAMs in high-

stakes contexts. This is because they “provide resources

for a continuing program of research and for

dissemination of research findings concerning both the

69

This has formed the basis of substantive due process claims

against school districts. E.g., Cook, 792 F.3d 1294 (agreeing

that the system of Florida that adopted VAM ratings that apply

to all teachers, including those in non-tested subject areas, was

unwise and unfair but upholding it under rational basis test).

70

E.g., Messick, supra note 47, at 8 noting that “[t]he only form

of validity evidence [typically] bypassed or neglected in these

traditional formulations is that which bears on the social

consequences of test interpretation and use.”

71

See also Kane, supra note 47.

72

STANDARDS, supra note 41.

73

Id.

2122

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[544]

positive and the negative effects of the testing

program.”

74

However, this rarely occurs. The burden typically

rests on the research community who must provide

evidence about the positive and negative effects and

explain these effects to external constituencies, including

policymakers. This group must collectively determine

whether VAM use, given the consequences and issues

identified above, warrant the financial, time, and human

resource investments.

75

Local and state departments of

education typically have not (perhaps for political

reasons) acknowledged or sought to examine the

consequences of their policy actions.

7. Intended Consequences

As noted, the primary intended consequence of

VAM use is to improve teaching and help teachers (and

schools/districts) become better at educating students by

measuring and then holding teachers accountable for

their effects on students. The stronger the consequences,

the stronger the motivation leading to stronger intended

effects. Secondary intended consequences include

74

Position Statement on High-Stakes Testing in Pre-K–12

Education, AM. EDUC. RES. ASS’N (2000), http://www.aera.

net/About-AERA/AERA-Rules-Policies/Association-Policies/

Position-Statement-on-High-Stakes-Testing [https://perma.cc/

969R-8RMR]; see also STANDARDS, supra note 41.

75

Arguably, some “reformers” assume that their ideas are

inviolable and opposition is simply a reflection of a recalcitrant

system, at best, or teachers’ unions at worst. See e.g., Michelle

Rhee, Opting Out of Standardized Tests? Wrong Answer,

WASH. POST (Apr. 4, 2014) https://www.washingtonpost.com/

opinions/michelle-rhee-opting-out-of-standardized-tests-wrong-

answer/2014/04/04/37a6e6a8-b8f9-11e3-96ae-f2c36d2b1245_

story.html [https://perma.cc/JD5L-6APK] (suggesting that an

organization she founded always keeps students’ interests first

and also implying that teachers’ unions do not, especially in

regards to the use of standardized tests).

2223

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[545]

replacing the nation’s antiquated teacher evaluation

systems which have been criticized by all corners of the

education research.

76

Yet, in practice, research evidence supporting

whether VAM use has led to these intended consequences

is suspect. Indeed, numerous studies have noted that

there is a lack of evidence linking VAMs to improved

teacher quality. First, VAM estimates have not produced

useable information for teachers about how teachers,

schools, and states might improve upon their instruction,

or how all involved might collectively improve student

learning and achievement over time.

77

Likewise, recent

evidence suggests the use of VAMs has not led to

improvements in teacher evaluation systems.

78

In sum,

strong evidence suggest that VAMs have not promoted

the intended benefits of providing actionable information

for teachers to improve instruction or teacher evaluation

systems.

8. Unintended Consequences

Simultaneously, ethical and research standards

require that the use of testing data must also recognize

VAMs’ unintended consequences.

79

Policymakers must

present evidence on whether VAMs cause unintended

effects and if those effects outweigh their intended

impact. This means that the educative goals at issue (e.g.,

increased student learning and achievement) should be

76

See, e.g., DANIEL WEISBERG ET AL., THE WIDGET EFFECT

(2009) (criticizing the evaluation models that treat teachers as

“widgets” and fail to recognize their differences and value).

77

Henry Braun, The Value in Value-Added Depends on the

Ecology, 44 EDUC. RES. 2 (2015); Corcoran, supra note 12.

78

Matthew A. Kraft & Allison Gilmour, Revisiting the Widget

Effect: Teacher Evaluation Reforms and the Distribution of

Teacher Effectiveness, 46 EDUC. RES. 234–49 (2017).

79

See AM. EDUC. RES. ASS’N, supra note 74; STANDARDS, supra

note 41.

2324

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[546]

examined alongside the positive and negative

implications for both the science and ethics of using

VAMs in practice.

80

Researchers have produced an exhaustive list of

these unintended consequences.

81

First, the use of VAMs

leads to teacher isolation whereby teachers “literally or

figuratively ‘close their classroom door’ and revert to

working alone.”

82

Sadly, teacher isolation is at cross-

purposes with collaboration among colleagues,

something that is an essential part to improving

schools.

83

Second, the use of high-stakes testing causes

teachers to leave the profession and avoid high-needs

schools that most need the best teachers.

84

Because of the

very nature of VAM-based teacher evaluation which

rewards testing achievement, teachers avoid teaching

high-needs students. This is rational: if they perceive

themselves to be at greater risk of teaching students who

may be more likely to hinder their value-added

85

they

“seek safer [grade level, subject area, classroom, or

school] assignments, where they can avoid the risk of low

VAMS scores.”

86

Of course, the flip side of this, teachers

avoid challenging assignments or leave the profession all

together.

87

Third, and most troubling perhaps, is the

dehumanization that high-stakes testing causes. Indeed,

under such regimes, teachers view and react to students

as “potential score increasers or score compressors,” not

children.

88

80

Messick, supra note 47.

81

See, e.g., Susan Moore Johnson, Will VAMS Reinforce the

Walls of the Egg-Crate School?, 44 EDUC. RES. 117–26 (2015).

82

Id. at 120.

83

Id.

84

Id.

85

Id.

86

Id.

87

Id.

88

Hewitt, supra note 61, at 32.

2425

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[547]

III. The Cases

This section discusses cases where the central

issue was the role VAMs played in adverse employment

actions. It first traces those cases related to arguments

grounded in the substantive Due Process and Equal

Protection clauses of the U.S. Constitution. It then

highlights the series of cases where plaintiffs challenged

the use of VAMs on jurisdictional grounds (i.e., that a

particular government agency superseded its authority

or other statutes in requiring the use of VAMs). The final

subsection assesses the cases where process arguments

have been advanced by the plaintiffs.

A. Federal Substantive Due Process Rights &

Equal Protection Arguments: VAMs May Be

Unwise But Still Constitutional

1. Cook v. Bennett

In 2015, a group of teachers challenged Florida’s

use of student test scores to evaluate their job

performance.

89

As part of that state’s application for Race

to the Top funds, the state legislature enacted a new

teacher performance evaluation regimen in their law of

teacher evaluation.

90

Specifically, the legislature

required that at least 50% of a teacher’s performance

evaluation be based on student growth on state

standardized tests in math and English (the Florida

Comprehensive Assessment Test, or FCAT).

91

The

remaining portion of the teacher’s evaluation was

89

Cook v. Bennett, 792 F.3d 1294 (11th Cir. 2015).

90

FLA. STAT. ANN. § 1012.34 (West 2011).

91

Id. A teacher’s final evaluation was based on the student test

growth (the VAM rating) on the FCAT (50%) and a VAM rating

based on the school’s contribution to a student’s growth. Cook,

792 F.3d at 1297.

2526

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[548]

calculated based on a school-wide VAM rating.

92

Not all

students take the math and English tests. In fact,

students took the English FCAT exam in grades 3

through 10 and the mathematics FCAT exam in grades 3

through 8.

Under the evaluation law, Florida teachers fell

under one of three types of categories.

93

“Type A” teachers

were those that taught the tested subjects (math and

English) in the years that the FCAT was administered

for those subjects. In effect, as the Eleventh Circuit Court

of Appeals noted, the model adopted by the state

education commissioner only worked as designed in

evaluating teachers of English in grades 4 through 10

and math in grades 4 through 8.

94

The rest of Florida’s

public school teachers fell into two groups. “Type B”

teachers taught students in grades 4 through 10, but in

subjects other than English or math.

95

“Type C” teachers

taught students in grades below 4 or above 10 or their

students did not take standardized tests (e.g., art).

96

The thrust of the legal problem, according to the

teachers challenging the evaluation scheme, related to

the evaluation of Type B and C teachers. As a practical

matter, school districts evaluated Type B teachers using

student FCAT scores for math and English,

notwithstanding the fact that those teachers did not

instruct the students in those subjects.

97

Type C teachers’

VAM scores were calculated based on school-wide FCAT

scores derived from student scores in subjects they did

not teach.

98

Under this scenario, for example, a second

92

Id.

93

The district court designated the classification set forth in

this discussion and, for ease of reference, the appeals court

adopted it in its analysis.

94

Cook, 792 F.3d at 1297.

95

Id.

96

Id.

97

Id.

98

Id. at 1298.

2627

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[549]

grade art teacher’s VAM rating could be calculated based

on a 3rd grade student’s math and English test growth.

The plaintiff-teachers argued that the evaluation

laws violated the Substantive Due Process and Equal

Protection clauses of the Fourteenth Amendment.

99

Because no fundamental right was at issue, the court

applied the rational basis test to determine whether the

government’s actions had a legitimate purpose and

whether the chosen methods were rationally related to

that purpose.

100

Ultimately, the court sided with the

government, finding that there was a legitimate interest

which was to “increas[e] student academic performance

by improving the quality of instructional, administrative,

and supervisory services in the public schools of the

state.”

101

The court also concluded that there was a rational

relationship between this purpose and the use of the

FCAT VAMs.

102

The court concluded—and the plaintiffs

conceded at oral argument—that the government “could

have reasonably believed that (1) a teacher can improve

student performance through his or her presence in a

99

U.S. CONST. amend. XIV provides, in relevant part, that: “No

state shall . . . deprive any person of life, liberty, or property,

without due process of law; nor deny to any person within its

jurisdiction the equal protection of the laws.”

100

Cook, 792 F.3d at 1300 (citing Fresenius Med. Care

Holdings, Inc. v. Tucker, 704 F.3d 935, 945 (11th Cir. 2013);

FCC v. Beach Comm’ns, Inc., 508 U.S. 307, 314 n.6 (1993)).

101

Id. at 1301 (citing FLA. STAT. § 1012.34(1)(a) (2013)); see also

Houston Fed’n of Teachers, Local 2415 v. Houston Indep. Sch.

Dist., 251 F. Supp. 3d 1168, 1182 (S.D. Tex. 2017) (concluding

that plaintiff’s substantive due process claims failed because

“[e]ven accepting plaintiffs’ criticisms at face value, the loose

constitutional standard of rationality allows governments to

use blunt tools which may produce only marginal results.”).

The Houston court, however, ruled that the plaintiff’s

allegations of procedural due process violations survived

summary judgment dismissal. Id. at 1183.

102

Cook, 792 F.3d at 1301.

2728

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[550]

school and (2) the FCAT VAM can measure those school-

wide performance improvements, even if the model was

not designed to do so.”

103

To be sure, both the appellate

and district courts criticized the chosen model.

104

The court similarly applied the rational basis

review to dismiss the equal protection claims.

105

Under

this claim, the teachers argued that the evaluation law

created a separate class of teachers: “those whose

evaluations are based on student growth data for

students assigned to the teacher in the subjects taught

by the teacher, and those whose evaluations are based on

student growth data for students and/or subjects they do

not teach.”

106

However, because this classification did not

implicate a suspect class (e.g., race, gender) rational basis

applied and, under the same line of reasoning of the

substantive due process claim, the equal protection claim

was dismissed.

107

103

Id.

104

Id. at 1301 (noting that “[w]hile the FCAT VAM may not be

the best method—or may even be a poor one—for achieving this

goal, it is still rational to think that the challenged evaluation

procedures would advance the government's stated purpose.”).

The district court in finding for the government concluded, in

dicta, that “[t]he unfairness of the evaluation system as

implemented is not lost on this Court” and that “this Court

would be hard-pressed to find anyone who would find this

evaluation system fair to non-FCAT teachers, let alone be

willing to submit to a similar evaluation system.” Cook v.

Stewart, 28 F. Supp. 3d 1207, 1215–16 (N.D. Fla. 2014), aff’d

sub nom. Cook v. Bennett, 792 F.3d 1294 (11th Cir. 2015).

105

Cook, 792 F.3d at 1301.

106

Stewart, 28 F. Supp. 3d at 1213.

107

Cook, 792 F.3d at 1301 (citing City of Cleburne v. Cleburne

Living Ctr., 473 U.S. 432, 440 (1985) (internal citations

omitted)).

2829

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[551]

2. Trout v. Knox County Board of Education

Plaintiff teachers in Trout v. Knox County Board

of Education brought substantive and procedural due

process claims based on their evaluations that used

VAMs for purposes of teacher evaluations.

108

In Trout,

the teachers challenged the use of Tennessee’s VAM

rating (the EVAAS). Specifically, two teachers (one a

math teacher and the other a science teacher) were

denied bonuses based on their VAM rating.

109

Both teachers involved (Trout and Taylor,

respectively) argued that the use of the VAMs was

arbitrary and capricious and, therefore, could not be

sustained under the rational basis test. Echoing

criticisms of the reliability and validity of VAMs,

110

the

plaintiffs argued that the VAMs were too imprecise to be

used to assess their effectiveness

111

and therefore

violated substantive due process rights.

The federal district court ruled in favor of the

government. It began its analysis by noting that the

plaintiffs failed to state a substantive due process

claim.

112

By way of background, a substantive due

process claim requires that there be some property

interest at stake. Here, under an analysis of property

interest rights in the Sixth Circuit Court of Appeals, the

court concluded that the plaintiffs did not have an

interest in bonuses.

113

For sake of argument, however, the court went on

to apply the rational test and found that the

government’s use of the VAMs in this case satisfied that

108

Trout v. Knox Cty. Bd. of Educ., 163 F. Supp. 3d 492, 494

(E.D. Tenn. 2016).

109

Id.

110

See supra Part I.

111

Trout, 163 F.Supp. 3d at 500.

112

Id.

113

Id at 501.

2930

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[552]

test.

114

The use of VAMs to identify and support

instruction to lead to increased student achievement was

not in dispute as a legitimate government interest.

115

The

plaintiffs, similar to Cook v. Bennett,

116

argued that

various statistical infirmities made reliance on VAMs

irrational, however.

117

In rejecting these arguments, the

district court noted, among other things, that there was

no legal authority requiring the court to apply a standard

with respect to the confidence level of a test.

118

To be sure, the Trout court was sympathetic to the

plaintiffs’ complaints regarding the statistical

inadequacy of the VAMs.

119

Yet, at bottom, there was no

legal authority that required the court to apply a certain

level of statistical confidence with respect to the

government’s chosen method for purposes of measuring

teacher effectiveness.

120

114

Id.

115

Id. at 503.

116

Cook, 792 F.3d at 1297.

117

For example, the plaintiffs took issue with the confidence

level of the statistical test (68%). Trout, 163 F.Supp. 3d at 503.

118

Id.

119

Id. at 504 (writing that the Court notes that Plaintiffs'

criticisms of the statistical methods of TVAAS are not

unfounded.)

120

Id. at 504–05. The court wrote that while “[p]laintiffs

bemoan the statistical imprecision of TVAAS,” no legal

authority “support[s] the proposition that the United States

Constitution requires legislative decision making regarding

the use of statistics to require ‘statistically significant’ results.

Absent controlling authority to the contrary, this Court refuses

to extend the rational basis test this far—where no suspect

class or fundamental right is at issue, the Constitution

requires a rational basis, not a statistically significant basis,

for the law in question.” Id.

3031

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[553]

3. Wagner v. Haslam

Another set of teachers in Tennessee challenged

the use of VAMs in Wagner v. Haslam.

121

Pursuant to

state and district evaluation policies, teachers of non-

tested subjects were evaluated based on school wide data

of student performance on test subjects.

122

Similar to

Cook v. Bennett, the teachers claimed that this practice

violated the substantive Due Process and Equal

Protection clauses of the U.S. Constitution.

123

The federal court, however, echoing the decisions

of other federal courts assessing similar claims, rejected

the teachers’ arguments. With respect to the substantive

due process claim, the court enumerated several reasons

why the policies at issue passed constitutional muster. It

noted that “the State Board could rationally believe that

a school-wide score provides some measure (albeit a crude

one) of evaluating an individual teacher’s

performance.”

124

The court also added that the legislature

had continued to amend its teacher evaluation laws to

address some of the concerns raised by the plaintiffs.

125

While the Wagner court concluded that the use of

VAMs was constitutional, it expressed concerns over

fairness similar to those found in Cook and Trout. Indeed,

the Wagner court wrote that although the current

evaluation processes may produce “unfair results” for

certain teachers, it did not rise to the level of being

irrational.

126

At the same time, the court was explicit

about its use of judicial restraint, especially with respect

to education policy questions. Indeed, subject to limited

121

112 F. Supp. 3d 673 (M.D. Tenn. 2015).

122

Id.

123

See Cook, 792 F.3d at 1297.

124

Wagner, 112 F. Supp. 3d at 694 (emphasis added).

125

Id.

126

Id. at 695.

3132

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[554]

exceptions,

127

the states have “unfettered”

128

discretion to

regulate education, and state legislators can make both

“excellent decisions and terrible decisions,” so long as

there is some “modicum of rationality.”

129

Put another

way, a court may disagree with the policy choice of a

governing body, but it is not the role of the courts to

second-guess policy judgments of elected officials.

130

4. Matter of Lederman v. King

The one extant case that succeeded in

demonstrating the government’s use of VAMs rose to the

high bar of arbitrary and capricious is found in Matter of

Lederman v. King.

131

In this case, a well-regarded

veteran teacher who had previously had positive

evaluations received an “ineffective” review under New

York’s new evaluation system.

132

This new system

required the use of VAMs. The teacher, Sheryl

Lederman, submitted “overwhelming” and ample

evidence from experts in the field that the court

concluded satisfied her burden in the record before the

court.

133

In contrast, the court noted that state defendants

left numerous statistical issues unaddressed, including

the potential VAM biases against teachers with high-

127

Some exceptions, of course, would include the use of race to

segregate schools. See generally Brown v. Bd. of Educ., 373 U.S.

483 (1954).

128

Id. at 692.

129

Id. at 693.

130

But see PAIGE, supra note 9 (arguing that for scholars of

educational policy the appropriate question is determining

which institutions—courts, legislatures, or markets—have the

capacity to best address a particular policy need in education,

like teacher evaluation).

131

Lederman v. King, 54 Misc. 3d 886 (N.Y. Sup. Ct. 2016).

132

Id. at 888.

133

Id. at 897–98.

3233

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[555]

performing students.

134

Critically, how Mrs. Lederman’s

scores swung so wildly from the second-highest level of

effective all the way to the lowest level of ineffective in a

single year with statistically similar scoring students,

among others.

135

In sum, the court was constrained to the

record before it and, on that evidence, found Ms.

Lederman satisfied her burden.

136

B. Legislative State Agency Authority

Questioned

Litigants have also challenged the use of VAMs in

teacher evaluation on jurisdictional grounds. In these

cases, organizations (typically unions) have argued that

a legislative or executive agency exceeded their

respective authority in requiring VAMs for purposes of

evaluation or high-stakes employment decisions. These

cases are discussed below.

1. Leff v. Clark County School District

At issue in Leff v. Clark County School District

was the constitutionality of changes made to state laws

governing teacher evaluation and post-probationary (or

continuing contract) status.

137

By way of background, up

until 2011, a teacher who completed a probationary

period of employment (three years) and was subsequently

rehired by a school district received post-probationary

status.

138

Post-probationary status conferred to a teacher

certain procedural protections should they face

termination and required that termination be “for

134

Id.

135

Id.

136

Id. at 898.

137

Leff v. Clark Cnty. Sch. Dist., 210 F. Supp. 3d 1242, 1244–

45 (D. Nev. 2016).

138

Id. at 1245.

3334

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[556]

cause.”

139

In contrast, probationary teachers could be

non-renewed without cause and did not have similar

procedural protections.

In 2011, the Nevada legislature changed its

teacher evaluation and post-probationary statutes. In

particular, it required that VAMs be used as part of

teacher evaluations. The legislature also required that if

a post-probationary teacher achieved two negative

evaluations, they would revert back to probationary

teacher status.

140

Put another way, a teacher could lose

the protections (e.g., a teacher’s termination could only

be for “cause”) because of the changes to the state

statutes.

Teachers contested the changes based on the

federal Constitution’s Contracts Clause.

141

That clause,

in relevant part, reads as follows: “No State shall . . .

pass any . . . Law impairing the Obligation of

Contracts[.]”

142

In essence, the post-probationary

teachers claimed that they had a binding contract with

the state once they achieved post-probationary status. In

exchange for meeting the demands of satisfactory

performance, the state had agreed to give them

procedural protections and the only grounds for

termination were cause. By passing the 2011 amendment

that tied teacher contract status to teacher evaluations

(that incorporated VAMs), the state breached the

contract, something not permitted under the U.S.

Constitution.

The federal court declined to adopt the teachers’

position and held that the statute prior to 2011 did not

create a contractual obligation between the state and

teachers. In its analysis, the court determined that there

is a strong presumption in law against the idea that a

139

Id.

140

Id.

141

Id. at 1244.

142

U.S. CONST. art. I, § 10.

3435

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[557]

legislative action creates a private contract.

143

Absent

any expression of the legislature that they were creating

a contract, it is generally assumed that typical legislative

activity simply reflects a policy determination that can be

changed.

144

Accordingly, the teachers’ claim that the

state legislature exceeded its authority with the

statutory amendments failed.

2. Stapleton v. Skandera

In Stapleton v. Skandera, teachers challenged the

use of VAMs in teacher evaluation on several

jurisdictional grounds related to statutory and agency

authority.

145

By way of brief background, the New Mexico

legislature attempted—but failed—to make several

amendments to its existing teacher evaluation laws in

2012. Notwithstanding this, the New Mexico Department

of Education Secretary (through the Department)

promulgated new regulations relative to the evaluation

of teachers.

146

The teachers sought judicial relief in that

the court would suspend the use of the regulations.

147

The teachers argued that the Secretary exceeded

her authority—that, in effect, she acted in a legislative

capacity. They raised particular objection to the

incorporation of VAMs in teacher evaluation, arguing

that such a move could only be done by way of legislative

action because it represented a shift in public policy

under exclusive legislative purview.

148

However, the New

Mexico Court of Appeals sided with the Department on

143

Leff, 210 F. Supp. 3d at 1246–47 (citing Nat’l R.R. Passenger

Corp. v. Atchinson Topeka and Santa Fe Ry. Co., 470 U.S. 451,

465–66.)

144

Id.

145

Stapleton v. Skandera, 346 P.3d 1191, 1194 (N.M. App.

2015).

146

Id. at 1193 (citing N.M. CODE R. § 6.69.8).

147

Id.

148

Id. at 1194.

3536

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[558]

this issue. It noted that the enabling statute required

only that the Department enact evaluation regulations

that were “uniform statewide” and “highly objective.”

149

Accordingly, the legislature left the Secretary “broad

authority” to enact regulations reflecting these

requirements and, in the view of the court, including

VAMs in teacher evaluation protocol did not exceed her

authority.

150

The teachers in Stapleton raised other claims

related to agency authority. In particular, they raised two

additional objections. They contended the new

departmental regulations permitted “assistant

principals” to observe teachers which violated the state

evaluation law that only gave such authority to

“principals.”

151

Similarly, they argued that the provisions

in the regulations that exempted charter schools from

coverage of the evaluations violated the state law

requirement that the Department enact a system of

“uniform” evaluation.

152

The court of appeals rejected both of the

arguments. With respect to the first claim (that only

principals could observe teachers), the court read the

state statute as allowing others to observe teachers,

including assistant principals. The court wrote, “We

agree with the district court that the regulation does not

necessarily conflict with the statute because the statute

‘mandates the participation of school principals [but]

does not limit the persons who may [also] observe

[teachers].’”

153

Regarding the claim that the regulations

inappropriately exempted charter schools, the state court

of appeals noted that the state Charter School Act

specifically allowed the Department to waive certain

149

Id. at 1195 (citing N.M. STAT. ANN. § 22-10A-19(A) (1978)).

150

Id.

151

Id. at 1196.

152

Id.

153

Id. (alterations in original).

3637

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[559]

regulations normally applicable to public schools.

154

Because the teachers could not cite to any other legal

authority that suggested the waiver was not permitted

under the Charter School Act, this theory was also

rejected.

155

3. Louisiana Federation of Teachers v. State

In Louisiana Federation of Teachers v. State, a

teacher’s union challenged Louisiana’s enactment,

amendment, and repeal of multiple state laws related to

public education, including those related to teacher

evaluation requirements.

156

During the 2012 legislative

sessions, the state legislature amended and re-enacted

nine different statutes, enacted two new distinct

statutes, and repealed twenty-eight statutes all related

to education.

157

The plaintiffs alleged that these actions, which all

occurred through one legislative act, violated the state

constitution’s “single object” requirement.

158

That

requirement stipulates that the legislature enacts bills

that have “one object” and that various pieces of a bill

must have a relationship to one another.

159

The teachers

argued that the bill contained unrelated subjects, such as

the changes to teacher evaluation, reduction in force

issues, rules governing contracts with superintendents,

among others.

160

Louisiana’s supreme court rejected the plaintiffs’

arguments.

161

The court began its assessment by noting

154

Id.

155

Id. at 1196–97.

156

La. Fed’n of Teachers v. State, 171 So. 3d 835, 841 (La.

2014).

157

Id.

158

Id. at 838.

159

Id. at 841.

160

Id. at 842.

161

Id. at 851.

3738

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[560]

that there is a general presumption that a legislature’s

acts satisfy the “one object” rule.

162

It also noted that the

purpose of the rule was to prevent “logrolling,” or the

practice of packaging many measures into one bill

because any of those measures, alone, would not pass the

legislature.

163

The court noted that under such a “grave

and palpable” scenario, the legislature would violate the

single object rule.

164

Yet, in this case, the court concluded

that the object of the act at issue “is improving

elementary and secondary education through tenure

reform and performance standards based on

effectiveness.”

165

The court concluded that various

components of that bill could be broadly related to this

objective.

166

4. Robinson v. Stewart

Another Florida case, Robinson v. Stewart,

167

also

involved a challenge to the authority of the state Board

of Education to implement teacher evaluation

regulations using VAMs.

168

In Robinson, the plaintiffs

sought to declare the 2011 Student Success Act

unconstitutional on the grounds that it impermissibly

delegated legislative control over public education to the

executive branch.

169

The act revised teacher evaluation

procedures and required the use of “student learning

growth measures” (or VAMs) to evaluate teachers and

make significant employment decisions, such as

tenure.

170

The act left it to the Department of Education

162

Id. at 845.

163

Id. at 845–46.

164

Id. at 851.

165

Id. at 850.

166

Id.

167

161 So. 3d 589 (Fla. Dist. Ct. App. 2015).

168

Id.at 590–91.

169

Id.

170

Id. at 591.

3839

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[561]

Commissioner (the executive branch) to develop the

formula to achieve these goals

171

and required the use of

standardized test scores.

172

The Florida District Court of Appeals rejected the

plaintiffs’ argument that the legislature, in requiring the

Commissioner to develop the formula, violated the non-

delagability doctrine of the state constitution that

ensures a separation of powers.

173

Its analysis noted that

the plaintiffs carried a high burden of proof: that the

legislature’s action violated the doctrine “beyond a

reasonable doubt,” the highest standard of proof under

the law.

174

The court further interpreted the act as simply

requiring the Commissioner to provide technical

implementation support, as opposed to allowing the

executive to make policy determinations.

175

5. Filed but not Adjudicated

Another case that deserves some attention as it

also related to a claim that a state agency exceeded its

authority by incorporating VAMs in evaluating teachers.

In Texas Teachers Association v. Texas Education Agency,

the Texas Department of Education adopted teacher

171

Id.

172

Id. at 592.

173

Id. at 590–91.

174

Id. at 591.

175

Id. at 592. But see id. at 597 (Benton, J., dissenting) (noting

that the legislature “has conferred on the State Board of

Education power to designate some of them—perhaps nearly

all of them—professionally ‘unsatisfactory,’ and therefore,

among other things, subject to being laid off, for reasons that

are so unclear and indefinite that the Legislature has

abandoned its responsibility to set public policy in this

important area, and delegated legislative authority it should

have exercised itself to the State Board of Education, an

executive branch agency.”)

3940

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[562]

evaluation regulations requiring the use of VAMs.

176

Numerous plaintiffs, including teachers’ unions, sought

to enjoin the use of VAMs on the grounds that the

regulations exceeded the power vested in the state

Department of Education.

177

The case settled and the

state ultimately agreed to eliminate the required use of

VAMs in teacher evaluation regulations.

178

In New Mexico ex rel Stewart v. New Mexico

Public Education Department, a group of plaintiffs

consisting of legislators, unions, and teachers filed a

complaint on the grounds that the state Department of

Education improperly infringed other state laws when it

promulgated its teacher evaluation regulations.

179

Plaintiffs argued that the School Personnel Act provides

for the processes associated with teacher evaluation and

termination.

180

Similarly, plaintiffs allege that the Department’s

regulation conflicts with New Mexico’s Public

176

Sean Collins Walsh, Union Sues to Block Texas Teacher

Evaluation Change, AUSTIN AM.-STATESMAN, (Aug. 13. 2016),

https://www.statesman.com/news/20160813/union-sues-to-

block-texas-teacher-evaluation-change [https://perma.cc/MQ

C2-FATW].

177

Id.

178

Melissa B. Taboada, Lawsuit Settled: Texas Teacher

Appraisals Won’t Be Tied to STAAR Scores, AUSTIN AM.-

STATESMAN (last updated Sept. 25, 2018),

https://www.statesman.com/news/20170504/lawsuit-settled-

texas-teacher-appraisals-wont-be-tied-to-staar-scores

[https://perma.cc/XP3C-H2WB].

179

Complaint, State ex rel Stewart v. N.M. Pub. Educ. Dep’t,

No. D-101-CV-2015-00409 (N.M. 1st Jud. Dist. Feb. 13, 2015),

https://www.aft.org/sites/default/files/nm-complaint-

teacherevals_1114.pdf [https://perma.cc/7T99-FG89]. The

plaintiffs also claim substantive and procedural due process

violations.

180

See e.g., N.M. STAT. ANN. § 22-10A-19(D) (2010) (providing

that evaluations should be determined in part by how well

professional development was carried out).

4041

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[563]

Employment Bargaining Law (the state’s enabling

collective bargaining statute) that governs “the terms

and conditions of employment.”

181

More specifically, that

law provides that local school districts must negotiate

terms and conditions of employment with the

representative union.

182

The case is pending with various

motions before the court.

183

C. Process & “Fundamental Fairness” Cases

1. Houston Federation of Teachers

A group of Houston teachers sought declaratory

and injunctive relief in the case of Houston Federation of

Teachers v. Houston Independent School District.

184

At

issue for the court was the constitutional protections

afforded teachers in the instance where the Houston

public school districts used VAMs to rate and make

employment decisions for its teachers.

185

The Houston

Independent School District (HISD) had contracted with

a third-party vendor who had created certain algorithms

to classify and rate teachers based on their students’ test

performance.

186

This third party vendor, citing trade

secrecy, refused to reveal the algorithms when they were

requested for review by the teachers.

187

Therefore,

teachers who faced adverse employment consequences

181

Complaint at 31, Stewart, No. D-101-CV-2015-00409.

182

See generally N.M. STAT. ANN. § 10-7E-17 (New Mexico’s

Public Employment Labor Relations Statute).

183

See Motion for Summary Judgment Filed in New Mexico

Teacher Evaluation Lawsuit, (Feb. 13, 2018), http://www.

krwg.org/post/motion-summary-judgment-filed-new-mexico-

teacher-evaluation-lawsuit [https://perma.cc/R8CU-DYHN].

184

Houston Fed’n of Teachers, Local 2415 v. Houston Indep.

Sch. Dist., 251 F. Supp. 3d 1168, 1174 (S.D. Tex. 2017).

185

Id. at 1171.

186

Id.

187

Id. at 1172.

4142

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[564]

could not review the underlying formulas that

contributed to these decisions.

188

The teachers claimed that the use of the value

added models constituted violation of the substantive and

procedural due process clauses of the Constitution.

189

Repeating a line of reasoning in Cook v. Bennett, and

other cases, the federal district court ruled that the

district’s use of VAMs did not amount to a substantive

due process violation.

190

The court concluded the

following: “Even accepting plaintiffs’ criticisms at face

value, the loose constitutional standard of rationality

allows governments to use blunt tools which may produce

only marginal results. HISD’s motion for summary

judgment on this substantive due process claim is

granted.”

191

Yet the court found in favor of the plaintiffs’

procedural due process claims.

192

The court’s analysis is

instructive because it relied heavily on procedural due

process as ensuring fundamental fairness.

193

The court

wrote:

“[The] purpose of procedural due process is to

convey to the individual a feeling that the

government has dealt with him fairly, as well as

to minimize the risk of mistaken deprivations of

protected interests.” [] In short, due process is

designed to foster government decision-making

that is both fair and accurate.

194

188

Id. at 1172–73.

189

Id.

190

Id. at 1181–82.

191

Id. at 1182.

192

Id. at 1180.

193

Id.

194

Id. at 1176 (alteration in original) (quoting Carey v. Piphus,

435 U.S. 247, 262 (1978)).

4243

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[565]

The court then listed the factors required for procedural

due process to be satisfied in the case of a teacher

termination in Texas.

195

Of particular note was that a

teacher facing termination must “be advised of the cause

for his termination in sufficient detail so as to enable him

to show any error that may exist.”

196

Teachers contended—and the court agreed—that

they were not being afforded due process protections

because the school district violated the requirement that

afforded a teacher “sufficient detail” to show that there

may be an error in the government’s decision.

197

Because

the district’s third party vendor would not release the

underlying formulas, teachers could not possibly assess

the accuracy of the district’s value-added rating.

198

The court listed numerous potential errors that

could be revealed if inspection of the formulas was

permitted.

199

As the court stated: “The [] score “might be

erroneously calculated for any number of reasons,

ranging from data-entry mistakes to glitches in the

computer code itself. . . . HISD has acknowledged that

mistakes can occur in calculating a teacher’s EVAAS

score . . . .”

200

The court was troubled by the district’s

stipulation that it could not correct a single teacher’s

score, even if an error was found, because correcting one

score would alter the results of all other teachers.

201

195

Id.

196

Id. The court also noted that a teacher facing termination

must be afforded: “the names and testimony of the witnesses

against him; [] a meaningful opportunity to be heard in his own

defense within a reasonable time; [] and a hearing before a

tribunal that possesses some academic expertise and an

apparent impartiality toward the charges.” Id. (citing Ferguson

v. Thomas, 430 F.2d 852, 856 (5th Cir. 1970).

197

Id. at 1176–77 (citing Levitt v. Univ. of Tex. at El Paso, 759

F.2d 1224, 1228 (5th Cir. 1985)).

198

Id.

199

Id. at 1177.

200

Id.

201

Id. at 1178.

4344

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[566]

Indeed, it is worth recalling that value added

scores are comparative in nature, assessing one teacher

against others.

202

This means that, if one teacher’s score

is adjusted for an error, it alters all others.

203

The court

characterized the underlying foundation of the VAM

ratings as built upon a “house of cards.”

204

Accordingly, it

denied the school district’s summary judgment claim

with respect to procedural due process.

205

2. Washington Teachers’ Union v. D.C. Public

Schools

The collective bargaining forum has also been

another forum wherein teachers have successfully

appealed the use of VAMs in teacher evaluations. By way

of background, collective bargaining agreements (CBAs)

provide for a process (grievance arbitration), to redress

violations of the contract. This arbitration process can be

important, especially when a contract calls for certain

specifications concerning how teacher evaluations can be

conducted. Indeed, districts’ decisions to non-renew or

terminate a teacher for performance have been called

into question because a district fails to follow

contractually mandated processes.

206

With some limited

202

Id. at 1172.

203

Id. at 1177.

204

Id. at 1178.

205

Id. at 1180. To be sure, procedural due process claims made

in Wagner v. Haslam, see supra notes 121129 and

accompanying discussion, did not survive. However, at issue in

that case was whether the teachers' bonuses could be linked to

their VAM scores. Wagner v. Haslam, 112 F. Supp. 3d 673, 688

(M.D. Tenn. 2015). In that context, the court concluded that

bonuses were not a property interest sufficient to trigger due

process protections. Id. at 698.

206

See, e.g., Dennis Yarmouth Teachers v. Dennis Yarmouth

Reg’l Sch. Dist, 360 N.E.3d 883, 884–885 (1977) (reversing a

school district’s decision to non-renew a probationary teacher

4445

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[567]

exceptions, scholarship has omitted consideration of the

value and importance of collective bargaining

agreements in relation to legal challenges to the use of

VAMs in teacher evaluations.

207

Cases emerging from Washington, D.C., illustrate

this theme. In Washington, a teacher’s union grieved the

public district’s performance ratings based on VAMs of

hundreds of teachers. As an initial matter, the school

district challenged whether the issue could, in fact, be

subject to the grievance arbitration procedures in the

contract. Indeed, as a general matter, disputes are

subject to the grievance process only if both parties

agreed to arbitrate the dispute under the CBA.

208

In Washington Teachers’ Union, a lower court had

concluded that the district’s final evaluation decisions

made under the evaluation systems were not arbitrable

but the district’s use of evaluation procedures under the

collective bargaining was, in fact, arbitrable.

209

Put

another way, the parties did not, under the CBA, agree

to arbitrate disputes over the judgment of the teachers’

final performance, but they did agree to arbitrate

whether or not the evaluation procedures outlined were

because school district violated terms of the collective

bargaining agreement that specified evaluation processes).

207

But see PAIGE, supra note 9, at 63–73 (arguing the use of

VAMs is susceptible to the grievance arbitration process and

the failures of VAMs to accurately assess teacher effectiveness

could be remedied through the collective bargaining process.);

see also Mark A. Paige, Applying the Paradox Theory: A Law

and Policy Analysis of Collective Bargaining Rights and

Teacher Evaluation Reform From Selected States, 2013 BYU

EDUC. & L.J. 21, 41–42 (highlighting the benefits of a more

collaborative collective bargaining process understood as

“interest-based” bargaining particularly with respect to

teacher evaluation).

208

Wash. Teachers’ Union v. D.C. Pub. Schs., 77 A.3d 441 (D.C.

Cir. 2013)

209

Id. at 444.

4546

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[568]

followed.

210

On appeal, the District of Columbia Court of

Appeals upheld the decision that the district’s final

judgments were not arbitrable. However, the school

district did not challenge the lower court’s determination

that the issue of whether the district followed evaluation

procedures was subject to evaluation.

211

In at least one other well-publicized case, the

Washington Teachers’ Union succeeded in frustrating

the D.C. Public Schools use of the IMPACT evaluation

system.

212

In this case, the union alleged that the school

district violated various evaluation procedures when they

terminated a seventeen year veteran teacher, Thomas

O’Rourke, under the district’s evaluation procedures.

213

As noted above, the controlling courts in the District of

Columbia have concluded that “process arguments”

under the collective bargaining agreement are arbitrable,

although the school district’s final judgment with respect

to evaluation categorization (e.g., ineffective,

satisfactory, etc.) is not.

In the District of Columbia Public Schools matter,

the arbitrator found that the district violated evaluation

procedures governing the length of observation visits,

which, according to the contract, should be “at least 30

minutes.”

214

In this case, the administrators evaluating

the teacher exceeded that length by substantial amounts

(e.g., observations lasted 80 minutes), which, in the eyes

of the arbitrator, amounted to a procedural violation of

evaluation processes.

215

Importantly, the arbitrator noted

210

Id.

211

Id.

212

D.C. Pub. Sch. v. Wash. Teachers Union, Local 6, AAA No.

16-20-1300-0499 AVH (Feigenbaum, Arb.); see also Perry

Stein, Teachers Union Touts Victory in Evaluation Fight WASH.

POST (Apr. 5, 2016), https://www.washingtonpost.com/news/

education/wp/2016/04/05/teachers-union-touts-victory-in-

evaluation-fight/ [https://perma.cc/P7RU-PSP7].

213

D.C. Pub. Schs., AAA No. 16-20-1300-0499 AVH.

214

Id. at 26–28.

215

Id. at 18.

4647

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[569]

two other significant factual findings to his decision. He

concluded that the administrator evaluating the teacher

had a reputation of using the observation system to

penalize teachers “he did not like.”

216

A school district

administrator, as well, testified that an observation that

exceeded or did not meet the thirty minute threshold

would amount to a process violation.

217

In sum, and under

these circumstances, therefore, procedural violations

could be seen as simply pretext for terminating a

teacher.

218

In arbitration cases, the remedy for a bargaining

violation can be a contested issue. In Washington, D.C.,

an arbitrator cannot issue a remedy in the form of

recategorizing a teacher’s evaluation from ineffective to

effective.

219

Reinstatement and back pay, however, are

typical arbitration remedies,

220

and these were, in fact,

used in the case.

IV. Current Policy Landscape in Wake of ESSA

This section discusses the current policy

landscape following the reauthorization of the

Elementary and Secondary Education Act of 1965 by

Congressional passage of the Every Student Succeeds Act

(ESSA) of 2015. It illustrates that the ESSA

reauthorization allowed for more state-level flexibility

with regards to VAM use. It then highlights how the new

policies have essentially shifted the emphasis from VAMs

216

Id. at 19.

217

Id. at 7.

218

Id. at 19.

219

Wash. Teachers’ Union v. D.C. Pub. Sch., 77 A.3d 441, 458

(D.C. Cir. 2013).

220

See e.g., DISCIPLINE AND DISCHARGE IN ARBITRATION ch.

13.I.A. (Norman Brand & Melissa Birens, eds., 3d ed.) (noting

that back-pay and reinstatement are two essential remedies for

making an employee whole).

4748

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[570]

in high stakes decision making to, perhaps, other ways of

measurement.

A. ESSA Reauthorization

In 2015, Congress passed a reauthorization of the

Elementary and Secondary Education Act under a new

name, the Every Student Succeeds Act.

221

In general,

ESSA reduced some federal mandates and incentives tied

to accountability system effectively limiting some of the

federal control promoted by RttT and other waiver

requirements.

222

Specifically, ESSA allowed state

departments of education two main changes: (a) ESSA

gave state departments leniency to interpret key terms

like, “including, as a significant factor, data on student

growth for all students,” and (b) ESSA gave state

departments more control to determine state goals and

measures for success with a federal framework.

223

Put

simply, ESSA allowed more flexibility.

To break down the policy changes further, the first

main change, allowing states to interpret “data on

student growth” differently, allowed state departments of

education to step back from the statistically-based

measures of student growth such as VAMs. ESSA

allowed states to use some measures which could include

qualitative measures as data showing student growth,

such as student learning objectives (SLOs), which are

objectives for the growth of students developed at the

beginning of the year by teachers (sometimes in

conjunction with others).

224

SLOs still rely on evidence which can still include

VAM scores, but the evidence can also include course

221

Every Students Succeeds Act of 2015, Pub. L. No. 114-95 §

114 Stat. 1177 (2015).

222

Race to the Top Act of 2011, S. Res. 844, 112th Cong. (2011).

223

ESEA Flexibility, U.S. DEP’T EDUC. (2012),

https://www.ed.gov/esea/flexibility [https://perma.cc/95A7-FLFA].

224

CLOSE ET AL., supra note 35, at 18.

4849

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[571]

exams, performance demonstrations, and other types of

evidence. In short, ESSA allowed states to incorporate

more nuanced and qualitative measures of student

growth without removing the requirement that states

must use evidence of student growth. The distinction is

small but significant. It signals a redefinition of “data” to

include information beyond large standardized testing

(although, importantly, it can still include these test

scores).

The second main change, allowing states to set

their own goals and measures for success, marks a

backing away from the strict adequate yearly progress

(AYP) goals established by NCLB. Although states still

must meet AYP for certain subgroups of students, the

consequences and the interventions that must be

imposed can be decided by the states themselves.

Essentially, ESSA removes the punitive bite

demonstrated previously by NCLB, the bite that

encouraged many states to apply for waivers and adopt

VAMs in the first place, and replaces it with flexibility.

States choose their own bite now. The standards remain,

but the consequence, the type of intervention required for

a failure to meet AYP, is decided by state departments of

education.

These two changes, though small, rolled back

some of the features that encouraged, or forced, states to

use large standardized statewide systems that leaned on

VAM results to measure teacher achievement.

225

The

new policy meant states did not need to create large-scale

comparable data about teacher achievement. States no

longer needed to structure their systems top-down and

could allow for more bottom-up control, essentially

handing more control to local educational authorities

such as school districts. ESSA marked a shift of power.

The federal government loosened reigns on state

225

Cindy Long, Six Ways ESSA Will Improve Assessments,

NEATODAY (2016), http://neatoday.org/2016/03/10/essa-

assessments/ [https://perma.cc/92AW-UC6A].

4950

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[572]

departments of education, who, in turn, had the freedom

to deviate from establishing one-size-fits-all teacher

evaluation systems across their state, handing more of

the power to make decisions to local educational

authorities, such as districts.

B. State Plans

Though ESSA allowed for many of the changes

stated above, it did not require or guarantee these

changes. The work of exercising the flexibility was for the

states, not the federal government. Hence, this section on

state plans reveals how state teacher evaluation plans

changed as a whole after the passage of ESSA through

state legislative and regulatory action. The changes, as

expected, trend toward less use of VAMs in high-stakes

decision making, though the trend is somewhat muted.

In general, less states are currently using growth

models or VAMs for teacher evaluation. The percentage

dropped from 42% in 2014 to 30% in 2018.

226

However,

that percentage drop fails to highlight the magnitude of

change. The study showing that the percentage

decreased measured whether some states currently use

or, importantly, endorse statewide use of VAMs. Some of

these states endorse VAMs but allow for local educational

authorities to avoid VAMs completely. For example,

Maine, encourages the use of VAMs, but offers two

models from which local education authorities can

choose, one of which measures student growth with

SLOs, not VAMs.

227

In this case, VAMs play a role in the

state’s teacher evaluation process, but, ultimately, the

choice is made locally. This represents a major departure

from the trend of heavy-handed state teacher evaluation

systems before the passage of ESSA.

Additionally, some states have maintained their

VAMs but use them in novel ways. North Carolina still

226

CLOSE ET AL., supra note 35, at 12.

227

Id. at 13.

5051

TENNESSEE’S NATIONAL IMPACT

13 TENN. J.L. & POL’Y 523 (2019)

[573]

uses a VAM, called EVAAS, which featured heavily in

many of the lawsuits.

228

However, the state does not use

the results to make high-stakes decisions. Rather, North

Carolina uses and reports the scores to foster

professional development.

229

In other words, the state

does not shy from using VAM data as a part of their

system, but they do shy from using VAMs for

consequential decisions such as tenure decisions and

others.

Additionally, and of note, recent state plans

demonstrate increased focus on formative feedback

practices compared to state plans collected in 2012, with

31 of 51 education plans stating that their evaluation

systems use formative data.

230

This shift indicates a

significant change in the stated values present in this

new set of state documents.

V. Conclusions

Quite apart from what education scholars and

policymakers believe with respect to the merits of added

models, all would likely agree that their introduction has

had significant consequences. Of course, there is

widespread disagreement with respect to how these

statistical models should be used. Teachers and unions

seeking to block the use of VAMs in high-stakes

employment decisions have sought judicial relief with

mixed success. That said, while courts may uphold the

use of VAMs under a rational basis test, they are suspect

about the wisdom of using VAMs to make significant

decisions with respect to teacher employment status.

But that does not mean that VAMs should be

relegated to the dustbin of educational policy history.

They may have important contributions to improving

teacher quality. They may be important “flags” for

228

See Hewitt, supra note 61, at 32.

229

CLOSE ET AL., supra note 35, at 14.

230

Id.

5152

TENNESSEE JOURNAL OF LAW AND POLICY

VOLUME 13 | WINTER 2019 | ISSUE 2

[574]

teachers, alerting them to investigate their practice a bit

further. VAMs may, someday, play an important role in

helping teachers.

Importantly, however, the use of VAMs must be

judicious, especially in light of their severe limitations.

VAMs cannot tell a teacher what causes a particular

result (the type of robust and actionable feedback a

teacher would want) and they are highly sensitive to

demographics and variables outside of a teacher’s control.

Yet, because VAMs were incorporated in high-stakes

decisions with such haste, especially with the impetus of

the Race to the Top, they were brought to scale, warts

and all.

Thankfully, states have a rare opportunity in

educational policy to take a bit more control over their

destiny under the Every Student Succeeds Act. They

can—and are—placing VAMs as a piece of a puzzle to

solve teacher quality issues. Many are beginning to adopt

laws and policies that minimize or eliminate their use in

high-stakes employment. That is a step in the right

direction, one that recognizes a relative value to VAMs in

the larger quest to improve public education.

5253