Navigating the paradox of EL Assessment– what makes sense to guide learning and teaching in K-12 schools?

he double-edged sword of national and statewide mandated accountability tests for English language learners (ELL) does not go unnoticed by policy makers, administrators, principals and teachers. On one hand, English language learners are defined by federal legislation as unable to meet challenging State academic standards (ESEA, Section 8101 (20)), yet on the other hand, they are expected to meet these same standards as defined by state accountability tests.

An ELL by definition possesses English capabilities that may make grade level assessments beyond reach, not due to lack of instruction or innate ability, but because the student has not had enough time in a program where English has been systematically and purposefully taught. Yet, to remove ELL from these standardized, grade level assessments is to physically and metaphorically relegate them to the back of the classroom. To exempt ELL from accountability measures would be far worse than requiring them to take the tests because it would remove them from the policy and decision-making radar.

English learners come to us with all sorts of linguistic and learning differences. Here we explore the various levels of assessment, both mandated and recommended, in an effort to give the reader insight into how assessment plays out in the K-12 educational landscape for ELL and how best leaders might navigate the current paradox.

Diagnostic Assessments

All states and districts use a developmentally appropriate battery of diagnostic assessments in the four domains of English (Reading, Writing, Speaking, and Listening) to determine if a bilingual student should receive additional English or bilingual services. It is important to note just because students speak another language at home does not mean they are automatically ELL or cannot access the general education curriculum without additional supports. A potential policy glitch in administering diagnostic ELL tests, then, and one that can have lasting negative impact for the student, is the over identification of ELL. This can be mitigated by using multiple assessment measures and ongoing diagnostic or dipstick checks to ensure reliability and validity of the initial diagnostic measure.

Diagnostic tests are meant to give a quick read on students’ proficiency levels with a relatively fast turnaround for student program placement purposes. District level assessments (vs. state or national tests) provide for more flexibility in accommodations and modifications because they are administered at the local level. Alternate forms or even alternate tests can be administered to students based on their individual needs, without huge repercussions.

Summative Assessments

Many state level mathematics, language arts, writing, social studies and science assessments are designed by test vendors as Criterion Referenced Tests (CRTs). A CRT sets a benchmark level against content standards, usually with input from teachers, policy makers, and instructional experts. Through a test maker process, called standards setting, a proficiency score is set. On the other hand, a National Normed Referenced Test (NRT) design provides a bell curve distribution of students, with the majority of test takers scoring at the 50th percentile with two tails at both ends and without any standards setting process. An NRT requires a large data base of test takers nationwide to give the test its validity.

Unfortunately, the statically models behind large scale CRT tests, often the statewide accountability test design of choice, are based on the bell curve models typically associated with NRTs. Thus, the benchmark to achieve “grade level” is presumably transparent and standard setting processes do give stakeholders a chance to define the benchmark or cut score, but the mathematical underpinnings of the two tests are the same.

States define grade level standards by sharing released test forms, test specifications and practice tests. ELL and special education experts may provide insight and expertise at various stages of the test making design, but rarely if ever, are special populations the center of a content level test design. These tests are summative in nature as they tell a community how a population of students did at the end of their annual journey down the standards-based road. There is not time for classroom adjustments to improve outcomes, but cohort and programmatic analysis can prove to be useful. For example, schools and districts could answer questions such as, is this new curriculum or instructional method working over time? Or what is the impact of a longer school day on an elementary program?

English Language Proficiency Assessments

With the authorization of the No Child Left Behind Act of 2001, ELL students were finally on the accountability radar and congress asked states to report out on the progress and proficiency of its ELL. English Language Proficiency Tests assess the same four modalities as diagnostic language tests, but they are meant to be summative measures of program quality and student progress. If these tests are based on language standards, they take on a CRT design. Currently, 39 states are a part of the WIDA consortium and as such administer the English Language Proficiency Assessments designed under the auspices of a federal enhanced assessment grant during the dawn of NCLB. These tests give policy makers information on how prepared ELL are to access the general education curriculum without supports

Formative Assessments

Finally, and most interestingly, research dating back more than twenty years, indicates the area where schools can get the most bang for their assessment buck is by using formative assessments. When educators in the field are measuring students’ acquisition of concepts and skills on an ongoing basis and making adjustments to their instruction in real time, it is here that we see the most significant growth for ELL. Formative assessments are classroom level assessments meant to be used by teams of teachers to determine instructional success. They inform planning and instruction. By adjusting a series of high- quality curriculum materials aligned to state content and language objectives, teachers can provide students not only what they need, but in timely fashion.

Unlike CRTs and NRTs, formative assessments are not meant to be compared across districts or states. They may only be used in a given classroom or across a grade level in a school, or even across a district or network of schools. However, formative assessments can provide great insight into reteaching and the spiraled review of important concepts and skills. They also provide student exemplars that can be shared and improved upon over time. English literacy development can be very visibly demonstrated by student writing in the content areas. Rubrics for assessing content and for assessing language acquisition can be beneficial as students progress in their language skills.

Because of their high impact potential, more time should be given in teacher preparation programs on how to use formative assessment data to guide instruction and how to interpret high quality assessment items. This work of analyzing test data and actual student work is best when it is teacher-led and teacher-owned. The knowledge of who can fix instructional issues resides with the teacher and the student. And that is a powerful fact. Good teaching goes hand in hand with powerful (formative) assessment.

