12 Listening Comprehension – Technical Manual

12.1 Task Description

Children listen to a sentence and are shown three or four pictures. They are asked to choose the picture that best represents the meaning of the sentence they heard.

12.2 Construct

The Listening Comprehension task measures the construct of the grammatical comprehension of sentences. Sentences cover three broad areas of grammar: phrasal syntax, temporal relationships, and modified noun phrases.

12.3 Item Development

While many available tests use production tasks to tap into children’s grammatical knowledge in their language(s), considerably fewer tests use receptive tasks to measure these skills. Researchers conducted a review of existing receptive grammar tests, as well as a review of the literature regarding easy and difficult grammatical constructions, and used this to inform item design. Grammatical constructions were considered for inclusion if (1) they had the potential to differentiate low vs. high comprehension and (2) they could be tested using a receptive format (e.g., with appropriate foils). This resulted in a blueprint of selected grammatical constructions, presented below:

12.3.1 English

Phrasal syntax: Correct parsing of the sentence required correctly linking the subject (‘the doer’) and object (‘the one being acted upon’) in the sentence. This included alternations in ditransitive sentences (direct object indirect object vs indirect object direct object); passive constructions; subject/object relative clauses with reversible and nonreversible noun phrases; and interrogatives (direct vs. embedded questions).
Temporal comprehension: Comprehension required linking the order of events and ranged from easy (with events linearly matching the real-world chronological order) to difficult (mismatch). This was done using causal clauses (because/so), temporal clauses, future tense (will), and conditional clauses (if).
Complex noun phrases: Comprehension required identifying modified noun phrases. This was done using prepositional phrases (with the red hat), adjectives (long striped), and quantifiers (none, all).

12.3.2 Spanish

Phrasal syntax: Correct parsing of the sentence required correctly linking the subject (‘the doer’) and object (‘the one being acted upon’) in the sentence. This included alternations in ditransitive sentences (direct object indirect object vs indirect object direct object); passive constructions; subject/object relative clauses with reversible and nonreversible noun phrases; and interrogatives (direct vs. embedded questions).
Temporal comprehension: Comprehension required linking the order of events and ranged from easy (with events linearly matching the real-world chronological order) to difficult (mismatch). This was done using causal clauses (porque/para que); temporal clauses (hasta que); future tense (va a), and conditional clauses (si, mientras).
Complex noun phrases: Comprehension required identifying modified noun phrases. This was done using prepositional phrases (entre el árbol y la casa); adjectives (en el carro rojo); and quantifiers (todo, ninguno).

The sentences were represented by illustrations created specifically for the assessment. The following guidelines were provided for the development of the illustrations:

Easily Recognizable. Emphasis was placed on developing illustrations that could be easily identified.
Removal of Irrelevant Information. Unnecessary elements were not included in the image, focusing solely on the essential components of the image related to the content of the sentence.
Diversity Representation. The illustrations were designed to target various aspects of diversity, including different racial backgrounds (through variations in skin tones and hair textures), and diverse abilities such as featuring characters who use wheelchairs, prosthetics, or hearing devices. Cultural representations were carefully considered, encompassing a range of clothing styles, skin tones and physical features to reflect various backgrounds. The Justice, Equity, Diversity, and Inclusion (JEDI) team reviewed all the developed illustrations to enhance diversity in representation.

Dialectal considerations.

12.4 Scoring

Dichotomous fixed response format of 0 points for incorrect responses or non-responses and 1 point for correct ones.

12.5 Calibration Samples

Table 12.1: Demographic Characteristics of Calibration Samples for the English and Spanish Listening Comprehension Tasks

Characteristic	English			Spanish
Characteristic	K N = 294	G1 N = 301	G2 N = 290	K N = 239	G1 N = 278	G2 N = 260
Timepoint
Fall 2023	294 (100%)	301 (100%)	290 (100%)	239 (100%)	278 (100%)	260 (100%)
Administration Format
Forms	294 (100%)	301 (100%)	290 (100%)	239 (100%)	278 (100%)	260 (100%)
Race
American/Alaskan Native	7 (2.4%)	3 (1.0%)	3 (1.1%)	2 (0.8%)	3 (1.1%)	4 (1.6%)
Asian	39 (13%)	49 (16%)	13 (4.8%)	6 (2.5%)	3 (1.1%)	0 (0%)
Black/African American	29 (10.0%)	31 (10%)	55 (20%)	1 (0.4%)	0 (0%)	0 (0%)
Not reported	44 (15%)	49 (16%)	20 (7.4%)	134 (57%)	183 (67%)	171 (66%)
Other	77 (26%)	50 (17%)	13 (4.8%)	43 (18%)	11 (4.0%)	20 (7.8%)
White	95 (33%)	119 (40%)	165 (61%)	51 (22%)	75 (27%)	63 (24%)
Unknown	3	0	21	2	3	2
Ethnicity
Hispanic/Latin(o/a)	133 (45%)	127 (42%)	170 (59%)	215 (91%)	255 (92%)	243 (98%)
Intentional nonreport	6 (2.0%)	2 (0.7%)	0 (0%)	1 (0.4%)	0 (0%)	2 (0.8%)
Not Hispanic/Latin(o/a)	155 (53%)	172 (57%)	116 (41%)	20 (8.5%)	22 (7.9%)	2 (0.8%)
Unknown	0	0	4	3	1	13
Gender
Female	145 (49%)	139 (46%)	144 (50%)	128 (54%)	138 (50%)	139 (53%)
Male	149 (51%)	162 (54%)	143 (50%)	111 (46%)	139 (50%)	121 (47%)
Unknown	0	0	3	0	1	0
Home Language
English	213 (74%)	225 (75%)	189 (79%)	27 (11%)	33 (12%)	16 (6.2%)
Spanish	39 (14%)	36 (12%)	40 (17%)	207 (88%)	241 (88%)	241 (93%)
Other	36 (13%)	38 (13%)	10 (4.2%)	2 (0.8%)	1 (0.4%)	1 (0.4%)
Unknown	6	2	51	3	3	2
English Proficiency Label
(Re-)Classified Proficient	25 (10%)	23 (7.8%)	16 (6.7%)	29 (13%)	43 (16%)	33 (14%)
English Learner	58 (23%)	56 (19%)	34 (14%)	184 (83%)	204 (74%)	189 (79%)
English-only	165 (67%)	215 (73%)	188 (79%)	9 (4.1%)	27 (9.9%)	16 (6.7%)
Unknown	46	7	52	17	4	22
Ever IEP/504	20 (8.4%)	27 (10%)	27 (11%)	20 (9.8%)	21 (9.8%)	15 (13%)
Unknown	56	43	51	35	63	143

12.6 Psychometric Analysis

12.6.1 Basic Item Statistics

We excluded 0 items from the English task and 0 items from the Spanish task based on low response counts (n < 90). 0 items were excluded because they had no variance in the Spanish task, and 0 items in the English task. Additionally, we excluded 12 items from the English task and 14 items from the Spanish task based on low point-biserial correlations (r < 0.2). Table 12.2 summarizes the basic item characteristics, Figure 12.1 shows the relationship between point-biserial correlations and the proportion of correct responses for each item.

Table 12.2: Basic Item Statistics Before and After Application of Exclusion Criteria, for the English and Spanish Listening Comprehension Tasks

	English		Spanish
Characteristic	Before Excl.	After Excl.	Before Excl.	After Excl.
Characteristic	N = 117	N = 105	N = 120	N = 106
No. of Responses	171 (101)	172 (103)	151 (91)	152 (90)
Proportion Correct	0.81 (0.15)	0.80 (0.14)	0.72 (0.18)	0.72 (0.16)
Point-biserial Correlation	0.38 (0.13)	0.41 (0.10)	0.36 (0.12)	0.39 (0.09)
Excluded (n < 90)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Excluded (pbis < .2)	12 (10%)	0 (0%)	14 (12%)	0 (0%)
Excluded (no variation)	0 (0%)	0 (0%)	0 (0%)	0 (0%)

Figure 12.1: Scatterplot Showing Point-biserial (Item-total) Correlations and Proportion of Correct Responses for the English (Panel A) and Spanish (Panel B) Listening Comprehension Tasks

12.6.2 Rasch Analysis

12.6.2.1 Item Location Estimates

Figure 12.2: Scatterplot Showing Item Location and Proportion of Correct Response for the English (Panel A) and Spanish (Panel B) Listening Comprehension Tasks

12.6.2.2 Item Fit Statistics

Table 12.3: Frequencies of Item Misfit Categories Based on Infit/Outfit MSE Values for the English and Spanish Listening Comprehension Tasks

	English					Spanish
	Infit MSE
	A	B	C	D	Total	A	B	C	D	Total
Outfit MSE
A	98	0	0	0	98	105	0	0	0	105
B	6	0	0	0	6	1	0	0	0	1
C	1	0	0	0	1	0	0	0	0	0
D	0	0	0	0	0	0	0	0	0	0
Total	105	0	0	0	105	106	0	0	0	106

12.6.2.3 Person Location Estimates

Figure 12.3: Scatterplot Showing Person Location Estimates (Obtained using the MLE method) and the Proportion of Correct Responses for English and Spanish Listening Comprehension Tasks

12.6.2.4 Person Fit Statistics

Table 12.4: Frequencies of Person Misfit Categories Based on Infit/Outfit MSE Values for the English and Spanish Listening Comprehension Tasks

	English					Spanish
	Infit MSE
	A	B	C	D	Total	A	B	C	D	Total
Outfit MSE
A	557	0	0	0	557	641	0	0	0	641
B	187	97	0	0	284	60	39	0	0	99
C	29	0	4	0	33	25	0	1	0	26
D	3	0	2	1	6	1	0	1	0	2
Total	776	97	6	1	880	727	39	2	0	768

12.6.2.5 Distribution of Theta Estimates

Figure 12.4: Distribution of Theta Estimates for the English and Spanish Listening Comprehension Tasks

12.6.2.6 Wright Maps

Figure 12.5: Wright Maps Showing the Relationship Between Item and Person Location Estimates for the English Listening Comprehension Task

Figure 12.6: Wright Maps Showing the Relationship Between Item and Person Location Estimates for the Spanish Listening Comprehension Task

12.6.2.7 Model Summary

Table 12.5: Summary of Rasch Model Statistics for the English and Spanish Listening Comprehension Tasks

	English		Spanish
	Item	Person	Item	Person
Characteristic	N = 105	N = 880	N = 106	N = 768
Logit Scale Location	-2.07 (1.22)	-0.06 (-0.81, 1.17)	-1.28 (1.02)	0.06 (-0.66, 0.77)
Outfit	0.93 (0.26)	0.69 (0.39, 0.98)	0.98 (0.15)	0.86 (0.66, 1.06)
Infit	0.99 (0.09)	0.86 (0.71, 1.02)	1.00 (0.07)	0.92 (0.79, 1.05)
Reliability of Separation	0.6992	0.5419	0.7060	0.6316

12.6.2.7.1 Final Number of Items

Following the exclusion of items with point-biserial correlations < .20 and items with poor fit statistics, the final versions of the task contain 105 and 106 for the English and Spanish tasks, respectively.

12.7 Criterion Validity Evidence

12.7.1 Sample

Table 12.6: Demographic Characteristics of the Concurrent Criterion Validity Evidence Samples for the English and Spanish Listening Comprehension Tasks

Characteristic	English			Spanish
Characteristic	K N = 261	G1 N = 231	G2 N = 201	K N = 242	G1 N = 226	G2 N = 261
Timepoint
Winter 2024	261 (100%)	231 (100%)	201 (100%)	242 (100%)	226 (100%)	261 (100%)
Race
American/Alaskan Native	5 (1.9%)	3 (1.3%)	1 (0.5%)	2 (0.8%)	4 (1.8%)	4 (1.5%)
Asian	35 (14%)	36 (16%)	8 (4.4%)	8 (3.3%)	2 (0.9%)	0 (0%)
Black/African American	27 (10%)	30 (13%)	32 (17%)	1 (0.4%)	0 (0%)	0 (0%)
Not reported	30 (12%)	32 (14%)	13 (7.1%)	133 (55%)	154 (69%)	168 (65%)
Other	73 (28%)	45 (19%)	3 (1.6%)	40 (17%)	8 (3.6%)	18 (6.9%)
White	88 (34%)	85 (37%)	126 (69%)	56 (23%)	55 (25%)	69 (27%)
Unknown	3	0	18	2	3	2
Ethnicity
Hispanic/Latin(o/a)	109 (42%)	98 (42%)	121 (61%)	218 (91%)	210 (93%)	244 (98%)
Intentional nonreport	7 (2.7%)	2 (0.9%)	0 (0%)	1 (0.4%)	0 (0%)	2 (0.8%)
Not Hispanic/Latin(o/a)	145 (56%)	131 (57%)	79 (40%)	20 (8.4%)	16 (7.1%)	2 (0.8%)
Unknown	0	0	1	3	0	13
Gender
Female	129 (49%)	110 (48%)	97 (48%)	128 (53%)	110 (49%)	134 (51%)
Male	132 (51%)	121 (52%)	104 (52%)	114 (47%)	116 (51%)	127 (49%)
Home Language
English	190 (74%)	174 (76%)	126 (82%)	29 (12%)	23 (10%)	23 (8.9%)
Spanish	34 (13%)	24 (10%)	22 (14%)	208 (87%)	199 (89%)	235 (91%)
Other	32 (13%)	32 (14%)	5 (3.3%)	2 (0.8%)	1 (0.4%)	1 (0.4%)
Unknown	5	1	48	3	3	2
English Proficiency Label
(Re-)Classified Proficient	12 (5.6%)	17 (7.5%)	11 (7.2%)	31 (14%)	24 (11%)	39 (16%)
English Learner	49 (23%)	40 (18%)	16 (10%)	185 (81%)	177 (80%)	177 (74%)
English-only	155 (72%)	169 (75%)	126 (82%)	11 (4.8%)	19 (8.6%)	23 (9.6%)
Unknown	45	5	48	15	6	22
Ever IEP/504	20 (10%)	23 (13%)	18 (12%)	20 (9.4%)	23 (11%)	16 (12%)
Unknown	63	47	48	30	15	123

English Listening Comprehension was correlated with the Sentence Comprehension subtest of the Clinical Evaluation of Language Fundamentals, 5th Edition (CELF 5) test (Wiig, Semel, and Secord 2013). Spanish Listening Comprehension was correlated with the Sentence Comprehension subtest of the Clinical Evaluation of Language Fundamentals, 4th Edition, Spanish (CELF 4 Spanish) test (Semel et al. 2006).

Table 12.7: Concurrent Criterion Validity Correlations for the English and Spanish Listening Comprehension Tasks

	English				Spanish
	All		EL		All
Grade	n	r [CI]	n	r [CI]	n	r [CI]
K	261	0.51 [0.41, 0.59]	49	0.58 [0.35, 0.74]	242	0.45 [0.35, 0.55]
G1	231	0.37 [0.26, 0.48]	40	0.44 [0.15, 0.66]	226	0.42 [0.31, 0.52]
G2	201	0.42 [0.29, 0.52]	NA	NA	261	0.40 [0.29, 0.50]