22  Spelling

22.1 Task Description

The child hears a word and spells it by selecting the correct letters among the foils available on the screen and drags them to the bottom of the screen in the correct order.

22.2 Construct

The Spelling task measures the children’s ability to put the alphabetic principle into action and encode speech sounds into print.

22.3 Item Development

22.3.1 English

For the development of the item pool, the research team reviewed multiple curricula to build up a list of frequent, decodable words, including curricula used in the United States, like McGraw-Hill’s “Wonders”, Benchmark’s “Benchmark Advance”, and HMH’s “Journey”.

From this pool of items, Clearpond (Marian et al. 2012) was used to retrieve information on the word’s frequency, orthographic and phonological length, and neighborhood frequency. This information was used to select a sample of hihg frequency words whose semantic meaning was overall easily accessed by the target population, with varying orthographic and phonological length. The final list of words also targeted the following characteristics: short, long, and variant vowels; r-controlled vowels; use of soft c and g; silent letters (e.g., /bm/, /sc/); diphthongs; consonant digraphs; two and three-letter blends; closed, open, and CVC syllables.

Using the letters of each target word as reference, the research team selected between 3 and 5 foil letters to be included among the correct letters for spelling the word. Foil letters were selected based on different criteria:

  • Phonological foils: letters with similarly-sounding phonemes as the target letter (e.g., z for s; c for k; v for b)
  • Visual foils: letters visually similar to the target letter (e.g., d for b; m for n)
  • Vocalic foils: alternative vowels to the targeted ones (e.g., o for a; e for i)
  • Morphological foils: alternative spelling of a conventional morpheme (e.g., z for s, t for ed for past tense verbs)
  • Unrelated foils: additional foils were included in the pool that were not easily confused with the letters needed to spell the target word.

22.3.2 Spanish

For the development of the item pool, the research team reviewed multiple curricula to build up a list of frequent, decodable words, including curricula used in dual language programs in California, including McGraw-Hill Maravillas, Estrellita, Houghton Mifflin Lectura. Curricular materials from Mexico, Panama, and Chile were also reviewed.

From this pool of items, Clearpond (Marian et al. 2012) was used to retrieve information on the word’s frequency, orthographic and phonological length, and neighborhood frequency. This information was used to select a sample of words that were high frequency, whose semantic meaning was overall easily accessed by the target population, and that had varying orthographic and phonological lengths.

Using the letters of each target word as reference, the research team selected between 3 and 5 foil letters to be included among the correct letters for spelling the word. Foils letters were based on different criteria:

  • Phonological foils: letters that sound similar to the target letter (e.g., z for s; c for k; v for b)
  • Visual foils: letters that are visually similar to the target letter (e.g., d for b; m for n)
  • Vocalic foils: alternative vowels to the targeted ones (e.g., o for a; e for i)
  • Stress foils: Spanish language uses accents for stressed letters and the child had to discern if the word contained the accented or nonaccented letter (e.g., é for e; a for á)
  • Crosslinguistic English phonology foils: phonemes that are represented with a different letter in English than they are in Spanish (e.g., th for d)
  • Unrelated foils: additional foils were included in the pool that were not easily confused with the letters needed to speall the target word.

22.4 Scoring

Dichotomous fixed response format of 0 points for incorrect responses or non-responses and 1 point for correct ones.

22.5 Calibration Samples

Table 22.1: Demographic Characteristics of Calibration Samples for the English and Spanish Spelling Tasks
Characteristic
English
Spanish
G2
N = 2,805
G2
N = 299
Timepoint

    Fall 2024 2,191 (100%) 0 (NA%)
    Unknown 614 299
Administration Format

    CAT 2,191 (78%)
    Forms 614 (22%) 299 (100%)
Race

    American/Alaskan Native 54 (2.1%) 4 (1.3%)
    Asian 179 (6.9%) 3 (1.0%)
    Black/African American 259 (10%) 4 (1.3%)
    Not reported 310 (12%) 156 (53%)
    Other 365 (14%) 14 (4.7%)
    White 1,420 (55%) 116 (39%)
    Unknown 218 2
Ethnicity

    Hispanic/Latin(o/a) 1,886 (74%) 264 (88%)
    Intentional nonreport 5 (0.2%) 1 (0.3%)
    Not Hispanic/Latin(o/a) 668 (26%) 34 (11%)
    Unknown 246
Gender

    Female 1,255 (49%) 173 (58%)
    Male 1,291 (51%) 126 (42%)
    Unknown 259
Home Language

    English 1,426 (59%) 71 (24%)
    Spanish 890 (37%) 216 (74%)
    Other 91 (3.8%) 5 (1.7%)
    Unknown 398 7
English Proficiency Label

    (Re-)Classified Proficient 268 (11%) 40 (14%)
    English Learner 701 (30%) 186 (64%)
    English-only 1,405 (59%) 66 (23%)
    Unknown 431 7
Ever IEP/504 210 (10%) 24 (9.5%)
    Unknown 784 46

22.6 Psychometric Analysis

22.6.1 Basic Item Statistics

We excluded 0 items from the English task and 9 items from the Spanish task based on low response counts (n < 90). 2 items were excluded because they had no variance in the Spanish task, and 2 items in the English task. Additionally, we excluded 1 items from the English task and 3 items from the Spanish task based on low point-biserial correlations (r < 0.2). Table 22.2 summarizes the basic item characteristics, Figure 22.1 shows the relationship between point-biserial correlations and the proportion of correct responses for each item.

Table 22.2: Basic Item Statistics Before and After Application of Exclusion Criteria, for the English and Spanish Spelling Tasks
English
Spanish
Characteristic
Before Excl.
After Excl.
Before Excl.
After Excl.
N = 90 N = 87 N = 102 N = 88
No. of Responses 440 (371) 452 (372) 120 (84) 132 (83)
Proportion Correct 0.42 (0.22) 0.42 (0.21) 0.42 (0.21) 0.43 (0.19)
Point-biserial Correlation 0.57 (0.12) 0.58 (0.11) 0.53 (0.16) 0.54 (0.13)
Excluded (n < 90) 0 (0%) 0 (0%) 9 (8.8%) 0 (0%)
Excluded (pbis < .2) 1 (1.1%) 0 (0%) 3 (3.0%) 0 (0%)
Excluded (no variation) 2 (2.2%) 0 (0%) 2 (2.0%) 0 (0%)
Figure 22.1: Scatterplot Showing Point-biserial (Item-total) Correlations and Proportion of Correct Responses for the English (Panel A) and Spanish (Panel B) Spelling Tasks

22.6.2 Rasch Analysis

22.6.2.1 Item Location Estimates

Figure 22.2: Scatterplot Showing Item Location and Proportion of Correct Response for the English (Panel A) and Spanish (Panel B) Spelling Tasks

22.6.2.2 Item Fit Statistics

Table 22.3: Frequencies of Item Misfit Categories Based on Infit/Outfit MSE Values for the English and Spanish Spelling Tasks
English
Spanish
Infit MSE
A B C D Total A B C D Total
Outfit MSE
A 67 0 0 0 67 70 0 0 0 70
B 10 0 0 0 10 4 0 0 0 4
C 5 0 0 0 5 9 0 0 0 9
D 5 0 0 0 5 3 0 2 0 5
Total 87 0 0 0 87 86 0 2 0 88

22.6.2.3 Person Location Estimates

Figure 22.3: Scatterplot Showing Person Location Estimates (Obtained using the MLE method) and the Proportion of Correct Responses for English and Spanish Spelling Tasks

22.6.2.4 Person Fit Statistics

Table 22.4: Frequencies of Person Misfit Categories Based on Infit/Outfit MSE Values for the English and Spanish Spelling Tasks
English
Spanish
Infit MSE
A B C D Total A B C D Total
Outfit MSE
A 1,846 0 9 2 1,857 438 0 1 0 439
B 319 364 0 0 683 67 102 0 0 169
C 70 0 14 1 85 24 0 6 0 30
D 78 0 28 7 113 15 0 8 0 23
Total 2,313 364 51 10 2,738 544 102 15 0 661

22.6.2.5 Distribution of Theta Estimates

Figure 22.4: Distribution of Theta Estimates for the English and Spanish Spelling Tasks

22.6.2.6 Wright Maps

Figure 22.5: Wright Maps Showing the Relationship Between Item and Person Location Estimates for the English Spelling Task
Figure 22.6: Wright Maps Showing the Relationship Between Item and Person Location Estimates for the Spanish Spelling Task

22.6.2.7 Model Summary

Table 22.5: Summary of Rasch Model Statistics for the English and Spanish Spelling Tasks
English
Spanish
Item
Person
Item
Person
Characteristic N = 87 N = 2,738 N = 88 N = 661
Logit Scale Location 1.15 (2.08) -0.09 (-1.54, 1.66) 0.98 (1.77) 0.25 (-1.57, 1.76)
Outfit 1.08 (0.91) 0.69 (0.50, 0.89) 1.10 (0.59) 0.70 (0.49, 0.97)
Infit 0.98 (0.13) 0.84 (0.68, 0.99) 1.02 (0.19) 0.85 (0.67, 1.01)
Reliability of Separation 0.8937 0.8560 0.8871 0.8307
22.6.2.7.1 Final Number of Items

Following the exclusion of items with point-biserial correlations < .20 and items with poor fit statistics, the final versions of the task contain 87 and 88 for the English and Spanish task, respectively.

22.7 Criterion Validity Evidence

22.7.1 Sample

Table 22.6: Demographic Characteristics of the Concurrent Criterion Validity Evidence Samples for the English and Spanish Spelling Tasks
Characteristic
English
Spanish
G2
N = 212
G2
N = 221
Timepoint

    Spring 2024 212 (100%)
Race

    American/Alaskan Native 2 (0.9%) 4 (1.8%)
    Asian 15 (7.1%) 9 (4.1%)
    Black/African American 27 (13%) 1 (0.5%)
    Not reported 34 (16%) 50 (23%)
    Other 45 (21%) 73 (33%)
    White 89 (42%) 84 (38%)
Ethnicity

    Hispanic/Latin(o/a) 102 (48%) 202 (92%)
    Intentional nonreport 3 (1.4%)
    Not Hispanic/Latin(o/a) 106 (50%) 18 (8.2%)
    Unknown 1 1
Gender

    Female 93 (44%) 110 (50%)
    Male 119 (56%) 110 (50%)
Home Language

    English 148 (70%) 54 (24%)
    Spanish 38 (18%) 159 (72%)
    Other 24 (11%) 8 (3.6%)
    Unknown 2
English Proficiency Label

    (Re-)Classified Proficient 17 (8.1%) 41 (19%)
    English Learner 42 (20%) 136 (62%)
    English-only 152 (72%) 42 (19%)
    Unknown 1 2
Ever IEP/504 20 (13%) 16 (8.9%)
    Unknown 54 41
    Spring 2025
221 (100%)
    Unknown
1

English Spelling was correlated with the Spelling subtest from the Woodcock-Johnson IV (WJ IV ACH) test (Schrank, McGrew, and Mather 2014). Spanish Spelling results are forthcoming.

Table 22.7: Concurrent Criterion Validity Correlations for the English and Spanish Spelling Tasks
English
Spanish
All
EL
All
Grade n r [CI] n r [CI] n r [CI]
G2 212 0.80 [0.75, 0.84] 42 0.79 [0.64, 0.88] 220 0.72 [0.65, 0.78]