maat: Multiple Administrations AdaptiveTesting

1 Introduction

Themaat package performs adaptive testing based onthe assessment framework involving multiple tests administeredthroughout the year using multiple item pools vertically scaled andmultiple phases within each test administration(Choi et al.,2022). It allows for transitioning from one item pool withassociated constraints to another as determined necessary according to aprespecified transition policy to enhance the quality of measurement.Based on an item pool and test blueprint constraints, each phase ormodule is constructed as a computerized adaptive test (CAT) assembleddynamically using the shadow-test approach to CAT(van derLinden & Reese, 1998). The current version ofmaat supports an assessment design involving threetests/administrations (e.g. Fall, Winter, and Spring) with two phaseswithin each administration (Phase 1 and Phase 2), so that an examineetakes six modules in total throughout the year. The assessment frameworkis adaptive in multiple levels:

Within phases - items are administered adaptively
Between phases - item pools are selected adaptively
Between tests - item pools are selected adaptively

2 AssessmentStructure

An assessment under the multiple administrations adaptive testingdesign has the following structure.

Assessment: An assessment is composed of multipleadministrations over time, with the current version supporting threeadministrations (e.g., Fall, Winter, and Spring).
Administration (Test): Each test/administrationconsists of two phases (Phase 1 and Phase 2) between which transitionsto a higher/lower item pool are permitted if needed. Transitions to ahigher/lower item pool can also occur between administrations.
Phase: Each phase involves its own item pool andtest specifications/constraints to be used in assembly.
Module: Three administrations x two phases give sixmodules in total.
- Each examinee takes six modules across three administrations withinthe assessment.
- In each module, a fixed-length adaptive test is assembled in realtime using the shadow-test approach to CAT.
Number of item pools required for the entireassessment
- For the full assessment, an item pool is required for each grade inthe range. For example, if the range to which the 4th grade (G4)examinees can be routed is G3-G6, then all four item pools (G3, G4, G5,and G6) are required for the assessment.
- Students begin the first module of the first test (e.g., Fall Phase1) with an item pool (and test specifications) corresponding to theirgrade level of record. Students are assessed with the same item pool forthe entire assessment if their ability estimates remain within themeasurement range of the item pool. Otherwise, they are routed to ahigher/lower item pool upon completing a module.
Routing policy: Based on a routing policy(described in detail below), examinees are routed to a grade level aftercompleting each module.

3 Assumptions

Several assumptions are made to support the multiple administrationsadaptive testing design.

Item Pool:
- Items in the item pools across grades are vertically scaled andarticulated.
- Item parameters from Phase 1 and Phase 2 are on the same metric, andthus can be combined to develop a single ability estimate of theexaminee.
- Test scores are computed based on item responses from Phase 1 andPhase 2 combined as the administration-level scores.
- The ability/trait scores for examinees across administrations areexpected to change (and not assumed to remain constant).
- The vertical scale is robust and well-articulated across the grades.For all$G=4, 5, \dots$,
  - the mean item difficulty of grade$G-1$ pool is lower than grade$G$ pool;
  - the minimum item difficulty of grade$G-1$ pool is lower than grade$G$ pool;
  - the maximum item difficulty of grade$G-1$ pool is lower than grade$G$ pool.
Module assembly:
- Student-level item overlap control is enforced acrossadministrations and/or phases.
- The starting theta of the first module (Fall Phase 1) is the meanitem pool difficulty of the grade of record. The same starting theta isused for all examinees for the first module. All subsequent modules areinitialized individually based on the previous score.
- Throughout the assembly process, no distinction is made between theability estimate and the target$\theta$ that is used to assemble tests.The ability estimate is always used as the target$\theta$ for the assembly.

4 Module Assembly

A module is a fixed-length adaptive test constructed under theshadow-test framework. This section describes how each module isassembled.

4.1 Content BalancingAlgorithm

The shadow-test approach to CAT(Choi & van der Linden, 2018;van der Linden & Reese,1998) was developed to balance the need for sequential andsimultaneous optimization in constrained CAT. The shadow-test approachuses the methodology of mixed-integer programming (MIP) tosimultaneously select a full set of items conforming to all content andother constraints and yet optimal (e.g., most informative).

Given the item pool and test specifications/constraints, theshadow-test approach to CAT assembles a full-length test form, called ashadow test, using MIP for each item position upon updating interim$\theta$ estimates,$\theta_{k}$,$k= 1,2,...,n$, where$n$ isthe test length. The optimal test assembly engine uses MIP to constructshadow tests optimized for the interim$\theta$ estimates and conforming to allspecifications and requirements, encompassing content constraints,exposure control, overlap control, and other practical constraints(e.g., enemy items, item types, etc.). The cycle is repeated until theintended test length$n$ has beenreached.

The methods by which the shadow-test approach formulates testassembly problems as constrained combinatorial optimization have beendocumented invan der Linden (2005) and implementedin theTestDesign package(Choi et al., 2021). Refer toChoi & van der Linden (2018) for more informationabout how the shadow-test approach creates an adaptive test as asequence of optimally constructed full-length tests.

4.2 Item SelectionCriteria

A standard procedure for choosing a shadow test (for a given examineeat a particular item position) among potentially an astronomical numberof alternatives is to compare the objective values provided by thealternatives. The common objective function in its simplest form is:

\[\text{maximize}\sum_{i\,=\,1}^{I}I_{i}(\hat{\theta})x_{i}\]

where$I_{i}(\hat{\theta})$ is theFisher information for Item$i$ at anestimated$\theta$ value. It is alsopossible to add a random component to the objective function to reducethe overexposure of highly informative items for some or all itempositions within a test. For example, the progressive method(Revuelta &Ponsoda, 1998) can be incorporated into the objectivefunction so that at the beginning of the test the objective functioncombines a random component with item information, and as the testprogresses the random component is reduced proportionately.

Upon constructing/updating a shadow test, a single item is selectedto be administered. Selecting an item from a shadow test is typicallydone by selecting the most informative item in the shadow test that hasnot been administered, as

\[\text{arg} \max_{i\,\in\,R}I_{i}(\hat{\theta}),\]

where$R$ indicates the set ofitems in the current shadow test that has not been administered to theexaminee. When the test is comprised of item sets (e.g., readingpassages), selecting a passage should precede selecting an item, whichcan be based on the average item information within each passage. Once apassage is selected, typically multiple items are selected before movingonto another passage.

4.3 How Passages AreSelected

In the MIP optimizer, passages are selected not directly but as aresult of attempting to satisfy constraints. Given an item pool that has$I$ items, a discrete assemblyproblem (i.e., not passage-based) uses$I$ decision variables that represent eachitem in the pool. In a passage-based assembly that has$S$ available passages in the pool,$S$ more decision variables are added toexisting$I$ decision variables. Thenested structure between items and passages is provided to the solverthrough the use of constraints.

Using the same information maximization criterion presented above, ashadow-test that satisfies the criterion and the constraints isassembled/re-assembled for the administration of each item. From theshadow-test, the passage to be administered to the examinee isdetermined using the following process.

First, if the examinee is currently not within a passage, the passagethat has the largest mean information at the current$\hat{\theta}$ is selected as the passageto be administered. The mean information is calculated from the shadowtest. For example, suppose that Passage 1 consists of Items 1, 2, 3, 4,5, and only Items 1, 2, 3 were included in the shadow test. In thiscase, the mean information of Passage 1 is computed from Items 1, 2, 3.After selecting a passage with the highest mean information, the itemwithin the passage that has the largest information at the current$\hat{\theta}$ is administered to theexaminee. This marks the passage as the currently active passage.

For the next shadow test, the assembly engine enforces to selectpreviously administered items and passages, and the currently activepassage that contains the item. In this step, for the currently activepassage, a different combination of items may be selected in the shadowtest. For example, suppose that Passage 1 consists of Items 1, 2, 3, 4,5, and the constraint is to administer 3 items for each passage. IfItems 1, 2, 3 were selected previously and Item 1 was administered, itis possible that Items 1, 3, 5 will be selected in the current shadowtest. Given the combination, either Item 3 or 5 will be administered tothe examinee depending on which item has the largest information.

4.4 Exposure Control

The maximum-information item-selection criterion causes overexposureof a small proportion of items while underexposing the rest. Theshadow-test approach mitigates the problem by adding random itemeligibility constraints to the test-assembly model so that items withhigher exposure rates have higher probabilities of being temporarilyineligible. TheTestDesign package implements theconditional item eligibility control method recently improved andextended(van der Linden & Choi,2019). For each examinee theTestDesign enginedetermines which items to constrain as temporarily ineligible from theitem pool. The engine can also monitor the probabilities ofineligibility for all items conditional on different theta levels suchthat the exposure rates for all items in the pool are tightly controlledwithin and across different theta segments (intervals) and bound below amaximum exposure rate seta priori (e.g.,$r^{\max}=0.25$).

More specifically, for each new examinee, prior to the administrationof the test, the item eligibility control method conducts Bernoulliexperiments (by theta segment) for the items in the pool to determinetheir eligibility for administration with probabilities updated as afunction of the actual exposure rates of the items. For any itemsdetermined to be ineligible additional constraints are included in thetest assembly model as follows:

\[\sum_{i\,\in\,V_j}{x_i} = 0\]

where$x_i$ is the binary decisionvariable for the selection of item$i$; and$V_j$ denotes the set of items determinedto be ineligible for Examinee$j$.

The conditional item eligibility method monitors and updates theprobabilities within a predetermined set of theta segments, e.g.,$\theta_1 \in [-\infty,-1.5), \theta_2 \in[-1.5,-.5), \dots , \theta_G \in (1.5, \infty]$. The conditionalitem-eligibility probabilities are represented as a recurrencerelationship as follows:

\[\text{Pr}\{E_i | \theta\}\leq\frac{r^{\max}}{\text{Pr}\{A_i | \theta\}}\text{Pr}\{E_i | \theta\},\]

where$\text{Pr}\{E_i | \theta\}$is the conditional eligibility probability for item$i$ given$\theta \in \theta_g$; and$\text{Pr}\{A_i | \theta\}$ is theconditional exposure probability (rate) for the item. Theoretically,$\text{Pr}\{A_i | \theta\}$ can beupdated continuously as each examinee finishes the test. Assuming$l = 1,2,\dots$ denote the sequence ofexaminees taking the test. The conditional item-eligibilityprobabilities can be updated continuously as:

\[\text{Pr}^{l+1}\{E_{i}|\theta\} = \min \bigg\{ \frac{r^{\max}} {\text{Pr}^{l}\{A_{i}|\theta\}} \text{Pr}^{l}\{E_{i}|\theta\}, 1 \bigg\}\]

However, in the context of a large number of concurrent testinstances updating the exposure counts in real time after each instancecan be difficult and perhaps not necessary. One complication with theconditional item eligibility control method is that as the testprogresses examinees may move in and out of segments and can be subjectto different sets of eligible items as they typically visit more thanone theta segment over the course of a test administration.van der Linden & Choi (2019) elaboratesthe issue and provides a workaround. Unconditional exposure control ismuch more straightforward to implement and can be preferred in manytesting situations. TheTestDesign package implements theconditional item eligibility control method based on configurable$\theta$ segments. Defining one big segmentof$\theta$ simplifies the method tothe unconditional case.

4.5 Overlap Control

Overlap control might be needed to prevent or reduce the amount ofintra-individual overlap in test content across administrations. Theitem eligibility control method can be used to make all items previouslyseen by the examinee ineligible for the current administration byimposing constraints similarly as

\[\sum_{i\,\in\,S_{j}}{x_{i}} = 0,\]

where$s_j$ denotes the set ofitems Examinee$j$ has seen prior tothe current administration. Imposing these hard constraints can undulylimit the item pool and potentially affect the quality of measurement.To avoid infeasibility and degradation of measurement, we can imposesoft constraints in the form of a modification to the maximuminformation objective function as

\[\text{maximize} \sum_{i\,=\,1}^{I}I_{i}{(\theta) x_{i}} \, – \, M \sum_{i\,\in\,s_{j}}{x_{i}},\]

where$M$ is a penalty forselecting an item from$s_j$ thesubset of items previously administered to Examinee$j$. This modification to the objectivefunction can effectively deter the selection of items previouslyadministered unless absolutely necessary for feasibility of themodel.

Although the same item eligibility constraints for inter-individualexposure control can be used to control intra-individual item overlap,the mechanism for identifying ineligible items for the intra-individualoverlap control is quite different. It requires tracking the examineerecords across test administrations, which may be days, weeks, or monthsapart. As the number of administrations increases, the ineligible itemset ($s_j$) can grow quickly andadversely affect the quality of measurement progressively. To preventthe ineligible item set from growing quickly,$s_j$ may need to be defined based only onthe immediately preceding test administration. Another possibility is tolet the penalty$M$ be subject toexponential decay over test administrations:

\[M\cdot e^{-\lambda t},\]

where$\lambda$ is adisintigration constant; and$t$ is atime interval in some unit.

Themaat package uses hard constraints to performoverlap control. Three options are available:

all: If an examinee sees an item, then the item isexcluded from shadow tests in all subsquent modules. For example, if anexaminee sees an item in Phase 1 Administration 1, then the item isexcluded in Phase 2 Administration 1 and all subsequent phases andadministrations. In passage-based assembly, if an examinee sees at leastone item from a passage, then the entire passage is excluded from shadowtests in all following modules.
within-test: If an examinee sees an item, then the itemis excluded from shadow tests in following phases within the currentadministration. The item is again made available in the subsequentadministrations. For example, if an examinee sees an item in Phase 1Administration 1, then the item is excluded in Phase 2 Administration 1but is made available in Phase 1 Administration 2. Similar is done forpassage-based assembly.
none: A examinee can see any item twice (or more) inany phases and administrations. For example, if an examinee sees an itemin Phase 1 Administration 1, the examinee can see the same item in Phase2 Administration 1.

4.6 Stopping Rule

The stopping rule describes the criteria used to terminate a CAT. Thestopping rule is based on the number of overall required points and thetotal number of items denoted in the constraint file.

4.7 AbilityEstimation

Themaat package supports expected a posteriori(EAP), maximum likelihood estimation (MLE) and maximum likelihoodestimation with fence (MLEF) available in theTestDesignpackage for$\theta$ estimation. Theestimation method must be specified increateShadowTestConfig().

The MLE and MLEF methods inTestDesign has extrafallback routines for performing CAT:

For MLE, if the response vector is extreme (i.e., the sum is 0 orthe maximum for all items), EAP is used instead of MLE to obtain the$\theta$ estimate. This is becauseMLE estimates for extreme vectors is either$-\infty$ or$+\infty$.
For MLEF, the two fence items are only added when the responsevector is extreme. If the response vector is not extreme, a regular MLEis performed without any fence items. The purpose of this routine is toreduce the computation time spent in augmenting extra items.

In amaat() simulation, two types of ability estimatesare obtained after completing each module.

ThePhase-wise estimate is obtained by only usingthe response from the current module. For example, after completingPhase 2 Administration 1, which say consists of 20 items, the phase-wiseestimate is computed based on a 20-item response vector.
TheAdministration-wise estimate is obtained byusing the responses from both modules in the current administration. Forexample, after completing Phase 2 Administration 2 with 20 items and thePhase 1 Administration 2 with 15 items, the administration-wise estimateis computed based on all 35 items. This is always performed regardlessof thecombine_policy option inmaat(). Thecombine_policy option is implemented to determine whichtype of ability estimate to use for routing, after computing bothphase-wise and administration-wise ability estimates.

In each module (except for the very first), the initial estimate thatis in place before administering the first item, is the final routingestimate from the previous module. The initial estimates can be manuallyspecified for each examinee and for each module by supplying a list totheinitial_theta_list argument inmaat(). Thelist must be accessible usinginitial_theta_list[[module_index]][[examinee_id]]. In theexample assessment structure in this document,module_indexranges from 1 to 6. The value ofexaminee_id is a stringthat is used in theexaminee_list object.

5 Routing Policy

Transitioning between phases and between tests are governed by therules described in this section. These so-called transition rules aregenerally based on theta estimates (and confidence intervals) and thecut-scores defining the performance levels for each grade. There arealso restrictions that override the general rules. Two routing rules areimplemented in themaat package: Confidence IntervalApproach and Difficulty Percentile Approach.

5.1 Cut Scores

The cut scores for achievement levels must be defined to be able toperform routing between grades. For example, if there are fourachievement levels (e.g., Beginning, Developing, Proficient, andAdvanced), then three cut scores are necessary for each grade.

5.2 RoutingStructure

Routing is performed between modules and also between tests. Forexample, routing is performed between Fall Phase 1 and Fall Phase 2, andalso between Fall Phase 2 and Winter Phase 1. Because an examinee takes6 modules in total, routing is performed 5 times for the examineethroughout the entire assessment.

The routing structure is now described. Let$G$ denote the grade of record of anexaminee.

All examinees begin at grade$G$, the grade of record.
After completing a module, a routing theta is determined. Therouting theta is used to:
- Determine the achievement level using cut scores
- Perform routing depending on the achievement level
- Use as starting theta in the next module
Different types of$\theta$estimates are used for the routing theta.
- $\theta_{1}$ is an estimate basedon Phase 1 items only.
- $\theta_{2}$ is an estimate basedon Phase 2 items only.
- $\theta_{1+2}$ is an estimatebased on the combined set of Phase 1 and Phase 2 items.
In each administration, after completing Phase 1,$\theta_1$ is used as the routing theta forthe following module.
In each administration, after completing Phase 2, the abilityestimate from the currentadministration is used as the routingtheta. This is either$\theta_2$ or$\theta_{1+2}$ depending on thecombine policy:
- If combine policy isalways, then$\theta_{1+2}$ is used as the routingtheta;
- If combine policy isnever, then$\theta_2$ is used as the routingtheta;
- If combine policy isconditional:
  - If the examinee was in the same grade in Phase 1 and 2, then$\theta_{1+2}$ is used as the routingtheta;
  - Else, then$\theta_2$ is used asthe routing theta.
Using the routing theta and the cut scores, the achievement levelis determined. The achievement level is:
- Advanced
- Proficient-Developing
  - Inmaat package, no distinction is made betweenProficient and Developing.
- Beginning
Using the achievement level, routing is performed:
- Advanced: change grade by$+1$;
- Proficient-Developing: change grade by$0$;
- Beginning: change grade by$-1$.
There are four restrictions imposed by default on the routingrule.
- Restriction R0: Only allow routing within$[G - b_L, G + b_U]$, where$b_L$ and$b_U$ are lower and upper bounds. Thedefault lower bound is$b_L = 1$ andthe upper bound is$b_U = 2$;
- Restriction R1: If grade is$G - 1$ in Phase 2 of any administration,ignore achievement level and always change grade by$+1$;
- Restriction R2: If grade is$G$ in Phase 2 of any administration:
  - If achievement level is Beginning, change grade by$0$;
- Restriction R3: If grade is$G + k$ in Phase 2 of Administration$k$:
  - If achievement level is Advanced, change grade by$0$.
As a result of these restrictions, an examinee can be routed to$G - 1$ at a minimum and$G + 2$ at a maximum. For example, a$G = 4$ examinee can be potentially routedto grades 3-6. A$G = 4$ examinee cannever be routed to grades 7 and above or below 3 in any module.

5.3 Routing SturctureDiagram

The following diagrams visually summarize the permissible routingpaths between modules and tests. The paths highlighted in red are due tothe restrictions described above.

5.3.1 Test 1 to 2

Restrictions R1, R2, R3 are in place.

5.3.2 Test 2 to 3

Restrictions R0, R1, R2, R3 are in place. The routing path for Phase2 Grade$G+2$ for “Advanced”achievement level results from applying either R0 or R3.

5.3.3 Test 3

Restriction R0 is in place.

5.4 Confidence IntervalRouting

The examinee is routed based on the performance in each phase. Theperformance is quantified not as a point estimate of$\theta$, but as a confidence interval. Theconfidence interval approachEggen &Straetmans (2000) canbe used with MLE scoring(Yang et al., 2006) and can beeasily extended to multiple cut scores(Thompson, 2007).

In the confidence interval approach, the lower and upper bounds ofthe routing theta is computed as

\[\hat{\theta_{L}} = \hat{\theta} -z_{\alpha} \cdot SE(\theta),\] and

\[\hat{\theta_{U}} = \hat{\theta} +z_{\alpha} \cdot SE(\theta),\]

where$z_{\alpha}$ is the normaldeviate corresponding to a$1 -\alpha$ confidence interval,$\hat{\theta}$ is the routing theta, and$\hat{\theta_{L}}$ and$\hat{\theta_{U}}$ are lower and upperboundary theta values.

Once boundary values are calculated,$\hat{\theta_{L}}$ and$\hat{\theta_{U}}$ are used to identify theachievement level of the examinee:

If$\hat{\theta_{U}} <\tau_1$, where$\tau_1$ isthe first cut score that separates Beginning and Proficient-Developinglevels, the achievement level is determined as Beginning;
$\tau_2$, the second cut scorethat separates Developing and Proficient levels, is ignored;
If$\hat{\theta_{L}} >\tau_3$, where$\tau_3$ isthe third cut score that separates Proficient-Developing and Advancedlevels, the achievement level is determined as Advanced;
If neither holds, then the achievement level is determined asProficient-Developing.

5.5 Difficulty PercentileRouting

In difficulty percentile routing, prespecified cut scores areignored. Instead, cut scores are determined based on item difficultyparameters of the current item pool for the module.

$\tau_1$, the first cut score, iscalculated as$p$-th percentile valueof item difficulty parameters.
$\tau_2$, the second cut scorethat separates Developing and Proficient levels, is ignored.
$\tau_3$, the third cut score, iscalculated as$100-p$-th percentilevalue of item difficulty parameters.

Once cut scores are calculated, the routing theta$\hat{\theta}$ is used to identify theachievement level of the examinee as:

If$\hat{\theta} < \tau_1$,where$\tau_1$ is the first cut scorethat separates Beginning and Proficient-Developing levels, theachievement level is determined as Beginning;
$\tau_2$, the second cut scorethat separates Developing and Proficient levels, is ignored;
If$\hat{\theta} > \tau_3$,where$\tau_3$ is the third cut scorethat separates Proficient-Developing and Advanced levels, theachievement level is determined as Advanced;
If neither holds, then the achievement level is determined asProficient-Developing.

6 Using the package

This section explains how to use themaatpackage.

6.1 Create AssessmentStructure

The first step is to define an assessment structure using thecreateAssessmentStructure() function. In what follows, wespecify 3 tests with 2 phases in each test. Route limits are specifiedto 1 below and 2 above to match the assessment structure diagram shownabove. That is, for examinees in grade$G$, routing is limited to item poolsbetween$G-1$ and$G+2$.

assessment_structure <- createAssessmentStructure(  n_test  = 3,  n_phase = 2,  route_limit_below = 1,  route_limit_above = 2)

6.2 Create an examineelist

The next step is to create an examinee list usingsimExaminees(). An example is given below:

cor_v <- matrix(.8, 3, 3)diag(cor_v) <- 1set.seed(1)examinee_list <- simExaminees(  N = 10,  mean_v = c(0, 0.5, 1.0),  sd_v   = c(1, 1, 1),  cor_v  = cor_v,  assessment_structure = assessment_structure,  initial_grade = "G4",  initial_phase = "P1",  initial_test  = "T1")

For each examinee we simulate three true theta values, one for eachtest administration. In the example above, the true theta values aredrawn from a multivariate normal distribution, specified by avariance-covariance matrix with all covariances between thetas are setto$0.8$ and all variance to$1.0$.

Each argument ofsimExaminees() is defined asfollows:

N is the number of examinees to simulate.
mean_v is the mean theta to use in generating$\theta$ values. This must be a vector ofthree values, each element corresponding to each administration.
sd_v is the standard deviation to use in generating$\theta$ values. This must be avector of three values, each corresponding to each administration.
cor_v is the correlation structure to use in generating$\theta$ values. This must be a$3\times3$ matrix, each dimensioncorresponding to each administration.
assessment_structure is the assessment structure objectcreated previously usingcreateAssessmentStructure().
initial_grade is the grade of record to use for allexaminees. This must be in the formatG?, where? is a number.
initial_phase is the phase that all examinees areplaced in the beginning of the assessment. This must be in the formatP?, where? is a number.
initial_test is the administration that all examineesare placed in the beginning of the assessment. This must be in theformatT?, where? is a number.

6.3 Load ModuleSpecification Sheet

The next step is to load the module specification sheet usingloadModules(). Themaat package allows forusing different item pools and constraints across different stages oftesting. This requires a module specification sheet that contains whichitem pools and constraints are used for each grade, test, and phase. Anexample module specification sheet is displayed below:

Grade	Test	Phase	Module	Constraints	ItemPool	ItemAttrib	PassageAttrib
G3		P1	ModuleMATH3P1N	extdata/constraints_MATH3_P1.csv	extdata/pool_MATH_normal_N500_Grade03.csv	extdata/item_attrib_MATH_normal_N500_Grade03.csv	NA
G3		P2	ModuleMATH3P2N	extdata/constraints_MATH3_P2.csv	extdata/pool_MATH_normal_N500_Grade03.csv	extdata/item_attrib_MATH_normal_N500_Grade03.csv	NA
G4	T1	P1	ModuleMATH4T1P1N	extdata/constraints_MATH4_T1P1.csv	extdata/pool_MATH_normal_N500_Grade04.csv	extdata/item_attrib_MATH_normal_N500_Grade04.csv	NA
G4	T1	P2	ModuleMATH4T1P2N	extdata/constraints_MATH4_T1P2.csv	extdata/pool_MATH_normal_N500_Grade04.csv	extdata/item_attrib_MATH_normal_N500_Grade04.csv	NA
G4	T2	P1	ModuleMATH4T2P1N	extdata/constraints_MATH4_T2P1.csv	extdata/pool_MATH_normal_N500_Grade04.csv	extdata/item_attrib_MATH_normal_N500_Grade04.csv	NA
G4	T2	P2	ModuleMATH4T2P2N	extdata/constraints_MATH4_T2P2.csv	extdata/pool_MATH_normal_N500_Grade04.csv	extdata/item_attrib_MATH_normal_N500_Grade04.csv	NA
G4	T3	P1	ModuleMATH4T3P1N	extdata/constraints_MATH4_T3P1.csv	extdata/pool_MATH_normal_N500_Grade04.csv	extdata/item_attrib_MATH_normal_N500_Grade04.csv	NA
G4	T3	P2	ModuleMATH4T3P2N	extdata/constraints_MATH4_T3P2.csv	extdata/pool_MATH_normal_N500_Grade04.csv	extdata/item_attrib_MATH_normal_N500_Grade04.csv	NA
G5		P1	ModuleMATH5P1N	extdata/constraints_MATH5_P1.csv	extdata/pool_MATH_normal_N500_Grade05.csv	extdata/item_attrib_MATH_normal_N500_Grade05.csv	NA
G5		P2	ModuleMATH5P2N	extdata/constraints_MATH5_P2.csv	extdata/pool_MATH_normal_N500_Grade05.csv	extdata/item_attrib_MATH_normal_N500_Grade05.csv	NA
G6		P1	ModuleMATH6P1N	extdata/constraints_MATH6_P1.csv	extdata/pool_MATH_normal_N500_Grade06.csv	extdata/item_attrib_MATH_normal_N500_Grade06.csv	NA
G6		P2	ModuleMATH6P2N	extdata/constraints_MATH6_P2.csv	extdata/pool_MATH_normal_N500_Grade06.csv	extdata/item_attrib_MATH_normal_N500_Grade06.csv	NA
G7		P1	ModuleMATH7P1N	extdata/constraints_MATH7_P1.csv	extdata/pool_MATH_normal_N500_Grade07.csv	extdata/item_attrib_MATH_normal_N500_Grade07.csv	NA
G7		P2	ModuleMATH7P2N	extdata/constraints_MATH7_P2.csv	extdata/pool_MATH_normal_N500_Grade07.csv	extdata/item_attrib_MATH_normal_N500_Grade07.csv	NA
G8		P1	ModuleMATH8P1N	extdata/constraints_MATH8_P1.csv	extdata/pool_MATH_normal_N500_Grade08.csv	extdata/item_attrib_MATH_normal_N500_Grade08.csv	NA
G8		P2	ModuleMATH8P2N	extdata/constraints_MATH8_P2.csv	extdata/pool_MATH_normal_N500_Grade08.csv	extdata/item_attrib_MATH_normal_N500_Grade08.csv	NA

The sheet must have seven columns.

Grade The grade level. This must be in the form ofG?, where? is a number.
Test The test level. This must be in the form ofT?, where? is a number.
Phase The phase level. This must be in the form ofP?, where? is a number.
Module The module ID string.
Constraints The file path of constraints data. Thismust be readable byloadConstraints() in theTestDesign package.
ItemPool The file path of item pool data. This mustbe readable byloadItemPool() inTestDesignpackage.
ItemAttrib The file path of item attributes data.This must be readable byloadItemAttrib() in theTestDesign package.
PassageAttrib (Optional) The file path of passageattributes data. This must be readable byloadStAttrib() intheTestDesign package.

Load the module specification sheet usingloadModules().

fn <- system.file("extdata", "module_definition_MATH_normal_N500_flexible.csv", package = "maat")module_list <- loadModules(  fn = fn,  base_path = system.file(package = "maat"),  assessment_structure = assessment_structure,  examinee_list = examinee_list)

## Required modules: 24## Using base path: /private/var/folders/dn/2ccsybvs25ncgrj3b4qd77sh0000gq/T/Rtmpu5tagi/Rinst74e67d2f11c/maat## Loading 24 modules## Grade G3 Test T1 Phase P1 : Module ModuleMATH3P1N## Grade G3 Test T1 Phase P2 : Module ModuleMATH3P2N## Grade G3 Test T2 Phase P1 : Module ModuleMATH3P1N## Grade G3 Test T2 Phase P2 : Module ModuleMATH3P2N## Grade G3 Test T3 Phase P1 : Module ModuleMATH3P1N## Grade G3 Test T3 Phase P2 : Module ModuleMATH3P2N## Grade G4 Test T1 Phase P1 : Module ModuleMATH4T1P1N## Grade G4 Test T1 Phase P2 : Module ModuleMATH4T1P2N## Grade G4 Test T2 Phase P1 : Module ModuleMATH4T2P1N## Grade G4 Test T2 Phase P2 : Module ModuleMATH4T2P2N## Grade G4 Test T3 Phase P1 : Module ModuleMATH4T3P1N## Grade G4 Test T3 Phase P2 : Module ModuleMATH4T3P2N## Grade G5 Test T1 Phase P1 : Module ModuleMATH5P1N## Grade G5 Test T1 Phase P2 : Module ModuleMATH5P2N## Grade G5 Test T2 Phase P1 : Module ModuleMATH5P1N## Grade G5 Test T2 Phase P2 : Module ModuleMATH5P2N## Grade G5 Test T3 Phase P1 : Module ModuleMATH5P1N## Grade G5 Test T3 Phase P2 : Module ModuleMATH5P2N## Grade G6 Test T1 Phase P1 : Module ModuleMATH6P1N## Grade G6 Test T1 Phase P2 : Module ModuleMATH6P2N## Grade G6 Test T2 Phase P1 : Module ModuleMATH6P1N## Grade G6 Test T2 Phase P2 : Module ModuleMATH6P2N## Grade G6 Test T3 Phase P1 : Module ModuleMATH6P1N## Grade G6 Test T3 Phase P2 : Module ModuleMATH6P2N

fn: The file path of the module specificationsheet.
examinee_list: The examinee list object created aboveusingsimExaminees(). This is used to determine requiredmodules.
base_path: The value of this argument is pasted to thebeginning of file paths in the sheet. In the above example, ifbase_path isinst then the function willattempt to readinst/extdata/constraints_MATH8_P2.csv.

6.4 Load Cut Scores

Cut scores must be stored in alist object. For eachgrade, at least two cut scores must exist. When the number of cut scoresfor a single grade is more than two, only the first and the last entryis used. An example is given below:

cut_scores <- list(  G3 = c(-1.47, -0.55, 0.48),  G4 = c(-1.07, -0.15, 0.88),  G5 = c(-0.67,  0.25, 1.28),  G6 = c(-0.27,  0.65, 1.68),  G7 = c( 0.13,  1.05, 2.08),  G8 = c( 0.53,  1.45, 2.48))

6.5 Create configobject

The next step is to create a config object usingcreateShadowTestConfig() in theTestDesignpackage to set various shadow test configuration options. For example,the final theta estimation method infinal_theta$method canbe set toEAP,MLE orMLEF.

The exclude policy inexclude_policy$method must be settoSOFT to use the Big M method discussed in the OverlapControl section above. The value inexclude_policy$M is thepenalty value.

library(TestDesign)config <- createShadowTestConfig(  interim_theta = list(method = "MLE"),  final_theta   = list(method = "MLE"),  exclude_policy = list(method = "SOFT", M = 100))

Alternatively, a list of config objects can be used to use separateconfigs for each module. The list must have a length of 6 as per theassessment structure outlined above.

config_list <- list()config_list[[1]] <- createShadowTestConfig(exclude_policy = list(method = "SOFT", M = 100))config_list[[2]] <- createShadowTestConfig(exclude_policy = list(method = "SOFT", M = 100))config_list[[3]] <- createShadowTestConfig(exclude_policy = list(method = "SOFT", M = 100))config_list[[4]] <- createShadowTestConfig(exclude_policy = list(method = "SOFT", M = 100))config_list[[5]] <- createShadowTestConfig(exclude_policy = list(method = "SOFT", M = 100))config_list[[6]] <- createShadowTestConfig(exclude_policy = list(method = "SOFT", M = 100))

6.6 Run the MainSimulation

The final step is to run the main simulation usingmaat().

An illustration of the simulation for two of the transitionpolicies:CI andpool_difficulty_percentile:

set.seed(1)maat_output_CI <- maat(  examinee_list          = examinee_list,  assessment_structure   = assessment_structure,  module_list            = module_list,  config                 = config,  cut_scores             = cut_scores,  overlap_control_policy = "within_test",  transition_policy      = "CI",  combine_policy         = "conditional",  transition_CI_alpha    = 0.05)set.seed(1)maat_output_difficulty <- maat(  examinee_list          = examinee_list,  assessment_structure   = assessment_structure,  module_list            = module_list,  config                 = config_list,  cut_scores             = cut_scores,  overlap_control_policy = "within_test",  transition_policy      = "pool_difficulty_percentile",  combine_policy         = "conditional",  transition_CI_alpha         = 0.05,  transition_percentile_lower = 0.05,  transition_percentile_upper = 0.95)

examinee_list is the examinee list object createdabove.
module_list is the module list object createdabove.
config is the shared config object created above. Alsocan be a list of config objects to use separate configurations for eachmodule.
cut_scores is the cut scores list object createdabove.
overlap_control_policy specifies the type of overlapcontrol.
- all performs overlap control across administrations.This forbids an item to be given more than once within and across testadministrations.
- within_test performs overlap control within each testadministration. This forbids an item to be given more than once withineach administration but allows an item to be given more than once acrossadministrations.
- none does not perform overlap control. This allows anitem to be given more than once within each administration (betweenphases) and across administrations.
transition_policy specifies the type of item pooltransition policy.
- CI uses confidence intervals on theta estimates toperform routing between modules or tests.
- pool_difficulty_percentile uses item difficultypercentiles of all items in the current item pool to performrouting.
- pool_difficulty_percentile_exclude_administered usesitem difficulty percentiles of all items in the current item pool,excluding items administered to the examinee, to perform routing.
- on_grade makes all examinees to remain in the item poolfor the grade level of record.
combine_policy specifies which type of theta is used toperform routing. This is only utilized at the end of eachadministration.
- conditional uses the combined theta estimate (obtainedfrom combining Phase 1 and Phase 2 responses) as the routing theta, ifthe examinee was in the same grade item pool in Phase 1 and Phase 2. Ifthe examinee was in different item pools in Phase 1 and 2, then Phase 2estimate is used as the routing theta.
- always uses the combined theta estimate as the routingtheta.
- never uses the Phase 2 estimate as the routingtheta.
transition_CI_alpha is the alpha level to use inconjunction with the CI-based transition policy.
transition_percentile_lower is the percentile value touse in conjunction with the difficulty-percentile-based transitionpolicy.
transition_percentile_upper is the percentile value touse in conjunction with the difficulty-percentile-based transitionpolicy.

6.7 Plot the ModuleRoutes

Theplot(type = "route") function can be used to plotthe number of examinees routed to each module. The function accepts aoutput_maat object produced bymaat().

6.7.1 Route Diagram forthe`CI` Transition Policy

plot(maat_output_CI, type = "route")

6.7.2 Route Diagram forthe`pool_difficulty_percentile` Transition Policy

plot(maat_output_difficulty, type = "route")

6.8 Scatterplot

Theplot(type = "correlation") function can be used toplot true$\theta$s and estimated$\theta$s across testadministrations.
The arguments ofplot() for this use case are:
- x is the output object returned bymaat();
- theta_range is the$\theta$ range to be used forplotting;
- main is a vector of plot titles.

6.8.1 Scatterplot for the`CI` Transition Policy

plot(  x           = maat_output_CI,  type        = "correlation",  theta_range = c(-4, 4),  main        = c("Fall", "Winter", "Spring"))

6.8.2 Scatterplot for the`pool_difficulty_percentile` Transition Policy

plot(  x           = maat_output_difficulty,  type        = "correlation",  theta_range = c(-4, 4),  main        = c("Fall", "Winter", "Spring"))

6.9 Audit plot

Theplot(type = "audit") function can also be used toplot interim$\theta$ estimates overmodules for a single examinee.
The arguments ofplot() for this use case are:
- x is the output object returned bymaat().
- examinee_id is the examinee ID to plot.

plot(  x = maat_output_CI,  type = "audit",  examinee_id = 1)

References

Choi, S. W., Lim, S., Niu, L., Lee, S., Schneider, M. C., Lee, J., &Gianopulos, G. J. (2022).maat: AnR package for multiple administrations adaptive testing.Applied Psychological Measurement,46(7), 73–74.https://doi.org/10.1177/01466216211049212

Choi, S. W., Lim, S., & van der Linden, W. J. (2021).TestDesign: An optimal test design approach to constructingfixed and adaptive tests inR.Behaviormetrika.https://doi.org/10.1007/s41237-021-00145-9

Choi, S. W., & van der Linden, W. J. (2018). Ensuring contentvalidity of patient-reported outcomes: A shadow-test approach to theiradaptive measurement.Quality of Life Research,27(7),1683–1693.https://doi.org/10.1007/s11136-017-1650-1

Eggen, T. J. H. M., & Straetmans, G. J. J. M. (2000). Computerizedadaptive testing for classifying examinees into three categories.Educational and Psychological Measurement,60(5),713–734.https://doi.org/10.1177/00131640021970862

Kingsbury, G. G., & Weiss, D. J. (1983). A comparison ofIRT-based adaptive mastery testing and a sequential masterytesting procedure. In D. J. Weiss (Ed.),New horizons intesting (pp. 257–283). Academic Press.https://doi.org/10.1016/B978-0-12-742780-5.50024-X

Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposurecontrol methods in computerized adaptive testing.Journal ofEducational Measurement,35(4), 311–327.https://doi.org/10.1111/j.1745-3984.1998.tb00541.x

Thompson, N. (2007). A practitioner’s guide for variable-lengthcomputerized classification testing.Practical Assessment, Research,and Evaluation,12(1).https://doi.org/10.7275/fq3r-zz60

van der Linden, W. J. (2005).Linear models for optimal testdesign. Springer.https://doi.org/10.1007/0-387-29054-0

van der Linden, W. J., & Choi, S. W. (2019). Improving item-exposurecontrol in adaptive testing.Journal of EducationalMeasurement,57(3), 405–422.https://doi.org/10.1111/jedm.12254

van der Linden, W. J., & Reese, L. M. (1998). A model for optimalconstrained adaptive testing.Applied PsychologicalMeasurement,22(3), 259–270.https://doi.org/10.1177/01466216980223006

Yang, X., Poggio, J. C., & Glasnapp, D. R. (2006). Effects ofestimation bias on multiple-category classification with anIRT-based adaptive classification procedure.Educational and Psychological Measurement,66(4),545–564.https://doi.org/10.1177/0013164405284031

Movatterモバイル変換