- Home
- Faculty Resources
- Technical Manual
- Test Operations
Test Operations
Table of Contents
Test Form Construction
Test Composition
Standard 4.12 Test developers should document the extent to which the content domain of a test represents the domain defined in the test specifications.
Standard 4.7 The procedures used to develop, review, and try out items and to select items from the item pool should be documented. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
Each test form is created to match specifications regarding the number and type of items (multiple-choice, constructed-response) within content subareas. Each test form includes a percentage of items designated as non-scorable, so that data can be collected on their psychometric characteristics before they contribute to examinee scores. These items are made scorable on future test forms only after an item analysis of responses is conducted and the items are found to meet acceptable psychometric criteria. Information regarding test composition is available to examinees and others in the Test Objectives. Below is a sample of test composition information for a Mathematics test.
Subarea | Range of Objectives | Approximate Percentage of Questions on Test | |
---|---|---|---|
I | Mathematical Processes and Number Concepts | 001–004 | 22% |
II | Patterns, Algebraic Relationships, and Functions | 005–009 | 28% |
III | Measurement and Geometry | 010–013 | 22% |
IV | Data Analysis, Statistics, Probability, and Discrete Mathematics | 014–018 | 28% |
Relationships Among Test Forms
One or more forms of each test are administered at a time. New forms are typically constructed after a sufficient number of examinees have responded to the items on a previous test form and the previous test form can be used for equating. Typically, the scorable multiple-choice items in the new form are embedded in the previous form, either as scorable or non-scorable items. The creation of new forms continues until a pool of forms has been created for rotation. In order to establish continuity and consistency across different forms of a test, the following relationships among test forms are maintained as subsequent test forms are constructed:
- Content relationships. Each form of the test is constructed to be comparable to previous test forms with respect to content coverage. This is accomplished by selecting items for the test according to the proportions provided in the test objectives for the field.
- Statistical relationships. For fields for which performance data are available (most fields), new test forms are constructed to be comparable to the previous test form in estimated overall test difficulty. The overall test form difficulty is determined by averaging item p-values (percent of examinees answering the item correctly) for the scorable items on the form, obtained from operational administrations when possible, or from field test administrations before operational p-values are available. In addition, the test results for each form are statistically equated to those of the previous form to enable comparability of passing decisions across administrations. See Test Equating for further information.
- Relationships among constructed-response items. Sets of constructed-response items are designed to be comparable to one another. For example, if a test has two constructed-response items, the items in the bank for Constructed-Response Item Type #1 are designed to be comparable in regard to the difficulty level and the performance characteristics measured. The same is true for Constructed-response Item Type #2. Comparability of the constructed-response items across test forms is established through several activities, including preparation of item specifications for creating multiple items of a type, field testing of items, establishing marker responses that exemplify the score points, and the training and calibration of scorers. See Establishing Comparability of Constructed-response Items for further information.
Test Administration
Standard 6.1 Test administrators should follow carefully the standardized procedures for administration and scoring specified by the test developer and any instructions from the test user.
Standard 6.3 Changes or disruptions to standardized test administration procedures or scoring should be documented and reported to the test user.
Standard 6.4 The testing environment should furnish reasonable comfort with minimal distractions to avoid construct-irrelevant variance. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
As of July 2018, the MTTC tests are administered under standardized, controlled procedures at Pearson-authorized computer-based test centers. Prior to July 2018, all MTTC tests were also available at paper-based test sites.
Test administrations are designed to provide a secure, controlled testing environment with minimal distractions so as to minimize the possibility of irrelevant characteristics affecting examinees' scores. Test takers are monitored continuously throughout the test administration. Test sites adhere to guidelines relating to test security, accessibility, lighting, workspace, comfort, and quiet surroundings. Test administrators follow documented, standardized procedures for test administration. Procedures are in place for the documentation, review, and resolution of any deviations from standard administration procedures.
Test Security
Standard 6.6 Reasonable efforts should be made to ensure the integrity of test scores by eliminating opportunities for test takers to attain scores by fraudulent or deceptive means. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
Examinee security. In order to prevent opportunities for test taking fraud, examinees are required to present original (no photocopies) and valid (unexpired) government issued identification bearing their photograph and signature. Additional biometrics may be collected at the test center including a digital signature and palm vein scan. Before testing, examinees agree to policies regarding prohibited materials and activities, which are strictly enforced at test administrations.
Test form security. In order to minimize the possibility of breaches in test security and to help ensure the integrity of examinee test scores, a number of guidelines, as indicated below, are followed related to the construction and administration of MTTC test forms. These guidelines are followed as allowed by sufficient examinee numbers; exceptions may be made for test fields taken by few examinees.
- Multiple-choice items are re-ordered within a subarea from one examinee to another when administered by computer so that examinees taking the same test form within a testing period receive the items in a different order. Additionally, an examinee assigned the same test form in different testing periods receives the multiple-choice items in a different order.
- Multiple test forms are administered within a testing period for most fields. Additionally, for most but not all fields, there are multiple sets of constructed-response items that are assigned to examinees within a testing period separate from the multiple-choice test form assigned. Therefore, different examinees typically receive different test items within a testing period. Whether a single constructed-response item or multiple constructed-response items are available in a given testing window is dependent mainly upon the number of test takers for the test field.
- Test forms and constructed-response items for most fields are typically changed from one testing period to another so that most examinees who retest in an adjacent testing period receive different sets of test items.
Information to Test Takers
Standard 8.2 Test takers should be provided in advance with as much information about the test, the testing process, the intended test use, test scoring criteria, testing policy, availability of accommodations, and confidentiality protection as is consistent with obtaining valid responses and making appropriate interpretations of test scores.
Standard 4.16 The instructions presented to test takers should contain sufficient detail so that test takers can respond to a task in the manner that the test developer intended. When appropriate, sample materials, practice or sample questions, criteria for scoring, and a representative item identified with each item format or major area in the test's classification or domain should be provided to the test takers prior to the administration of the test, or should be included in the testing material as part of the standard administration instructions.
Standard 6.5 Test takers should be provided appropriate instructions, practice, and other support necessary to reduce construct-irrelevant variance. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
The MTTC website provides program and test information to test takers, educator preparation programs, and the public. It provides information and guidance useful to examinees before, during, and after testing, including the following:
- Tests needed to meet certification requirements
- Characteristics of the tests
- Test composition and length
- Testing policies and rules
- Day of the test information
- Alternative testing arrangements
- Study guides
- Practice tests
- Test preparation video
- Computer-based testing tutorial
- A guided tour of a computer testing center
- Test scoring and determination of passing status
- Score report explanation
- Score reporting policies
- Faculty guide and other assessment information for educator preparation program providers
A "Contact Us" page includes email, phone, live chat, and mail information, as well as a secure document uploader for examinees who may have questions or need to submit documentation.
Testing Accommodations
Standard 3.9 Test developers and/or test users are responsible for developing and providing test accommodations, when appropriate and feasible, to remove construct-irrelevant barriers that otherwise would interfere with examinees' ability to demonstrate their standing on the target constructs.
Standard 6.2 When formal procedures have been established for requesting and receiving accommodations, test takers should be informed of these procedures in advance of testing. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
Alternative testing arrangements for the MTTC tests are available upon request to examinees who provide appropriate documentation of a disability. The alternative arrangements are designed to provide accommodations for the test and/or administration conditions to remove construct-irrelevant barriers and enable the accurate assessment of the knowledge and skills that are being measured. Construct-irrelevant barriers include those obstacles to accessibility (e.g., text size) that impede an examinee from demonstrating his or her ability on the constructs the test is intended to measure. Examinees are accommodated on a case-by-case basis according to the alternative arrangement(s) needed and are not restricted to a pre-determined list.
Accommodations are requested, reviewed, and provided according to standardized procedures, as described on the MTTC website. Examinees who are granted accommodations are notified in writing of the alternative arrangements. Test administrators are notified of the accommodations and provided with instructions regarding any changes to testing procedures.
Scoring
Standard 6.8 Those responsible for test scoring should establish scoring protocols. Test scoring that involves human judgment should include rubrics, procedures, and criteria for scoring. When scoring of complex responses is done by computer, the accuracy of the algorithm and processes should be documented.
Standard 6.9 Those responsible for test scoring should establish and document quality control processes and criteria. Adequate training should be provided. The quality of scoring should be monitored and documented. Any systematic source of scoring errors should be documented and corrected. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
Multiple-Choice Item Scoring
Responses to the multiple-choice items for the MTTC tests are recorded and scored at each authorized computer-based testing center and sent electronically to Evaluation Systems, where all responses are rescored for verification. A single point is awarded for each correct response and no points are awarded for an incorrect response. The raw score for the multiple-choice section is the total number of multiple-choice items answered correctly. The raw score is transformed to a scale ranging from 100 to 300.
Constructed-Response Item Scoring
Standard 4.20 The process for selecting, training, qualifying, and monitoring scorers should be specified by the test developer. The training materials, such as the scoring rubrics and examples of test takers' responses that illustrate the levels on the rubric score scale, and the procedures for training scorers should result in a degree of accuracy and agreement among scorers that allows the scores to be interpreted as originally intended by the test developer. Specifications should also describe processes for assessing scorer consistency and potential drift over time in raters' scoring. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
The World Language tests of the MTTC program each include two or more constructed-response items (i.e., performance assignments). The constructed-response items are scored under secure conditions, typically in scoring sessions conducted after designated administration periods. Scorers are unaware of the identity of the individuals whose responses they score. Constructed-response items are scored using a focused holistic scoring process. For higher-incidence fields, examinee responses are scored independently by two scorers according to a scoring scale, with additional scoring by a third scorer or a Chief Reader as needed. For lower-incidence fields, scorers first independently score a response, then reach consensus as a group on the assigned score.
Focused holistic scoring. In focused holistic scoring, scorers judge the overall effectiveness of each response using a set of performance characteristics that have been defined as important aspects of a quality response (e.g., full development, coherent flow of language). The score is holistic in that each score is based on the overall effectiveness of these characteristics working together, focusing on the response as a whole. Scorers use an approved, standardized scoring scale (based on the performance characteristics) and approved marker responses exemplifying the points on the scoring scale to assign scores to examinee responses. The performance characteristics and scoring scale are available to examinees and others in the study guides on the MTTC program website.
Examinee responses are scored on a scale with a low of "1" and a high of "4" (with a separate code for blank or unscorable responses, such as responses that are not written or spoken in the required language, or that are completely off topic). When examinee responses are independently scored by two scorers, their two scores are summed for a total possible score range of 2 to 8 for each constructed-response item. Scores for a response that differ by more than 1 point are considered discrepant and are resolved by further readings, typically either by a third scorer or by the Chief Reader.
Scorer selection criteria. Scorers for the constructed-response items are selected based on established qualification criteria for the MTTC program. Currently the only tests containing constructed-response items are the World Language tests. Typically scorers for the World Language tests have the following qualifications:
-
Content expertise and/or a teaching certificate in the target language AND teaching experience at the secondary or college level;
OR
-
College degree in the target language AND teaching experience at the secondary or college level;
OR
- Fluency in the target language and a college degree.
Individuals are eligible to continue serving as scorers if they have participated successfully as a scorer at a previous scoring session and continue to participate in professional development, including scoring activities.
Scorer training, calibration, and monitoring. Before being allowed to score, scorers must successfully complete training and calibration activities. Scorers receive a scoring manual and are oriented to the task, and they practice scoring training responses to which scores have already been assigned. Scorers must demonstrate accuracy on scoring the responses before proceeding to score operational responses. Scorer performance is monitored throughout scoring sessions through the use of scorer performance reports, which are provided to scorers and supervisory personnel. At points in the scoring process, scorers are recalibrated to the scoring scale, typically through discussions of specific responses. Analyses of scores given by scorers to constructed-response items are generated and reviewed, including comparisons to previous administrations, in order to monitor the accuracy and consistency of scoring over time.
Test Equating
Standard 5.13 When claims of form-to-form equivalence are based on equating procedures, detailed technical information should be provided on the method by which equating functions were established and on the accuracy of the equating functions.
Standard 5.12 A clear rationale and supporting evidence should be provided for any claim that scale scores earned on alternate forms of a test may be used interchangeably. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
Purpose of equating. The central purpose of statistically equating the MTTC tests is to compensate statistically for possible variability from one test form to another that may affect examinees' scores (e.g., differences in the overall difficulty of a new test form compared to a previous test form). Each form of a test is constructed to be as comparable as possible to a previous test form in estimated overall test difficulty. See Relationships Among Test Forms for further information about the construction of test forms. Statistical equating is conducted as an additional step to enable comparability of passing decisions across administrations.
Statistical equating methods adjust an examinee's scaled score for the relative difficulty of the particular test form that was taken. Thus, differences in scores across test forms can be attributed to differences in knowledge or skills, and not differences in the tests.
Equating design. A single-group equating design is utilized for the computer-administered MTTC tests. In a single-group design, one group of examinees takes two alternative forms of the test and these forms are then statistically equated. Typically, a new form is created by selecting the set of scorable multiple-choice items from within the sets of scorable and non-scorable items on the previous form. Because the new scorable set of items is embedded within the previous test form, statistical equating can compare examinee performance on the previous form with what their performance would have been on the new test form (i.e., the new set of scorable items), and an equated passing score can be determined. This pre-equating methodology allows the passing score to be determined before administration of a new test form, eliminating the need to gather performance data on the new form from a sufficient number of examinees before their scores can be released.
Equating method. A linear pre-equating method is used within a classical test theory framework. In linear equating, two scores are equivalent if they are the same number of standard deviation units above or below the mean for some group of examinees (Angoff, 1984). A linear equation is used to equate the cutscores on the two forms by setting their standard deviation scores, or z-scores, to be equal (Kolen & Brennan, 2004).
Only multiple-choice items are included in the statistical equating of the MTTC tests. All of the linking items appear on a previous form as either scorable or non-scorable items and as scorable items on the new form. Response data solely from the group of test takers who took the previous form are used to compute both the means and standard deviations of the scorable items on the previous form and the scorable items on the new form. The z-scores for the two sets of scorable items are set to be equal. The raw score on the new scorable set of items that corresponds to a particular raw score (the cutscore) on the previous set of scorable items is calculated to establish the raw cutscore on the new form. See Formula for Equating for further information.
Establishing comparability of constructed-response items. The constructed-response items for the MTTC World Language Tests are "calibrated" prior to administration. In addition, the following methods are typically used for MTTC tests to establish the comparability of constructed-response items from test form to test form:
- Scoring scales. For each type of constructed-response item, an approved, standardized scoring scale (with an associated set of performance characteristics) is used to assign scores to examinee responses. The scoring scale provides a written, standardized description of the "typical" response at each score point. The same scoring scale is used to score responses to the constructed-response items of a particular type across test administrations and across different test forms. The use of a standardized scoring scale helps ensure the comparability of scores assigned to different individual constructed-response items within each item type.
- Marker responses. Based on the score-point descriptions in the scoring scale, a set of responses is established for each constructed-response item to serve as exemplars of each point on the scale. These marker responses are used in the training and calibration of scorers to help ensure that the standardized meaning of the approved scoring scale is applied accurately and consistently to examinee responses.
Scaled Scores
Standard 5.2 The procedures for constructing scales used for reporting scores and the rationale for these procedures should be described clearly.
Standard 4.23 When a test score is derived from the differential weighting of items or subscores, the test developer should document the rationale and process used to develop, review, and assign item weights. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
The scores that are reported on the MTTC tests are "scaled" scores. Examinee scores are converted mathematically to a scale with a lower limit of 100, a passing score of 220, and an upper limit of 300. In this process, raw examinee scores as well as raw passing scores for the tests are converted to scaled scores. The use of scaled scores supports the communication of MTTC program results in the following ways:
- Examinees, educator preparation programs, the MDE, and other stakeholders are able to interpret scores from different tests in a similar manner, regardless of test taken.
- The meaning of the scaled passing scores is consistent over time, making it possible to compare performance from one administration (or one year) to the next.
Computation of scaled scores. Most MTTC tests consist of a set of multiple-choice items. The World Language tests consist of two sections: a multiple-choice item section and a constructed-response item section. For tests with two sections (i.e., the World Language tests), scaled scores are computed separately for each section and then combined to determine the total test scaled score, according to the weights specified for each section (e.g., 80% for the multiple-choice section, 20% for the constructed-response section).
With this method, an examinee who answers all questions correctly on the multiple-choice section or receives all possible points on the constructed-response section receives a scaled score of 300 for that section. An examinee who answers correctly the number of multiple-choice items equal to the just acceptable multiple-choice score or receives the just-acceptable constructed-response score will receive a scaled score of 220 for that section. See Formula for Determining Section Scores for further information.
Combining scaled scores of test sections. Some of the World Language tests (Spanish, French, German, and Latin) each contain a multiple-choice section and a constructed-response section comprised of two constructed-response items. For tests with two sections, the examinee's scaled section scores are combined based on the section weights that are approved for the test and communicated in the test objectives posted on the MTTC program website. For the Spanish, French, German, and Latin tests, a weight of 80% is assigned for the multiple-choice section and 20% for the constructed-response section; an examinee’s scaled scores for the two sections will be weighted accordingly and combined to determine a total test scaled score.
The remaining World Language tests (Chinese [Mandarin], Arabic [Modern Standard], Russian, and Japanese) each contain a multiple-choice section and eight constructed-response items comprising four sections. As with the Spanish, French, German and Latin tests, the scaled scores of the multiple-choice section and each constructed-response item section are calculated and then combined, using weights (i.e., the percent of the total test score that is based on the test component), to produce a total test scaled score. The multiple-choice section accounts for 35% of the total test score, the writing constructed-response item section accounts for 20% of the total test score, and each of the remaining three constructed-response item sections account for 15% of the total test score.
See Formula for Combining Scaled Scores of Test Sections for further information and examples.
A candidate passes a test if the rounded total test scaled score is equal to or greater than 220.
Score Reporting
Standard 6.10 When test score information is released, those responsible for testing programs should provide interpretations appropriate to the audience. The interpretations should describe in simple language what the test covers, what scores represent, the precision/reliability of the scores, and how scores are intended to be used.
Standard 6.16 Transmission of individually identified test scores to authorized individuals or institutions should be done in a manner that protects the confidential nature of the scores and pertinent ancillary information. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
Examinee test results are released to the examinee, the Michigan Department of Education (MDE), the educator preparation program with which the examinee indicated affiliation at the time of registration, and up to two other educator preparation programs for which explicit permission has been given by the examinee. Prior to each score reporting period, Michigan institutions of higher education are given the opportunity to verify that examinees who report an affiliation with their schools are eligible for inclusion on the post-administration performance summaries for their institutions. The results are provided in accordance with predetermined data formats, security procedures, and schedules. Policies regarding the use of candidate information, including full or abbreviated social security numbers, and candidate privacy measures are communicated on the MTTC program website and are acknowledged by candidates during registration. Interpretive information is provided to each audience with each transmission. A Score Report Explanation is provided to examinees and is also made available on the MTTC program website. It includes the following information:
- An overview
- How to interpret the Total Score
- How to interpret Subarea Performance Information
- Performance on Subareas with Multiple-Choice Items
- Performance on Constructed-Response Item Sections
Test Quality Reviews
Test quality reviews are conducted on a regular basis for the MTTC tests to monitor the psychometric properties of the tests and their items. These include statistical analyses of test items and test forms conducted on a periodic basis and Differential Item Functioning (DIF) analyses conducted annually, beginning with the 2014–2015 program year.
Item Analysis
Standard 4.10 When a test developer evaluates the psychometric properties of items, the model used for that purpose (e.g., classical test theory, item response theory, or another mode) should be documented. The sample used for estimating item properties should be described and should be of adequate size and diversity for the procedure. The process by which items are screened and the data used for screening, such as item difficulty, item discrimination, or differential item functioning (DIF) for major examinee groups, should also be documented. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
On a periodic basis, an item analysis of responses to multiple-choice items is conducted as a measure of quality assurance to assess the performance of the item. The item analysis identifies items for review based on the following item statistics:
- The percent of the examinees who answered the item correctly is less than 30 (i.e., fewer than 30 percent of examinees selected the response keyed as the correct response) (N ≥ 5)
- Nonmodal correct response (i.e., the response chosen by the greatest number of examinees is not the response keyed as the correct response) (N ≥ 5)
- Item-to-test point-biserial correlation coefficient is less than 0.10 (if the percent of examinees who selected the correct response is less than 50) (N ≥ 25); or
- The percent of examinees who answered the item correctly for the most recent period decreased at least 20 points from the percent of examinees who answered the item correctly for all administrations of the item (N ≥ 25 for the most recent period, N ≥ 50 for all administrations)
Should the item analysis indicate an item warrants review, the review may include
- confirmation that the wording of the item on the test form is the same as the wording of the item as approved by the CAC,
- a check of content and correct answer with documentary sources, and/or
- review by a content expert.
Based on the results of the review, items may be deleted, revised, or retained.
Test Quality Assurance Review
On an annual basis, a test quality assurance review of test form and test item statistics is conducted by Evaluation Systems psychometric and program staff for the purpose of monitoring the psychometric properties of the tests. Statistical analyses are generated for each test field regarding the following:
- Pass rates for the most recent 3 years
- Test reliability (KR20)
- Percent of items with p-values ≥ 95
- Percent of items with p-values less than 30
- Percent of items with item-to-test point-biserial correlation less than 0.10
- Percent of items with no response ≥ 5 percent
The statistical analyses are examined to detect possible test quality issues, such as test difficulty, reliability, or speededness. Any issues raised by the test quality assurance review are followed up on (e.g., by removing items from the item bank) or forwarded for further review (e.g., by content specialists).
Additionally, two Test Statistics Reports—the Test Statistics Report by Test Form and the Test Statistics Report for Performance Assignments—are produced and published annually for the MTTC tests. These reports are designed to provide information about the statistical properties of the MTTC tests, including the reliability of the tests.
Differential Item Functioning (DIF) Analysis
Standard 4.10 When a test developer evaluates the psychometric properties of items, the model used for that purpose (e.g., classical test theory, item response theory, or another model) should be documented. The sample used for estimating item properties should be described and should be of adequate size and diversity for the procedure. The process by which items are screened and the data used for screening, such as item difficulty, item discrimination, or differential item functioning (DIF) for major examinee groups, should also be documented. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)
Beginning with the 2014–2015 program year, an annual analysis of Differential Item Functioning (DIF) of multiple-choice items was implemented for the MTTC program as a component of bias prevention. DIF occurs when individuals of the same ability level but with different characteristics (e.g., ethnicity/gender) have different likelihoods of answering an item correctly. DIF analyses are conducted on multiple-choice items to assess whether they perform differently depending on examinees' ethnic (e.g., White/Black and White/Hispanic) or gender (male/female) group membership. Items are assessed for DIF in regard to ethnicity and gender of test takers for items with sufficient numbers of examinees in both the focal (protected) group (e.g., Hispanic) and the reference group (e.g., White). DIF analyses are generated for items on MTTC test forms that are taken by at least 100 examinees in both the focal and reference groups. Typically, and moving forward beginning with the 2015–2016 administration year, each item is assessed for DIF across all administrations of the item on each test form on which it appeared during the previous two test administration years. For the analysis completed during the 2016–2017 administration year, items were assessed across the period from September 1, 2015 through August 31, 2017. An item is indicated for further review if it is identified more than 50% of the time as displaying DIF, regardless of the direction of the effect (i.e., favoring the reference or the focal group).
The analysis employs the Mantel-Haenszel DIF procedure, which is designed to detect uniform DIF. Uniform DIF occurs when the probability of members from one group answering an item correctly is consistently (i.e., uniformly across all ability levels) higher than the probability of members from another group answering the same item correctly. In this procedure, examinees are sorted into focal and reference groups and matched on ability, as determined by total test score. Items were identified as differentially functioning if they met the following criterion designated by Longford, Holland and Thayer (1993): the categorization of the magnitude of DIF, represented by pipe Delta pipe , is at least 1.5 and significantly greater than 1 (at the .05 significance level). Items meeting this criterion, indicating that one group performed significantly better than the group to which it was compared, are designated for further review.
Any items that flag for DIF are designated for further review to judge if the items contain any potential bias. Items identified as differentially functioning based on gender are reviewed by the MDE, and items identified as differentially functioning based on ethnicity are reviewed by the MDE and the bias review committee (BRC). The BRC reviews the identified items in light of the item statistics and using the standard review criteria that are applied during test development activities. Based on committee recommendations and final dispositions by the MDE, reviewed items are either retained in the banks or deleted.
Need More Time?
To continue your session, select Stay Signed In.