Regarding CIM/CAM implementation
I believe that the Oregon Department of Education should immediately stop the implementation of the CIM test until that test has been shown to meet with OAR 329.485 statutes requiring that the reform assessments abide by appropriate standards.
329.485 Statewide assessment system; types of assessments; subjects; additional services or alternative educational options. (1)(a) The Department of Education shall implement statewide a valid and reliable assessment system for all students that meets technical adequacy standards...
Further, the school districts should change back to whatever course structure, etc., were present before the CIM altered the course structure (e.g., it is my understanding that in some schools, the order of history courses has been changed, that various course content is being altered, that students are being told to take courses specifically to allow them to pass the CIM requirements, etc.).
Reasoning:
Several weeks ago, this a few minutes after I made a phone call to him on this issue, ODE employee Sueng Choi, Ph.D. (an assessment specialist working on developing the CIM test), checked with his supervisor, Dr. Steve Slater, and was told by Dr. Slater that Oregon ODE must abide by the 1985 "Standards for educational and psychological testing" (published by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education, 1985).
At any rate, ODE clearly must follow the 1985 standards. However, listed below are some of these 1985 standards along with examples of how they have been violated by the actual development and current implementation of the CIM testing.
Standard 1.1 Evidence of validity should be presented for the major types of inferences for which the use of a test is recommended. A rationale should be provided to support the particular mix of evidence presented for the intended uses. (Primary)
Comment:
Whether one or more kinds of validity evidence are appropriate is a function of the particular question being asked and of the context and extent of previous evidence.
Standard 1.2 If validity for some common interpretation has not been investigated, that fact should be made clear, and potential users should be cautioned about making such interpretations. Statements about validity should refer to the validity of particular interpretations or of particular types of decisions. (Primary)
Comment:
It is incorrect to use the unqualified phrase "the validity of the test." No test is valid for all purposes or in all situations. If a test is likely to be used incorrectly for certain kinds of decisions, specific warnings against such use should be given. On the other hand, no two situations are ever identical, so some generalization by the user is always necessary. Test developers should present their validation evidence in a way that can aid such generalization. (p. 13)
So a reasonable question is this -- what good validity evidence is there for CIM testing? If external validity (including concurrent, predictive validity, etc.) studies have not been performed (and it is my understanding that they have not been), where has that fact been made clear (in accordance with Standard 1.2)? I believe that in violation of Standard 1.2, there has been little or no qualifications of what kind of abilities, skills, etc., the CIM is expected to measure. To the contrary, the terms "Certificate of Initial Mastery" and "Certificate of Advanced Mastery" strongly imply these tests do, indeed, "certify" "initial mastery" and "advanced mastery."
Standard 1.6 When content-related evidence serves as a significant demonstration of validity for a particular test use, a clear definition of the universe represented, its relevance to the proposed test use, and the procedures followed in generating test content to represent that universe should be described. When the content sampling is intended to reflect criticality rather than representatives, the rationale for the relative emphasis given to critical factors in the universe should also be described carefully. (Primary) (p. 14)
According to this standard, it is important to review the rationale for the various components of the CIM and the CAM, and especially important to see "described carefully" the rationale behind the importance of the open-ended math questions used on the CIM, along with whatever empirical data ODE may have that proves that open-ended math questions in testing and in education are of great value in learning math. (If the ODE has good data that supports open-ended math questions, great. But if it doesn’t, then certainly grave questions must be raised about the role of open-ended math questions in the CIM and the CAM, in the Oregon Statewide Assessment, etc.)
Standard 1.7 When subject-matter experts have been asked to judge whether items are an appropriate sample of a universe or are correctly scored, or when criteria are composed of rater judgments, the relevant training, experience, and qualifications of the experts should be described. Any procedure used to obtain a consensus among judges about the appropriate specifications of the universe and the representativeness of the samples for the intended objectives should also be described. (Conditional) (p. 15)
This standard is very important to apply to the CIM and to the Oregon Assessment test, this as these tests apparently depend almost entirely on content validity. Therefore, it is essential that these tests have been constructed by experts who don't have their own theoretical axes to grind, who are in touch with good testing and educational practices, etc. It is important that the committees not be hi-jacked by biased directors, etc.
In accordance with the above standard, I asked ODE to provide a background of the experts, the methods they used to select the items of these tests, the procedures to obtain consensus, and of the items that have made it to the final level of the CIM, how many questions were donated by each of the experts. (I am interested in following up with a personal contact to each of the experts to determine whether the numbers are accurate, whether the method of consensus enumerated was indeed followed, etc. I am very concerned that only several of the panel members may have generated a majority of the questions.) However, it is my understanding (although I have not yet seen it stated in print) that no document has been prepared which contains the kinds of details demanded by the above standard. (At this point, I would have serious doubts about whatever technical manuals the ODE may construct in the near future, this as there would be no way of knowing whether the manuals actually describe how the test construction process was conducted. Indeed, such an approach may seem almost like trying to clean up after the damage has already been done. Clearly the manuals should have been prepared at the same time -- or very shortly after -- the tests were being constructed and the whole process of the test construction should be open to appropriate scrutiny.)
In all of this, a major concern should be that of consensual drift, this occurring when a group of people agree among themselves but agree less and less with the major body of people -- in other words, they sort of drift away from the general field, from what the general consensus of experts is, etc..
I am sure that average Oregonians do not know that the only form of validity that CIM and CAM proponents can offer at this point for their test is "content validity." Content validity derives from the fact that a group of unknown "experts" developed the questions which comprise the CIM and CAM tests. However, whether these tests correlate at all with "real world measures" has not been determined, and certainly all of us know that committees of experts can be wrong — such committees were responsible for open classrooms, whole language, the numerous debacles of various planned societies, etc. (Why should our committee of "experts" be right and other committees be wrong?) As contrasted with content validity, concurrent validity is the relationship between test scores and other current measures, such as nationally-normed, widely accepted standardized tests. Predictive validity is the relationship between test scores any future important behaviors -- e.g., what is the relationship between doing well on the CIM and youngster’s later writing skills?
Relying entirely on content validity only to establish the validity of a test -- especially a complex, high-stakes test -- is inappropriate from a methodological perspective, as this passage from page 75 of the 1997 technical manual of the WAIS-III/WMS-III makes clear (WAIS-III/WMS-III Technical Manual; 1997):
Content validity is made up of two components: content coverage and content relevance. It is not based on statistics or empirical testing; rather, it is the degree to which the test items adequately represent and relate to the trait or function being measured. Content coverage means that the items or subtests related to the abilities the test is designed to measure. These two properties, however, do not in and of themselves ensure "true" validity (Messick, 1975, 1980, 1995). Rather, they help prevent construct validity from being jeopardized by construct underrepresentation and construct-irrelevance (see Messick, 1995).
However, at this point, the CIM test has no external validity established for it.
Paul Kline (A handbook of test construction, 1986) noted: "Obviously, content validity is only useful for tests where, as in mathematics, the subject matter is clear" (p. 6). However, certainly domains such as "Initial Mastery" and "Advanced Mastery" are not clear. To the contrary, these general areas are particularly difficult to define. In 1993 Kline wrote (The handbook of psychological testing):
Content validity is applicable only to a small range of tests where the domain of items is particularly clear cut... However, even when a test has clear content validity it is advisable to demonstrate that it is valid by some other means. With tests of attainment and ability this is not usually difficult: predictive validity against the criterion of public exams or teachers’ ratings is usually possible. One further point needs brief discussion. If with these tests predictive validity is a viable procedure it is pertinent to ask why content validity needs to be established. The answer to this is that predictive validity is only required because it could be that a content-valid test was rendered invalid by poor instructions or poor modes of responding. In fact content validity is the validity to aim for, where it is relevant, and it should be backed up with evidence of predictive or concurrent validity (pp. 91-92)
Hence, even with content-valid tests (and the CIM has not yet begun to approach the situation in which it can be called "content-valid") and in dealing with clear subject matter, evidence of predictive or concurrent validity should be sought. (It would be very, very easy for ODE to do this, and it is remarkable that such data appear not to have been gathered.) If Oregon is changing its educational future on the basis of these high-stakes CIM outcomes -- if courses are being changed, if students are being advised what to take on the basis of CIM results, etc. -- then very clearly good evidence of "predictive or concurrent validity" should be available.
Standard 8.11 Test users should not imply that empirical evidence exists for a relationship among particular test results, prescribed educational plans, and desired student outcomes unless such evidence is available. (Primary)
Comment:
...When evidence supporting the utility of testing procedures for instructional purposes is lacking, test users can stress the tentative nature of the recommendations they provide and encourage teachers and others to weigh the usefulness of that information in light of additional available data. (p. 54)
In accordance with this standard, unless there is good empirical evidence for its claims, the ODE should immediately stop stating that CIM and CAM will ensure that our students will meet certain standards of minimum and advanced mastery of performance. (The average person reading the title of the "tests" -- "Certificate of Initial Mastery" and "Certificate of Advance Mastery" -- would logically think that these tests had been proven to demonstrate this.) Also, the ODE should begin immediately noting "the tentative nature" of the test results (at the very least, calling it a pilot test) and also note that there are pluses and minuses of shifting courses and changing course content (so that more students pass the CIM/CAM requirements) and that this be made clear to parents, these implications including the possibility that because of the push for youngsters to pass the CIM/CAM tests, students may do worse on SAT, AP, PSAT tests, etc., may be less able to acquire the course content required by many colleges and universities, etc. Also, it is not clear that any prestigious colleges or universities outside Oregon would accept the CIM/CAM data as a substantial part of a student's entry application, etc., especially if these institutions of higher learning were to discover that the CIM and CAM tests are greatly changing Oregon’s higher education course content, that Oregon intends to reduce its focus on Carnegie units (these the basic measures of course content), etc.
Standard 8.5 When a test is developed by a state or local district to be used for student promotion, graduation, or classification decisions, user's guides, or technical reports should be developed and disseminated. (Conditional)
Comment:
An agency that develops a certification or classification test has the same obligation to supply a manual and technical reports as does a commercial test publisher. A test that is widely used throughout a jurisdiction, even though not published or sold, requires a technical manual so that it can be properly used and evaluated. In smaller testing programs, disseminations may be limited to summary statements, provided that detailed analyses are made available on request. (p. 53)
In keeping with the above standard (and especially given the implications of this "high-stakes" testing), technical manuals for the CIM should be available, but it is my understanding that they have not yet been constructed.
Conclusion and Closing Thoughts:
In summary, the above points clearly demonstrate that the Oregon Department of Education did not follow the basic guidelines in establishing the CIM test. Further, failure to follow the basic OAR statute 329.485 to ensure this test and others (including the CAM, etc.) meets "technical adequacy standards" will lead to invalid results. Therefore, the implementation of the CIM into our schools should be immediately halted until we can be assured that the test-makers have followed the appropriate guidelines to make certain that we indeed have "a valid and reliable assessment system for all students that meets technical adequacy standards." Also, ODE should immediately reverse whatever changes were made specifically to have youngsters pass the CIM (such as course changes, etc.), until we know that the CIM is a worthwhile measure and that the time youngsters will spend learning to pass it will not be wasted or even counter-productive.
Before we allow the CIM/CAM "tail" to wag the educational "dog," we had better make darned sure that these tests have more validity than simply the "content validity" conferred on them by a particular group of "experts" who may be theoretically biased or who may be governed by a "wacko" leader. What proof do we have that these "experts" are right and that other "experts" are wrong? (As Lincoln noted, simply calling a tail a leg doesn’t make it one. And similarly simply calling the CIM/CAM valid doesn’t make it so.) Given the "new" approach of the CIM and the possible results of CIM outcomes (youngsters not graduating, course content and structure being changed to help ensure youngsters pass this "test," valuable teacher and student time taken up comporting with CIM requirements, etc.), we clearly need better proof of the validity of CIM before we should even think of implementing it in our schools. Otherwise, our children will be treated like "guinea pigs," this in violation of OAR 329.485, of general standards of test construction, and in violation of our expectation that the public schools provide good and tested educational approaches -- or, at the very least, approaches which will not limit their educational attainment and opportunities.
The basic rules of statistics and of test construction cannot be altered by good intentions. Further, even excellent tests misclassify youngsters some of the time, and even almost perfect tests do have painful consequences to some of its takers. But there is absolutely no need to use a test with an absolutely unknown relationship to other proven measures, especially in such a high-stakes situation. Also, the basic 1985 "Standards for educational and psychological testing" as well as common sense argue against such a course of action.
In addition, as a matter of simple practicality, the numbers on this test will come out, and they are very unlikely to prove the test to be worthwhile. At long last, this test will be compared with other measures, including standardized, nationally-normed tests, and when the results come out, Oregonians will have a right to be furious. Both the State of Oregon and any local school district implementing these unproven approaches without testing them first will be financially liable. (This is one of the reason that major test constructors -- such as The Educational Testing Service, the Psychological Corporation, etc. go to such lengths to make sure their tests reach high standards of excellence.)
If you or anyone else has any questions regarding this, I have a copy of the "Standards for educational and psychological testing" and would be happy to go over the relevant paragraphs. Also, I urge you to contact test-construction specialists unconnected with the Educational Reforms in this and other states and I feel certain that they too would have grave doubts about the way CIM was developed, and about its implementation in our schools without sound proof. (I have contacted various experts myself -- including several at the Unversity of Oregon, one at the Educational Testing Service in Princeton, and another in Nevada. All agreed with my concerns about the lack of empirical support, etc.)
By the way, the same concerns should be voiced about the Oregon Statewide Assessments and the Oregon Educational Benchmarks -- these were also created by "expert" committees and they also -- as far as I can learn -- have not been validated by nationally-normed tests, etc. All are pronounced as "new," etc. But if they are "new," those applying them to our children should have specifically sought good evidence of external validity before our children’s futures are changed by the results. Again, the State has had ample time to gather such data but, I have been told, has failed to do so. (I have talked to elementary teachers who have said that the content on the Statewide Assessments is clearly different from the nationally-normed standardized tests administered to their students. These teachers feel that they have to have essentially two different curricula for their students. This clearly indicates that the Statewide Assessments are not consistent with widely agreed-upon, proven approaches and they should clearly be more closely examined before they decide the future of Oregon’s schools and Oregon’s school children.)
As our children are our most important responsibility, and as the above issues should raise concerns from all of our citizens -- including parents -- it is time to put implementation of the CIM on hold. (I would also advise a close examination of the Statewide Assessment and the State Benchmarks, this in keeping with OAR 329.485.) We have already spent much, much money on the unproved CIM approach -- let us not squander the full futures of our youngsters on approaches with no indication of validity other than that conferred on them by a group of "experts," a group we know nothing about and a group that is probably outside the mainstream of effective and proven curricula developers.
If you have questions or comments regarding this, please do not hesitate to contact me. Please distribute this as you think appropriate.
Yours,
Caleb Burns, Ph.D.
Psychologist
Portland Psychology Clinic
2154 NE Broadway, Suite 110
Portland, OR 97232
Phone and Fax: (503) 288-4558
August 31, 1998 (and as of 5-7-98, no reply from Mr. Ron Houser of the DOE -- I faxed him a letter several months ago requesting the technical manual on the CIM, also asking for validity information, etc.)
If you would like to drop me a note to discuss this or related topics, please click here. calebb@teleport.com