DEPEKY

Developmental Education in Postsecondary in Kentucky
FYE Resources and News

ASSESSMENT OF THE FIRST-YEAR SEMINAR:
RESEARCH-BASED GUIDELINES FOR COURSE & PROGRAM EVALUATION
Joe Cuseo
Marymount College
Abstract
This manuscript attempts to provide comprehensive coverage of key issues associated specifically with evaluation of the freshman seminar and grounds them within the larger context of assessment theory and research in higher education. Four major questions about the process of freshman-seminar assessment are addressed: (a) Why is the assessment being conducted, i.e., What is its purpose or objective? (b) What outcomes will serve as criteria for assessing the impact of the freshman seminar? (c) Who should conduct the assessment project and analyze the results? and (d) How will the assessment be conducted, i.e., What research design or methodology will be used to assess whether the course has achieved its intended outcomes?
Eight different research designs and methods are described and evaluated: (a) experimental design, (b) quasi-experimental design, (c) time-series design, (d) multiple regressions analysis (a.k.a., multivariate analysis), (e) course-evaluation surveys (i.e., student ratings), (f) pre-test/post-test design, (g) analysis of students' behavioral records (e.g., logs of student use of campus programs, trace audits, transcript analyses), and (h) qualitative research methods (e.g., focus groups, content analysis and category analysis of students' written comments). Strategies are offered for writing the final assessment report in terms of (a) relating the report to institutional mission and goals, (b) tailoring content and tone of the report to specific audiences, (c) reporting results for different student subgroups, and (d) identifying how results will be acted on and put to use.
The future of freshman-seminar assessment is discussed in terms of what additional outcomes and alternative methods represent fertile areas for subsequent research on the freshman seminar, such as assessment of (a) specific course topics and teaching strategies that have the most impact on student outcomes; (b) how involvement in the freshman seminar influences student satisfaction with the total college experience; (c) impact of the freshman seminar on students' choice of major and their time to graduation; (d) faculty and staff perceptions of how student participation in the freshman seminar affects their behavior in the classroom and on campus; (e) impact of the freshman seminar on the transfer rate of community college students; (f) how availability of the freshman seminar at an institution can affect college marketing and student recruitment; (g) whether student performance in the freshman seminar is an effective predictor of overall student success during the first year of college; and (h) viability of the freshman seminar as a vehicle for conducting comprehensive student assessment at college entry.
The monograph concludes with a discussion of how the wide range of student outcomes targeted for freshman-seminar assessment, and the sophisticated methodologies that have been used to assess these outcomes, may serve as both a model and a stimulus for effective campus-wide assessment of other educational programs or institutional interventions designed to promote student retention and achievement.
Introduction
National survey data reveal that freshman seminars are being adopted with increasing frequency on college campuses. However, despite the increase in the number of adopted freshman seminars, the overall percentage of freshman seminars in place at American colleges and universities has remained the same (approximately 67%) since 1988, and the percentage of institutions reporting strong support for their existing freshman seminars has been declining (Barefoot & Fidler, 1996). The survey authors offer the following interpretation for these findings.
This may indicate that as freshman seminars are born, others die an untimely death for a variety of reasons which can be summarized as lack of firm institutional support . . . . Freshman seminars are generally held to higher expectations with respect to outcomes than any other course in the curriculum. The absence of such outcomes (or lack of research to demonstrate outcomes) may spell the demise of the course . . . . "Successful" seminars—those that enjoy strong broad-based institutional support and long life--are courses [that] are evaluated on a regular basis, and results of this evaluation are made available to the entire campus community (Barefoot & Fidler, 1996, pp. 5-6, 61).
In addition to increasing the likelihood of course survival, an assessment plan may also increase the likelihood of receiving institutional approval to initiate a freshman seminar program. As Kenny (1996) advises, "Assessment should be built into the proposal. Not only will this disarm opposition, but it will keep program designers focused and will enhance program credibility " (p. 71).
The freshman seminar is an educational intervention whose potential for promoting positive student outcomes qualifies it for rigorous assessment at any institution which has adopted the course, and particularly at those colleges where institutional quality is defined not simply in terms of student selectivity or national reputation, but in terms of the quality of the educational programs they provide and their demonstrated impact on student development and success.
National interest in and demand for assessment of freshman year experience programs is highlighted by John Gardner, pioneer of the freshman year experience movement and founder of the National Resource Center for the Freshman Year Experience, in a statement he made several years ago:
What I did not foresee . . . was the tremendous number of requests we would receive from educators asking for basic information on how to get started on the research they wanted to do on their own campus programs. In addition, I could not have anticipated the flood of requests we would receive from graduate students, both at the masters and the doctoral level, who would be interested in doing empirical research measuring the effectiveness of programming for freshmen (1992, p. 3).
To date, only one publication has appeared in the scholarly literature which attempts to provide systematic guidelines for assessment of freshman year programs: Primer for Research on The Freshman Year Experience by Dorothy Fidler (1992). The present monograph builds upon and expands the recommendations offered in this earlier publication, attempts to provide comprehensive coverage of key issues associated specifically with freshman seminar evaluation, and grounds these issues within the larger context of assessment theory and research in higher education.
Key Questions and Issues Involved in Assessment of the Freshman Seminar
Assessment of the effects of the freshman seminar on student outcomes, like assessment of any other educational program, involves consideration of four major questions or issues: (a) Why is the assessment being conducted? (b) What outcomes will be assessed? (c) Who will conduct the assessment? and (d) How will the assessment be conducted? Each of these four questions will be addressed, in turn, in successive sections of this monograph.
* Why is the assessment being conducted? What is its goal or objective? This is the first question that must be addressed in the assessment of any educational program or intervention. Its pivotal role in the assessment process is underscored by Trudy Banta, a nationally-recognized assessment scholar.
Assessment is most effective when it is based on clear and focused goals and objectives. It is from these goals that educators fashion the coherent frameworks around which they can carry out inquiry. When such frameworks are not constructed, assessment outcomes fall short of providing the direction necessary to improve programs (Banta et al., 1996, p. 22).
Two major goals of freshman seminar assessment are (a) to obtain evaluative information on the program's overall effectiveness or impact for use in bottom-line decisions about whether the program should be adopted, funded, continued, or expanded, and (b) to obtain evaluative information on the program for the purpose of improving or fine-tuning its quality.
In assessment terminology, the first purpose is referred to as "summative evaluation," i.e., assessment designed to "sum up" a program's overall value; and the second purpose is referred to as "formative evaluation," i.e., assessment that helps to form, shape, or further develop a program's effectiveness (Scriven, 1967).
* What outcomes will be selected as criteria for assessing the program's impact? The following outcomes have been the most commonly used measures of the freshman seminar's impact.
1. Student outcomes, such as:
- Student satisfaction with the course or student perceptions of course effectiveness— as measured by student ratings and narrative comments made on course-evaluation surveys or questionnaires administered upon course completion.
- Student use of campus services and participation in campus activities—as measured by survey questions asking for students' reported use of campus services and their frequency of participation in campus activities; or, by means of logs kept by student development or student service professionals that track students' use of specific campus services and participation in particular campus activities .
- Students' out-of-class interaction with faculty—as measured by the frequency and quality of such interactions reported on student surveys.
- Students' social integration into the peer community—as measured by the quantity and quality of on-campus friendships reported on student surveys.
- Student retention—as measured by (a) number of future courses/units completed, (b) student persistence to completion of the first semester of college (fall-to-spring retention), (c) student persistence to completion of the first year of college (fall-to-fall retention), and/or (d) persistence to degree/program completion.
- Students' academic achievement—as measured by (a) course content knowledge acquired in the freshman seminar, (b) cumulative grade-point average achieved at the end of the freshman year or at college graduation, (c) number of first-year students in good academic standing (versus academic probation or dismissal), (d) number of first-year courses attempted and completed (vs. dropped or failed), and (e) number of introductory courses completed with a grade of C or better.
2. Faculty/Staff outcomes, such as:
- Satisfaction with the seminar's instructor-training program, as measured by participants' survey ratings or comments.
- Changes in attitude and behavior toward students, or changes in instructional methods, which faculty report are the result of their participation in the seminar's instructor-training program.
3. Institutional outcomes, such as:
- Improved enrollment management—as measured by increased annual rates of student retention and increased institutional revenue resulting there from.
- Improved institutional effectiveness or efficiency— as evidenced by increased student and faculty utilization of campus resources, and reduced time taken by students to complete educational programs or degree requirements.
There are other important outcomes to which the freshman seminar may contribute which have yet to be assessed and reported in the research literature. These are discussed in a subsequent section of this monograph, titled "The Future of Freshman Seminar Assessment" (pp. ).
* Who should conduct assessment of the freshman seminar and analyze the results? Probably the simplest and most direct answer to this question is to have someone conduct the evaluation who has not been associated with the freshman seminar and who has no vested interest in its outcome, i.e., an "external" evaluator or "third party" assessor who would not be biased toward or against the program. This should serve to enhance the credibility of the assessment report. For example, we would have more confidence in research on the health effects of cigarette smoking that was designed, conducted, and analyzed by an external, university-affiliated research team than by researchers associated with, or employed by, the American tobacco industry.
At the University of South Carolina, evaluation of the freshman seminar is conducted by a "highly respected researcher who has no official relationship to or responsibility for the program being studied" (Gardner, 1986, p. 271). This practice guards against evaluator bias--the tendency of the individual who designed or conducted a study, when involved in evaluating the resulting data, to unwittingly skew the findings in the direction of its intended outcome. Evaluator bias tends to occur because of (a) the "believing is seeing" trap (Weick, 1979), whereby the researcher sees what he expects to see and perceives ambiguous information in the direction of the expected or desired results (Arnoult, 1976), and (b) the "Rosenthal effect," in which unrecognized behaviors on the part of the researcher may tacitly encourage or reinforce the study's participants to respond in ways that support the intended results (Rosenthal, 1966, 1974).
It is noteworthy that ensuring the assessor be someone untied to the course or program being assessed is common practice at European colleges and universities (Adelman, 1986). In America, it is more likely for those involved in running a program or teaching a course also to be involved in its assessment. As one assessment scholar notes, "If American colleges and universities were commercial institutions, they would be in violation of the Sherman Antitrust Act for `bundling' these services" (Harris, 1986, p. 14). One strategy for "unbundling" these two functions is to have the assessor be someone on campus who has had a history of involvement in institutional research, but who has no direct tie to the freshman seminar program. This strategy may be particularly viable today because recent national interest in the related issues of assessment and accountability has precipitated an increase in the number of institutions employing personnel with full-time or part-time responsibility for conducting on-campus assessment projects (El-Khawas, 1993).
Another option is to request the assistance of faculty. The department of education or social and behavioral sciences may be a good source of faculty who have the graduate training and professional expertise needed to conduct program assessment. Dorothy Fidler (1992) suggests that faculty from the departments of computer science, mathematics and statistics may have valuable expertise they can lend to the study's design or data analysis. As Howard Altman urges, "Make use of ‘local talent’ among the faculty . . . . There is often a great deal of faculty expertise which gets hired by other campuses by outside sources, but which is ignored on the home campus, where it is often available for free" (1988, p. 126).
Whether qualified faculty should be asked to lend their evaluation services "for free" is a sensitive and debatable issue. Given that the requested faculty member is likely to have full-time professorial responsibilities, requesting that she simply add on another major task to her existing workload might be both socially insensitive and logistically infeasible. Alternatives to this gratuitous approach could be offered to faculty in the form of extra compensation (e.g., stipends, meritorious pay), or release time from other institutional responsibilities (e.g., reduction in teaching load, publication requirements, or committee work). Rewarding faculty who are asked to conduct freshman seminar assessment should serve to increase their level of commitment and effort, and should send a clear message to the college community that the institution is fully committed to conducting a high-quality program evaluation—as evidenced by its willingness to provide evaluators with the time and/or fiscal support needed to conduct the assessment in a thorough and rigorous fashion. If such incentives cannot be provided, then at the very least, those faculty who commit their time as program evaluators should be rewarded with a formal thank-you letter. A copy of this letter should also be sent to the faculty member's department chair or academic dean, so it can be included in his retention-and-promotion file.
Another potential source of assistance in conducting assessment of the freshman seminar are students. For instance, graduate students from relevant academic departments or from student development programs could assist in program assessment. Upper-division undergraduates might also provide assistance, perhaps as research assistants on faculty-student research teams or graduate-undergraduate student research teams.
* How will the assessment be conducted, i.e., What research design or methodology will be used to assess the intended outcomes of the freshman seminar?
The foregoing sections of this manuscript have focused on (a) the why question of assessment (i.e., its purpose or objective), (b) the what question (i.e., what types of outcome data will be collected) and (c) the who question (i.e., who will conduct the assessment). This section attempts to answer the how question (i.e., the means or methods used to collect outcome data).
The method used to be used for assessing whether or not the intended outcomes of the freshman seminar have been realized is an important decision that should be made before data are collected and analyzed (Halpern, 1987). Thus, decisions about the research design or method for evaluating the freshman seminar should be made early in the course planning and assessment process. Listed below are descriptions of a variety of potential research designs for assessing the freshman seminar, accompanied by a discussion of their relative advantages and disadvantages.
1. Experimental Design
This research method involves comparing student outcomes for freshmen who are randomly assigned to either one of the following two groups: (a) an "experimental" group of students who participate in the freshman seminar, or (b) a "control" group of students who do not participate in the course.
Historically, this method has been considered to be the scientifically ideal or "true" experimental design for evaluating educational programs because it ensures randomized assignment of students to both the experimental and control groups, thus controlling for the "volunteer effect" or "self-selection bias," i.e., the possibility that students who voluntarily decide to participate in an educational program, and then select themselves into that program, may be more intrinsically motivated and committed to college success than students who elect not to become involved in the program. In the case of the freshman seminar, any positive outcomes resulting from voluntary student participation in the course may be due to the highly motivated nature of the particular first-year students who choose to enroll themselves in the seminar, rather than to the actual effect or impact of the course itself (Fidler, 1992).
As Pascarella and Terenzini point out in their landmark work, How College Affects Students, It has been axiomatic in the educational research community that the most valid approach for estimating the causal link between two variables and thus the net effect of one on the other is through the random assignment of individual subjects to experimental and control groups. Unfortunately, the necessary conditions for a true or randomized experiment are extremely difficult to obtain in actual field settings where self-selection rather than random assignment is the rule . . . . Perhaps the basic problem in assessing the unique influence of college on students is the issue of student self-selection or recruitment (1991, pp. 657-658).
Empirical support for the self-selection bias operating when students voluntarily enroll in the freshman seminar is provided by Schwitzer, Robbins, & McGovern (1993) who found that freshmen who enrolled voluntarily in a freshman orientation course had a better sense of goal directedness and were experiencing fewer adjustment problems than freshmen who chose not to take the course.
One procedure that has been used to circumvent this methodological problem is to solicit a larger number of students who are interested in taking the course than the actual number of course sections that can accommodate them. Half of those students who express interest in taking the course are then randomly selected to fill the available course sections (to serve as the experimental group), while the remaining half of students who had expressed interest in taking the seminar are denied access to the course (so they may serve as the control group).
To further ensure that students in both the control and experimental groups are representative of the total freshman population on campus, a "stratified" random sampling procedure may be used. In this procedure, before students are assigned to either the experimental or control group, they are subdivided into strata (subgroups or subpopulations) which represent the approximate percentage of their total population on campus (e.g., 60% female, 40% male; 25% residents, 75% commuters; 15% minority students, 85% majority students.) Students are then randomly selected from each of the designated subpopulations and assigned to both the experimental and control groups. The major disadvantage of the experimental design is an ethical one: Its random selection of students to become course participants or non-participants (members of the control group) results in the arbitrary denial of course access to one-half of the students who want to become involved in the program and who are likely to benefit from it (Pascarella, 1986). This is akin to a common medical ethics issue involving drug research: Do you arbitrarily deny certain patients a promising drug that could significantly enhance the quality of their lives, or possibly save their lives, so they can conveniently be used as the "placebo" control group in an experiment designed to test the drug's effectiveness? Analogously, do you arbitrarily deny certain students access to a promising educational program (freshman seminar) that could significantly enhance the quality of their college experience, or enable them to survive to degree completion, so that they can serve as a control group in an experiment designed to assess the program's effectiveness? This ethical disadvantage of the experimental design may be tempered somewhat by the argument that it is a justifiable procedure when used to conduct a "pilot study" on only one cohort of students, with the intention of gathering just enough data to serve as supporting evidence for subsequent expansion of the course, thereby ensuring the seminar's availability to all future cohorts of freshmen.
Viewed in this light, the experimental design may be seen as an ethically acceptable and methodologically rigorous research tool for marshalling initial empirical evidence to support the freshman seminar which then may be used to justify long-term, full-scale "institutionalization" of the course.
2. Quasi-Experimental Design
This research method involves comparing outcomes for freshmen who volunteer to participate in the freshman seminar (experimental group) relative to a "matched" control group, i.e., selected freshmen who have elected not to participate in the seminar but whose personal characteristics are similar to, or "match" the experimental group on important student variables that may influence educational outcomes. For example, in previously conducted freshman-seminar assessments, students in experimental and control groups have been matched with respect to such characteristics as (a) high school grade-point average, (b) standardized college-admission test scores, (c) basic-skills placement test scores, (d) predicted GPA derived from weighted scores on college-preparation courses, (f) high school grades and SAT scores, (e) educational goals or objectives, (f) residential or commuter status, and (g) demographic characteristics such as age, gender, race or ethnicity. Matching course participants with non-participants in this fashion serves to control for, or rule out the possibility that differences in student outcomes associated with course participation could be due to the fact that course participants had personal characteristics which differed significantly from non-participants.
A major ethical advantage of this research design is that it allows all students who express interest in taking the freshman seminar to have access to the course, thus circumventing the ethical problem associated with an experimental design in which some students are arbitrarily denied course access so they can serve as a control group. However, one methodological disadvantage of the quasi-experimental design is that students are not randomly assigned to experimental and control groups as they are in a true experimental design (hence its name, "quasi-experimental"). Consequently, this design fails to control for the volunteer effect or self-selection bias, leaving open the possibility that any positive outcomes resulting from course participation may be due to the highly motivated nature of students who elect to enroll in the freshman seminar, rather than to the effects of the course itself.
One possible strategy for addressing this limitation of the quasi-experimental design is to survey students in both the experimental group and matched-control group to assess whether they report differences in their level of college motivation. For instance, at the University of South Carolina, a short survey has been designed to assess students' motivation to stay in college and complete their degree. This survey has been administered to both freshman seminar participants and non-participants. Comparisons of survey responses provided by course participants and non-participants have revealed no differences between the two groups in their motivation to stay in college and complete their degree. Thus, the higher freshman-to-sophomore retention rate evidenced by students who participate in the university's freshman seminar is not likely to be an artifact of students selecting themselves into the course because of their higher level of motivation to persist and succeed in college (Fidler, 1991).
Similarly, the University of South Carolina has administered the Cooperative Institutional Research Program (CIRP) survey to assess the "joining" behavior of freshmen prior to college entry. These survey results indicate that there are no differences between students who eventually decide to enroll in the freshman seminar and those who do not. Hence, the higher rate of participation in co-curricular activities evinced by course participants is unlikely to be due to self-selection by students with greater motivation for extracurricular involvement (Gardner, 1994).
3. Time-Series Design In this research design, outcomes assessed after implementation of the freshman seminar are compared with the same outcomes achieved prior to the seminar's adoption. For example, freshman-to-sophomore retention rates at the college after adoption of the freshman seminar are compared with freshman-to-sophomore retention rates for the years preceding course adoption.
The advantage of this design is that it provides a type of "historical" control group--against which the effects of seminar participation may be compared--without having to withhold the course from a portion of entering freshmen so they can serve as a "contemporary" control group. Hence, this design circumvents the ethical drawback of an experimental design in which some students are deliberately recruited for the freshman seminar and then arbitrarily deprived course access so they can be used as a control group.
However, two caveats must be issued with respect to the time-series research design: (a) The personal characteristics of entering freshmen during years before and after implementation of the freshman seminar should be similar or matched so that any changes in student outcomes subsequent to course implementation cannot simply be due to historical changes in the entry characteristics of the freshman class (e.g., more academically qualified freshmen entering the institution during and after implementation of the freshman seminar). (b) Two or more years of outcome data should be gathered before and after institutional initiation of the freshman seminar in order to compare pre- and post-seminar outcomes--not just the year immediately before and after program implementation--because any year-to-year fluctuations in student outcomes (e.g., retention) may simply be due to random chance deviation (Pascarella, 1986). Gathering data for two or more years before and after program implementation also results in a larger sample size which can enhance the power or sensitivity of statistical tests (Cohen, 1988), such as t-tests and chi square analyses, which may be used to detect pre- to post-implementation differences in student outcomes.
Ramapo College (New Jersey) has employed this time-series design to demonstrate that the average freshman-to-sophomore retention rate for cohorts of entering students who enrolled in the freshman seminar during a five-year period immediately after the course became a requirement was significantly higher than the average retention rate for freshmen entering the college during the three-year period immediately before course adoption (Ramapo College in Barefoot, 1993).
4. Multiple Regression Analysis (a.k.a., Multivariate Analysis)
This statistical procedure, or some variant thereof, has been the favored research design of contemporary scholars interested in assessing how college experiences affect student outcomes (e.g., Astin, 1993; Pascarella & Terenzini, 1991). In short, multiple regression analysis involves computing correlations between student-outcome variables (e.g., student retention or academic performance) and two other types of variables: (a) student input variables (e.g., entering students' SAT scores) and (b) college experience variables (e.g., student participation in the freshman seminar).
For a detailed explanation of multivariate analysis, consult the appendices in Astin (1991) or Pascarella and Terenzini (1991). The following synopsis of multivariate analysis has been adapted from Astin (1993) and applied to assessment of the freshman seminar. The first step in multiple regression analysis is to calculate correlations between all influential student-input characteristics and a single student outcome in order to obtain a "prediction score" or "estimated outcome" score for that particular outcome (e.g., the predicted or estimated first-year GPA for college freshmen based on their entering high-school GPA, SAT, and placement-test scores). This estimated outcome score, based on characteristics which students bring with them to the institution, serves as a type of "statistical" control group or baseline against which to compare the effects of later college experiences on student outcomes. For instance, if students who participate in a freshman seminar go on to display a higher retention rate than would be expected or predicted from their college-entry characteristics, then this discrepancy (called the "residual score") suggests that participating in the seminar (a college experience variable) is having an effect on retention (a student outcome variable).
The amount of the seminar's effect can be assessed by computing the correlation between the residual score it produces and the student outcome in question. This partial correlation (called the "beta" coefficient) represents the degree to which the educational experience and the student outcome are statistically related—after all other potentially biasing student characteristics have been controlled for. In other words, it represents what the freshman seminar experience adds to the predicted student outcome--above and beyond what can be predicted by student input characteristics.
Thus, it might be said that multiple regression analysis attempts to control for confounding student variables statistically, i.e., by computing and comparing correlations between student variables and outcomes, whereas the aforementioned experimental and quasi-experimental research designs attempt to gain this control procedurally, i.e., by the procedures used to select and assign students to experimental and control groups. Multiple regression analysis can also be adapted to assess whether the effect of the freshman seminar (or any other college-experience variable) on a student outcome is either "direct" or "indirect." A college-experience variable can be considered to have a direct effect on a student outcome if its beta coefficient remains statistically significant even after the correlations of all other college-experience variables with that student outcome have been included in the regression analysis. This suggests that a particular college-experience variable is making a unique or independent contribution to the student outcome that cannot be accounted for by other college-experience variables. A college-experience variable may be deemed to have an indirect effect on a student outcome if its beta coefficient, which was significant after student input (entry) characteristics were controlled for, is later reduced to nonsignificance when other college-experience variables are added to the regression equation. This suggests that the effect of the particular college-experience variable is accounted for, or mediated by, other college-experience variables.
There are three key advantages associated with the use of multiple regression analysis as a research method for assessing outcomes of the freshman seminar:
(a) It circumvents the disadvantage of a "true" experimental design in which freshmen are denied access to the course so they can be used as a control group.
(b) It allows investigators to assess whether the addition of individual college-experience variables results in any incremental change in the predicted score for a student outcome (Banta, 1988). For example, multiple regression analysis could be used to answer the following question: Would student participation in the freshman seminar, plus their participation in a pre-semester orientation program, result in a higher than predicted rate of student retention than participation in the seminar alone?
(c) It allows investigators to compute the percentage of outcome variance that is attributable to a particular student-input or college-experience variable (by squaring its beta coefficient), thus providing an estimate of the variable's relative influence on the outcome under investigation (Pascarella & Terenzini, 1991). For example, via multiple regression analysis it would be possible to compute the approximate percentage of total variance in college students' first-year GPA that is attributable to participation in the freshman seminar, relative to the percentage of variance in first-year GPA that is attributable to students' entering SAT, high-school GPA, or placement-test scores.
Two limitations of multiple regression analysis have been cited in the assessment literature: (a) The procedure does not allow assessment of how joint or common variance between college-experience variables and student-input variables may interact to influence outcomes (Hanson, 1988). (b) It assumes that any variance in outcome that may be attributed to the joint influence of student-input and college-experience variables is attributed solely to the student-input variable, thus the influence of student-input variables on outcomes may be overestimated while the influence of college-experience variables are underestimated (Pascarella & Terenzini, 1991).
However, proponents of multiple regression analysis consider these to be relatively minor limitations which have little adverse effect on the overall validity and interpretability of the results generated by this statistical procedure (Astin, 1991; A.W. Astin, personal communication, October 21, 1992).
There is one final consideration should be kept in mind when using any statistical test to evaluate the significance of the freshman seminar's impact. If desirable student outcomes attributed to the freshman seminar are not found to be statistically significant, the seminar's effect on certain institutional outcomes may still prove to be practically significant. For instance, a freshman seminar which results in a very modest 5-10% increase in student retention may generate a gain in the college's total enrollment number that does not reach a level of statistical significance; however, the revenue gained from this modest increase in additional tuition-paying customers may contribute significantly to the institutional budget, particularly at colleges whose operational budgets are heavily tuition dependent.
This institutional outcome has been assessed at Seton Hall University. The cost/benefit ratio of its freshman studies program was evaluated by means of two statistical techniques that are commonly used in business to evaluate the economic benefits of alternative courses of action: (a) "break-even analysis" (Larimore, 1974), and (b) "elasticity coefficient" (Hoffman, 1986). Two faculty from the university's department of economics used these procedures to assess whether the total revenue generated by its freshman studies program equaled or exceeded the total costs incurred by the program. They found that the break-even point for an entering class of approximately 1,000 students who participated in Seton Hall's freshman studies program was 21 students, which represented an increased retention rate of only about two percent. This means that if implementation of the program leads to the retention of 21 additional students who would otherwise have withdrawn from the college, the program will have paid for itself. The architects of this campus-specific study concluded that Seton Hall's freshman studies program was "cost efficient [and] will more than pay for itself in economic terms alone without taking into account the quality benefits that accrue to the university and the retained students" (Ketkar & Bennet, 1989, p. 43).
These findings are consistent with early cost-effectiveness research on the freshman seminar (University 101) conducted at the University of South Carolina, whose Office of Finance reported that for every $1.00 used to support its freshman seminar, the program generated $5.36 in return (Gardner, 1981). What these campus-specific research reports strongly suggest is that comprehensive evaluation of the freshman seminar's overall impact should involve not only its statistical effect with respect to student outcomes, but also its fiscal effect as an institutional outcome.
5. Course-Evaluation Surveys: Student Ratings
National research reveals that student ratings are the most widely used source of information for assessing teaching effectiveness in college (Seldin, 1993), and student ratings of the course or course instructor are the most commonly used strategy for assessing the freshman seminar (Barefoot & Fidler, 1996). Since student ratings represent the most frequently used strategy for evaluating teaching effectiveness in general, and for evaluating the freshman seminar in particular, an extensive discussion of this assessment strategy is provided in this section of the monograph.
Course evaluations usually take the form of student surveys or questionnaires which are designed to assess (a) students' level of course satisfaction or their perceptions of course effectiveness, and (b) self-reported student outcomes associated with the course (e.g., students' reported use of various campus services or the frequency of their interactions with key college personnel).
One major strength of student evaluations is that their reliability and validity have probably received more empirical support than any other method of course assessment; there have been over 1300 articles and books published which contain research on the topic of student ratings (Cashin, 1988). Despite perennial criticisms of student evaluations by some faculty and the publication of some isolated studies which purportedly refute the validity of student evaluations, when the results of all studies are viewed collectively and synthesized, they provide strong support for the following conclusions about student evaluations.
(a) Students' judgments correlate positively (i.e., are in agreement with) the judgments of more experienced observers (e.g., alumni, teaching assistants, faculty peers, administrators, and trained external observers (Aleamoni & Hexner, 1980; Feldman, 1988, 1989; Marsh, 1984).
(b) Students make judgments on what is taught and how it is taught. Student judgments are not unduly influenced by their own personal characteristics (e.g., student's gender or academic ability), or by characteristics extraneous to the course, such as time of day or time of year when the course is taught (Abrami, Perry, & Leventhal, 1982; Aleamoni & Hexner, 1980; Feldman, 1977; 1979; Seldin, 1993).
(c) Students' overall ratings of course quality and teaching effectiveness correlate positively with how much they actually learn in the course--as measured by their performance on standardized final exams. In other words, students rate most highly those courses in which they learn the most and those instructors from whom they learn the most (Abrami, d'Apollonia, & Rosenfield, 1997; Centra, 1977; Cohen, 1981, 1986; McCallum, 1984).
(d) Student evaluations do not depend heavily on the student's age (Centra, 1993) or level of college experience. For example, lower-division students do not provide ratings that differ systematically from upper-division students (McKeachie, 1979).
(e) Students distinguish or discriminate among specific dimensions and components of course instruction. For example, students give independent ratings to such course dimensions as course organization, instructor-student rapport, and the quality of course assignments (Marsh, 1984). As Aleamoni (1987) illustrates, "If a teacher tells great jokes, he or she will receive high ratings in humor . . . but these ratings do not influence students' assessments of other teaching skills" (p. 27).
Bill McKeachie, who has been engaged in national research on student ratings for three decades, recently provided a succinct summary of why we should take student evaluations seriously.
Decades of research have related student ratings to measures of student learning, student motivation for further learning, instructors' own judgments of which of two classes they had taught more effectively, alumni judgments, peer and administrator evaluations, and ratings by trained observers. All of these criteria attest to the validity of student ratings well beyond that of other sources of evidence about teaching (McKeachie & Kaplan, 1996, p. 7).
Moreover, a large body of research has consistently reputed commonly held myths about student ratings. For instance, the following findings fail to support traditional criticisms of student evaluations.
(a) Students who receive higher course grades do not give higher course ratings (Theall, Franklin, & Ludlow, 1990; Howard & Maxwell, 1980, 1982).
(b) Students do not give lower ratings to difficult or challenging courses which require a heavy work load (Marsh & Dunkin, 1992; Sixbury & Cashin, 1995).
(c) Student evaluations are not unduly influenced by the instructor's personality and popularity; for example, entertaining teachers do not necessarily receive higher overall student ratings (Costin, Greenough, & Menges, 1971; McKeachie, et al., 1978; Marsh & Ware, 1982).
(d) Student ratings do not change over time or with students' post-course experiences; in contrast, there is substantial agreement between student evaluations given at the time of course completion and retrospective evaluations given by the same students one-to-five years later (Feldman, 1989; Overall & Marsh, 1980). This refutes the oft-cited argument that students are immature and only with maturity, or the passage of time, will they come to appreciate courses or instructors that were initially rated poorly. In a comprehensive review of the research literature on students' course ratings, Cashin (1995) reaches the following conclusion.
There are probably more studies of student ratings than of all of the other data used to evaluate college teaching combined. Although one can find individual studies that support almost any conclusion, for a number of variables there are enough studies to discern trends. In general, student ratings tend to be statistically reliable, valid, and relatively free from bias or the need for control; probably more so than any other data used for evaluation (p. 6). This echoes the conclusion reached by Scriven (1988): "Student ratings are not only a valid, but often the only valid way to get much of the information needed for most evaluations" (quoted in d'Apollonia & Abrami, 1997, p. 19).
These research-based conclusions are not surprising when viewed in light of the following reasons why students are in a uniquely advantageous position to evaluate the quality of a course. (a) Students witness all elements of the course, from the syllabus to the final exam, including all intervening in-class learning experiences and out-of-class assignments. Thus, they have a comprehensive perspective of the totality of instructional elements that comprise the course. (b) Students experience multiple courses and multiple instructors which enable them to provide course assessment from a comparative perspective. (c) Multiple students experience and evaluate a given course, thus they comprise a large sample of observers, particularly if course evaluations are amalgamated across semesters or across different course sections. It is almost an assessment axiom that such a large representative sample is a necessary precondition for drawing accurate inferences and reaching valid conclusions from empirical observation (Hays, 1973; Robinson & Foster, 1979).
Another major advantage of course-evaluation surveys or questionnaires is that they are capable of generating an extensive amount of data on a large sample of respondents in a relatively short period of time. If a student-rating survey or questionnaire is well constructed and carefully administered, it can be an effective and efficient vehicle for assessing the attitudes, perspectives, and self-reported outcomes of the institution's most valued constituent: the learner.
Nevertheless, the degree of reliability and validity of a particular student-rating survey can be influenced by the content (items) which comprise the survey and the process by which it is administered. The following recommendations are offered as strategies for maximizing the validity, interpretability, and utility of surveys designed to solicit student evaluations of the freshman seminar. Most of the following recommendations are also relevant for improving the effectiveness of surveys designed to solicit faculty/staff evaluations of any instructor-training program that happens to be offered in conjunction with the freshman seminar.
Recommendations Regarding the Course Evaluation Instrument/Form
* Cluster individual items into logical categories that represent important course components or instructional dimensions.
For instance, include items pertaining to each of the three core components of the course experience: (a) course planning and design (e.g., questions pertaining to overall course organization and clarity of course objectives); (b) classroom instruction (e.g., items pertaining to in-class teaching, such as clarity and organization of lectures or instructional presentations); and (c) evaluation of student performance (e.g., items pertaining to the fairness of tests, assignments, grading practices, and the quality of feedback provided by the instructor). Also, a healthy balance of questions pertaining to both course content (topics and subtopics) and instructional processes (in-class and out-of-class learning activities) should be included on the evaluation form. A major advantage of this clustering or categorizing strategy is that the categories can function as signposts or retrireview cues for the designers of the survey, ensuring that the items selected for inclusion in the instrument reflect a well-balanced sample of the major course dimensions that affect the quality of the students' learning experience.
Another advantage of grouping items under section headings is that it can function as a cue or signal to students completing the instrument that there are distinct dimensions to the course. This may help them to discriminate among these important components of course effectiveness, increasing the likelihood that they will assess them independently. Lastly, partitioning the instrument into separate sections that reflect separate course dimensions should help to reduce the risk of a general "halo effect," i.e., the tendency for a student to complete the evaluation instrument by going right down the same column and filling in all "1s" or "5s" on all items, depending on whether they generally liked or disliked the course.
* Provide a rating scale that allows five-to-seven choice points or response options. There is research evidence which suggests that fewer than five choices reduces the instrument's ability to discriminate between satisfied and dissatisfied respondents, and more than seven rating-scale options adds nothing to the instrument's discriminability (Cashin, 1990).
* If possible, do not include the neutral "don't know" or "not sure" as a response option. This alternative could generate misleading results because it may be used as an "escape route" by students who do have strong opinions but are reluctant to offer them (Arreola, 1983).
* Include items that ask students to report their behavior.
Astin (1991) suggests a taxonomy for classifying types of data that may be collected in the assessment process that includes two broad categories: (a) psychological data reflecting students' internal states, and (b) behavioral data reflecting students' activities. Traditionally, student course evaluations have focused almost exclusively on the gathering of psychological data (student perceptions or opinions). However, given that one of the major goals of most freshman seminars is to increase students' use of campus services (Barefoot & Fidler, 1996), items which generate behavioral data pertaining to use of campus services, or frequency of participation in co-curricular activities, should also be included on the evaluation instrument.
It should be noted that commercially developed instruments are available to assess students' reported behaviors and their degree of involvement with campus services and student activities, such as (a) "The College Life Task Assessment Instrument" (Brower, 1990, 1994), (b) the "College Student Experiences Questionnaire" (CSEQ)(Pace, 1984), and (c) the "Critical Incident Techniques & Behavioral Events Analysis" (Knapp & Sharon, 1975).
This raises the larger issue of whether the college should rely exclusively on locally developed ("home grown") assessment instruments for evaluating the freshman seminar, or whether they should purchase externally constructed ("store bought") instruments from testing services or centers. There are some advantages associated with external, commercially developed instruments, namely: (a) Their reliability and validity are usually well established. (b) They are efficient, saving time that the institution would have to devote to instrument construction and scoring. (c) Norms are typically provided by the testing service that allow the institution to gain a comparative perspective by assessing its own performance relative to national averages.
However, a major disadvantage of externally developed instruments is that the questions asked and results generated may be less relevant to the institution's unique, campus-specific goals and objectives than those provided by internally constructed instruments. Some testing services attempt to minimize this disadvantage by allowing the college the option of adding a certain number of their own questions to the standardized instrument. The availability of this option should be one major factor to consider when deciding whether or not to purchase an external instrument.
Another factor to consider would be the purchase and scoring costs associated with the use of an externally developed instrument relative to the anticipated cost (in time and money) to the college if it developed and scored its own instrument. Supporting the latter approach are the results of one college-based review of assessment methods in which it was concluded that the cost of externally developed surveys make them viable only for "one-time projects or single, annual projects, and only if campus-based expertise is unavailable" (Malaney & Weitzer, 1993, p. 126).
Peter Ewell, a nationally recognized assessment scholar, warns about another subtle disadvantage associated with what he calls "off the shelf" (externally developed) instruments.
Off-the-shelf instruments are easy. Although they cost money, it's a lot less than the effort it takes to develop your own. But buying something off-the-shelf means not really engaging the issues that we should—for example, What are we really assessing? and What assumptions are we making about the nature of learning" (Mentkowski et al., 1991, p. 7).
To improve the reliability and validity of campus-specific instruments that are internally designed, a pilot study of the instrument should be conducted on a small sample of students to assess whether the instrument's instructions are clear, the wording of each of its items is unambiguous, and the total time needed to complete the instrument is manageable. As Fidler (1992) notes, "Seasoned researchers recommend pilot studies before every research project in order to identify and quickly alter procedures, thereby alleviating problems before they can invalidate an entire study" (p. 16).
* Beneath each rating item or question, print the phrase “Reason(s) for this rating:” and leave a small space for any written remarks students would like to make with respect to that particular item.
Written comments often serve to clarify or elucidate numerical ratings, and instructors frequently report that written comments are most useful for course-improvement purposes, especially if such comments are specific (Seldin, 1992). As Jacobi (1991) points out, "The typical survey consists of a majority of closed-ended items, with limited opportunities for open-ended responses. This format does not encourage students to explore their attitudes, feelings, or experiences in depth and therefore may provide incomplete information about why students think, feel, or behave in a particular manner" (p. 196).
Allowing students to write comments with respect to each individual item, rather than restricting them to the usual "general comments" section at the very end of the evaluation form, should serve to increase the interpretability of numerical rating, as well as increase the specificity of students' written remarks—which, in turn, should increase their usefulness for course or program improvement.
* Include at least two global items on the evaluation instrument pertaining to overall course effectiveness or course impact; these items can be used for summative evaluation purposes.
The following statements illustrate global items that are useful for summative evaluation:
(a) I would rate the overall quality of this course as: (poor <- - -> excellent).
(b) I would rate the general usefulness of this course as: (very low <- - -> very high).
(c) I would recommend this course to other first-year students: (strongly agree <- - -> strongly disagree).
Responses to these global items can provide an effective and convenient snapshot of students' overall evaluation of the course which can be readily used in program assessment reports. Research has repeatedly shown that these global ratings are more predictive of student learning than student ratings given to individual survey items pertaining to specific aspects or dimensions of course instruction (Braskamp & Ory, 1994; Centra, 1993; Cohen, 1986). As Cashin (1990) puts it, global items function "like a final course grade" (p. 2).
Abrami (1989) argues further that, "it does make conceptual and empirical sense to make summative decisions about teaching using a unidimensional [global] rating. This choice then frees us to recognize that the particular characteristics of effective teaching vary across instructors" (p. 227). Thus, ratings on such unidimensional or global items may be used to make summative (overall) assessments of the course or instructor. However, it is imperative not to add up the ratings for all individual items on the questionnaire and then average them in order to obtain an overall course evaluation. This procedure not only is inefficient, it is also an ineffective index of overall course satisfaction because it gratuitously assumes that each individual item carries equal weight in shaping the students' overall evaluation of the course.
Inclusion of global items on the evaluation instrument also allows for the examination of relationships between students' overall course ratings and their ratings on individual items pertaining to specific course dimensions. Such comparisons could be helpful in answering the following question: Among those students who give the course very high global ratings versus those students who give the course very low overall ratings, on which particular items in the evaluation instrument do these two groups of students display the largest discrepancy in ratings? These particular items could reveal those specific aspects or dimensions of the course which carry the most weight in determining students' overall perceptions and their overall level of satisfaction with the freshman seminar. These specific course dimensions may represent key target areas for course improvement which could be addressed in instructional-development workshops and new-instructor orientation programs.
* Include an open-ended question asking for written comments about the course's strengths and weaknesses, and how the latter may be improved.
Such questions can often provide useful information about students' general reaction to the course as well as specific suggestions for course improvement. For example, at the end of the semester, course instructors could ask students to provide a written response to a question which asks them to "describe a major change (if any) in their approach to the college experience that resulted from their participation in the freshman seminar." Or, students could be asked, "What would you have liked to learn about being a successful student that was not addressed in the freshman seminar?" The written responses given to these questions by individual students in separate class sections could be aggregated and their content analyzed to identify recurrent themes or response categories.
* Provide some space at the end of the evaluation form so that individual instructors can add their own questions (Seldin, 1993).
This practice enables instructors to assess specific instructional practices that are unique to their own course section. Also, this option should serve to give instructors some sense of personal control or ownership of the evaluation instrument which, in turn, may increase their motivation to use the results in a constructive fashion.
* Give students the opportunity to suggest questions which they think should be included on the evaluation form.
This opportunity could be cued by a prompt at the end of the evaluation form, such as, "Suggested Questions for Future Evaluations." This practice has three major advantages: (a) It may identify student perspectives and concerns that the evaluation form failed to address, and (b) it shows respect for student input and (c) it gives students some sense of control or ownership of the evaluation process. * Avoid vague or ambiguous descriptors which may be difficult to interpret, or whose meaning may be interpreted by different students in different ways (for example, "Pacing of class presentations was ‘appropriate’.")
* When soliciting information on the incidence or frequency of an experienced event (e.g., "How often have you seen your advisor this semester?"), avoid response options that require high levels of inference on the part of the reader (e.g., "rarely"-"occasionally"-frequently"). Instead, provide options in the form of numbers or frequency counts which require less inference or interpretation by the reader (e.g., 0, 1-2, 3-4, 5 or more times). This practice should serve to reduce the likelihood that individual students will interpret the meaning of response options in different ways.
* When asking students to rate their degree of involvement or satisfaction with a campus support service or student activity, be sure to include a zero or "not used" option. This response alternative allows a valid choice for those students who may have never experienced the service or activity in question (Astin, 1991).
* Use the singular, first-person pronoun ("I" or "me") rather than the third-person plural ("students"). For instance, "The instructor gave me effective feedback on how I could improve my academic performance" should be used rather than, "The instructor gave students effective feedback on how they could improve their performance." The rationale underlying this recommendation is that an individual student can make a valid personal judgment with respect to her own course experiences but she is not in a position to judge and report how other students, or students in general, perceive the course.
* Avoid compound sentences that ask the student to rate two different aspects of the course simultaneously (e.g., "Assignments were challenging and fair.") This practice forces respondents to give the same rating to both aspects, even if they are more satisfied with one aspect than the other. For example, a student may feel the assignments were very "fair," but not very "challenging."
* Include one or two negatively worded items that require students to reverse the rating scale (e.g., "I did not receive effective feedback on how I could improve my performance."). Such items serve two purposes: (a) They encourage students to read and rate each item carefully, serving to reduce the frequency of "positive-response-set" mistakes (Arreola & Aleamoni, 1990) in which the respondent goes straight down a rating column and fills in a uniformly high rating for all items. (b) They could be used to identify evaluations forms which have not been completed carefully and may need to be deleted from the data analysis. For example, students who have responded thoughtlessly by filling in all positive or all negative ratings may be identified by their failure to reverse their response bias on the negative-worded item(s).
Recommendations for Administration of Course Evaluations
* The amount of time allotted for students to complete their evaluations should be standardized across different course sections.
Some consensus should be reached among course instructors regarding the minimal amount of time needed for students to complete evaluations in a thoughtful fashion. Further temporal standardization could be achieved if seminar instructors would agree on what time during the class period (e.g., the first or last 15 minutes of class) would be best for administering course evaluations. One argument against the common practice of having students complete their evaluations at the very end of the class period is that it could result in less carefully completed evaluations because students may be tempted to finish quickly and leave early Another temporal factor for consideration is the time during the academic term or semester when course evaluations are administered. One option is to administer the evaluations immediately after the final exam of the course. This provides two advantages: (a) It allows students to truly assess the whole course because the final exam represents its last key component. (b) Students are not likely to be absent on the day of the final exam, so a larger and more representative sample of students would be present to complete the course evaluation than if it were administered on a routine day of class.
However, a major disadvantage of administering evaluations immediately after students complete the final exam is that students are more likely to be preoccupied, anxious, or fatigued by the just-completed exam. This may result in evaluations that are filled out more hurriedly, with fewer written comments, and less overall validity. This disadvantage is a major one and probably outweighs the advantages associated with having students evaluate the course after completing the final exam.
Perhaps the best approach is for seminar instructors to agree to administer the evaluation instrument as close to the end of the course as possible (e.g., during the last week of the term), but not immediately after the final exam. Also, this approach would better accommodate those instructors who elect not to administer a final examination in the freshman seminar.
One last consideration with respect to the issue of when course evaluations are administered is the "burn-out" or fatigue factor that may come into play when students are repeatedly required to fill out course evaluations in all their classes at the end of the term. To minimize the adverse impact of fatigue or boredom that may accompany completion of multiple course evaluations, it might be advisable to try to administer the course-evaluation instrument to freshman seminar students at a time that does not coincide with administration of the college's standard course evaluation forms.
* Instructions read to students immediately before distribution of the evaluation forms should be standardized for all course instructors and all course sections. Some research has shown that student ratings can be affected by the wording of instructions that are read to students just prior to administration of the evaluation instrument (Pasen, et al., 1978). For instance, students tend to provide more favorable or lenient ratings if the instructions indicate that the evaluation results will be used for decisions about the instructor's "retention and promotion," as opposed to students being told that the results will be used for the purpose of "course improvement" or "instructional improvement" (Braskamp & Ory, 1994; Feldman, 1979). Thus, instructions read to students in different sections of the freshman seminar should be consistent (e.g., the same set of typewritten instructions read in each class).
* Instructions read prior to distribution of evaluation forms should effectively prepare or prime students for the important role they play in evaluating the freshman seminar and provide students with a positive "mental set".
To increase students' enthusiasm for course evaluation and improve the validity of the results obtained, consider including the following information in the instructions read to students prior to course evaluation.
(a) Remind students that evaluating the course is an opportunity for them to provide meaningful input that could improve the quality of the freshman seminar for many future generations of first-year students.
(b) Explain to students why the evaluations are being conducted (e.g., to help instructors improve their teaching and to improve the quality of the course). If items relating to specific course characteristics are to be used for instructional improvement purposes and global items for overall course-evaluation or instructor-evaluation purposes, then this distinction should be mentioned in the instructions.
(c) Assure students that their evaluations will be read carefully and taken seriously by the program director as well as the course instructor.
(d) Acknowledge to students that, although they may be repeatedly completing evaluations for all courses they are taking, they should still try to take the time and effort to complete the freshman-seminar evaluation form thoughtfully, because it is a non-traditional course in terms of both its content and method of instruction.
(e) Remind students that they should avoid the temptation to give uniformly high or uniformly low ratings on every item, depending on whether they generally liked or disliked the course or the course instructor. Instead, remind them to respond to each item independently and honestly.
(f) Encourage students to provide written comments in order to clarify or justify their numerical ratings, and emphasize that specific comments are especially welcome because they provide instructors with valuable feedback on course strengths and useful ideas for overcoming course weaknesses.
(g) Inform students what will be done with their evaluations once they have completed them, assuring them that their evaluations will not be seen by the instructor before grades have been turned in (Ory, 1990), and that their hand-written comments will be converted into typewritten form before they are returned to the instructor. (The latter assurance can alleviate student fears that their handwriting will be recognized by the instructor; without this assurance, students may be inhibited about writing any negative comments on the evaluation form. The need to prepare students for their role as evaluators of course instruction, and the role of a freshman orientation course for providing this preparatory experience, are suggested by the former president of the Carnegie Foundation for the Advancement of Teaching, Ernest Boyer:
We urge that student evaluation be used . . . but for this to work, procedures must be well designed and students must be well prepared. It's a mistake to ask students to fill out a form at the end of a class without serious consideration of the process. Specifically, we urge that a session on faculty assessment be part of freshman orientation. All incoming students should discuss the importance of the process, and the procedures used (1991, p. 40).
* The behavior of instructors during the time when students complete their evaluations should also be standardized.
The importance of this practice is supported by research indicating that student ratings tend to be higher when the instructor remains in the room while students complete the course-evaluation form (Centra, 1993; Feldman, 1989; Marsh & Dunkin, 1992). The simplest and most direct way to eliminate this potential bias is for the instructor to be out of the room while students complete their evaluations (Seldin, 1993). This would require someone other than the instructor to administer the evaluations, such as a student government representative or a staff member. Some faculty might resist this procedure, particularly if there is no precedent for it at the college. If such resistance is extreme and widespread, even after a clear rationale has been provided, then the following alternatives might be considered: (a) The instructor asks a student to distribute the forms and then the instructor leaves the room while students complete their evaluations. (b) The instructor stays in the room but does not circulate among the class while students complete their course evaluations; instead he or she remains seated at a distance from the students (e.g., at a desk in front of the class) until all evaluations have been completed.
Whatever the procedure used, the bottom line is that variations in how instructors behave while students complete course evaluations should be minimized. Instructor behavior during course evaluation is a variable which needs to be held constant so it does not unduly influence or "contaminate" the validity of student evaluations of the freshman seminar.
Recommendations for Analyzing, Summarizing & Reporting the Results of Course Evaluations
Report both the central tendency and variability of students' course ratings.
Two key descriptive statistics can effectively summarize student ratings: (a) Mean (average) rating per item, which summarizes the central tendency of student ratings, and (b) standard deviation (SD) per item, which summarizes the variation or spread of student ratings for each item. Theall and Franklin (1991) succinctly capture the meaning and significance of including standard deviation in the analysis and summary of students’ course ratings.
The standard deviation for individual items is an index of agreement or disagreement among student raters. Perfect agreement yields a standard deviation of 0. Deviations of less than 1.0 indicate relatively good agreement in a 5-point scale. Deviations of 1.2 and higher indicate that the mean may not be a good measure of student agreement. This situation may occur when opinion in a class is strongly divided between very high and very low ratings or, possibly, is evenly dispersed across the entire response scale, resulting in a mean that does not represent a “typical” student opinion in any meaningful sense. Because students vary in their needs, a teacher [or course] may be “among the best” for some and at the same time “among the worst” for others. A mean of 3.0 or 3.5 [on a 5-point scale] cannot be construed to represent “average” performance in the sense of middle-range performance when the mean is simply an artifact of strong disagreement among students. The standard deviation is therefore an important source of information about student opinion (p. 90).
In addition to computing the means and SDs for student ratings received by individual instructors in their own course sections, these statistics can also be computed for all class sections combined, thereby allowing individual instructors to compare the mean and SD score for ratings in their own section with the composite mean and standard deviation calculated for all sections. Computing section-specific and across-section (composite) means and SDs for each item on the evaluation instrument also allow for the application of statistical tests to detect significant differences between the instructor's section-specific ratings and the average rating of all course sections combined. The results of these significance tests could provide valuable information that can be used diagnose instructional diagnosis and improvement. For instance, if an instructor's rating on an item is significantly below the collective mean for that item, it may suggest to the instructor that this is one aspect of his instruction that needs closer attention and further development. In contrast, if an instructor's mean rating on a given item is significantly above the overall mean for all course sections on that item, then this discrepancy suggests an instructional strength with respect to that particular course characteristic. What the instructor is doing to garner such a comparatively high rating might be identified and shared with other faculty who are teaching the course.
Identification and sharing of strategies for instructional improvement is an essential component of student ratings assessment, and it is a form of feedback that is frequently ignored or overlooked (Stevens, 1987; Cohen, 1990). As Stevens contends, "The instructor must learn how to design and implement alternative instructional procedures in response to feedback, which means that a coherent system of instructional resources must be easily available to the instructor. Without such a system, the instructor may be unable to gain the knowledge or support that is necessary to effect change" (1987, p. 37)
One non-threatening way to provide course instructors with specific strategies for instructional improvement is to create opportunities for instructors to share concrete teaching practices that have worked for them. Strategies could be solicited specifically for each item on the evaluation form and a compendium of item-specific strategies could then be sent to all instructors--ideally, at the same time they receive the results of their course evaluations. In this fashion, instructors are not only provided with a descriptive summary of student-evaluation results, but also with a prescriptive summary of specific strategies about what they can do to improve their instructional performance with respect to each item on the evaluation instrument. Moreover, involving course instructors in the development and construction of these strategies serves to (a) actively engage them in the quest to improve course instruction, (b) increases their sense of ownership or control of the course, and (c) treats them like responsible agents (rather than passive pawns) in the assessment process. As Paul Dressel recommends, "Evaluation done with or for those involved in a program is psychologically more acceptable than evaluation done to them" (1976, p. 5, underlining added).
The importance of providing specific teaching-improvement feedback to course instructors is underscored by research indicating that (a) instructors prefer feedback that is specific and focused on concrete teaching behaviors (Murray, 1987; Brinko, 1993), and (b) specific feedback is more effective for helping recipients understand their evaluation results and for helping them to improve their instructional performance (Goldschmid, 1978; Brinko, 1993). As Wilson (1986) concluded following his extensive research on the effects of teaching consultation for improving instructors' course evaluations, "Items on which the greatest number of faculty showed statistically important change were those for which the suggestions were most concrete, specific and behavioral" (p. 209). To maximize the opportunity for instructors to make instructional improvements while the course is still in progress, it is recommended that course evaluations be administered at midterm to obtain early student feedback. Cohen (1980) conducted a meta-analysis of 17 studies on the effectiveness of student-rating feedback for improving course instruction. He found that receiving feedback from student ratings during the first half of the semester was positively correlated with instructional improvement—as measured by the difference in student ratings received at midterms (before feedback was received), and ratings received at the end of the semester (after midterm feedback had been received). These findings are consistent with those reported by Murray and Smith (1989), who found that graduate teaching assistants in three different disciplines who received instructional feedback at midterms displayed higher pre-to post-test gains in student ratings than a control group of teaching assistants who did not receive midterm feedback. Freshman seminar instructors could take advantage of this early-feedback procedure by administering student evaluations at midterms and then compare these results with those obtained at the end of the course—after some instructional change was made in response to students' midterm feedback. Thus, pre-midterm to post-midterm gain in students ratings could be attributed to the particular instructional change that was implemented during the second half of the course. This is the type of "classroom research" which has been strongly endorsed as a legitimate form of faculty scholarship (Boyer, 1991) and which integrates educational research with instructional practice (Cross & Angelo, 1988).
* To gain a reference point for interpreting student perceptions of the freshman seminar, compare student evaluations of the seminar with their evaluations of other first-semester courses.
To ensure a fair basis of comparison and a valid reference point, compare student evaluations of the freshman seminar with other courses of similar class size (e.g., a freshman course in English composition) because there is some evidence that class size can influence student ratings, with smaller classes tending to receive slightly higher average ratings than larger classes (Cashin, 1988; Feldman, 1984).
Also, depending on whether the seminar is a required or elective course, it should be compared with other first-semester courses that have the same required or elective status, because research suggests that required courses tend to receive lower student ratings than elective courses (Braskamp & Ory, 1994; Marsh & Dunkin, 1992). One institution which employed this course-comparison procedure discovered that 75% or more of their first-year students placed a higher value on the freshman seminar than they did for any other course in the college's core curriculum (Marietta College, cited in Barefoot, 1993). As Arreola and Aleamoni advise,
The use of comparative (normative) data in reporting results can . . . result in a more accurate and meaningful interpretation of ratings. For example, comparative data gathered on freshman- level courses in a department allow instructors to determine how they and their courses are perceived in relation to the rest of the courses in the department. When such comparative data are not available, instructors will be interpreting and using results in a void, with very little substantiation for their conclusions and actions (1990, p. 51).
Surveys or questionnaires could also be used to obtain a different type of comparative perspective on the freshman seminar—the retrospective perceptions of alumni. Since the course emphasizes lifelong-learning and life-adjustment skills, it might be revealing to assess how alumni, looking back on the freshman seminar from the perspective of a working professional or graduate student, would respond to the following questions posed to them via phone interviews or alumni surveys: (a) Do you view the seminar differently now than you did when you were a first-year student? (b) What aspect of the seminar is most memorable or has had the most long-lasting impact on you? (c) Do you still use any ideas or skills acquired during the freshman seminar in your professional or personal life?
It is noteworthy that the only two reported studies of alumni perspectives on the freshman seminar have both revealed that former students' retrospective evaluations of the course are very positive (Hartman, et al., 1991; University of Prince Edward, cited in Barefoot, 1993).
6. Pre-Test/Post-Test Design
To assess the amount of change in students' attitudes, reported behaviors, or academic-skill performance between the onset and completion of the freshman seminar, student responses given at the end of the course can be compared to those given before the course begins. This pre/post design involves administering an evaluation instrument, or selected items therefrom, to students on the first day of class so these responses can be used as a baseline (pre-test) against which their post-course (post-test) responses can be compared. For example, assessment of the freshman seminar at the University of Colorado (Colorado Springs) involves a pre- to post-test design that includes both student self-assessment of their communication skills and performance measures of their actual communication skills. Intriguingly, the results obtained with this research design revealed that minority students who took the freshman seminar reported greater gains in their communication skills via pre- to post-test self-assessments; however, in terms of actual performance measures, majority students evidenced greater pre- to post-test gains in communication skills (Tregarthen, Staley, & Staley, 1994).
To ensure that pre- to post-course changes can be attributed to the freshman seminar experience in particular, rather than to personal maturation or first-semester college experience in general, seminar students' pre- and post-course responses could be compared with the responses provided by other freshmen--at the beginning and end of the same semester--who have not participated in the seminar. For example, the University of South Carolina has used this strategy to assess the impact of the sexuality component of its freshman seminar. Sexual awareness surveys were administered to students prior to their participation in the freshman seminar and these surveys were re-administered upon course completion. The same surveys were administered to first-semester freshmen at the beginning of the term, and re-administered to them at the end of the term. Students who participated in the freshman seminar reported higher rates of abstinence and greater use of condoms at the end of the course than they did at the start of the course, whereas freshmen who did not take the seminar reported no decline in either their abstinence rates or condom use from beginning to end of their first semester in college (Turner, et al., 1994). One limitation of student surveys or questionnaires as assessment methods is that the resulting data consist of subjective reports, such as personal feelings and perceptions, rather than objective data reflecting behavior or performance. Having students self-report their behavior on surveys and questionnaires more directly assesses their actions and behaviors than questions asking only for student attitudes or opinions; however, the fact still remains that these are self-reported behaviors which may or may not reflect their actual occurrence. For instance, on surveys, students may make inaccurate reports of their behavior because of memory distortion, such as forgetting exactly how many times they used a student service; or, despite assurances of confidentiality, students may still report behaviors that they think will please the evaluator or confirm her hypothesis—a research phenomenon known as "respondent bias" (Fidler, 1992).
Furthermore, paper-and-pencil assessments of self-reported student behaviors conducted in classroom settings represent indirect measures of student behavior that are gathered in an artificial context. Nationally prominent scholars have argued repeatedly that colleges and universities need to supplement classroom-centered, paper-and-pencil assessment with procedures that assess student behavior in real-life settings (Kuh, et al., 1991; Pascarella & Terenzini, 1991).
Behavioral records of student activities outside the classroom can provide more direct, first-hand data on the incidence or frequency of important student behaviors in college; and since these data reflect behavior that takes place in a more realistic environment than the classroom, they may be said to have a greater degree of "ecological validity" (Harrison, 1979).
Listed below are examples of behavioral records that may be relevant to the assessment of the impact of the freshman seminar.
* Logs kept by student-service providers to track (a) the frequency with which students utilize support programs (e.g., incidence and frequency of student use of the career center), (b) the number of contacts between students and their advisors, or (c) the number of critical incidents that have occurred on campus (e.g., number of disciplinary reports filed).
* Trace audits, a.k.a., "credit card measures" of student involvement (e.g., using student- identification cards to assess frequency of library use).
* Transcript analysis of students' course-enrollment patterns to assess the extent and nature of their progress toward completion of academic programs and degrees (e.g., number of students who change majors; time taken by students to complete programs or degree requirements).
* "Student development" or "co-curricular" transcripts of students' participation in on-campus clubs, student organizations, campus activities, and college-sponsored community services.
Using these behavioral records, comparisons can be made between the behavior of students who have participated in the freshman seminar and students who have not experienced the course. Admittedly, colleges and universities typically have not kept careful records of the above-mentioned student behaviors, but if such record keeping is built into the freshman-seminar assessment plan, then it may serve as a "spark plug" for igniting institutional commitment to the systematic collection and analysis of students' behavioral records.
8. Qualitative Research Methods
Quantitative data, such as survey ratings and behavioral measures, provide evaluative information that can be readily summarized and manipulated numerically. Such data can be scored efficiently by machine or computer and are amenable to statistical analysis (e.g., correlation coefficients or chi-square analysis). In contrast, qualitative data take the form of human actions and words (e.g., students' written or verbal comments) and they are analyzed by means of "human instruments" (Kuh, et al., 1991, p. 273). Also, in contrast to the hypothesis testing and scientific methodology that characterizes quantitative research, qualitative research is "exploratory [and] inductive, . . . one does not manipulate variables or administer a treatment. What one does is observe, intuit, [and] sense what is occurring in natural settings" (Merriam, 1988, p. 17). Increasing emphasis is being placed on qualitative research in higher education (Fidler, 1992). Some of its more radical proponents argue that it should displace or replace the dominant quantitative paradigm which, they argue, has exerted an almost hegemonic hold on the research methodology used in education and the social sciences (Duffy & Jonassen, 1992). On the other hand, those in the quantitative camp argue that qualitative research often lacks reliability or objectivity, yielding data that are dangerously subject to biased interpretation (Reigeluth, 1992). As is usually the case with such thesis-antithesis dichotomies, an effective synthesis lies somewhere between these two polar positions. While acknowledging that quantitative and qualitative research emerge from contrasting philosophical traditions and rest on very different epistemological assumptions (Smith & Heshusius, 1986), the position taken here is that the data generated by these two styles of inquiry can provide complementary sources of evidence, with the disadvantages of one method being offset or counterbalanced by the advantages of the other. For instance, students' written comments on surveys can be used to help interpret the average scores computed for numerical ratings, while the average rating scores can be used to counterbalance the tendency to draw overgeneralized conclusions from several written comments that happen to be particularly poignant and powerful, but which are not representative of students as a whole.
Indeed, two of the most prolific and highly regarded quantitative-oriented researchers in higher education, Ernest Pascarella and Patrick Terenzini, have argued for complementing quantitative methods with qualitative research procedures: "[An] important direction of future research on college impact should be a greater dependence on naturalistic and qualitative methodologies. When employed judiciously, such approaches are capable of providing greater sensitivity to many of the subtle and fine-grained complexities of college impact than more traditional quantitative approaches" (1991, p. 634).
Among program-evaluation scholars, it is almost axiomatic that use of "multiple measures" in the assessment process represents a more reliable and valid procedure than exclusive reliance on a single research method or data source (Wergin, 1988). Including multiple measures in the assessment plan for the freshman seminar increases the likelihood that subtle differences in the effects of the course will be detected. Use of multiple methods also can be used to demonstrate a consistent pattern of results across different methods--a cross-validation procedure known in the assessment literature as "triangulation" (Fetterman, 1991) or "convergent validity" (Campbell & Fiske, 1959). Such cross-validation serves to minimize the likelihood that the results obtained are merely an artifact of any one single method used to obtain them, and it magnifies the persuasive power of the results obtained so that they may be used more effectively to convert course skeptics.
As Dorothy Fidler notes in her primer for research on the freshman year experience, "All research designs have strengths and weaknesses; therefore, no single piece of research can fully answer a research question. Researchers can select between qualitative or quantitative designs, and ideally a body of literature contains both types of designs in order to balance the strengths and weaknesses of each" (1992, p. 11). Consequently, a comprehensive and well-balanced assessment of the freshman seminar should include not only quantitative methods, but also qualitative methods, such as those described below.
* Analysis of Students' Written Comments
Written comments made on student surveys can provide a good source of qualitative data. These comments may be difficult to summarize and manipulate statistically, but they have the potential for providing poignant, in-depth information on the course's strengths and weaknesses, as well as providing an index of students' subjective feelings about the course. As Davis and Murrell note, "These descriptions provide much texture and offer rich, often powerful images of the college experience" (1993, p. 50).
Historically, surveys and questionnaires have not been considered to be qualitative research methods because they generate quantitative data (numerical ratings). However, written comments made by respondents to clarify their ratings do represent legitimate qualitative data, the content of which can be analyzed and classified systematically. Students' written comments can be gathered while the course is still in progress by administering an open-ended questionnaire at midterm, or in a less formal but more ongoing manner, by inviting students to deposit written comments about the course in a suggestion box hung near the classroom door or the instructor's office door.
Another potential source of students' written comments for assessment are personal documents, which qualitative researchers describe broadly as "any first-person narrative that describes an individual's actions, experiences, and beliefs" (Bogdan & Biklen, 1992, p. 132). For example, student journals used in the freshman seminar qualify as personal documents that could be reviewed to gain insight into student feelings about the course and their first-semester. One freshman seminar instructor has crafted a final assignment which requires her students to write a letter to incoming freshmen, advising these prospective students about what to do, and what not to do, in order to be successful during the first semester of college life (Linda Rawlings, personal communication, December 19, 1997). Her analysis of the written comments made by students in their letters has provided her with useful ideas for developing course topics and course assignments, as well as providing the college's retention committee with information that may be used to improve student satisfaction with the institution.
One particular qualitative-research method that can be used to enhance the representativeness and meaningfulness of students' written comments is "category analysis," a procedure in which the reader engages in inductive data analysis, identifying common themes that emerge among the comments as they are read and then organizes these comments into categories (Lincoln & Guba, 1985). A succinct procedural summary of the major steps involved in categorical analysis is provided by Bogdan & Biklen:
You search through your data for regularities and patterns as well as for topics your data cover, and then you write down words and phrases to represent these topics and patterns. These words and phrases are coding categories. They are a means of sorting the descriptive data you have collected so that the material bearing on a given topic can be physically separated from other data. Some coding categories will come to you while you are collecting data. These should be jotted down for future use. Developing a list of coding categories after the data have been collected . . . is a crucial step in data analysis (1992, p. 166).
A tally of the number of written comments per category may also be kept and reported along with the identified categories. These category-specific frequency counts can then be used as quantitative data to help summarize and interpret the representativeness of written comments (qualitative data). Such a fusion of quantitative and qualitative methods in the evaluation of narrative comments is referred to as "content analysis" (Holsti, 1969) and this procedure has already been applied to assessment of the freshman seminar (Marymount College in Barefoot, 1993).
The sheer number of positive or negative written responses students make beneath a specific item on a rating survey may itself serve as a measure of the importance or intensity of student feelings about the issue addressed by that item. As the National Orientation Directors Association (NODA) recommends for surveys of orientation programs, "Request individual written comments and provide space on the evaluation for these remarks. Participants with strong opinions about certain activities will state them if adequate space is provided. Summarize written comments in detail; consider indicating the number of times the same negative or positive comments were made" (Mullendore & Abraham, 1992, pp. 39-40).
To maximize the validity of category analysis, have two or more independent readers categorize the comments so that inter-reader (inter-rater) agreement can be assessed. When data appear in the form of written comments which are not amenable to objective and reliable machine-scored assessment, then use of multiple assessors allows for the more subjective assessments of humans to be cross-checked so their reliability or consistency can be confirmed. Also, the reading and categorizing of comments should be conducted by someone who has no vested interest in the outcome of the program so as to reduce the possibility of evaluator bias.
* Focus Groups
Succinctly defined, a focus group is a small (6-12 person) group that meets with a trained moderator in a relaxed environment to discuss a selected topic or issue, with the goal of eliciting participants' perceptions, attitudes, and ideas (Bers, 1989). In contrast to surveys or questionnaires which solicit individual students' written comments, focus-group interviews solicit students' verbal responses in a discussion-group setting. Verbal responses to questions often turn out to be more elaborate and extensive than written comments, and they may reveal underlying beliefs or assumptions that are not amenable to behavioral observation (Reinharz, 1993).
Focus-group interviews may be used in conjunction with surveys. For example, they may be used as a follow-up to surveys for purposes of gaining greater insight into the meaning of the survey's quantitative results. Interview questions may be posed to a focus group which ask them to offer their interpretation or explanation of ratings given to particular course dimensions and outcomes. Or, the order can be reversed, with focus groups conducted first to collect ideas that may later be used to develop specific items for inclusion on surveys or questionnaires. (Such combined use of qualitative and quantitative methods again reinforces the recommendation that they should be viewed as complementary rather than contradictory research methods.)
Another, more tacit advantage of focus groups is that they serve to validate students' personal experiences by sending them the message that someone at the institution is genuinely interested in their feelings, opinions, and concerns. This advantage of focus groups is well illustrated by William Perry's classic interviews with Harvard students, in which he asked graduating seniors to identify the best features of their educational experience at the university (Perry, 1970). To his surprise, the feature most frequently cited by the students was the interviews he conducted with them (Wrenn, 1988). Indeed, qualitative researchers argue that engaging in dialogue with research participants, particularly those who have been marginalized in some way, serves to empower them and encourages them to gain control of their experience (Roman & Apple, 1990). To increase the representativeness and overall validity of comments obtained from students via focus-group interviews, the following practices are recommended.
* When forming focus groups, be sure that all key student subpopulations are represented (e.g., males and females, ethnic/racial minority and majority students; commuter and residential students).
This representation can be achieved in either of two ways. (a) Heterogeneous group formation whereby members of different subpopulations are represented in each focus group. The advantage of this procedure is that a cross-section of members from different subgroups are present at the same time which can serve to enrich the diversity of the focus-group dialogue.
(b) Homogeneous group formation in which members of the same subpopluations comprise separate focus groups (e.g., separate focus groups comprised entirely of minority students, commuter students, residential students, re-entry students, etc.). The primary advantage of this grouping procedure is that it allows students to share their perceptions and concerns with others who may have common experiences and with whom they may feel more comfortable expressing their views.
* Select interviewers to conduct the focus groups who (a) reflect a demographic cross-section of the campus community (e.g., males and females, majority- and minority-group members) and (b) have no vested interest or conflict of interest with respect to the outcome of the interviews. The latter caveat is particularly important because an interviewer with a favorable bias toward the freshman seminar may unwittingly signal this bias to the interviewees, perhaps leading them to provide socially appropriate comments that tend to support the interviewer's bias. (Experimental psychologists use the term "demand characteristics" to refer to such tendencies of the individual who is conducting a study to unwittingly influence research subjects to behave in a way that confirms the research's hypothesis [Orne, 1962]).
* Tape-record or videotape the focus-group sessions so that students' verbal comments can later be reviewed and transcribed into written protocols.
Written comments may be more easily analyzed and categorized than verbal responses. To reduce the usual extensive amount of time needed to convert tape-recorded interviews into written form, a machine known as a "transcriber" may be used. A transcriber allows playback of the tape to be controlled by foot pedals which facilitate starting, stopping, and rewinding the tape recorder (Bogdan & Biklen, 1992).
Guidelines previously discussed with respect to assessing students' written comments on surveys via category analysis, and improving the validity of written comments' assessment via inter-rater reliability, can also be applied to the assessment of written protocols obtained from focus groups.
* The same questions should be posed to all focus groups.
One essential attribute of qualitative research is its flexibility which allows the researcher to respond to the flow of data as they are collected, and to change directions of the research as it proceeds (Delamont, 1992). This also is a cardinal feature of the focus-group interview, and one which serves to define it as a qualitative research method (Morgan, 1988).
However, some initial standardization of questions across focus groups can serve to increase the reliability of the data obtained and their amenability to comparisons across different groups. The interviewer can still retain the freedom to create her own specific, follow-up questions to whatever student responses emerge from the initial, standardized general questions. In fact, this procedure is consistent with the qualitative research method of open-ended interviews whereby "prior to beginning the set of interviews, the researcher develops a protocol of general questions that needs to be covered; however, the researcher is free to move in any direction that appears interesting and rich in data" (Tierney, 1991, p. 9).
* Interviewers should provide the same instructions to all focus groups, particularly with respect to (a) ensuring the participants' confidentiality, (b) encouraging equal participation by all group members, and (c) describing the moderator's role as an unobtrusive, non-judgmental facilitator rather than an evaluative authority figure (Tierney, 1991).

The Assessment Report:
Summarizing, Interpreting, and Disseminating the Results
An important final step in the assessment process is the construction and distribution of an assessment report. As Fidler states strongly, "Publishing the results is a serious responsibility to those whose findings may suggest changes in educational policy and/or practice. . . . Therefore, publishing the results of a study is the final step for an ethical researcher" (1992, p. 17).
Multi-institutional survey data collected by the National Resource Center for The Freshman Year Experience & Students in Transition reveal that only 19% of colleges and universities which offer the freshman seminar report that the results of course assessment are published (Barefoot & Fidler, 1996). This finding is particularly disappointing when considered in light of previously-cited national survey research indicating that (a) the percentage of institutions reporting strong support for their freshman seminar is declining, and (b) the total percentage of institutions offering freshman seminars has remained constant despite an increasing number of institutions reporting that they have recently adopted a freshman seminar. Together, these two findings suggest that some adopted freshman seminars are losing their support and are being discontinued (Barefoot & Fidler, 1996).
A well-written report of the results of assessment may spell the difference between continued support or elimination of an educational program. As Trudy Banta notes, "Assessment information is often particularly useful in selling a decision once it has been taken. Because it is concrete, such information can be extremely effective in communicating needs and priorities to those responsible for making resource-allocation decisions" (1988, p. 24). This recommendation is particularly pertinent to the freshman seminar because of its perennial struggle to gain institutional credibility and credit-earning status (Gardner, 1989).
Furthermore, publication and dissemination of assessment results can have a positive impact on the morale of those involved with the program by enabling them to "see" tangible results for their efforts and to be recognized publicly for their contributions. This can serve to revitalize their interest in the program and reinforce their continuing commitment to the freshman seminar which, in turn, should increase to the course's prospects for long-term survival.
It is for these reasons that a well-written and widely distributed freshman-seminar assessment report is strongly recommended. The following suggestions are offered as strategies for enhancing the report's quality and impact. * Relate the assessment results to the college mission statement and to specific institutional goals.
As Peter Ewell (1988) suggests, "A critical task for institutional researchers is the transformation of data into useful information [and] the usefulness of information will be determined by its reasonable and demonstrable linkage to particular institutional goals" (p. 24).
The viability of this recommendation for freshman-seminar assessment is promising because the course pursues student-centered objectives and holistic-development goals which are often strikingly consistent with the majority of college mission statements. Mission statements tend to embrace institutional goals that are much broader than subject-specific knowledge acquisition and thinking, including educational outcomes that are more often psychosocial, experiential, and student-centered (Kuh, Shedd, & Whitt, 1987; Lenning, 1988). Consequently, the student outcomes associated with freshman-seminar assessment are likely to be compatible with the realization of most institutional goals. Capitalizing on this fortuitous compatibility should serve to increase the persuasive scope and power of the freshman- seminar assessment report.
* Report results for different subpopulations of students.
The outcomes of an educational program can vary for different student subpopulations or subgroups on campus. For example, the freshman seminar may have different effects on commuter students versus residents, males versus females, minority versus majority students, and traditional-age versus re-entry students. To allow for such comparative analyses across student subpopulations, a student demographics section should be included on all assessment instruments so that the respondents' subgroup can be identified. It is advisable to include a short statement accompanying this request for demographic information which assures the respondent that this information will be used only for subgroup analysis and that the individual's anonymity or confidentiality will be preserved.
Interesting interactions can emerge from subgroup analyses that otherwise might otherwise be missed if results are reported only in the form of aggregated data that have been gathered on all students and collapsed into one "average" profile. Important differences among student populations may be masked or canceled out in this averaging process, concealing the unique effects of the seminar on particular student subgroups. The significance of this recommendation is illustrated by the results of one campus-specific study which demonstrated that black and white students at the same institution report (a) different degrees of satisfaction with their college experience, (b) perceive their college along different dimensions, and (c) find different aspects of the college environment to be more salient (Pfeifer & Schneider, 1974). Also, the nature and degree of college adjustment and satisfaction experienced by female re-entry students has been found to differ substantially from those of re-entry males (Hughes, 1983). With respect to the effects of the freshman seminar in particular, Fidler & Godwin (1994) have discovered that African-American freshmen who participate in University 101 at the University of South Carolina display significantly higher retention rates than do white students who take the course.
* Include discussion of how the assessment results can be put to use, i.e., what specific action
steps could be taken in response to the assessment results.
Analyzing and summarizing the results are two important elements of an assessment report, but the report is not complete until it includes at least some discussion of practical implications and intended action strategies. The distinction between these components of an assessment report is clearly articulated by Alexander Astin: "Analysis refers primarily to the statistical or analytical procedures that are applied to the raw assessment data and to the manner in which the results of these analyses are displayed visually; utilization has to do with how the results of assessment analyses are actually used by educators and policy makers to improve the talent development process" (1991, p. 94).
A common complaint about assessment initiatives in higher education is that they frequently fail to "close the loop" (Johnson, 1993, p. 7), i.e., the results often sit in some office without any meaningful follow-up action. To ensure that freshman-seminar assessment efforts do not culminate in the same state of "permanent storage," a well-defined plan for follow-up action should be incorporated into the assessment report. This plan should include answers to the following implementation questions: (a) What needs to be done? (b) Who would do it? (c) When could action be initiated and completed? and (d) What anticipated obstacles or roadblocks need to be overcome in order to initiate, execute, and complete the action plan?
The ultimate power of an assessment report rests not in the sheer compilation of data, or even the knowledge gained from its analysis, but in the conversion of acquired data and knowledge into informed practice. As the influential empirical philosopher, Francis Bacon, once stated: "Knowledge is power; but mere knowledge is not power; it is only possibility. Action is power; and its highest manifestation is when it is directed by knowledge" (quoted in Nemko, 1988, p. 6).
* Tailor the content and tone of the assessment report to the specific interests and needs of the audience who will receive it.
As Gary Hanson advises, "Consider the eventual audience who will receive the data and what form they need it to be in so the audience can make sense of them. Data that researchers or faculty find [to be] of the greatest interest may not necessarily be the kind of data that program administrators need" (1982, p. 56).
If budget-conscious administrators are the target audience for the assessment report, then the freshman seminar's impact on enrollment management and institutional finances should be explicitly emphasized or highlighted. If the audience is faculty, then assessment results pointing to the freshman seminar's impact on improving students' academic skills and preparedness should be showcased.
Some assessment scholars recommend that separate summaries of program-evaluation reports should be prepared for administrators, for faculty, and for broad, institution-wide distribution (Marcus, Leone, & Goldberg, 1983). At first glance, this recommendation may appear to be onerous, but it could be accomplished expeditiously by simply merging and/or deleting certain sections of the total report to customize it for different audiences. Time-conscious and information-overloaded world of academe, faculty and administrators are more inclined to read and react to shorter reports that relate specifically to their immediate concerns.
* Deliver the assessment report at a time when its intended audience will be most likely to attend and respond to it.
Some forethought should be given to the optimal timing for delivery of the assessment report. There may be natural rhythms or temporal junctures during the academic year when different campus constituents will be more receptive and responsive to assessment reports. For instance, reception of a report prior to Christmas or summer vacation may not generate the same timely response than it would if it were received at the beginning of an academic term, when members of the college community are more likely to be energized, and when they will be working on campus for an extended period of time following reception of the report. As Krone and Hanson confess about an assessment project they conducted,
The results were disseminated in the last weeks of the semester, and much of the impact was lost . . . . One of the most important things we learned from conducting this study was that other factors played a significant role in how much attention is given to the final results. When other competing activities—or bad timing—interfere, it is very difficult to get people to pay attention to even very significant findings (1982, pp. 107 & 109).

The Future of Freshman Seminar Assessment:
Additional Outcomes and Alternative Methods
To date, the effects of student participation in the freshman seminar on the following outcomes have not been systematically evaluated and, thus, are recommended as target areas for future research and assessment.
1. Assessment of What Particular Course Content (Topics) and Instructional Processes (Teaching Strategies) Have the Most Significant Impact on Student Outcomes
This should become a key issue for future research on the freshman seminar because so much evidence has already been amassed in support of the general conclusion that the freshman seminar is effective for promoting student retention and academic success. Now it may be time to move on to develop methodologies for "teasing out" what specific course components or instructional methods contribute most heavily to the seminar's overall effectiveness. For example, the differential impact of the following characteristics of freshman seminars might be fertile areas for future research.
(a) Whether an instructor-training program experienced by freshman seminar instructors prior to teaching the course serves to increase the seminar's positive impact on student outcomes and, if so, whether variations in the training program's duration, content, or process of delivery are variables that have differential effects.
(b) Whether the seminar's impact on student outcomes varies depending on who teaches the course (e.g., faculty or staff; teaching teams or a single instructor).
(c) Whether different types of student assignments required in the course result in different outcomes (e.g., whether an assignment requiring students to meet with their advisor and develop a long-range educational plan has any impact on subsequent advisor contact, student retention, and/or student advancement).
(d) Whether offering the freshman seminar in a course-linking format, whereby seminar students enroll simultaneously in another course to form a more extended "learning community," results in different outcomes than offering the freshman seminar as a stand-alone (independent) course.
Admittedly, teasing out these subtle variations and nested components of the freshman seminar pose a challenging research task. To this end, the following strategies are offered as possible methodological tools for disaggregating or disentangling the effects of embedded elements and separate components that comprise freshman seminars. This strategy might be particularly viable for colleges which are about to initiate a freshman seminar program. Instead of offering one content-standardized seminar in all course sections, different types of seminars with different types of course content could be offered (e.g., academic-skills seminars and holistic-development seminars). These different seminar types could then be compared in terms of their effects on important student outcomes such as retention and academic achievement, relative to a matched control group of freshmen who do not experience any type of freshman seminar. This recommendation is consistent with Barefoot and Fidler's call for "case study research of specific seminars in each category" (Barefoot & Fidler, 1992, p. 64).
* For institutions already offering one basic type of freshman seminar, it might be useful to attempt to identify subtle differences in outcomes that happen to emerge across different sections of the course.
Any course which is offered in multiple sections has the potential for providing investigators with a very useful research design for gathering information about how differences in instruction can influence student outcomes. Between-section differences in student characteristics can be controlled for, either by random assignment or by checking the background characteristics of students after they have selected their course sections in order to ensure that they do not vary systematically across sections. If between-section differences are minimized further by use of (a) common course objectives, (b) common syllabus, (c) common textbook, (d) common class sizes, and (e) a common evaluation form designed to assess the same student outcomes, then a "multisection design" is created which maximizes the likelihood that differences in outcomes that emerge between course sections can be attributed to differences in the nature of instruction delivered in different sections of the course (Cohen, 1981; Abrami, d'Apollonia, & Cohen, 1990).
For instance, suppose one desired outcome of the freshman seminar is to increase student utilization of campus support services. If course evaluations obtained in one particular section of the course indicate that students in this section report substantially greater use of support services than the course average for all seminar sections, then this finding may suggest that a specific approach or method used by the instructor in that individual course section is highly effective for achieving this particular outcome. Such fine-grained analyses conducted across course sections could unearth other section-specific differences in outcomes. Such differences may emerge naturally due to subtle variations in approaches to course content and instructional delivery that occur from section to section of the freshman seminar when it is offered in a multiple-section format. Detection of these subtle differences may be facilitated further if course-evaluation instruments are deliberately designed to include items that ask students about the impact of specific course topics and specific instructional processes on specific outcomes.
Multiple regression analysis might also be adapted and used as a strategy for teasing out the effects of different aspects of the course on different student outcomes. Particular types of course experiences might be entered as variables in the multiple regression equation (e.g., number of writing assignments or number of student-faculty conferences outside of class) to assess whether these course-experience variables have any differential impact on student outcomes.
* To better assess the impact of specific course characteristics, end-of-course evaluations should be supplemented with more frequent and focused assessments administered immediately after students experience individual units of instruction, particular pedagogical procedures, or particular course assignments.
One way to obtain feedback on specific course experiences is the "minute paper," a strategy developed by an engineering professor at the University of California-Berkeley (Davis, Wood, & Wilson, 1983) and popularized by Cross and Angelo (1988). Minute papers require students to provide a short (one-minute) answer in response to a teacher-posed question at the end of a class session or learning activity (e.g., "What was the most memorable or useful thing you learned today?"). The nature of the question can vary, depending on the nature and objective of the class session, but the one constant among all types of minute papers is that it provides the instructor with frequent and immediate feedback on how students respond to specific course experiences.
Such focused forms of assessment could be adopted in the freshman seminar to provide instructors with direct student feedback on individual course components which otherwise may be masked or "averaged out" when students only report their perceptions of the entire course experience after its completion. One college has already adopted this course-component assessment strategy in the freshman seminar, going as far as having students assess the course on a class-by-class basis throughout an entire semester (Zerger, 1993).
Frequent assessments of specific course components, immediately after students experience these components during the term, may also provide assessment data of greater validity because they are more narrowly focused and more readily recalled by students. In contrast, the standard, end-of-course evaluation tends to ask more global questions about general course characteristics that require long-term memory for experiences that may date back to the beginning of the term.
Assessment conducted by course instructors for purposes of grading students may be another valuable source of information for assessing the effectiveness of specific instructional techniques and teaching strategies. As Warren (1987) points out, "The information faculty members routinely gather about student learning is too valid to be limited to the assignment of course grades. Even their intuitive sources, if examined closely, might be given a sufficiently systematic structure to provide documented judgments" (p. 6).
Pat Cross has developed a number of such systematic structures or procedures for college instructors which she groups under the rubric of "classroom research." As she describes it, "The specific question for classroom research is: What are my students learning in my classroom as a result of my efforts . . . ? Classrooms have daily lessons for teachers as well as for students. We only have to observe systematically and sensitively" (1990, pp. 73-74). Though Cross defines classroom research methods in terms of personal feedback for the instructor's benefit, they can also be used to assess the impact of specific course experiences. Pat Cross and Tom Angelo have compiled two teaching handbooks containing a host of very specific strategies for classroom research and assessment (Cross & Angelo, 1988; Angelo & Cross, 1993) which can be readily adapted or adopted by freshman seminar instructors.
Another strategy for obtaining specific, frequent feedback from students is to ask small groups of freshmen to provide verbal feedback on specific course components or experiences as they are experienced throughout the semester. One interesting variation on this theme is the "student advisory committee" (Haug, 1992). This type of student-centered assessment strategy is designed to provide specific and continuous feedback on course effectiveness by vesting the responsibility with a group of students, typically referred to as "student managers" or a "student advisory committee," whose role is to periodically solicit evaluative comments from their classmates and to meet regularly with their instructor throughout the term. These meetings between the instructor and student-management team members are designed to discuss students' general satisfaction with the course and the effectiveness of specific instructional practices, while the course is still in progress.
A variation of this student-feedback procedure is the "small group instructional diagnosis" (SGID) pioneered by Clark and Bekey (1979). In short, SGIDs can be defined as structured interviews with small groups of students that are conducted during the term by an outside facilitator, in the absence of the instructor, for the purpose of course improvement. These small groups (4-6 students) select a recorder and try to reach consensus on two key questions: (a) What helps you learn in this course? and (b) What improvements would you like and how would you suggest they be made? Following about 10 minutes of discussion, the student groups report 2-3 ideas for each question to the entire class. The outside facilitator, typically a faculty peer or faculty development specialist, records the students' ideas on the board and attempts to clarify and organize them into a coherent series of recommendations for the instructor (White, 1991).
All of these versions of frequent, classroom-centered assessment are consistent with the aforementioned call for a "scholarship of teaching" (Boyer, 1991). The call for a new definition of scholarship that includes research on the teaching-and-learning process, when viewed in conjunction with the historic demands placed on the freshman seminar to document its worth and justify its resources, suggests a natural marriage in which both the advancement of the freshman seminar and the professional advancement of its instructors may be simultaneously realized if instructors capitalize on the regular opportunities that are available for data collection in the classroom. Namely, classroom-research efforts may be used as formative assessment to improve specific course components and as summative assessment to document the course's overall value; at the same time, this course-related research can serve as a serving as legitimate scholarship opportunity for course instructors which may enhance their prospects for promotion and tenure.
2. Assessment of How Participation in the Freshman Seminar Influences Student Satisfaction with the College Experience
Student satisfaction with the freshman seminar has been examined extensively; however, the seminar's impact on students' overall satisfaction with the college experience has received much less attention in the research literature. One major purpose of the freshman seminar is to connect students to the institution, i.e., to its key educational agents, support services, and co-curricular opportunities. So it may be reasonable to hypothesize that these connections, if initially established via the freshman seminar, may then continue throughout the entire first-year experience and perhaps throughout the undergraduate years, thus serving to increase students' overall satisfaction with the total college experience. Relevant to this issue of the seminar's impact on students' institutional satisfaction is the historical development of the freshman seminar at the University of South Carolina. This course, which now serves as a national model, originated in a request from the college president who was seeking a vehicle for reducing the reoccurrence of "student riots" that had been triggered previously by institutional dissatisfaction and alienation (Gardner, 1981). A cogent argument for the importance of assessing the institutional satisfaction of first-year students is made by Barefoot and Fidler,
Systematic assessments of the quality of freshman life should be part of the total assessment procedure. First-year students are often compliant and reluctant to complain about even the most egregious injustices. Institutions must take the initiative in determining the existing quality of life for first-year students both in and out of the classroom (1992, p. 63).
Also, student satisfaction with the college experience may be one of the most important student outcomes to assess simply because research indicates that it is the outcome which is least influenced by students' college-entry characteristics (Astin, 1991). Alexander Astin points out the important implication of this research finding: "The practical significance of the weak association between student input [college-entry] characteristics and student satisfaction with college is that causal inferences based on correlations between environmental characteristics and student satisfaction entail fewer risks of error than do causal inferences involving most other kinds of outcome measures" (1991, p. 117).
Another major argument for the value of assessing the freshman seminar's impact on students' overall satisfaction with the institution is the well-documented association between student satisfaction with the institution and student retention at that institution (Noel, 1985). It is noteworthy that student retention pioneer, Lee Noel, and his associates, have developed a student satisfaction survey (the "Noel-Levitz Student Satisfaction Inventory") with national norms for different institutional types (e.g., community colleges, liberal arts colleges, and research universities). This instrument is designed to assess the difference between students' institutional satisfaction and their institutional expectations, which the instrument's authors refer to as the institutional "performance gap" (Noel & Levitz, 1996).
Given that a common goal of freshman seminars is to clarify the differences between high school and college, especially in terms of institutional expectations and student responsibilities, it might be interesting to assess whether the institutional performance gap between student expectations and student satisfaction would be smaller for students who experience the freshman seminar than for students who do not participate in this course during their first semester of college life.
Colleges could design their own instruments for assessing the relationship between student participation in the freshman seminar and their degree of satisfaction with the institution by developing a graduating-student survey that includes items pertaining to students' overall level of satisfaction with the college and items pertaining to their specific student experiences while at the college (such as participation in the freshman seminar). The relationship between these two variables could then be assessed by computing correlations between the college-experience variable (student participation in the freshman seminar), and the outcome variable (student satisfaction with the college experience).
3. Assessment of the Impact of Freshman Seminar Participation on Students' Choice of College Major and Time to Graduation
Approximately 50% of all freshmen are undecided about their academic major at college entry, and about half of the remaining 50% who are "decided" eventually change their major (Titley & Titley, 1980). It is estimated that, on average, freshmen will change their plans about an academic major three times before college graduation (Gordon, 1984; Willingham, 1985). Some of this indecisiveness and plan-changing is healthy, perhaps reflecting initial exploration and eventual crystallization of educational goals that naturally accompany personal maturation and increased experience with the college curriculum. However, students' indecisiveness and vacillation may also reflect confusion, or premature decision-making, due to lack of knowledge about themselves or the relationship between college majors and future careers.
Frequent or late changes in academic major can eventuate in longer time to degree completion and graduation because of additional courses that must be taken to fulfill degree requirements for the newly selected major. Such delays in degree completion due to student confusion and vacillation regarding selection of an academic major may be one factor contributing to the extended length of time it now takes college students to complete their graduation requirements. Less than half of all college students in America complete their baccalaureate degree in four years (U.S Bureau of the Census, 1994), and the number of college students who take five or more years to graduate has doubled since the early 1980s (Kramer, 1993).
It would be intriguing to assess how participation in a freshman seminar, particularly one that devotes class time and requires out-of-class assignments relating to the topics of selecting a college major and the relationship between majors and careers, might affect the number of changes in academic major made by students during their college experience and their average time to graduation. There already exists a substantial amount of research documenting the positive effect of freshman-seminar participation on student persistence to degree completion (Barefoot, 1993; Cuseo, 1991), but there appears to be only one reported study that has investigated the seminar's impact on how long it takes students to reach degree completion. Indeed, this singular study revealed that seminar participants complete their baccalaureate degree in a time period that is significantly shorter than non-participants (Central Missouri State University, cited in Barefoot, 1993).
The possibility that such reduced time to graduation could be attributed to the freshman seminar's effect on promoting earlier and more accurate crystallization of students' college major and career plans is suggested by findings reported at another institution--where longitudinal research has been conducted on seminar participants' self-reported academic and career plans prior to the course, immediately after the course, and after the third semester of college. This study revealed that students who participated in the freshman seminar report much more focused career and academic goals at the end of the course and did so, again, after completion of their third semester in college (Irvine Valley College, cited in Barefoot, 1993).
Another long-term outcome of the freshman seminar that may be worthy of future investigation is assessment of whether participation in the freshman seminar influences the type of college majors which students eventually select. For instance, would participation in the freshman seminar, particularly one which includes discussion of the meaning, value, and career relevance of the liberal arts, serve to increase the number of baccalaureate-aspiring students who eventually major in liberal arts-related fields? This potential outcome of the freshman seminar has yet to be investigated, with the exception of one four-year college which reports that its freshman-seminar participants are more likely to select a broader range of majors than students who do not participate in the course (University of Maine, cited in Barefoot, 1993).
The seminar's effect on students' eventual choice of majors and their time to graduation represent delayed assessments which take place after a period of time has elapsed between course completion and measurement of course outcomes. Such time-delayed assessments of the freshman seminar may serve as powerful testimony to the course's long-term impact. Short-term positive outcomes assessed immediately after course completion might be challenged by methodological purists who could contend that such short-term positive outcomes may represent nothing more than a "novelty effect" or "Hawthorne effect"--the tendency of research subjects to temporarily change their behavior immediately after they receive new or special treatment (Harrison, 1979). However, delayed or long-term positive outcomes of the freshman seminar cannot be dismissed on such grounds.
4. Assessing Faculty and Staff Perceptions of How Students' Participation in the Freshman Seminar Affects Their Campus Behavior and Academic Performance
It is not uncommon to hear anecdotal reports from attendees at Freshman Year Experience conferences who claim that faculty on their campus have observed that freshmen are "better prepared" for college life and know how to "behave like college students" after they have participated in the freshman seminar. It has also been the author's experience to hear student life professionals reporting that students have a greater appreciation of, and interest in co-curricular activities as a result of their participation in the freshman seminar.
Unfortunately, however, faculty and staff perceptions of how participation in the freshman seminar affects student behavior on campus have received little systematic assessment and documentation. Given that qualitative research is now experiencing a surge of national interest and respectability, perhaps application of qualitative research methods, such as focus-group interviews with faculty and staff to assess their perceptions of the value of the freshman seminar and its impact on student behavior, may represent fertile research territory for future assessments of the freshman seminar.
5. Assessment of the Impact of Freshman Seminar Participation on the Transfer Rate of Community College Students
It might be interesting for community colleges to assess whether participation in the freshman seminar affects the educational aspirations of (a) students who are undecided about eventually transferring to a 4-year institution and (b) vocationally-oriented students whose initial goal is a vocational/technical certificate or associate degree. It seems reasonable to hypothesize that the educational aspirations of both types of students may be elevated by their participation in a freshman seminar which includes coverage of such topics as (a) building academic skills and self-confidence, (b) learning how to learn, (c) motivation and goal setting, (d) the value of liberal education, and (e) the relative advantages of the baccalaureate versus an associate degree or vocational certification. Furthermore, if vocationally-oriented and transfer-oriented (baccalaureate-seeking) students are grouped together in the same sections of a community-college freshman seminar, interactions between students who are on different educational tracks may serve to further increase the seminar's potential for elevating the educational aspirations of those students initially seeking 2-year terminal degrees or vocational certificates. Assessing the influence of the freshman seminar on the educational aspirations of community college students becomes even more significant when viewed in light of research which indicates that the transfer rate of community college students who are in vocational-technical programs now equals or exceeds that of students who are in general education (transfer-track) programs (Prager, 1988). This finding may call into question the validity of drawing strong distinctions between community college students in terms of being on "transfer" or "nontransfer" tracks (Harbin, 1996), and it suggests that the educational aspirations of first-year community college students are malleable and amenable to alteration by proactive interventions, such as the freshman seminar.
Such proactive intervention would be consistent with Carey Harbin's "total transfer management" philosophy which "has as its goal the transfer of all students to a baccalaureate-degree-granting institution and is founded on the principle that all students are potential transfer candidates" (1996, p. 33). Berman et al. (1990) argue further that one criterion for assessing the quality of community colleges is "transfer effectiveness," defined as the number of students who actually transfer, compared to the number of students that were expected to transfer. This proposed criterion for assessing institutional quality would recognize those 2-year institutions which raise the educational aspirations of their students.
Thus, assessment of the freshman seminar's role as a proactive institutional strategy for increasing the transfer-effectiveness rates of community colleges may now be a valuable and timely research endeavor, particularly for those institutions interested in promoting the educational access and advancement of underrepresented students--who are much more likely to begin (and end) their college experience at community colleges than at 4-year institutions (Almanac, 1994).
6. Assessment of Whether Availability of the Freshman Seminar at an Institution can Enhance College Marketing and Student Recruitment
What impact would the availability of a freshman seminar have on attracting students to a college which offers such a course? Research and scholarship point to the conclusion that high school graduates are confused about what to expect in college (Boyer, 1987) and beginning college freshmen are reporting lack of confidence in their ability to succeed in college without the provision of support or assistance (Astin, 1994). The availability of a first-semester seminar designed to support students' high school-to-college transition, reduce their anticipatory anxiety, and promote their college success, might serve as an effective recruitment tool if consciously marketed by postsecondary institutions.
This conjecture could be assessed by means of a short survey circulated among new students, perhaps at orientation, registration, or in the freshman seminar during the first week of class, which asks first-semester students to rate or rank a list of factors that influenced their final decision to attend the institution. Each factor on the list could be rated or ranked in terms of its degree of influence on college choice, with one of the listed factors being "availability of the freshman seminar." Similarly, a short survey could be completed by high school counselors, or by high school seniors and their parents who attend a recruitment program conducted by the college (e.g., "Counselors' Day" or "Family Day"). This survey could be administered at the end of the program, requesting attendees to identify what campus characteristics or college programs they found to be most impressive or influential.
Also, if it is true that students who experience the freshman seminar are more likely to get more involved in the college experience, and go on to report greater satisfaction with the institution, then these more satisfied customers might be expected to enhance new-student recruitment by "word of mouth." This could be assessed by expanding a question that is already included on many college-admission forms, "Did anyone recommended our college to you?" If an applicant checks the box indicating that a student at the college had recommended the school, then a follow-up question could be included on the application form which asks the applicant if any particular college characteristic or program was mentioned by the student who recommended the college. To facilitate the applicant's response to this query, a checklist of college programs (one of which being the freshman seminar) could be included on the application form.
7. Assessment of Whether Student Performance in the Freshman Seminar is an Effective Predictor of Student Success During the First Year of College
At the University of South Carolina, preliminary evidence has already been reported which suggests that a failing grade in the freshman seminar is a "red flag" that can identify students who will later experience academic problems or attrition (Fidler & Shanley, 1993). At another institution, multiple-regression analysis has been used to demonstrate that the grade earned by students in its freshman seminar correlates significantly with student retention, above and beyond mere course participation and completion (Starke, 1993). Such findings suggest that students' academic performance in the freshman seminar may be predictive of their academic success, in general, during their first year of college.
If campus-specific research reveals that there is such a predictive relationship, then the institution could target intervention procedures at those freshmen who perform poorly in the freshman seminar. In this fashion, the freshman seminar could function as a prognostic and diagnostic tool for early identification and interception of potential student problems during the freshman year.
The proactive and diagnostic potential of the freshman seminar could be enhanced further if course instructors issue midterm grades or midterm progress reports which are also sent to students' academic advisors or to other academic-support professionals. Freshmen receiving grades below a certain cutoff or threshold could be contacted for consultation and possible intervention. To determine this cutoff point, research could be conducted on grade distributions in the freshman seminar to identify the grade below which a relationship between poor performance in the freshman seminar and first-year academic problems or student attrition begins to emerge.
Use of midterm grades as an "early alert" or "early warning" system is nothing new to higher education. However, a perennial problem with successful implementation of this procedure is lack of compliance; faculty may have neither the time for, nor the interest in calculating and reporting midterm grades for all their students. However, if the freshman seminar grade is a good "proxy" for first-year academic performance in general, then the midterm grade in this single course could serve as an effective and efficient early-warning signal. Moreover, given that freshman seminar instructors often self-select into the program because of their interest in, and concern for promoting the success of first-year students, their compliance rate and reliability with respect to computing and submitting midterm grades for seminar students should be very high.
Thus, assessment of the freshman seminar's potential for predicting the success of first-year students appears to be a worthwhile research endeavor. Such research could serve to demonstrate that the freshman seminar can function as the key curricular cog in an effective early-warning system designed to provide targeted, proactive support for freshman during their critical first semester of college life.
8. Assessment of Students' Retrospective Perceptions of the Freshman Seminar at Later Points in the College Experience
While there have been two reported institutional studies of alumni perception of the freshman seminar after graduation from college, there appears to be little or not research available on the retrospective perceptions of course graduates at later times during their college experience. This could be a worthwhile assessment endeavor because some of the goals of the freshman seminar involve preparation for college experiences that student encounter after course completion (e.g., selection of a college major, preparation for transfer from a 2-year to 4-year institution, career exploration and choice). It seems likely that some of the information and skills acquired in the freshman seminar may be applied, and better appreciated by students at later times in their college experience.
One possible vehicle for collecting the retrospective assessments of course graduates is by means of a freshman seminar "class reunion." One college has used this strategy to reunite freshman seminar classes during the sophomore and junior years for the purposes of re-establishing social networks among former classmates and to provide support for meeting the adjustments being experienced by these students at later junctures in the college experience ("CUNY-Baruch College Capitalizes on Freshman Seminar Reunions," 1995).
9. Assessing the Effectiveness of the Freshman Seminar as a Vehicle for Conducting Comprehensive Student Assessment at College Entry
Not only may the freshman seminar be assessed in terms of its impact on students, it may also be evaluated in terms of its impact on the institution, i.e., how it helps the college fulfill other organizational functions or needs. One such institutional benefit of the freshman seminar is its potential for serving as a mechanism through which the institution may gather comprehensive data on its freshman class at college entry (Cuseo, 1991). This is an essential first step in any effective "student tracking" system designed to assess the institutional experiences of students from entry to exit (Palmer, 1989).
For example, diagnostic assessment of beginning college students' support-service needs is now possible with the development of instruments that can reliably identify students who are "at risk" for attrition, such as (a) the Noel/Levitz College Student Inventory (Striatal, 1988), (b) the Behavioral and Attitudinal Predictors of Academic Success Scale (Walkie & Radiant, 1996), (c) the Student Adaptation to College Questionnaire (Baker & Siryk, 1989), and (d) the Anticipated Student Adaptation to College Questionnaire (Baker & Schultz, 1992). The prospects for college success of at-risk students identified by these assessment instruments could be greatly enhanced if these students experience proactively-delivered support services or early interventions that are targeted to meet their individual needs.
Institutions interested in using these diagnostic instruments for student retention-promoting purposes must find the time and place to do so. The freshman seminar could serve this function, providing a relevant curricular structure and a comfortable classroom context within with to conduct comprehensive assessment of beginning students' needs during their critical first semester in college.
Furthermore, if data gathered on students at college entry are later compared with data gathered on the same cohort of students at college graduation, then pre- to post-college comparisons can be made, thus completing the cycle needed for longitudinal (entry-to-exit) assessment. Such a longitudinal research design is required to conduct meaningful value-added or talent-development assessment (Astin, 1991). Though often used interchangeably with outcomes assessment--which involves only assessing student characteristics at graduation, value-added assessment is more ambitious. It involves assessment of the same cohort of students at both entry and exit, with the intent of determining whether student differences that happen to emerge between the start and completion of college can be attributed directly to the effect of the college experience itself (i.e., how much "value" the college experience has "added" to the student's development that would not have taken place on its own).
Alexander Astin, probably the foremost proponent of value-added assessment or "talent development" assessment, as he prefers to call it, offers the following argument.
Learning and growth take place over time and assessment cannot hope to document that growth unless it also tries to reflect how students are changing over time. This has very important implications for assessment: It means you can't learn very much from one-time administrations of achievement tests . . . . We have to make sure we're following the same students so that we have some idea who changed, how they changed, and why (quoted in Mentkowski, et al., 1991, pp. 1 & 6).
Whereas outcomes assessment is merely descriptive, i.e., simply describing what students are like at graduation, value-added assessment has the potential for providing information that is causal--suggesting that the college experience, or some element thereof, has produced or caused positive changes in student development that would not have otherwise occurred.
However, to gain this advantage for identifying causal relationships between college experiences and student outcomes, data collection must take place at two junctures, requiring administration of parallel assessments at both college entry and college graduation. Such an entry-to-exit assessment procedure may be resisted by members of the college community due to (a) logistical reasons, i.e., Where can we find the time, the place, and the students needed to administer two large-scale assessments? and (b) methodological reasons, i.e., If students are asked to volunteer their time for entry and exit assessments, would such a volunteer sample be representative of the "true" population of students at the college, or would this self-selected sample of volunteers represent a biased sample of more motivated and committed students?
A required freshman seminar may be a viable vehicle for addressing both of these concerns. The course could conveniently provide a sizable sample of entering freshmen as well as the time and the place for entry assessment to be conducted. If freshmen are required to take the course, then volunteers would not have to be solicited and the confounding effects of sampling bias would be circumvented. Also, since the seminar has a student-centered focus and often places an emphasis on student self-assessment and self-awareness (e.g., attitude surveys and self-assessment inventories administered during the seminar for the purpose of heightening students' awareness of personal interests, values, and learning styles), the course can provide a comfortable context for entry testing which, itself, is a form of self-assessment. Thus, tests administered in the context of the freshman seminar to gather student-entry data are more likely to be perceived as relevant because they relate to the course objectives and they are more likely to be seen as natural extensions of other self-assessment procedures which are already included in the course.
The upshot of these arguments is that the freshman seminar has the potential for serving as a convenient curricular conduit through which colleges can obtain the captive audience needed to collect student-entry data in a careful and comprehensive fashion. Analogously, a senior seminar could provide a viable and relevant context within which the exit component of value-added assessment could take place. As Astin points out, "Voluntary participation in follow-up testing may lead to a large amount of attrition because of noncooperation. Required participation, however, raises both logistical and ethical issues. For students, participation in cognitive post-testing might be incorporated in to the requirements of a course" (1991, pp. 168-169).
One course that could readily incorporate Astin's recommendation for exit assessment is the senior "capstone" seminar which, at some colleges and universities, already includes self-assessments as an integral part of the course (e.g., student portfolios and graduating student surveys) (Cuseo, 1998). If parallel instruments or assessment forms were to be administered in both the freshman and senior seminars, then a pre-to-post research design could be created that is conducive to value-added assessment of changes in student attitudes, values, or skills that take place between college entry and college completion. Some institutions already have experimented with this practice. For example, Canisius College (New York) administers the Cooperative Institutional Research Project (CIRP) to entering freshmen and re-administers portions of the same CIRP items to the same cohort of students during their senior year, in order to assess student change over time (Miller, Neuner, & Glynn, 1988). Recently, Alexander Astin and his associates at the University of California-Los Angeles, have developed a senior version of the CIRP, known as the "College Student Survey" (CSS) (Schilling & Schilling, 1998). By asking students similar questions about their attitudes and values, goals and expectations, and activities, the senior-focused CSS may be used in conjunction with the freshman-focused CIRP to compare the characteristics of students at college entry and college exit (graduation). Eckerd College (Florida) tests all its entering freshmen with the Graduate Record Exam (GRE) and reassesses them with the same instrument as seniors (Paskow, 1988), whereas Northeast Missouri State uses the same test/retest procedure with the American College Testing (ACT) test (Astin, 1991). Evergreen State College and other schools in a consortium of colleges in the state of Washington, administer a cognitive development instrument to beginning freshmen and exiting seniors. A "gain score" is then calculated by subtracting the freshman score from the senior score (MacGregor, 1991).
To be sure that the changes identified in students between college entry and exit are actually due to the college experience above and beyond changes due to personal maturation and life experiences that may be unrelated to the college experience itself, Pascarella & Terenzini (1991) recommend that comparisons be made between the average scores of graduating seniors who are the same age as students with less college experience (e.g., 22-year old freshmen vs. 22-year old seniors).
10. Assessment of Whether Student Participation in the Freshman Seminar affects Student Outcomes at Graduation
While there are some campus-specific studies which indicate that student involvement in the freshman seminar increase their persistence to graduation (Barefoot, 1993), there appears to be no reported research on whether course participation affects student outcomes at graduates. At first glance, it may seem far-fetched to expect that participation in a first-semester course will have a significant effect on student outcomes measured over four years later. However, given the seminar's already-demonstrated potential to increase students' utilization of campus resources and support services, as well as student contact with key educational agents outside the classroom, it may be reasonable to hypothesize that the freshman seminar can serve to increase the quantity and quality of student involvement in the college experience which, in turn, may result in more dramatic gains in student outcomes displayed at the time of college completion.
Perhaps future assessments of the freshman seminar should be based on the "utilization-focused evaluation" model suggested by Patton (1978). According to this model, there are three major goals of outcome assessment: (a) immediate-outcome goals relating to initial program participation (e.g., academic success during the first semester), (b) intermediate outcome goals relating to program impact immediately following participation (e.g., retention to completion of the freshman year), and (c) ultimate-outcome goals relating to broader, long-term impact (e.g., enhanced student outcomes at college graduation).
In this assessment model, student development is perceived as cumulative or hierarchical, so any educational program which increases the accomplishment of immediate and intermediate outcome goals, has the potential for achieving ultimate goals. Heretofore, freshman-seminar assessments have focused almost exclusively on immediate and intermediate outcome goals and have repeatedly demonstrated the course's positive impact on these goals. A useful direction for future assessments of the freshman seminar might be to explore the course's potential for building on these shorter-term accomplishments to assess the seminar's ability to achieve broader, long-term outcomes demonstrated at the time of college completion.
Once the confounding effects of personal maturation have been controlled for, by comparing the scores of graduating seniors who are the same age as students with less experience, then the differential impact of a specific college experience (such as participation in the freshman seminar) may be estimated by observing whether students who have had this particular experience display greater gains in development between college entry and college completion than do students who did not have this experience.
As Ernest Pascarella notes,
Longitudinal, correlational investigations that statistically control for differences in student pre-enrollment characteristics serve two valuable purposes. First, they help eliminate spurious associations between college experiences and student persistence that are not causal. Second, they identify those associations between college experiences and persistence that are most likely to be causal. This allows administrators to focus their energies on those aspects of student life in which programmatic interventions are likely to have the greatest impact (1986, p. 103).

Conclusion:
The Scope of Freshman Seminar Assessment and Its Potential for Promoting the Quality of Campus-Wide Assessment
Though some important issues and outcomes remain to be addressed by freshman-seminar assessment efforts, the breadth of outcomes that have already been addressed under the rubric of freshman seminar assessment is quite impressive. A retrospective look at the various forms of assessment associated with the freshman seminar that have been cited in this monograph suggests that the course is capable of generating a wide spectrum of assessment activities, ranging from the micro to macro level, as illustrated in the continuum depicted on the following page.
MICRO
*
* - Assessment of and for individual students (e.g., student
* self-assessment of learning styles or vocational interests
* conducted as part of the freshman seminar)
*
* - Assessment of the impact of specific course components or instructional activities (e.g., classroom research on the impact of instructional units or modules)
*
* - Assessment of total course impact on student outcomes (e.g., effect of the freshman seminar on student persistence to graduation)
*
* - Assessment of the freshman seminar as a total program (e.g., its impact on promoting partnerships between faculty and student development professionals who participate in freshman-seminar instructor training or who team-teach the course)
*
* - Assessment of the freshman seminar's impact on the institution (e.g., its effect on student satisfaction with the college, or its impact on enrollment management and institutional revenue)
*
MACRO
Nationally prominent scholars have repeatedly called for colleges and universities to expand their assessment efforts beyond the narrow evaluation of only academic or cognitive outcomes. For instance, Alexander Astin issued this satirical statement about the narrow focus of assessment in postsecondary education, What is interesting about our current assessment practices in higher education is that an observer from another planet might conclude that there is only one or at best a handful of student outcomes that are important. Many institutions rely heavily or even exclusively on a single measure of student progress—the grade-point average (GPA). Clearly traditional assessment practices in American higher education do not adequately reflect the multidimensionality of student outcomes (1991, p. 41).
In contrast, the potential of the freshman seminar for promoting "multidimensional" student assessment and the assessment of multiple outcomes has already been demonstrated by numerous campus-specific studies conducted at all types of higher educational institutions, ranging from community colleges to research universities (Barefoot, 1993; Cuseo & Barefoot, 1996). The wide range of relevant student outcomes that have been targeted (and documented) by freshman-seminar assessment projects are exemplary and worthy of emulation by other campus assessment projects.
The assessment methodologies and procedures that freshman-seminar researchers have applied to assess the course's impact on multiple outcomes is equally impressive. The novelty of the freshman seminar's course goals and its non-traditional course content have often activated its home institution's "organizational immune system," triggering frequent and often hostile attacks on the seminar's academic legitimacy in an attempt to reject this unorthodox course like a foreign substance. Consequently, the freshman seminar has been one of higher education's most repeatedly challenged and most thoroughly assessed interventions. Necessity being the "mother of invention," rigorous and innovative methodologies had to be developed to document the impact of freshman seminars and ensure their survival. These innovative methodologies may now have the potential to serve as models for encouraging assessment of other campus programs with the same degree of creativity and quality as they have the freshman seminar. Furthermore, all the effort and energy that freshman seminar researchers have devoted simply to the process of assessment can itself stimulate institution-wide pursuit of important educational goals. As Pat Hutchings observes,
Recognizing that the product of assessment will have certain inevitable imperfections, however, one begins to attend to process—to new kinds and levels of faculty discourse, greater attention to student learning, more explicit focus on teaching strategies, a stronger sense of institutional identity. In this way, assessment may have less to do with measurement than with on-going institutional self-evaluation and improvement (1987, p. 147).
To increase the likelihood that improvements in the quality of undergraduate education for first-year students achieved by freshman-year initiatives, such as freshman-seminar implementation and assessment, will reach the level of campus-wide awareness and improvement, John Gardner offers the following exhortation to campus professionals involved in successful freshman-year programs: "There is a real obligation for people like [those] who are committed to changing the freshman year, to educate the whole institution to everything you've learned so that they are moving in the same direction you are" (1991, p. 8).
Indeed, the comprehensive range of student outcomes targeted for assessment by freshman seminar research, along with the breadth of sophisticated methodologies designed to assess these outcomes, may serve as a lightning rod for attracting and harnessing campus-wide interest in effective assessment of other educational programs or institutional interventions designed to promote student retention and student success.
References
Abrami, P. C. (1989). How should we use student ratings to evaluate teaching? Research in Higher Education, 30(2), 221-227.
Abrami, P. C., d'Apollonia, S., & Cohen, P. A. (1990). The validity of student ratings of instruction: What we know and what we don't. Journal of Educational Psychology, 82(2), 219-231.
Abrami, P. C., d'Apollonia, S., & Rosenfield, S. (1997). The dimensionality of student ratings of instruction: What we know and what we do not. In J. Smart (Ed.), Higher education: Handbook of theory and research, Vol. II. New York: Agathon Press.
Abrami, P. C., Perry, R. P., & Leventhal, L. (1982). The relationship between student personality characteristics, teacher ratings, and student achievement. Journal of Educational Psychology, 74(1), 111-125.
Adelman, C. (1986). To imagine an adverb: Concluding notes to adversaries and enthusiasts. In C. Adelman (Ed.), Assessment in American higher education: Issues and contexts (pp. 73- 75). Washington, D.C.: Office of Educational Research and Improvement, U.S. Department of Education.
Aleamoni, L. M. (1987). Techniques for evaluating and improving instruction. New Directions for Teaching and Learning, No. 31. San Francisco: Jossey-Bass.
Aleamoni, L. M. & Hexner, P. Z. (1980). A review of the research on student evaluation and a report on the effect of different sets of instructions on student course and instructor evaluation. Instructional Science, 9(1), 67-84.
Almanac. (1994, September 1). The Chronicle of Higher Education, 41(1). Altman, H. B. (1988). Assessment of learning outcomes: Building faculty cooperation. Journal of Staff, Program, & Organization Development, 6(3), 125-127.
Angelo, T., & Cross, P. (1993). Classroom assessment techniques: A handbook for college teachers. San Francisco: Jossey-Bass.
Arnoult, M. D. (1976). Fundamentals of scientific method in psychology (2nd. ed). Dubuque, Iowa: Wm. C. Brown Co.
Arreola, R. A. (1983). Establishing successful faculty evaluation and development programs. In A. Smith (Ed.), Evaluating faculty and staff. New Directions for Community Colleges, No. 41. San Francisco: Jossey-Bass.
Arreola, R. A. & Aleamoni, L. M. (1990). Practical decisions in developing and operating a faculty evaluation system. In M. Theall & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice (pp. 37-56). New Directions for Teaching and Learning, No. 43. San Francisco: Jossey-Bass.
Astin, A. W. (1991). Assessment for excellence: The philosophy and practice of assessment and evaluation in higher education. New York: Macmillan. Astin, A. W. (1993). What matters in college: Four critical years revisited. San Francisco: Jossey-Bass.
Astin, A. W. (1994). The American freshman: National norms for fall 1993. Los Angeles: Higher Education Research Institute, University of California, Los Angeles.
Baker, R. W., & Schultz, K. L. (1992). Measuring expectations about college adjustment. NACADA Journal, 12(2), 23-32.
Baker, R. W., & Siryk, B. (1986). Exploratory intervention with a scale measuring adjustment to college. Journal of Counseling Psychology, 33, 31-38.
Banta, T. W. (1988). Promise and Perils. In T. W. Banta (Ed.), Implementing outcomes assessment: Promise and perils (pp. 95-98). New Directions for Institutional Research, No. 59. San Francisco: Jossey-Bass.
Banta, T. W., Lund, J. P., Black, K. E., & Oblander, F. W. (1996). Assessment in practice: Putting principles to work on college campuses. San Francisco: Jossey-Bass.
Barefoot, B. O. (Ed.) (1993). Exploring the evidence: Reporting outcomes of freshman seminars. (Monograph No. 11). Columbia, SC: National Resource Center for The Freshman Year Experience, University of South Carolina.
Barefoot, B. O., & Fidler, P. P. (1992). Helping students climb the ladder: 1991 national survey of freshman seminar programs. (Monograph No. 10). Columbia, SC: National Resource Center for The Freshman Year Experience, University of South Carolina.
Barefoot, B. O., & Fidler, P. P. (1996). The 1994 survey of freshman seminar programs: Continuing innovations in the collegiate curriculum. (Monograph No. 20). National Resource Center for The Freshman Year Experience & Students in Transition, University of South Carolina.
Berman, P., Curry, J., Nelson, B., & Weiler, D. (1990). Enhancing transfer effectiveness: A model for the 1990s. First year report to the National Effective Transfer Consortium. Berkeley, CA: BW Associates.
Bers, T. H. (1989). The popularity and problems of focus-group research. College & University, 64(3), 260-268.
Bogdan, R. C., & Biklen, S. K. (1992). Qualitative research for education (2nd ed.). Boston: Allyn & Bacon.
Boyer, E. L. (1987). College: The undergraduate experience in America. New York: Harper and Row.
Boyer, E. L. (1991). Scholarship reconsidered: Priorities of the professoriate. Princeton, NJ: Carnegie Foundation for the Advancement of Teaching.
Braskamp, L. A., & Ory, J. C. (1994). Assessing faculty work: Enhancing individual and institutional performance. San Francisco: Jossey-Bass.
Brinko, K. T. (1993). The practice of giving feedback to improve teaching: What is effective? Journal of Higher Education, 64(5), 574-593.
Brower, A. M. (1990). Student perceptions of life task demands as a mediator in the freshman year experience. Journal of the Freshman Year Experience, 2(2), 7-30.
Brower, A. M. (1994). Measuring student performances and performance appraisals with the College Life Task Assessment instrument. Journal of the Freshman Year Experience, 6(2), 7- 36.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait- multimethod matrix. Psychological Bulletin, 56, 81-105.
Cashin, W. E. (1988). Student ratings of teaching: A summary of the research. IDEA Paper No. 20. Manhattan, Kansas: Kansas State University, Center for Faculty Evaluation and Development. (ERIC Document Reproduction No. ED 302 567).
Cashin, W. E. (1990). Students do rate different academic fields differently. In M. Theall, & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice (pp. 113-121). New Directions for Teaching and Learning, No. 43. San Francisco: Jossey Bass.
Cashin, W. E. (1995). Student ratings of teaching: The research revisited. IDEA Paper No. 32. Manhattan, Kansas: Kansas State University, Center for Faculty Evaluation and Development.
Centra, J. A. (1977). Student ratings of instruction and their relationship to student learning. American Educational Research Journal, 14(1), 17-24.
Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco: Jossey-Bass.
Clark, D. J., & Bekey, J. (1979). Use of small groups in instructional evaluation. Professional & Organizational Development Quarterly, 1, 87-95.
Cohen, J. C. (1988). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, P. A. (1980). Effectiveness of student-rating feedback for improving college instruction: A meta-analysis of findings. Research in Higher Education, 13(4), 321-341.
Cohen, P. A. (1981). Student ratings of instruction and student achievement: A Meta-analysis of multisection validity studies. Review of Educational Research, 51(3), 281-309.
Cohen, P. A. (1986). An updated and expanded meta-analysis of multisection student rating validity studies. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, 1986.
Cohen, P. A. (1990). Bringing research into practice. In M. Theall & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice. New Directions for Teaching and Learning, No. 43. San Francisco: Jossey-Bass.
Costin, F., Greenough, W, & Menges, R. (1971). Student rated college teaching: Reliability, validity, and usefulness. Review of Educational Research, 41(5), 511-535.
Cross, K. P. (1990). Making teaching more effective. Journal of The Freshman Year Experience, 2(2), 59-74.
Cross, K. P., & Angelo, T. A. (1988). Classroom assessment techniques: A handbook for faculty. National Center for Research to Improve Postsecondary Teaching and Learning. Ann Arbor: University of Michigan.
Cuseo, J. (1991). The freshman orientation seminar: A research-based rationale for its value, delivery, and content. (Monograph No. 4). Columbia, SC: National Resource Center for The Freshman Year Experience, University of South Carolina.
Cuseo, J. (1998). Objectives and benefits of senior year programs. In J. N. Gardner, G. Van der Veer, & Associates, The senior year experience: Facilitating integration, reflection, closure, and transition (pp. 21-36). San Francisco: Jossey-Bass.
Cuseo, J. B. & Barefoot, B. O. (1996). A natural marriage: The extended orientation seminar and the community college. In J. N. Hankin (Ed.), The community college: Opportunity and access for America's first-year students (pp. 59-68). The National Resource Center for The Freshman Year Experience & Students in Transition, University of South Carolina.
d'Apollonia, S., & Abrami, P. C. (1997). In response . . . . Change, 29(5), pp. 18-19
Davis, T. M. & Murrell, P. H. (1993). Turning teaching into learning: The role of student responsibility in the collegiate experience. ASHE-ERIC Higher Education Report No. 8. Washington, D.C.: The George Washington University, School of Education and Human Development.
Davis, B. G., Wood, L., & Wilson, R. C. (1983). ABCs of teaching with excellence. Berkeley, CA: University of California.
Delamont, S. (1992). Fieldwork in educational settings: Methods, pitfalls, and perspectives. Bristol, PA: The Falmer Press.
Dressel, P. L. (1976). Handbook of academic evaluation. San Francisco: Jossey-Bass.
Duffy, T. M., & Jonassen. D. H. (1992). Constructivism: New implications for instructional technology. In T. M. Duffy & D. H. Jonassen (Eds.), Constructivism and the technology of instruction: A conversation (pp. 1-16). Hillsdale, NJ: Laurence Erblaum & Associates.
El-Khawas, E. (Ed.)(1993). Campus trends. Washington, D.C.: American Council on Education.
Ewell, P. T. (1988). Implementing assessment: Some organizational issues. In T. W. Banta (Ed.), Implementing outcomes assessment: Promise and perils (pp. 15-28). New Directions for Institutional Research, No. 50. San Francisco: Jossey-Bass.
Feldman, K. A. (1977). Consistency and variability among college students in rating their teachers and courses: A review and analysis. Research in Higher Education, 6(3), 233-274.
Feldman, K. A. (1979). The significance of circumstances for college students' ratings of their teachers and courses. Research in Higher Education, 10(2), 149-172.
Feldman, K. A. (1984). Class size and college students' evaluations of teachers and courses: A closer look. Research in Higher Education, 21(1), 45-116.
Feldman, K. A. (1988). Effective college teaching from the students' and faculty's view: Matched or mismatched priorities? Research in Higher Education, 28(4), 291-344.
Feldman, K. A. (1989). Instructional effectiveness of college teachers as judged by teachers themselves, current and former students, colleagues, administrators, and external (neutral) observers. Research in Higher Education, 30(2), 137-194.
Fetterman, D. M. (1991). Auditing as institutional research: A qualitative focus. In D. M. Fetterman (Ed.), Using qualitative methods in institutional research (pp.23-34). New Directions for Institutional Research, No. 72. San Francisco: Jossey-Bass.
Fidler, D. S. (1992). Primer for research on the freshman year experience. National Resource Center for The Freshman Year Experience, University of South Carolina.
Fidler, P. P. (1991). Relationship of freshman orientation seminars to sophomore return rates. Journal of The Freshman Year Experience, 3(1), 7-38.
Fidler, P. P., & Godwin, M. A. (1994). Retaining African-American students through the freshman seminar. Journal of Developmental Education, 17(3), 34-40.
Fidler, P. P., & Shanley, M. G. (1993, February). Evaluation results of University 101. Presentation made at the annual conference of The Freshman Year Experience, Columbia, South Carolina.
Gardner, J. N. (1981). Developing faculty as facilitators and mentors. In V. A. Harren, M. N. Daniels, & J. N. Buck (Eds.), Facilitating students’ career development (pp. 67-80. New Directions for Student Services, No. 14. San Francisco: Jossey-Bass.
Gardner, J. N. (1986). The freshman year experience. College and University, 61(4), 261-274.
Gardner, J. N. (1989), Starting a freshman seminar program. In M. L. Upcraft, J. N. Gardner, & Associates, The freshman year experience (pp. 238-249). San Francisco: Jossey-Bass.
Gardner, J. N. (1991). Introduction. In Perspectives on the freshman year: Selected major addresses from freshman year experience conferences. (Monograph No. 2). Columbia, SC: National Resource Center for The Freshman Year Experience, University of South Carolina.
Gardner, J. N. (1992). Foreword. In D. S. Fidler, Primer for research on the freshman year experience (pp. 3-4). National Resource Center for the Freshman Year Experience, University of South Carolina.
Gardner, J. N. (1994, July). Comment made at the Seventh International Conference on the First- Year Experience, Dublin, Ireland.
Goldschmid, M. L. (1978). The evaluation and improvement of teaching in higher education. Higher Education, 7, 221-245.
Gordon, V. N. (1984). The undecided college student: An academic and career advising challenge. Springfield, Illinois: Thomas.
Halpern. D. F. (1987). Recommendations and caveats. In D. F. Halpern (Ed.), Student outcomes assessment: What institutions stand to gain (pp. 109-111). New Directions for Higher Education, No. 59. San Francisco: Jossey-Bass.
Hanson, G. R. (1982). Critical issues in the assessment of student development. In G. R. Hanson (Ed.), Measuring student development (pp. 47-64). New Directions for Student Services, No. 20. San Francisco: Jossey-Bass.
Hanson, G. R. (1988). Critical issues in the assessment of value added in education. In T. W. Banta (Ed.), Implementing outcomes assessment: Promise and perils (pp. 53-68). New Directions for Institutional Research, No. 50. San Francisco: Jossey-Bass.
Harbin, C. E. (1996). A new plan: Total transfer management. In J. N. Hankin (Ed.), The community college: Opportunity and access for America's first-year students (pp. 29-36). The National Resource Center for The Freshman Year Experience & Students in Transition, University of South Carolina.
Harris, J. (1986). Assessing outcomes in higher education. In C. Adelman (Ed.), Assessment in American higher education (pp.13-32). Washington, D.C.: Office of Educational Research and Improvement, U.S. Department of Education.
Harrison, N. S. (1979). Understanding behavioral research. Belmont, CA: Wadsworth.
Hartman, N. A., & former University 101 students (1991, February). Celebrating the freshman year: A retrospection. Presentation made at the annual conference of The Freshman Year Experience, Columbia, South Carolina.
Haug, P. (1992). Guidelines for student advisory committees. The Teaching Professor, 6(10), p. 7. Hays, N. L. (1973). Statistics for the social sciences (2nd ed.). New York: Holt, Rinehart, & Winston.
Hoffman, E. P. (1986). A review of two studies of elasticities in academe. Economics of Education Review, 5(2), 219-224.
Holsti, O. R. (1969). Content analysis for the social sciences and humanities. Reading, Mass.: Addison-Wesley.
Howard, G. S., & Maxwell, S. E. (1980). Correlation between student satisfaction and grades: A case of mistaken causation. Journal of Educational Psychology, 72(6), 810-820.
Howard, G. S., & Maxwell, S. E. (1982). Do grades contaminate student evaluations of instruction? Research in Higher Education, 16, 175-188.
Hughes, R. (1983). The non-traditional student in higher education: A synthesis of the literature. NASPA Journal, 20(3), 51-64.
Hutchings, P. (1987). Six stories: Implementing successful assessment. Journal of Staff, Program, & Organization Development, 5(4), 139-148. \
Jacobi, M. (1991). Focus group research: A tool for the student affairs' professional. NASPA Journal, 28(3), 195-201.
Johnson, R. (1993). Assessment and institutional quality. TQM in Higher Education, 2(9), p. 7.
Kenny, D. A. (1996). The politics of creating and maintaining a college success course. In J. N. Hankin (Ed.), The community college: Opportunity and access for America's first-year students (pp. 69-76). The National Resource Center for The Freshman Year Experience & Students in Transition, University of South Carolina.
Ketkar, K., & Bennet, S. D. (1989). Strategies for evaluating freshman orientation programs. Journal of The Freshman Year Experience, 1(1), 33-44.
Knapp, J., & Sharon, A. (1975). A compendium of assessment techniques. Princeton, N.J.: Educational Testing Service.
Kramer, M. (1993). Lengthening of time to degree. Change, 25(3), pp. 5-7.
Krone, K. J., & Hanson, G. R. (1982). Assessment in student development programming: A case study. In G. R. Hanson (Ed.), Measuring student development (pp. 93-110). New Directions for Student Services, No. 20. San Francisco: Jossey-Bass.
Kuh, G., Schuh, J., Whitt, E., & Associates (1991). Involving colleges. San Francisco: Jossey- Bass.
Kuh, G., Shedd, J., & Whitt, E. (1987). Student affairs and liberal education: Unrecognized (and unappreciated) common law partners. Journal of College Student Personnel, 28(3), 252-260. Larimore, L. K. (1974). Break-even analysis for higher education. Management Accounting, 56(3), 25-28.
Lenning, O. T. (1988). Use of noncognitive measures in assessment. In T. W. Banta (Ed.), Implementing outcomes assessment: Promise and perils (pp. 41-52). New Directions for Institutional Research, No. 50. San Francisco: Jossey-Bass.
Lincoln, Y. S., & Guba, E. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage.
MacGregor, J. (1991). What differences do learning communities make? Washington Center News, 6(1), pp. 4-9.
Malaney, G. D., & Weitzer, W. H. (1993). Research on students: A framework of methods based on cost and expertise. NASPA Journal, 30(2), 126-137.
Marcus, L. R., Leone, A. O., & Goldberg, E. D. (1983). The path to excellence: Quality assurance in higher education. ASHE-ERIC/Higher Education Research Report No. 1. Washington, D.C.: Association for the Study of Higher Education.
Marsh, H. W. (1984). Students' evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility. Journal of Educational Psychology, 76(5), 707-754.
Marsh, H. W., & Dunkin, M. (1992). Students' evaluations of university teaching: A multidimensional perspective. In J. C. Smart (Ed.), Higher education: Handbook of theory and research (Vol. 8, pp. 143-233). New York: Agathon.
Marsh, H. W., & Ware, J. E., Jr. (1982). Effects of expressiveness, content coverage and incentive on multidimensional student rating scales: New interpretations of the Dr. Fox effect. Journal of Educational Psychology, 74(1), 126-134.
McCallum, L. W. (1984). A meta-analysis of course evaluation data and its use in the tenure decision. Research in Higher Education, 21(2), 150-158.
McKeachie, W. J. (1979). Student ratings of faculty: A reprise. Academe, 65(6), 384-397.
McKeachie, W. J., & Kaplan, M. (1996). Persistent problems in evaluating college teaching. AAHE Bulletin, 48(6), pp. 5-8.
McKeachie, W. J., Lin, Y-G, Moffett, M., & Daugherty, M. (1978). Effective teaching: Facilitative versus directive style. Teaching of Psychology, 5, 193-194.
Mentkowski, M., Astin, A. W., Ewell, P. T., & Moran, E. T. (1991). Catching theory up with practice: Conceptual frameworks for assessment. The AAHE Assessment Forum. Washington, D.C.: American Association for Higher Education.
Merriam, S. B. (1988). Case study research in education: A qualitative approach. San Francisco: Jossey-Bass.
Miller, T. E., Neuner, J. L., & Glynn, J. (1988). Reducing student attrition: A college at work in research and practice. NASPA Journal, 25(4), 236-243.
Morgan, D. L. (1988). Focus groups as qualitative research. Newbury Park, CA : Sage.
Mullendore, R., & Abraham, J. (1992). Orientation director's manual. Statesboro, GA: National Orientation Director's Association.
Murray, H. G. (1987, April). Impact of student instructional ratings on quality of teaching in higher education. Paper presented at the 71st annual meeting of the American Educational Research Association, Washington, D.C.
Murray, H. G., & Smith, T. A. (1989, March). Effects of midterm behavioral feedback on end-of- term ratings of instructional effectiveness. Paper presented at the annual conference of the American Educational Research Association, San Francisco.
Nemko, M. (1988). How to get an ivy league education at a state university. New York: Avon Books.
Noel, L. (1985). Increasing student retention: New challenges and potential. In L. Noel, R. Levitz, & Associates, Increasing student retention (pp. 1-27). San Francisco: Jossey-Bass.
Noel, L. & Levitz, R. (1996). A comprehensive student success program: Part 1. Recruitment & Retention in Higher Education, 10(7), pp. 4-7.
Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17, 776- 783.
Ory, J. C. (1990). Student ratings of instruction: Ethics and practice. In M. Theall & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice (pp. 63-74). New Directions for Teaching and Learning, No. 43. San Francisco: Jossey-Bass.
Overall, J. U., & Marsh, H. W. (1980). Students' evaluations of instruction: A longitudinal study of their stability. Journal of Educational Psychology, 72(3), 321-325.
Pace, R. (1984). Measuring the quality of college student experiences. Los Angeles: UCLA Center for the Study of Evaluation. (ERIC Reproduction No. ED 255 099)
Palmer, J. (1989). Trends and issues; Student tracking systems at community colleges. In T. H. Bers (Ed.), Using student tracking systems effectively (pp. 95-104). New Directions for Community Colleges, No. 66. San Francisco: Jossey-Bass.
Pascarella, E. T. (1986). A program for research and policy on student persistence at the institutional level. Journal of College Student Personnel, 27(2), 100-107.
Pascarella, E. T., & Terenzini, P. T. (1991). How college affects students: Findings and insights from twenty years of research. San Francisco: Jossey-Bass.
Pasen, R. M., Frey, P. W., Menges, R. J., & Rath, G. (1978). Different administrative directions and student ratings of instruction: Cognitive vs. affective effects. Research in Higher Education, 9(2), 1-167.
Paskow, J. (Ed.)(1988). Assessment programs and projects: A directory. Washington, D.C.: American Association for Higher Education.
Perry, W. (1970). Forms of intellectual and ethical development in the college years: A scheme. New York: Holt, Rinehart and Winston.
Pfeifer, C. M., & Schneider, B. (1974). University climate perceptions of black and white students. Journal of Applied Psychology, 59(5), 660-662.
Prager, C. (1988). Editor's notes. In C. Prager (Ed.), Enhancing articulation and transfer (pp. 1- 6). New directions for Community Colleges, No. 61. San Francisco: Jossey-Bass.
Reigeluth, C. M. (1992). Reflections on the implications of constructivism for educational technology in T. M. Duffy & D. H. Jonassen (Eds.), Constructivism and the technology of instruction: A conversation (pp. 147-156). Hillsdale, NJ: Laurence Erblaum & Associates.
Reinharz, S. (1993). On becoming a social scientist. New Brunswick, NJ: Transaction Publishers.
Rice, R. (1992). Reactions of participants to either one-week pre-college orientation or to freshman seminar courses. Journal of The Freshman Year Experience, 4(2), 85-100.
Robinson, P. W., & Foster, D. F. (1979). Experimental psychology: A small-N approach. New York: Harper & Row.
Roman, L., & Apple, M. (1990). Is naturalism a move away from positivism? Materialist and feminist approaches to subjectivity in ethnographic research. In E. Eisner & A. Peshkin (Eds.), Qualitative inquiry in education: The continuing debate (pp. 38-74). New York: Teachers College Press.
Rosenthal, R. (1966). Experimenter bias in behavioral research. New York: Appleton-Century- Crofts. Rosenthal, R. (1974). On the social psychology of the self-fulfilling prophecy: Further evidence of pygmalion effects and their mediating mechanisms. New York: M.S.S. Information Corporation, Modular Publications.
Schilling, K. L., & Schilling, K. M. (1998). Looking back, moving ahead: Assessment in the senior year. In J. N. Gardner, & G. Van der Veer, & Associates. The senior year experience: Facilitating integration, reflection, closure, and transition (pp. 245-255). San Francisco: Jossey-Bass.
Schwitzer, A. M., Robins, S. B., & McGovern, T. V. (1993). Influences of goal instability and social support on college adjustment. Journal of College Student Development, 34, 21-34.
Scriven, M. (1967). The methodology of evaluation. In Perspectives of Curriculum Evaluation, AERA Monograph Series on Curriculum Evaluation, No. 1. Chicago: Rand McNally & Co.
Seldin, P. (1992). Evaluating teaching: New lessons learned. Keynote address presented at "Evaluating Teaching: More Than a Grade" conference held at the University of Wisconsin- Madison, sponsored by the University of Wisconsin System, Undergraduate Teaching Improvement Council.
Seldin, P. (1993). How colleges evaluate professors, 1983 vs.1993. AAHE Bulletin, 46(2), pp. 6- 8, 12.
Sixbury, G. R., & Cashin, W. E. (1995). IDEA technical report no. 9: Description of database for the IDEA Diagnostic Form. Manhattan, KS: Kansas State University, Center for Faculty Evaluation and Development.
Smith, J. K., & Heshusius, L. (1986). Closing down the conversation: The end of the quantitative- qualitative debate among educational inquirers. Educational Researcher, 15(1), 4-12.
Starke, M. C. (1993, February). Retention, bonding, and academic achievement: Effectiveness of the college seminar in promoting college success. Paper presented at the annual conference of The Freshman Year Experience, Columbia, South Carolina.
Stevens, J. J. (1987). Using student ratings to improve instruction. In K. M. Aleamoni (Ed.), Techniques for evaluating and improving instruction (pp. 33-38). New Directions for Teaching and Learning, No. 31. San Francisco: Jossey-Bass.
Striatal, M. L. (1988). College student inventory. Coralville, Iowa: Noel/Levitz Centers.
Taylor, S. J., & Bogdan, R. C. (1984). Introduction to qualitative research and methods: The search for meaning. New York: Wiley.
Theall, M., Franklin, J., & Ludlow, L. H. (1990). Attributions and retributions: Student ratings and the perceived causes of performance. Paper presented at the annual meeting of the American Educational Research Association, Boston.
Tierney, W. G. (1991). Utilizing ethnographic interviews to enhance academic decision making. In D. M. Fetterman (Ed.), Using qualitative methods in institutional research (pp.7- 22). New Directions for Institutional Research, No. 72. San Francisco: Jossey-Bass.
Titley, R., & Titley, B. (1980). Initial choice of college major: Are only the "undecided" undecided? Journal of College Student Personnel, 21(4), 293-298.
Tregarthen, T., Staley, R. S., & Staley, C. (1994, July). A new freshman seminar course at a small commuter campus. Paper presented at the Seventh International Conference on The First Year Experience, Dublin, Ireland.
Turner, J. C., Garrison, C. Z., Korpita, E., Waller, J., Addy, C., Hill, W. R., & Mohn, L. A. (1994). Promoting responsible sexual behavior through a college freshman seminar. Aids Education and Prevention, 6(3), 266-277.
U.S. Bureau of the Census (1994). Statistical abstract of the United States: 1994 (114th ed.). Washington, DC: U.S. Government Printing Office.
Walkie, C., & Radiant, B. (1996). Predictors of academic success and failure of first-year college students. Journal of The Freshman Year Experience, 8(2), 17-32.
Warren, J. (1987). Assessment at the source. Liberal Education, 73(3), 2-6.
Weick, K. E. (1979). The social psychology of organizing (2nd ed.). Reading, Mass.: Addison- Wesley.
Wergin, J. F. (1988). Basic issues and principles in classroom assessment. In J. H. McMillan (Ed.), Assessing students' learning (pp. 5-17). New Directions for Teaching and Learning, No. 34. San Francisco: Jossey-Bass.
White, K. (1991). Mid-course adjustments: Using small group instructional diagnosis to improve teaching and learning. Washington Center News, 6(1), pp. 20-22.
Willingham, W. W. (1985). Success in college: The role of personal qualities and academic ability. New York: College Entrance Examination Board.
Wilson, R. C. (1986). Improving faculty teaching: Effective use of student evaluations and consultants. Journal of Higher Education, 57(2), 196-211.
Wrenn, R. (1988). Student-faculty interaction programs. In J. Rhem (Ed.), Making changes: 27 strategies for recruitment and retention (pp. 75-77). Madison, WI: Magna.
Zerger, S. (1993, February). Description and explanation of freshman to sophomore attrition rates. Paper presented at the annual conference of The Freshman Year Experience, Columbia, South Carolina.

Home _ News_ Organizations _ Conferences _ Related Publications _
About DEHEKY _ Kentucky Online Resources _ Other Online Resources _ Other Lists _
Research Sites _ Assessment Testing _ KY Colleges & Universities _ Faculty Directories _
Subscribe to DEHEKY list

© 1999 thomas.kesterson@kctcs.net