Introduction

Evaluation in language Education

Part I-INTRODUCTION

Prev

Home

Next

a) Conceptual Frame work of Language Education

Various Approaches to Languages Teaching

Some Basic Linguistic Factors and Language Testing

b) Basic Ideas in Movements and Evaluation

Measurement and Evaluation

Basic Factors in the Process of Measurement

Types of Evaluation

Testing or Measurement in the Context of Language Teaching/Learning

Various Types of Tests

Various Approaches to Language Teaching:Language testing involves both linguistics and psychology as it is concerned with languages and learning. The process may be called an experiment because the objectives or the learning tasks are defined in order to study the learner's behaviour through instruction and it is aimed at using the statistical techniques to assess or study the learner's behaviour and hence it is evaluative. Thus language testing may be said to be consisting of three main factors, viz., language, learning and evaluation.
For purposes of learning, language is indispensable and hence it is extremely difficult to consider language as a separate entity from learning. However, it can be said that there is a very close interrelation between language and learning and water tight compartmentalisation between the two is not possible. Language Learning means not only the learning of a second language or a foreign language but also the learning of mother-tongue or the first language.
There have been in the past different schools of thought in the context of language as strictly compartmentalised entities. Such scholars do not take congnisance of the learner's needs. Their view point is that language is a thing in itself and it exsits in the written texts and there is nothing like Spoken Language. This view point supports the grammar-translation method of language exists only in written texts and mostly in literary ones, emphasis in language teaching is laid down upon the teaching of literature. In other words, the idea is that of mind training and of transfer of training. Accordingly testing in such situations is mainly concerned with the framing of translation tasks. As already mentioned earlier, this school of thought gives very little importance to the learner's needs and the content of teaching material is derived from the written literary texts which is considered to be the Real Language. The only aspect taken into consideration by proponents of this view point is that some attempt is made to arrange the instructional material in such a way that the learner begins from the easiest material and proceeds towards more difficult material. Such sequencing does not have any scientific basis and it is mostly subjective.

The second school of thought is that language is a kind of machine which acts as the stimulus-response mechanism. According to the proponents of this view point, learning is the extreme form and it is said to be consisting of conditioned responses to various situations in the environment. Language is argued to be not possessing any direct connection with the environment but it is said to be a part of human behaviour. Therefore, scholars supporting this view consider that there is no need to link in any meaningful way with the world to which it refers. Harris and some others argue that linguistic analysis proceeds theoretically from sound to sentence and it is concluded that learning itself proceeds in the same way. According to them teaching of language necessitates the analysis of language into its components viz., the structures. This analysis is called the immediate constitute analysis. Under this system instruction starts from the smallest units and proceeds from sound to sentence making use of different teaching aids and machinery This is where the use of language laboratory comes into picture and the method of instruction may be said to be the Audio-Lingual Method. The assumption is that learning takes place by generalisation and by analogies. In this method drills and exercise are considered to be important and therefore, testing invariably consists of items parallel to drills and exercises. To a very large extent the tests of the type mentioned will at best indicate whether the learner is able to use the language in real life situations. The latter aspect could be taken care of if and only if open ended items like translation, guided and free compositions and free response items are included in the test.
The third school of thought that is worth mentioning in this context is that language is creative and simultaneously it is also rule based. Scholars like Chomsky who hold this opinion argue that language and learning are interdependent. The language described by them is an abstraction which is far away from learning in its formalization as opposed to the other schools of thought mentioned above. As yet the influences or the applications of this approach to language teaching are not crystal clear. The three major assumptions that are made in this context are:
1. The aim of language teaching is to develop the grammatical and communicative competences among the language learners. According to Chomsky underlying of surface realisations there exists a deep structure and in the context of language learning the learner has to master the deep structure and the operation of various transformations employed.
2. A second language and foreign language learner has to adopt the same kind of strategy as one does in learning the mother-tongue or the first language learner has to adopt is not yet known clearly as mentioned by Lyons and Wales. But the transformation of the generative approach is directed towards the way native speaker learns his own language, its formalization will be in L1 terms and language teaching making use of it must also approach the second and foreign language learners in the same terms.

3. The third assumption is related to the learning of mother-tongue as it accepts the demands of the situations or the environment in which the language is used. This point of view emphasizes tasks and problems rather than patterns and repetitions in language teaching and at the same time strategies are emphasized as opposed to simple memory. Therefore, the testing strategy related to the third school of thought will be much more situational and will be less concerned with pure language features such as the segments, stress, etc. Tests related to this view point set problems which may not be too far away from the translation problems but are directly related with the corpus of instruction. Therefore, comprehension and composition which need resolutions are included in such tests. The test items in such tests will also aim at assessing the creative ability of the learners.
The present day language teaching is mainly based upon the belief that spoken language must be taught first before proceeding to the teaching of written language. While the teaching method employed is mainly Audio-Lingual, the instructional material is normally prepared considering of conversations or dialogues employed in various real life situations. Therefore, the method is a conglomeration of various existing methods and therefore, it may be called an eclectic method.
The strategies and techniques for the construction of good language tests have been discussed in the present volume in addition to the discussion on evaluation of methods, materials and media and provision of a set of instruments meant for such evaluation.
The need for measurement and evaluation has been discussed providing a set of instruments for the evaluation of Language Instructional Material.Later emphasis has been laid down upon the measurement of performance of the language learners and discussion of various factors involved in the development of instruments for testing the achievement made. The techniques of testing of the major important language skills, viz, listening, speaking, reading and writing have been discussed in detail providing examples from Indian language wherever necessary with a hope that they will aid the test maker in developing good language tests. In addition to this, some light has been thrown on the aspects of testing of Culture and Literature. Needless to say that not much literature in the area of language evaluation and language testing is available and the present volume is aimed at providing basic guidelines for the test makers in Indian languages. It is hoped that this volume will prove to be useful for all those who are engaged in language teaching and language testing with reference to Indian languages in particular.

Some Basic Linguistic Factors and Language Testing:As the theory of linguistics has been applied to the task of language teaching, the application of linguistics to language testing cannot be ruled out. Language teaching and testing are so intensely interrelated that the basis for teaching as well as testing is the same. Therefore it would be necessary to consider those basic linguistic factors which play an important role in language testing. Broadly speaking, the following five factors may be considered as important in the context of language testing:
1. A language is a set of habits.
2. Language is primarily speech and secondarily writing.
3. A language is what its native speakers say and not what some one thinks ought to say (language is descriptive and not prescriptive).
4. It is the language that should be taught and not facts about the language.
5. Language learning is not simply the mastery of the 'code' but also the 'use of code'.
Let us consider each of the above factors and see to what extent they play a role in the context of language testing.

1. A Language is a set of Habits :A second or foreign language learner can be said to know a language or to have attained proficiency in the target language if and only if he can respond quickly and automatically n natural language situations. It is therefore necessary for the test maker to decide to what extent speed should be a factor in the test. A speed text may be defined in its strict sense as the one which contains such easy items that all the testees can answer them correctly given enough time. Though this is the precise and strict definition of speed test, in practice very few tests are pure speed tests. On the other hand there is another category to tests viz., power tests which can be timed in such a way that only certain group of examinees are able to complete answering before the time is called and the writer of language tests can be set to measure the extent to which the learners have mastered the set of habits. Incidentally, the difference of power test as opposed to the speed test is that the difficulty of the items is steeply graded, some items being too difficult for most or any of the subjects to answer correctly though they will probably have time to reach them1. Regarding the general language tests, it may be said that most of them are not highly speeded, but aim at the automatic response from the examinees. The objective types of items possess an advantage over the free-response types in the sense that the responses for such items can be made more quickly.Needless to say that in a second or foreign language learning situation, the learner generally have a tendency to transfer their mother tongue habits to the target language. It is in this context that the use of contrastive analysis has been made the results of which form the basis for selection and gradation of language instructional material for teaching a second/foreign language. Particularly in the context of second or foreign language testing, the examines of which possess the same native language background, it is always useful to use the results of contrastive analysis and to include those language patterns which are likely to pose problems for the learner due to the interference of his mother tongue habits. In constructing the test items involving the problem areas due to interference of habits from the mother tongue to the target language, the examiner must make use of the mother tongue habits of the learners in constructing the distractors for the test items. For example, a test item of the following type in Telugu is most likely to pose the structural problems for the Hindi speakers. In Telugu, the first person plural haas two different words viz.,

1. memu (exclusive of the listener) and
2. manamu (inclusive of the listener)
miru ikkada undandi
gudiki vel?l?i vastam
(You stay here. We will go to temple and come back)
The blank in the above item should be filled in by using 'memu' and not 'manamu'. But in Hindi, irrespective of whether the first person plural is inclusive or exclusive, the blank can be filled in by using ham (we).
From the practical point of view, it may not be possible to be more dependent on contrastive analysis in the testing of a language as a second or foreign language. If a language test has to be designed for a wider audience who may possess varied language backgrounds, the procedures of contrastive analysis become more inapplicable. It is therefore, necessary that the test maker should find out some means of sampling the total repertoire of the target language pattern and construct the test consisting of items representing different aspects of the target language and its use. The general argument that some items in the test will be easier for the students of particular language backgrounds than others cannot stand valid and need not be considered as a defect because the test is focused upon various factors of the target language and reflects the actual language learning situation.
2. Language is Primarily Speech and Secondarily Writing : In the context of second and foreign language teaching in the present day times, more emphasis on developing proficiency in the spoken language is given and it is only after ensuring that the learners have attained an optimum level of achievement in the spoken skill, they are led to the written skills. However, by the end of the instructional programme the learners are expected to achieve proficiency not only in the spoken skills but also in the written skills. It is for this reason that the first levels of language instruction and language testing are mainly based upon the spoken language and the measurement of listening comprehension and production are given importance.Adequate and appropriate methods of testing the listening comprehension, auditory perception and the testing of speaking have been developed and each have been discussed in detail in this volume. But it should be noted here that for he testing of the spoken skills, special facilities and amenities like the language laboratory, the tape recorder etc., are required for efficiently testing these skills and such facilities are not generally available everywhere. Therefore, for this reason the efficient testing of spoken skills may not be possible everywhere.

3. A Language is what its native speakers say, not what some one thinks they ought to say: The belief that a language is what its native speakers say and not what some one thinks they ought to say, implies that the language instructional materials should be descriptive. In other words such instructional material should have the basis on the actual speech of the native speakers who use different social and regional contexts. It is such actual speech of the native speakers who use different varieties of the language successfully in different social and regional contexts. It is such actual usage tat has to be taken as the basis for language teaching and language testing and not the ideal artificial variety of language which does not have any existence perhaps outside te classroom. The test maker also has to keep in mind the features of the spoken language as opposed to the highly structural conservative grammatically of the written language.In the early years of language testing, most of the tests required the examinees to make decisions about the grammatical accuracy of language. An English example of such tests is as follows :
Directions :- Insert 'shall' or 'will' in the following.
Don't worry, I --- have plenty of time.
Along with the progress of developments in the area of language testing a clear cut distinction has been made between the problems involving the colloquial language and those calling for the sensitivity to written style. The instructions for the former type of question are simply to ask the testee to select the alternative that sounds exactly like what a native speaker of the target language would say.
4. It is language that should be taught and not facts about the language :In the name and guise of language teaching, most language instructional programmes are concentrating on teaching more about a language and not the language itself. The basic theory of language testing is that, what is tested in the examinations is what exactly is taught. Therefore, the so called language examinations or tests concentrate on assessing the learner's knowledge about the language and not his knowledge of the language. It may be said that the language teaching existing in many educational organization is not skill oriented. Pattanayak has rightly pointed out that "all the current examinations test the student's knowledge about a language rather than his performance in it."2In the context of language instruction, grammar should be used as a means to an end and not as an end in itself. Teaching the student to make statements about the language is a wasteful actively that deprives him of the precious time during which he could try to develop the language skills or he could be made to learn to imitate the forms of the target language correctly and practice them until they become automatic. In the initial stages of language instruction, perhaps some grammatical explanation may be occasionally useful is enabling the learner to understand the new patterns of the target language. Thereafter the learners should be asked only to manipulate the patterns and not to explain them.
5. Language learning is not simply the mastery of the 'code' but also the 'use of code' :Learning of a second or foreign language involves the mastery of not only the structure of language, the vocabulary etc., but also the various kinds of language use in different social and cultural contexts. In other words it involves the mastery of code in addition to the use of the code. If a learner of a second or foreign language is said to have attained proficiency in the target language, it implies that he has attained adequate control of the sentence patterns, the vocabulary etc., in addition to the mastery of various kinds of language use in different contexts. Thus learning a language involves acquiring knowledge of the code together with the ability to use that knowledge in producing appropriate utterances and in understanding what is said by other native speakers.In order to be able to effectively communicate with the native speakers of the target language, the learner should be able to control the language structures and patterns apart from making use of appropriate styles and registers depending upon the context of situation. This is what has been labeled as communicative competence. Thus one should keep in mind very clearly that mastery of a second or foreign language is attaining proficiency both in grammatical competence as well as the communicative competence.From this point of view, it would be necessary to test the examinee's ability in both the area of grammatical and communicative competence if one has to test the language proficiency of a learner. It is in this context that tests should be focused on both these aspects. The aspects of testing language proficiency discussed in this volume will highlight these factors in detail. These arguments hold good in the context of language testing also. Therefore, the language tests must be constructed in such a way that specimens of the language or other direct responses to language are evoked. They should not simply not be asked to identify the parts of speech or sentence types but should have a chance to exhibit his ability or competence to use the 'code'.

Basic Ideas in Measurement and Evaluation
Measurement and Evaluation :
The terms Measurement and Evaluation though possess distinctly different meanings are quite often confused and are frequently used interchangeably. The term Measurement refers to quantitative descriptions behaviour, things or events i.e;., behaviour described in terms of numbers.The term Evaluation provides a broader concept than Measurement. While it involves quantitative descriptive i.e., description expressed in words. In other words, Evaluation involves the interpretation of what is measured. In addition to the description of behaviour in terms of numbers and words, Evaluation also includes value judgements about the thing described. In the context of evaluating the achievements of a student in an instructional programme, apart from the performance evaluation, the evaluator considers the effectiveness of instruction in terms of methods, materials and media. Therefore the primary or the basic criteria to be used for Evaluation are the course objectives or instructional objectives or criteria by which worth is determined. Thus defining the instructional objectives (major and minor) clearly and preferably and first step in the process of Evaluation.
Following are procedural steps typically followed in the evaluation of student's achievement :
1. Identification of course objectives (the expected or desired learning outcome).
2. Defining the objectives in behavioural terms (in terms of the learner's terminal
behaviour).
3. Constructing appropriate tools or instruments for measuring the behaviour.

4. Applying or administering the tools/instruments and analyzing he results to determine the degree of learner's achievement in the instructional programme.
Thus above four steps are basically the same in the evaluation of instruction, curriculum or the programme as a whole.Both Measurement and Evaluation require a broad variety of tools or instruments such as tests, rating scales, inventories, checklists, questionnaires etc.

Basic Factors in the process of Measurement :
The process of measurement is generally thought of in terms of quantitative descriptions of the measured phenomena or in other words the process is thought of in terms of numerical test scores.Measurement in Education broadly consists of arranging or ordering of individual learners in accordance with their responses to specific tests connected with the learning that important elements in this process are :
1. The Test Situation (to which the individual learners respond)
2. The Individual Learner's Responses to such situations.
3. Ordering of Individual Learner's based on the assessment judgement of their responses expressed in terms of scores/grades.
1. The Test Situation :The first and foremost element in the process of Measurement is that there will be a number of well-defined test situation top which the individual learners respond. They present stimuli (or tasks) which require appropriate responses on the part of the individuals. Such situations may take a broad range or variety of forms such as :
(a) Essay questions
(b) Paragraph questions
(c) Short-answer questions
(d) True/False questions
(e) Multiple choice questions etc.
Apart from these usual types of test items, certain situation may involve role-play, construction of material objects or manipulation of apparatus etc.These situations may be administered through various types of media viz., printed page, tape reorders, films and film strips, oral instructions, physical gestures or combination of two or more of these. The mode of responses elicited by the test situations may be verbal, non-verbal, cognitive, affective or manipulative.

There are two restriction on any test situations to be observed in the process of measurement :
(i) Each situation must be exactly repeatable from one occasion to another, to the extent possible. It should be identical for all students in the group of learners being tested.Although this restriction could be observation in most of the test situations, it is relatively difficult in the case of oral tests. In the case of oral tests, each situation is normally unique to the particular occasion as the questioning may proceed in any kind of unexpected and unpredictable directions depending upon the examiner's interests and the sequences of responses given by the examinees.
(ii) The test situations will be identical for all individual in the group of learners,
as far as possible.This restriction, does not mean that the test situations will be perceptually identical but only aim at their being able to meet a set of objective specifications.
2. Learner's Responses to Test Situation :It is the learner's responses to test situations that are most crucial and important in the process of measurement, although the first important step in this process is the test situation. The scores or numbers which form the quantitative measurement are based upon what the learner does in response to the test situations. These responses could either be verbal, non-verbal, written manipulation of physical objects etc., depending upon the type of the test.
There are two factors concerned with learner's responses viz.
(a) Direct measurement and
(b) Indirect measurement
Direct measurement deals with the responses as products. It is concerned with the product in and for itself, its qualities and its desirable charactistics.Indirect measurement deals with a symbolic response. One that stands for some kind of process that has gone on behind the scene. The interest of the examiner is not in the symbol itself but in the which is symbolized ; in other words, the probable series of mental operations which the learner has performed and which he ahs indicated by means of a symbol.
3. Ordering of Responses :The basis and fundamental operation in the process of educational measurement involves the judgement by the examiner(s) of the quality or appropriateness of the response with respect to a given situation. The responses of the students will be arranged by awarding scores which permit ordering in an ascending order. In other words, the examiner compares the responses of the examiners and arrives at a decision that one examinee exhibits more ability than the other(s). Such judgements on the part of the examiners are in a way subjective as their decisions generally depend upon what the examiners look for on the student responses, their backgrounds, habits of perception and frames of references. The minimization of such subjectivity is the most essential characteristic of Good Measurement. Dyer3 suggests the following three ways through which the subjectivity in examiners' judgements can be minimized :
(a) The examiner makes his judgements following the actual productions of the responses by students (Essay. question typifies this case).
(b) The examiner anticipates what the responses to the test situation will be and makes his judgement. (Multiple choice question typifies this case).
(c) The examiner attempts attempts in advance to develop models of likely responses that he considers to be good, bad and indifferent, and then, when the actual responses become available, he orders them by matching each one to the appropriate model. (Short answer question typifies this case).

Types of Evaluation:
Evaluation in the context of language may be divided into two main varieties :

(a) On-going evaluation or continuous evaluation and
(b) Terminal evaluation
The first kind of evaluation viz. ongoing evaluation is meant to keep on getting regular feed back at every stage of h programme during its process viz. planning, preparation, production and application, This would enable the programme to see the success or otherwise of it.
Again two kinds of evaluation could be thought of from another point of view and they are
(a) Brief Evaluation and
(b) Extensive Evaluation
(a) Brief Evaluation :Extensive Evaluation involves the analysis of a programme in all its main and sub aspects. The evaluator has to rate and weigh each of them individually and consolidate the total rating based on which he makes his value judgement. This is more objective and valid.The kinds of evaluation have been further classified into two categories viz., Formative and Summative Evaluation.
(b)Formative Evaluation :Formative Evaluation is that process of evaluation which is done from time to time in the case of an instructional programme and from quality either of the instructional programme, the techniques and methods, materials or media.
(c)Summative Evaluation :Summative Evaluation is that kind of evaluation which takes into consideration the periodic evaluations that have been done and in addition, a total evaluation of the programme, process or product is made and the conclusions are arrived at keeping in view the outcomes of the periodic evaluations in addition to the final evaluation.

Testing or Measurement in the Context of Language Teaching/Learning:
Programme evaluation involves the evaluation of the teaching methods, media of instruction, language instructional material in language instr4uctional material in language education in addition to the learners' performance. Language tests are the measuring tools to assess the learners' achievement and therefore, they are applied to the learners and not to the materials or the methods or the teachers. They are designed to measure the learners' knowledge of the language being learnt or his competence-both grammatical & communicative-in the target language at a particular time during the course of language instruction. Such knowledge of the others or with a standard norm that may be fixed. Measurement is what the results of the tests show which in itself does not have much meaning. But the inference or conclusions that can be drawn from the measurement is more crucial and important and this is what is called the Evaluation.As already mentioned earlier measurement and evaluation are distinct from each other, but they are logically related. Measuring the knowledge of a learner is a means of evaluation not only in respect of the learner but also the teacher, the teaching material and the medium of teaching. In other words, the test results are neutral and are to be drawn which fall under evaluation.
An experiment scientifically is the process of testing a hypothesis. In the context of language teaching the term experiment may be defined as the means of relationship between the teaching material, teaching methods and the learners' achievement. In this context, hypotheses are framed such as if method 'A' is used with a set of selected instructional materials, a particular group of learners will achieve a certain amount of mastery of the target language in a fixed period of time. In other words this is nothing but the investigations of the effect of selected teaching material in a specific learning situation on the learners. In order to assess which material is more effective and which method is more advantageous, it would be essential to evaluate the effectiveness of various methods and materials.
Aims and Objectives of Language Testing :The first and foremost aim of language testing is that of research. Whether in the context of mother tongue teaching or second/ foreign language teaching, tests will be essential to assess the quantum of learning that has taken place from time to time. Such testing would be necessary for the evaluation of either language teaching methods, materials or media and it becomes essential to compare the experimental and control conditions to test the research hypotheses.Language tests are generally understood to be used as tools for measuring of the learner's competence or knowledge of the language. Broadly speaking language tests may be said to be of four major types:
1. Achievement
2. Proficiency
3. Aptitude
4. Diagnostic

This classifications is based upon the use of the tests.

Various Types of Tests:

1. Achievements Tests :Achievements tests are aimed at the assessment of what has been learnt by the learner from the language instructional programme. In other words they are aimed at finding out the quantum of language skills acquired by a learner during the course of instruction (or the use of such tests is made to measure and find out how much has been learnt of what has been taught i.e., of the syllabus). The results of such tests are made use of not only for declaring the success or failure of a learner in the examination, but at times they are also made use of in taking decision about the learner's future. The test results are also made use of in making alterations or changes in the syllabus or the teaching method or the presentation of material. Thus the uses of achievement tests are manifold the main use being the testing of learner's achievement. It should be noted here that the only capability of the achievement tests are manifold the main use being the testing of learner's achievement. It should be noted here that the only capability of the achievement test is to indicate how much of the syllabus has been learnt, but it cannot make predictions about the learner's future performance unless the syllabus is designed for this purpose.
2. Proficiency Tests :Proficiency tests are used for assessing what has been learnt may be from a known or an unknown syllabus. In other words such tests are used to find out the knowledge of a learner that is already existing. The different between achievement and proficiency testing is that the example, TOEFL (Testing of English as a Foreign Language), English Proficiency Test Battery, Cambridge Proficiency Examinations and Michigan Test of English Proficiency may be cited as examples of Proficiency Tests.
3. Aptitude Tests :Aptitude tests are made use of in assessing one's proficiency in language for language use. While a proficiency test assesses the adequacy of control in the target language, aptitude test is said to be assessing the amount of linguistic skill required for learning of languages. The distinction between Proficiency and Aptitude Test is very subtle and unclear. Aptitude test may be differentiate from the Achievement Tests in the sense that the former does not necessarily require the examinee to possess any knowledge of the language aimed at teaching and the latter does. In the context of language testing, a Proficiency Test may be said to be taking place at a certain point after the instruction has started and has relation to future non-language performance. An aptitude test is concerned with the inherent aptitude for language learning.
4. Diagnostic Tests :Diagnostic Test differs from the other type of tests in the sense that it relates to the use of information obtained and to the absence of a specific skill in the learner. Achievement, Proficiency and Aptitude are concerned with both use and skills of language, whereas a Diagnostic Test is made use of by the teacher for the information provided from the presence or absence of a part or whole of one of he skills. In other words Diagnostic Test is made use of by the teacher for the information provided from the presence or absence of a part or whole of one of the skills. In other words Diagnostic Test helps in discovering thelearner's deficiencies in specified areas of language learning. Diagnostic Tests generally yield a profile which is of a greater interest than a single total score.
Some scholars have made distinctions between tests and examinations. It may not be out of place to see how the examinations and tests differ from each other if at all they do. Davies notes "The notion of test conjures up vague ideas of psychology and of intelligence, whereas an examination suggests the end of term multi-subject ordeal"4 Some scholars have an advantage in accepting its influence over the curriculum. It may be noted here that the examinations has an advantage in accepting its influence over the characteristics of tests and in addition it has the feature of being able to influence the curriculum. Normally tests and examinations are interchangeably used just as the terms measurement and evaluation are. However a broad distinction between 'Examination' and 'Test' could be made in the sense that the former is normally meant for assessing overall achievement which is not followed up with remedial efforts.
In simple terms it may be said that examination refers to the total area of language measurement and the term 'Test' refers to a specialised part within it and therefore, a test may be conceived as a kind of examination.Before proceeding further with the discussion about the language tests and their construction, it would be necessary to consider some of the criteria that good language tests are supposed to be possessing.

A good test is said to be satisfying the following three characteristics :
(a) The test should be simple.
(b) The syllables should be teachable (It should reflect the syllabus that it covers adequately and without distortion of relative weights)
( c) The effects should be beneficial
(a) By simple what is meant is that the test should be easy to administer and that the examiner is clear as to what is being tested and it should contain all the necessary characteristics of an objectives test.
(b) By teachable what is that the syllabus that forms the basis for the test should be fairly detailed so that the instruction could be effective. It also implies that the instruction should be possible for an average language teacher.
(c ) By beneficial, what is meant is that the test would provide the basis for the improvement of the teaching material or the teaching techniques or the syllabus, while it helps in placing the learners on a comparative scale.
In addition to the above features, good tests are required to be reliable and valid.
Reliability : The examiners would expect their tests to measure as accurately as possible what they intend to measure. In language teaching, reliability can be achieved only by making the tests as objectives as possible. By objectively, it is meant that an impartial speaker of the target language would agree upon the correct and incorrect responses of the language learners. This necessitates the construction of test items in such away that there could be one and only one acceptable correct response to each item.
As all the examinees cannot be said to be possessing the same intelligence, knowledge of the target language etc., the subject unreliability cannot be done away with. Therefore, the concern of the test makers should be to see the reliability of the instrument leaving aside the subject unreliability. Whether the language test is meant to assess the learner's communicative competence or his grammatical competence, we should be clear in our minds that the testing of the total knowledge of the language learner is impossible. Therefore, only a representative sample of the total knowledge that a learner possesses can be tested for which a selection has to be made out f the syllabus based on which are considered to be fair and representative ones. Such selection or the process of sampling depends on the construct that the examiners have. This aspects will be discussed in the next section on validity.
Language tests may be unreliable for various other reasons that the wrong or defective sampling and they are concerned with the language data and the test instructions. For instance a syntactic test item may contain vocabulary that the examinee does not know. Similarly the instructions may be defective in the sense tat they maynot be clear and may be ambiguous or contain technical words that the examinee is not aware of. In order to avoid to avoid these factors which make the test unreliable, the instructions must be precise, clear and unambiguous. The range of vocabulary used in the test must be within the pupil's knowledge. Such confirmations need the testing of tests. This is by trying them out on a sample of the same category as the learners belong to and thereafter eliminating the items hat are found to be defective in the try out. This is what precisely is called the item analysis.

Validity :According to Pilliner5, the validity of any examination or test procedure may be broadly defined as the extent to which it does and what it is intended to do. Four kinds of validity are talked about in the context of tests viz.,
1. Content Validity
2. Predictive Validity
3. Concurrent Validity
4. Construct Validity
1. Content Validity :If the test contains questions that require the subject to perform all the activities that have been taught in the course of instruction and accounts for the strong resemblance between the forms and exercise and the test items and procedures, it is said to possess 'Content Validity'. Establishing the content validity of a test does not require the comparison of results from other tests, but it is a matter of expert judgement. This is therefore, to some extent, subjective and becomes unreliable.
2. Predictive Validity :If the results of a test can be used to predict the success of the examinees in the performances of some other related task, it is said to possess the "Predictive Validity".
3. Construct Validity :If the results of one test are confirmed by a parallel test of the same kind aimed at measuring the same thing they are said to possess what is called "Concurrent Validity".
4. Construct Validity :Needless to say that language test or any test or experiment has a basis. In the context of language testing, it is the theory of language that forms the basis. While making a selective sampling of the linguistic and communicative aspects of language instruction for inclusion in the tests, the selected items are based upon various aspects viz., the linguistic and communicative aspects. If the test represents the items with appropriate weightages on different aspects of language instruction and the distribution of the items is found to be appropriate, then such a test is said to possess what is called the "Construct Validity".

Cronbach6 in his "Essentials of Psychological Testing" has provided the following
table which illustrates these four kinds of validity :

Question asked

Procedure

Principle used

Examples

Predictive

Do test scores predict a certain important future performance

Give test anduse it to pre-dict the outco-me. Sometimelater obtain ameasure of the outcome. Com pare prediction with outcome.

Selection and classification.

Admission testfor Medical students comp ared with later marks.

Concurrent

Do test scorespermit an estimate of a certain present performance?

Give test, obtaindirect measure   of the other per- formance.Compare the two.

Testsintendedas a substitutefor a less convenient procedure.

Group mentaltest comparedto individual test

Contents

Does this test givea fair measure of performance onsome important set of tasks?

Compare the items logically to the   content supposed to be measured.

Achievementtests.

A test of shorthand ability isexamined to

Construct

How can scores ontest be explained psychologically?

Set up hypotheses. Test them  experimentally by   by any suitable procedure.

Tests used for description oror in scientific research

A test of artaptitude isstudiedto determine howlargely scor-es depend on art training,experience inWestern Cul-ture etc.

CRITERION-REFERENCED VERSUS NORM-REFERENCED TESTS

The tests whether in language or in other discipline may be broadly classified into two categories viz., Criterion-referenced and Norm-referenced Tests.In the context of the validity of language tests we have already talked of the fact that a test can be considered to be valid if and only if it successfully measures what it is supposed to measure and nothing else. Normally in the second language testing the usual aim of the tests can be attributed to what may be called "Fluency" or "Command of the basic grammatical structures" and/or "ability to function in the target language in real life situations". While these attributes to second or foreign language testing sound concrete, but still more narrowly specified and operational definitions would be necessary. In other words the pre-requisite of the making of a test is that the objectives are clearly defined in what may be called the behavioural terms. Such objectives should to the extent possible be specified in terms of the learner's terminal behaviour. These objectives stated in behavioural terms form the basis or the criteria for the construction of a test. Thereby, it can be said that the bases of the bases of the test are the criteria set forth as mentioned here. Such tests are usually called the Criterion-Referenced Tests.
Norm-Referenced Tests :
The Norm-Referenced Tests are those which aim at placing the examinees based on their performance in the tests against a pre-defined degree of achievement or norm. The main consideration in such tests is the norm that is predetermined against which the examinees are placed at different points in the scale with reference to the standard scores or percentiles set up. The key concept of such tests is comparing an individual with others of his references group. Therefore, such tests are called Norm-Referenced Tests.It is needless to say that a test can be both Criterion-Referenced and Norm-Referenced. From the point of view of the types of test items and the skills that are aimed at testing, it may be called the Criterion-Referenced Test and from the point of view of placing the examinees on a scale with reference to a set norm that might determine the passing or failure, it may be called the Norm-Referenced Test.