Before a test can be normed/standardized, it is important to conduct a pilot with a number of people from the target group (Mann & Haug, 2014, p. 133). For example, before using the previous mentioned (from SLPI adapted) NFA test (interview) which targets at people learning NGT in a higher education institution, the test developers piloted their first version of the test with all the signed language teachers in the team. The goal of such a pilot is to see if the test instructions, test items, procedure, rating procedure etc. work as planned. For example, when rater on a productive test have a huge difference in how they judge the signed language competences of a L2 learner, the rating scale/procedure needs to be revised. Or when students who already graduate from a four years BA programme perform worse on the same receptive test than first year students, one needs to investigate the reason and revise the test.
After revising a test, test developers can start with a main study. During the main study we also need to establish the psychometric properties, namely validity and reliability of a test (see section 1.2).
To stick to the example with the NFA, below you find a schema of the procedures (Figure 4) that were followed to train the signed language teachers in working with the NFA. Because the NFA was adapted from the SLPI, the adaptation process is included here too.
Figure 4 Adaptation and training schema for NFA in ISL&D (from Boers-Visker et al., 2014)
Once the documents were translated from English into Dutch, the team was trained in two sessions of one week, during one year. In between training sessions with the SLPI expert from the USA, the teachers practiced interviewing and scoring with each other monthly; teachers who were L2 learners of NGT were used as candidates. In teams of two signed language teachers, all interviews were scored and discussed with the whole group to establish and validate the scoring norms. For each interview the inter-rater reliability scores were calculated and discussed (via email) with the SLPI expert from NTID/RIT, USA. Also, during the training sessions, the documents were fine-tuned to the NGT grammatical features, and terminology was unified in all documents, and discussed again during training sessions. This whole process took approximately 2 years. When the team was confident that the interviewing process was acquired satisfactorily by all teachers, and the scores were reliable, the NFA was offered to learners in a pilot study. This was done with parents learning NGT in an extra-curricular course. Currently these data are being analysed.
The goal of this assessment framework was to provide guidance and concrete steps to signed language teachers and test developers developing signed language assessments which can be applied to the CEFR. Where possible we provided hands-on materials to the reader by providing concrete examples of signed language tests used within a CEFR-aligned curriculum. We encourage all colleagues to share their materials and experience with others so that we can move forward with CEFR-aligned curricula across Europe.
The first step in the testing cycle is to identify why a test needs to be developed. It can be used to evaluate the knowledge, understanding, ability and/or skills of the learners (efsli 2013, p. 12) or it is needed to evaluate changes in the curriculum (Mann & Haug, 2014, p.131). For example, students enrolled in a signed language interpreters’ program are required by program regulations to take a course and program final exam in signed language before completing a course/ graduating (Leeson, Wurm, & Vermeerbergen, 2011).
With a view to the feasibility of test development (see 1.2) it is necessary to identify possible constraints: how much time is there to develop the test, how many test takers are there to be evaluated, what is the available budget or which technology is available? For instance, for Sign Language of the Netherlands (NGT) vocabulary assessment, test-software was developed at the Institute for Sign, Language & Deaf Studies at the University of Applied Sciences Utrecht (UUAS in the Netherlands, called Provisto (see Figure 2)). Constraints were: available technology and usability, available time to film NGT vocabulary and deadlines, available time to have pilot-tests to determine validity and reliability (see 1.2).
Score reporting refers to how the test taker will learn about his or her test results (Mann & Haug, 2014). Students graduating from a BA interpreting programme will learn if they passed or failed their final exam in signed language by mail in a fully Web-based implemented signed language testing system (e.g., Haug et al., 2014) or they will receive written notification by regular mail from their university.
Another important issue is the interpretation of test results which is used to inform decisions related to placement in a signed language class or to obtain the qualification for a job.
It depends on the content and the testing method to determine the appropriate rating procedure (Haug, 2011). An appropriate rating procedure will be different assessing signed language production or reception. For example, for a receptive test that uses a multiple-choice format (see Example 1 for Signed Language Reception above), the rating will be automatic when implemented into Web-based signed language tests, e.g., even when for L1 testing a receptive test for DGS (Haug et al., 2014).
A rating procedure for signed language production is more difficult to achieve. For instance in an interview, like the adapted NGT Functional Analysis (NFA) version of the Sign Language Proficiency Interview (SLPI) which was developed for ASL, where a tester is engaged into a conversation with a test taker. The conversation is video-recorded and will be analysed later by three raters according to pre-defined criteria on a scoring-sheet, such as The individual Rater Worksheet B (See for other ASL SLPI rating sheets here). The quality of any rating procedure will be evaluated during the pilot and main study (see also 2.8).
A schema (Figure 3) might enlighten the rating procedure for the NFA:
Figure 3: Schematic presentation of NGT Functional Assessment procedures (van den Broek, Boers-Visker, van den Bogaerde, 2014)
Above we provided an example of interaction test for level NGT A2 (see Example 3, in 2.4). Here we provide a template for an assessment sheet, which was developed for this test. This example of an assessment sheet was developed for and used in a workshop organized for teachers and other professionals (26 November 2013), by the ATERK team that works in the Netherlands to align NGT teaching and assessment to the CEFR for signed languages.
Download Assessment Interaction test, level A2
In order to develop a test, it is necessary to provide procedures (or a “blueprint”) which describe what needs to be done (Mann & Haug, 2014, p. 133) in what order. The different steps follow from the purpose, the design, the content of the test as well as the test method, and form the framework for the test properly. Other issues are the familiarity of the test to the test administrators, the location etc., as well as establishing procedures for test archives, and the establishing of the validity and reliability of the test.
A concrete example is to create a Word document which describes every step and “sub-steps” that need to be done to develop the test. For example the following information can be included into the test specifications: (1) people who will be involved in the process, (2) development of test items, (3) materials be used depending on the target group, (4) description of the test procedure, (5) environmental factors such as test administrator, test site, time of the day, (6) psychometric properties, (7) process of test development and (8) pilot and main study (from Haug, 2012), (9) milestones that need to be achieved.
Following the definition of the purpose of language assessment, the content needs to be defined. Within the CEFR the content of assessment is “predetermined” within the domains of:
Below follow some examples for the three domains of test items, taken from CEFR-aligned curricular from different signed languages
The purpose of the test actually determines the type and form of the test and is related to the validity of the test (see 1.2). The purpose should be clearly defined, and the testing method should be appropriate to the purpose. Is it an achievement test (see Test specification below or see 1.1) or a proficiency test (efsli, 2013, p. 12/13 or see 1.1) and does it concern formative or summative assessment?
PRO-Sign self assessment example