THE  A-Z OF TESTING

 

 G

 

GUIDANCE

 

Some tests are used to inform a person’s own decisions – what sort of career to go for, for instance – rather than to make decisions about them – as in recruitment and selection. This is known as guidance.

 

Tests can be powerful bases for rich discussions about many different life changes and, as counselling and coaching become popular means of developing staff, particularly senior ones, tests are being used more widely than they used to be in this area.  In particular “interest inventories”, assessments which clarify a person’s real interests ( as opposed to their ability in an area ) are useful here.

 

H

 

HISTOGRAM

 

Simply a way of presenting a frequency distribution ( see above ) graphically, as a bar chart

 

I

 

INFORMED CONSENT

 

Ensuring that the test taker understands and agrees to the terms and conditions under which the testing takes place: how the results will be used; why the testing is taking place etc etc. This can be done by the administrator or via an introductory screen on a computer based testing system.

 

Recent developments have stressed the need for a two way commitment at the beginning of the testing session:

 

  • the tester gets the consent to the process by explaining what’s involved

 

  • the test-taker commits to answering honestly. Making the test taker commit to this formally reduces the incidence of cheating or distortion

 

INSTRUMENT

 

Well, you’ll see this word used quite a lot in manuals and advertising copy about tests and assessments but it's difficult to say it has a technical definition. Let’s say it’s another term for a formal method of making a decision about something and its usually a “thing” – a book, a test, a form, a piece of software – rather than a process.

 

 

INTEGRITY

 

Integrity tests are sometimes called “ honesty tests”.

 

Assessing “integrity” is a huge growth area in the USA ( as many as 5 million people being assessed for integrity each year ). In Europe, employers tend to try to assess this via interviews but this can be a minefield: unstructured interviews are extremely poor predictors of anything and blunt questions about honesty are fairly easily answered by candidates to the satisfaction of the interviewer.

 

The area of testing integrity has been controversial: does it impinge too much on private life ? ; isn’t it fairly obvious how you have to answer the questions in an integrity test ?; given that any measurement contains error you’re going to suggest people are dishonest when they aren’t – which is a fairly serious accusation.

 

Professor Adrian Furnham summarises the approach taken by integrity tests as follows:

 

  • Direct, explicit admissions of dishonest behaviour (lying, cheating, stealing, whistle-blowing)

 

  • Opinions/attitudes about the acceptability of dishonest behaviour (prevalence in society, justification of causes)

 

  • Traits, value systems and biographical factors thought to be associated with dishonesty

 

  • Reactions to hypothetical situations that do or do not feature dishonest behaviour.

 

The impetus for the growth in integrity testing has been research into the huge cost of white collar crime; high profile cases like Enron and the research of psychologists into management malpractice and its causes..

 

Whatever the criticisms, integrity testing seems to predict later job success surprisingly well. The reason for this is that integrity tests often actually measure one of the Big Five factors – conscientiousness – a particularly robust measure of work performance.

 

INTELLIGENCE

 

Intelligence is important in work performance so here is a very short introduction to some issues.

 

Mike and Pam Smith in their book TESTING PEOPLE AT WORK give a particularly useful definition of intelligence :

 

“ the relative speed and accuracy with which the brain processes complex information.”

 

As is mentioned elsewhere in our A-Z, people will have different abilities with different sorts of information: words, numbers, diagrams etc. We also define two different sorts of intelligence: crystallised ( what you’ve learnt and your access to that knowledge ) and fluid ( your ability to make sense of a situation ).

 

The concept of general intelligence ( which you’ll often find referred to as ‘g’ ) is that if you’re good at one type of mental problem you’ll TEND to be good at all of them. As we’ve said, individuals will have different strengths and weaknesses (good with verbal information, less good with abstract reasoning ) but this tendency ( good at one, tendency to be good at all ) has been shown repeatedly. In other words there is such a thing as “ general intelligence” or “ smarts” or being “ quick on the uptake” and it can be measured. Many of you will know your own Intelligence Quotient  or IQ which measures general intelligence.

 

For some time IQ fell into disrepute as some people doubted its existence or saw it as being culturally biased. At this stage intelligence was often defined as “the ability to perform well on intelligence tests” which is not that useful to a hard-pressed manager !  Interest in multiple intelligences ( see Emotional intelligence, above ) was a reaction to this.

 

But research has increasingly established general intelligence, shown that it’s correlated with job success and that it’s related to the physical functions of the brain.

 

All this said, classic tests which produce an IQ tend to be used more in educational and clinical testing. Business testers tend to use tests of specific ability ( verbal, numerical; etc etc ) because it’s easier to see how an individual’s strengths and weaknesses in different abilities relate to the content of a particular job, whereas knowing someone has an  IQ of x provides a label, but a less obvious basis for actions and decisions.

 

In addition there’s still a lot of misunderstanding of intelligence and organisations are worried about being challenged over the use of classic IQ tests, either because of bias or because IQ items are not obviously related to specific jobs.

 

Fundamentally though, anyone recruiting, managing or developing people should be aware of the importance of intelligence and how intelligence affects performance. Just to give two examples:

 

  • there’s quite a bit of evidence that a team or meeting leader should NOT be the academically smartest person in the room. That will stifle the diverse team work that leads to excellence
  • in many jobs ( though not specialist ones ) academic intelligence looks like a hurdle. You need a certain level but, past that, greater intelligence doesn’t help and may get in the way.

 

INTERESTS

 

Attitudes towards things you do whether work, hobbies, duties or day by day activities. Interest Inventories are types of assessment which help people understand their real interest and are useful in career guidance.

 

INTER-RATER RELIABILITY

 

If two different people assess the same thing, do they come up with the same score.

 

Let’s use sports as an example. For certain events you need very little human judgement: if someone jumps higher than everyone else they win the high jump. The only judgement involved is when the referees measure how high the bar is. They might misread the instrument used for measuring the height.

 

In other sports, ‘judgement’ is much more important: ice dancing, synchronised swimming, acrobatics are examples. Here, judges decide on scores and you sometimes see different judges giving the same performance very different scores. There’s controversy when one performer is given much higher scores than another but other experts ( and sometimes media without much expertise ) claim the scores are unfair.

 

So, when developing a test, researchers measure whether different people in different situations, measuring the same person, will record either the same score, or scores whose difference simply reflects the inaccuracies present in every act of measurement ( see CLASSIC TEST THEORY )

 

Inter-Rater Reliability is also important in techniques where there’s a subjective element involved: for instance where you’re running an assessment centre and someone has to rate someone’s performance on a group exercise. Usually a standard rating scheme is created and the “ raters” observing the exercise are trained carefully so they score the same things in the same way.

 

To really see the importance of Inter Rater Reliability watch programmes like The X Factor. Initially you have 3 judges with very private agendas ( and no objective scoring system ) having spats over an individual singer. There’s no inter rater reliability here, let alone when the public votes on who stays in the programme and who gets thrown off it.

 

IN-TRAY EXERCISE

 

As it implies, an exercise which tries to mirror real office work by providing an in-basket of memos, letters and other documents which a candidate has to prioritise and deal with. An In-Tray exercise is often a component of an ASSESSMENT CENTRE ( see our entry in the A-Z).

 

Originally In-Tray Exercises were carried out physically: a candidate was given a tray of real documents. This made it difficult to mark their performance and to tailor the documents to a specific organisation.

 

As work has moved on-screen so have these sorts of exercises which are now easer to customise and score objectively. On-screen  exercises now mirror the real flow of office life with e-mails , messages, requests for meetings and urgent decisions appearing throughout the process.

 

IPSATIVE

 

Ipsative tests measure something within a person rather than comparing that person with other people ( the test comparing the person with other people is called a NORM-REFERENCED TEST ). An ipsative test measures a person’s score on one scale or factor with their own scale on another factor.

 

For instance, an ipsative test might measure which of the following areas a person was most interested in, which was their next strongest interest down to their weakest interest:

 

SPORT

ACADEMIC WORK

CHARITY WORK

IDLENESS

BUSINESS

 

The results would show the relative strength of that person’s interest in these areas. A norm referenced test, by contrast, might show  that a person was more interested in sport than 70% of the adult population in the UK but less interested in business than 90% of the population in the UK.

 

Norm-referenced and ipsative measures have different strengths and weaknesses and can be used for different purposes. There is a huge literature on the technical merits of each approach. In recent years more companies have seen the benefits of ipsativity and often use norm-reference and ipsative measures next to each other to get more rounded views of an individual. Ipsative measures:

 

  • are more difficult to cheat
  • in many ways they mirror individual’s actual experience of doing real work. It may be that you’re better than 50% of other UK accountants at numerical reasoning but if you prefer to be creative rather than follow rules it might suggest how you’re going to use that ability

 

There are a number of technical implications of this approach to testing which we’re more than happy to talk through with users and prospective users.

 

IQ

 

See intelligence

 

ITC

 

The International Test Commission. An international body which, among other things, lays down guidelines for testing which seek to bring together international views in an increasingly multinational activity.

 

ITEM

 

A question or a statement in a test that a candidate has to ( respectively ) answer or react to.

 

ITEM BANK

 

See ITEM RESPONSE THEORY

 

ITEM RESPONSE THEORY

 

A real growth area in test theory and applications at the moment as this theory underlies a lot of work on ability item banks and adaptive testing systems.

 

Item Response Theory is an area of academic work and theory which deals with the individual characteristics of an ITEM ( see above ) and how likely it is that a person will make a correct response to an item – in other words, how difficult it is.

 

Once you know this about items you can form a huge database of them ( an ITEM BANK ) and use algorhythms to create tests which are made up of different  items but which are equivalent in difficulty. This is central to ADAPTIVE testing and CBT.

 

To give an example; let’s say you want to assess a pool candidates’ who live a long way away on a verbal reasoning tests. You’re going to assess them over the internet but you want to reduce the possibility of them swapping answers. Using an item bank, generated test each candidate would be presented with a test which was of equivalent difficult ( so you could compare their scores), but the individual items would be different and have different answers.