Computer-based testing: tests delivered via a stand alone computer or the internet. This has been a growth area in testing over the last fifteen or so years and there it can be argued that the greatest benefits of computer delivery are still to come.


CBT offers better value-for-money, speed of response and technical innovations like adaptive testing. There is still some concern that CBT allows testing to be delivered at a distance, thus reducing crucial human interaction. But computer-based testing is, arguably, the predominant means of delivery so the role of test developers, publishers and users is to improve day-to-day practice.




When you measure anything – from someone’s height to the  number of people you can squeeze into a Mini - you make an error. If you repeat the measurement you’ll get slightly different answers. As an example, try to work out how much tax you should be paying three or four times !


This phenomenon results from the accuracy of the measuring instrument, how you go about the measurement, and chance factors when the measurement took place. Classic Test Theory applies this phenomenon to psychometric testing. You measure someone’s ability and get “ Observed scores”. Underlying these are “True Scores” .


A lot of the statistics used in psychometrics are designed to show you what the “true” score is likely to be and therefore how much you can depend on the scores you get. This is one of its strengths of psychometrics.





What goes on inside the human mind; perception, memory, thinking, reasoning. Tests are specifically designed to make explicit what is often hidden inside someone and indicate how these might affect behaviour. That’s why tests are often more complex than they at first seem.




This is the subject of much academic debate, down to how you should spell the word

“ competence” and what its different spellings mean. At the risk of adding to an already crowded field, we can define a competence as:


 “ A skill, ability, pattern of behaviour or underlying characteristic which you can define and measure and which contributes to the successful completion of a task or job “.


More usefully:


  • You can define a job by the competences that are needed to do it well, drawing these competences from a job description or successful people in the role


  • You can then define the successful applicant for that role by these competences...


  • ...decide the best way to measure each one ( by test, interview, experience etc) and...


  • match the two


There are many off-the-shelf competence frameworks available: some companies develop their own.




VALIDITY tells us how far a test is measuring what it says it’s measuring. Concurrent validity is a way of finding this out. The score on a test ( or some other sort of selection exercise such as an ASSESSMENT CENTRE or a scored interview ) is correlated with an external criterion of performance. This could be a supervisor rating or scored appraisal system, a specific form or, in certain cases, actual performance data – for instance current sales figures.


So if people who tend to score high on a sales test also tend to achieve high sales figures ( or goiod bonuses, or good ratings in their appraisal ) it will be seen as valid.


Scores on concurrent validity are reported in good test manuals and help you decide on how well a test measures what it claims to measure.




These relate to Classic Test Theory ( see above ).


Let’s say you get a percentile score of 5 on a test ( we explain percentiles later on ). We know there’s an amount of error in that score. Confidence intervals show how confident you can be of where the true score actually is. It might say

“ You can be 50% certain that the score lies between 4 and 6”. Confidence intervals are often shown as bands on results sheets.


They’re useful because they prevent you over-interpreting scores. One’s immediate reaction in seeing A scored 40% and B scored 42% on it is that A did better than B. Confidence Bands show how far you can rely on scores and points you to those differences which probably mean something significant.




Most tests seek to measure an intangible construct ( Good Faith or Common-Sense or Extraversion ). Construct Validity shows whether the test is actually doing that, usually by correlating scores on a test with scores on other measures claiming to be measuring the same thing.





Actually it’s the “ norms” ( see below ) that are criterion-referenced not the test, but we’ll let that pass.


Examples of criterion referenced tests are the driving test, school and university exams and professional qualifications. In effect the test differentiates between people who can and cannot reach an external criterion : meeting the standard thre government lays down for people who are allowed to drive or knowing the sort of Maths the government and its advisers say a 14 year old should know, for instance.


Criterion referenced tests are related to real behaviour which makes them more acceptable to some people. A driving test seems related to the actual practice  of driving whereas some people find it hard to see how, say, questions about your behaviour at parties in a norm-referenced personality test, relates to work as a telesales operative. But, in the end, criterion-referenced tests are only as good as the criterion they relate to. What does a degree in English tell you about a person? Does the driving test really make you safe to go out on the

road ?




A term, that used to describe two sorts of validity: predictive and concurrent. At the risk of making you impatient, we go into those terms elsewhere.




The ability to regurgitate existing learning: what you know and how well you access it. Its part of “g” the general factor of intelligence. See also INTELLIGENCE and FLUID INTELLIGENCE.






One type of standard score on a test. Deciles divide scores into ten groups of equal frequency ( in other words, 10% of the scores are included in each section) so that you can compare them.




In work psychology terms, any process designed to help a person overcome their weaknesses, give then new knowledge or skills, prepare them for a new job or improve their personal interactions with others. This can take the form of training, coaching, private study and acquiring new qualifications.


Tests can be used to decide whether a person needs development; if so, what kind and in what form; whether the development is working and what difference its made to their performance. Tests are used widely in recruitment and selection but in the last few years they’ve become popular as a basis for development activities.




A generic name for a type of test of behaviour. It’s based on work done in the 1920s and measures people against 2 axes ( Assertiveness / Passivity and Openness / Control). This is a very common type of assessment though specific examples take slightly different approaches.




Another word that causes problems.


Tests discriminate. They show how people differ in ways that are relevant to a job. For instance, Person A may be good at working out numerical problems and be highly  rule driven, whereas Person B may be extremely creative and better with words. A would seem to be more suited if you’re recruiting for a financial job, B for a job in an advertising agency or promotions department.


But discrimination has come to mean acting unfairly to people because they belong to a group to which certain arbitrary, false qualities are ascribed.


There has been confusion between these two uses which has resulted in some criticism of tests. In fact, one of testing’s strengths is that it replaces unacknowledged prejudice ( in interviews for instance ) by objective measurement which gives justification and an audit trail for crucial decisions made about people.




This is an increasingly important area of HR practice. It’s partly driven by legislation on equal opportunities,  You could define diversity as:


“ the extent to which a worforce reflects the make up of society”


It tends to be used to refer to specific areas such as gender, ethnicity and disability. Does a workforce reflect the ethnic diversity of the UK or is deliberate or unacknwledged prejudice preventing one ethnic group being represented  in proportion to their presence in the UK population.


But diversity doesn’t just relate to issues such as race, age and gender. Diversity of interests, personalities, values, skills and abilities are crucial to a successful organisation. Such diversity creates its own management challenges in helping very different people to value difference and work together. But a uniform organisation cannot react to change or do a variety of tasks equally well.


Tests and assessments are designed to look at the relevant differences between people and, if used well,  help put together diverse organisations.