Our BPS accredited occupational Test User Training courses are professional qualifications that are designed for those who use psychometric assessments or cognitive ability tests as part of their role.
These qualifications are based on a syllabus of assessments developed by the British Psychological Society (BPS) and enable you to use, administer, interpret and provide feedback on psychometric assessments.
Course Contents
Learn how to choose appropriate tests
Understand technical properties of cognitive ability tests and Personality Questionnaires including reliability and validity statistics, norm groups and design rational
Learn how to assess personality, interpret results and deliver feedback sessions
Administer tests in both paper and pencil and online formats
Calculate individual and group test scores
Learn how to use tests legally, ethically and fairly and how to correctly interpret results
Apply learning through case studies and practice sessions
Bias results when test performance is affected by unintended factors and those factors are not evenly distributed between groups. This results in group differences in test performance that are not related to the constructs the test is intended to measure. For example, a test of numerical reasoning that uses a lot of text may be biased against people who have English as an additional language. Group differences do not result from different levels of numerical reasoning ability, but from questions being more difficult for some due to their use of language.
Test developers may address bias through some or all of the following:
. Providing a clear rationale for what the test is, and is not, intended to measure
· Reviewing content to ensure it is accessible and free from complex language
· Ensuring scoring is automated and objective (i.e. free from user bias)
· Providing evidence of any group difference in test scores
· Examining the effect of group membership on individual questions – sometimes referred to as ‘differential item functioning’ or ‘dif’
· Ensuring norm groups used for comparisons are representative of the populations they reflect
· Providing guidance on using the reports and interpreting constructs measured
Reliability is an indicator of the consistency of a psychometric measure (Field, 2013). It is usually indicated by a reliability coefficient(r) as a number ranging between 0 and 1, with r = 0 indicating no reliability, and r = 1 indicating perfect reliability. A quick heads up, don’t expect to see a test with perfect reliability.
Reliability may refer to a test’s internal consistency, the equivalence of different versions of the test (parallel form reliability) or stability over time (test-retest reliability). Each measures a different aspect of consistency, so figures can be expected to vary across the different types of reliability.
The EFPA Test Review Criteria states that reliability estimates should be based on a minimum sample size of 100 and ideally 200 or more. Internal consistency and parallel form values should be 0.7 or greater to indicate adequate reliability, and test-retest values should be 0.6 or greater.
Most test scores are interpreted by comparing them to a relevant reference or norm group. This puts the score into context, showing how the test taker performed or reported relative to others. Norm groups should be sufficiently large (the EFPA Test Review Criteria states a minimum of 200) and collected within the last 20 years. Norm groups may be quite general (e.g. ‘UK graduates’) or more occupationally specific (e.g. ‘applicants to ABC law firm’).
A key consideration is the representativeness of the norm group and how it matches a user’s target group of test takers. It is therefore important to consider the distribution of factors such as age, gender and race in norm groups to ensure they are representative of the populations they claim to reflect. This is particularly important with norms claiming to represent the ‘general population’ or other wide-ranging groups. Occupationally specific norms are unlikely to be fully representative of the wider population, but evidence of their composition should still be available.
Validity shows the extent to which a test measures what it claims to, and so the meaning that users can attach to test scores. There are many different types of validity, though in organisational settings the main ones are content, construct and criterion validity. Reference may also be made to other types of validity such as face validity, which concerns the extent to which a test looks job-relevant to respondents.
Content validity relates to the actual questions in the test or the task that test takers need to perform. The more closely the content matches the type of information or problems that a test taker will face in the workplace, the higher its content validity. For tests such as personality or motivation, content validity relates more to the relevance of the behaviours assessed by the test rather than the actual questions asked.
Construct validity shows how the constructs measured by the test relate to other measures. This is often done by comparing one test against another. Where tests measure multiple scales, as is the case with assessments of personality and motivation, it is also common to look at how the measure's scales relate to each other.
Criterion validity looks at the extent to which scores on the test are statistically related to external criteria, such as job performance. Criterion validity may be described as 'concurrent' when test scores and criterion measures are taken at the same time, or 'predictive' when test scores are taken at one point in time and criterion measures are taken some time later.
Construct and criterion validity are often indicated by correlation coefficients which range from 0, indicating no association between the test and criterion measures, and 1, indicating a perfect association between the test and criterion measures. It is difficult to specify precisely what an acceptable level of validity is, as this will depend on many factors including what other measures the test is compared against or what criteria are used to evaluate its effectiveness. However, for criterion validity, tests showing associations with outcome measures of less than 0.2 are unlikely to provide useful information and ideally criterion validity coefficients should be 0.35 or higher. The samples used for criterion validity studies should also be at least 100.
Overall, whilst a publisher should provide validity evidence for their test, validity comes form using the right test for the right purpose. Therefore, users need to use available validity evidence to evaluate the relevance of the test for their specific purpose.
Please ensure you add the cost of the product (from the cost section) first before adding any of the reports, additional materials or any other costs.
You can add a report even if it is free or £0. This will ensure our supplier is aware of your requirements fully. Please contact us if you have any queries.
We are pleased to know that you found this review ‘useful’. To help us maintain the trust of our user community, please use the following login options.