“Do you conduct inter-rater reliability testing?” is a question that is included in the 29 Quality Assurance Mistakes to Avoid e-book and self-assessment. This e-book contains diagnostic questions to uncover common errors made within Quality Assurance programs in contact centers. Benchmarking is not always helpful if you want to break-away from the pack and deliver an exceptional customer experience. Actually, benchmarking has a tendency to hold you back to the average and can limit you from realizing your true potential. This article pinpoints one of the biggest benchmarking blunders of all time in contact centers. This e-book was developed to help you go past being average and improve your QA Calibration practices.
Why is QA calibration, instead of Inter-rater Reliability testing a problem?
Have you heard the expression about the man with only one clock? He’s the man who always knows what time it is, but the man with more than one clock is never quite sure. This is exactly how call center managers should feel when there is more than one person grading calls for internal quality assurance purposes. The answer to the drive toward consistency is one of the common not-even-close-to-best practices that have permeated the industry: QA Calibration.
Contact centers became aware of the challenges that arise when there is more than one quality monitoring person (rater), team leader, or peer reviewer, they migrated toward the formal process of QA Calibration. You would think this makes sense, doesn’t it? You want to develop consistency across the raters. Include the calibration process into QA. And depending on the number of raters, monthly or weekly sessions are conducted to compare scores given to four or five calls. After listening to and scoring the calls, discussions are held to fine-tune the raters’ perception of the call against the scoring rubric. Curious, how much time is spent in your center to focus on QA Calibration?
Let’s be clear, you must have a reliable method of calibration; there is a risk of lowering the morale of your agents through the very efforts to improve it; there is a possibility of exposure to lawsuits for wrongful termination or discriminatory promotion and reward practices; and when it comes to managing quality, the process resembles spinning wheels. Ultimately, there will be no or little return on all of the money, time, and management attention spent on quality monitoring.
Getting back to the man with more than one clock, let’s apply that old adage to your QA Calibration session. You have a room full of performance reviewers, each with a clock. The QA Calibration exercise is to determine what time it is by using the times on each clock. With the different data points (time), the discussion focuses on how best to combine the information to determine what time each clock should display. Similarly, each person has a score of a call and the group uses each score to determine the correct score for the call.
To properly set a clock, wouldn’t it be more accurate if one clock was the National Institute of Standards and Technology (time.gov) to which the others would be matched?
Let’s be even clearer, calibration is actually a term used all too loosely in most discussions. Properly understood, it’s actually an umbrella term that must encompass two main components: validity and reliability. The traditional QA Calibration practices in contact centers do not test for validity, they merely test for reliability. Because of this, you actually create a risk to your company standards as your raters may increase their reliability in scoring as they move father away (as a group) from the targeted (valid) optimal quality target.
Science has provided an answer for us. It’s the method taught by Customer Relationship Metrics and thousands of institutions of education around the world. It is designed to ensure the implementation of methods of calibration that are both valid and reliable.
While there may be other methods of reliability testing that may be applied to the practice of call monitoring, only one really fits the bill. It is a particular reliability test known as Inter-Rater Reliability (IRR). It does help to attain high degrees of fairness to your agents for evaluated calls. Agents deserve to know, with a high level of confidence, that observed calls will be scored consistently, no matter which member of the QA team scores them. And they need to know that they are scored well. Coaches and other management need to know that quality monitoring scores give reliable insight about the performance of the call center and about the performance of agents on any individual call.
QA Calibration sessions to determine Inter-Rater Reliability are more than mere conversations and round table discussions. They help to deliver a rigorous approach to quantitative measurement. For example, the first question to be answered when using Inter-Rater Reliability is: “How consistent are we?” But, since we know, that alone is not enough. A second question that is answered is: “Are we where we should be?”
In other words, certainly the goal of Inter-Rater Reliability is to ensure that each member of the Quality Assurance staff is grading calls consistently compared to his or her peers. However, a high degree of consistency between the members of the Quality Assurance Staff does not necessarily ensure that calls are being scored correctly, in view of organizational goals and objectives. This final step requires that a member of the management team take part in each Inter-Rater Reliability study, acting as The Standard, defining the correct scoring for each call.
The IRR study analyzes the grading of each member of the Quality Assurance team, statistically compared to the grading completed by The Standard. The purpose of The Standard is to define the “correct way” of scoring a call. Such quantitative analysis studies the relationship of movement between variables over time; the analysis will be supplemented with a view of defects (scoring differences between members of the QA members and The Standard).
To have a consistent and fair performance assessment, the traditional calibration model alone is inherently risky. IRR studies are repeated regularly to ensure a quantitative check on the process, like setting the clock with the official world time. Keep setting your clocks by looking at the other clocks with errors will keep you in the category of not ever really knowing what time it is. Traditional QA Calibration alone ensures that you never really know how your agents are performing.
- Time to Stop Customer Feedback - September 2, 2015
- 3 Things Enable Agents to Increase FCR - January 15, 2015
- What side of the quality assurance argument are you on? - October 23, 2014
- Yes, You accidentally cause agent burnout - August 22, 2014
- Top 4 Reasons Quality Fails - July 31, 2014
- Why consistency with QA calibration may make you inconsistent - March 20, 2014
- Why QA must generate a company score beyond VoC - March 13, 2014
- What’s the right number of things to measure on your QA form - February 26, 2014
- Why FCR is not a contact center metric anymore - February 20, 2014
- Quality Assurance Optimization Requires Transformation - December 9, 2013