Designing realistic SLAs from your historical ticket data

Most teams pick SLA numbers from a vendor template or a manager's gut feeling. Then they miss them, the breach count goes up, and after a quarter of red dashboards everyone learns to ignore the alerts. The problem is rarely the team's effort — it's that the targets were never anchored in reality. The fix is to start by reading what your queue actually does, not what you wish it did.

Pull resolution times from existing tickets

Before configuring a single SLA in Setup > SLAs, open the ticket search. Filter by status Solved or Closed, narrow to the last 6–12 months, and add columns for Priority, Date of opening, and Resolution date. Export to CSV.

You want one column showing minutes or hours between open and resolve, grouped by priority. In a spreadsheet, that's one formula. What you'll see is rarely a clean curve — most queues have a long tail of tickets that sat for weeks because someone forgot them. Those outliers will mislead you if you average them in.

Median and 90th percentile, not average

Averages on ticket data are dominated by the worst cases. A queue where 90% of P3 tickets close in 8 hours and one sat unattended for three weeks will average to two days. Setting an SLA at two days means you can hit it while doing nothing for half the queue.

Look at two numbers per priority:

Median (P50) — half your tickets resolved faster than this. This is your team's natural speed.
90th percentile (P90) — only 10% of tickets took longer. This is roughly the bar you can hit consistently without changing how you work.

If P50 for a priority is 4 hours and P90 is 18 hours, an SLA target of 24 hours says nothing useful about your service. A target of 12 hours forces the queue to clean up its outliers without breaking the work that already runs well.

Set targets slightly tighter than current P90

If you set the SLA equal to current P90, you'll hit it 90% of the time and learn nothing. The point of an SLA is to be a forcing function — slightly uncomfortable, not impossible. A reasonable starting position: SLA target = P90 minus 20–30%. Tight enough that you have to look at outliers, loose enough that you're not paging the team at 2 AM over normal variance.

Different priorities should differ by more than the labels. A P1 SLA of 4 hours and a P2 SLA of 6 hours is not really two tiers. If the numbers don't drive different behavior, you have one tier with two names.

When the historical data lies

The numbers reflect what your team did under past conditions. Three things invalidate them:

Headcount changed. Two technicians left last quarter — past resolution times overstate current capacity.
Workflow changed. You added an approval step, or rolled out self-service that captured the easy 30%. The remaining tickets are slower.
Scope changed. A new entity, a new department, or a new product line shows up in the queue. The old curve doesn't include them.

If any of these apply, halve your dataset to the last three months and recompute. Stale data is worse than no data because it carries the same false confidence.

Calibrate every quarter

An SLA isn't a one-time configuration. After three months of running new targets, rerun the same analysis. If you're hitting 99% — the targets are too loose, and breaches won't catch anything. If you're under 80% — they're unrealistic, and the team is learning to disregard the indicator. The goal is somewhere around 92–95%: tight enough to flag genuine problems, loose enough that the alerts mean something when they fire.

Need help with this topic?

Get in touch