Monthly Archives: July 2016

A Ticket Too Far… Part II

Part I:  A Ticket Too Far… Breaking the Broken

OK. So I screwed up. If you read the above post, I make a lot of claims about ticket systems and process. Many of those claims were based on the idea that we did not have an approved and enforced policy in place. Turns out I was wrong, sort of.

I did a little digging and reviewed the policy. Customers are asked to submit a ticket if their issue is not immediately preventing their work, otherwise they can either call the help desk number, an IT staff member directly or visit them in person. There is a lot I could say about this policy and what provisions I agree with and which I do not, but policy is policy and I misrepresented the strength of it – it is very much vetted and approved.

I see the goals of a policy on customer facing support as follows:

  • Prevent sysadmins from forgetting customer requests
  • Allow sysadmins to control their interrupt-based workflow, prioritize and not have their workflow control them
  • Track customer requests and incidents so pain-points can be discovered and resolved proactively
  • Build a database of break/fix-based documentation
  • Create an acknowledgement and feedback mechanism for customer issues (i.e., “you have been assigned ticket #232”), backed up by mechanisms that forces sysadmin action (i.e., “this ticket has not been touched in three days, either close it or reply”). This feedback loop ensures that issues are acknowledge and resolved either one way or another.

The details may be wrong but the bigger point of my last post remains the same; the combination of our policy and ticket system’s technical limitations does not accomplish those goals or lead to ideal outcomes for either customers or IT staff.

But does it really? Perception and reality are not always the same so I started tracking how often I was interrupted by a customer or a team member over a period of about four weeks. It is important to mention this was not a particularly rigorous study, I just kept an Excel spreadsheet and anytime I diverted my attention for more than a few minutes from my current task I made a quick note of it. If anything, I was consistent in my inconsistency. I also kept track of what kind of interrupt it was, what group it came from and whether or not it had ticket attached to it.

 

a graph showing the number of interrupts per day

A couple of interesting discoveries here:

  • I am not interrupted nearly as much as I think I am. If you throw out the obvious outlier of Day 12, the median is two interrupts per day. Not as bad as I would have thought… but it is not that great either considering with meetings and other obligations, I probably only have one period per day of uninterrupted time to focus on complex projects that is longer than two hours. Getting an interrupt during that time period is a pretty serious set back.
  • 48% of the interrupts were related to break/fix issues. The other 52% were what I call “communication interrupts”. More on these later.
  • Of the 31 break/fix interrupts I recorded, only one actually had a ticket already associated with it. This is mind-boggling terrible as far as accepted time management best practices go.
  • Only 59% of the interrupts required immediate action versus action that could be queued and prioritized later. This means 59% of these interruptions really did not need to be interruptions at all, they needed to be tickets or agenda items in a meeting.

Even with my back-of-the-napkin math these are pretty damning conclusions. Our ticket system capture rate, at least at Tier-3 where I spend most of my time, is laughably non-existent. Going back to my previous post about creating tickets for customers, there would be no reason for me to do so considering I would be creating 97% of the tickets myself. Interestingly enough, it is not like these requests just vanish into thin air. They get tracked and recorded somehow, albeit most likely by individual staff in a manner that is not portable or visible. The work required to track issues is still being done, just the organization is not getting a lot of value out of it since it is just recorded in some crusty senior sysadmin’s logbook.

As an aside, It would be really interesting to perform the same experiment at the Tier-1 and Tier-2 support levels and see what the ratio is. Maybe it is higher and it is more the kind of issues and/or customers that Tier-3 deals with that lead to low ticket system capture. Or worse maybe it is the same and ticket system capture is just really bad everywhere.

Almost two-thirds of these interrupts did not have to be interrupts. They did not require immediate action. This is also pretty terrible because interrupts cost a lot. The accepted wisdom is that it takes between 10 and 20 minutes to get back in “the zone” (source) after being interrupted. For myself that is about 40 minutes lost on average per day just in dealing with context switching from one task to another. There is also a greater risk of mistakes being made trying regain focus on your task. It is harder to calculate this cost but it is assuredly there.

Finally, a bit over half of these interrupts were “communication interrupts”. These are hard to pin down but mostly they were a team member or a customer wanting to communicate something to me, “Hey, I fixed item A on Server 1, but item B and item C are still broken” or a request for information, “Hey, how do items Z and Y work on Server 2 and Server 3?”. These clearly have a interrupt cost but on the other hand, they also have a cost in not immediately jumping to the top of my to-do pile. If someone needs information from me it is likely because they are in a wait-cycle – they need to know something to continue their current task. It feels like it boils down to a “whose time is more valuable?” argument. Is it better for a Tier-2 team member to burn 45 minutes instead of 10 on a task because she had to dig up some information instead of asking a Tier-3 sysadmin? Or is better for the Tier-3 sysadmin to burn 20 minutes to save his Tier-2 team member 35? I do not really know if there is answer here but it is an interesting thread to pull on…

The interrupts that served to communicate information to me seem a little more clear cut. None of them required immediate action and most of them were essentially status updates. Implementing a daily standup meeting or something similar would be a prefect format for these kinds of interactions.

 

Well. It was an interesting little experiment. I am not sure if I am smarter or better off for it but curiosity is an itch that sometimes just needs to be scratched.

Until next time, stay frosty.