Jira Service Management & Incident Management | ACP-420

When people think of Jira Service Management they tend to focus on supporting customers and intaking requests for IT things. That, however, is not the only use case for Jira Service Management. Another major use for it is to track incidents. 

What is an incident

Incidents are the source of what causes a disruption, they’re not the symptom. For example, if a server goes offline, reports of not being able to access aren’t an incident (They’re a problem). The incident would be the source of the issue - something was unplugged, or a bug in the code cropped up.

While organizations can classify incidents differently (typically by how many customers or users they impact) for the purposes of the ACP-420 we’ll consider “Incidents” and “Major Incidents”. Major incidents tend to impact broad groups of people, or bring entire systems offiline while “incidents” impact smaller groups, or disrupt, but not stop, systems.

Why track them?

While we tend to think of incidents as something that only impact software teams, incidents are a fact of life, and every type of business, team and group will experience them. Finance teams may experience an incident when an account has more or less money than expected. Marketing teams may experience an incident when a campaign fails to publish properly. HR teams may experience an incident when a wave of resignations occur.

Each incident represents a failure point - and an opportunity to learn and become stronger. This makes them both something that must be responded to quickly (to mitigate damange) and a very valuable learning opportunity that can help the team grow.

Tracking them in Jira Service Management offers a number of advantages beyond just keeping track of them.

  1. It’s in Jira - This may be obvious, but keeping your incidents in Jira Service Management means they’re in the same system as your other work items. This makes it easy for a team to inveistgate root causes, other reports of issues and related topics as all your information is in a single spot.

  2. Source of Truth - tracking incidents in Jira Service Management ensures everything related to that incident is in one spot. This means your responders (the people who determine what happened and fix it) just need to open a single work item to get everything they need.

Who’s involved?

There’s typically a few folks involved in incidents:

  1. Incident Commander - this is the individual who is solely responsible for managing the incident. Typically they’ll be someone who is familiar with the technology or process, and is able to work with others to resolve issues.

  2. Responders - Responders are individuals who will be pulled in to help resolve incidents. Depending on the incident this group can differ as different skills or backgrounds are necessary.

  3. Customers - Customers are also, indirectly, involved with incidents. Typically they will be the ones reporting them, but they will also be impacted by them. This means the team needs to keep them updated (even if only at a high level) about what is going on.

Communication

Communication is incredibly important during an incident. Responders will need information on what’s going on so they can solve things. Customers need to be updated on when things will be back to normal. Stakeholders will need reassurance that things are moving forward.

Jira Service Management offers a number of tools to help with this, but when an incident occurs we should also consider other ways of sharing information.

Options within Jira Service Management - note that these require careful thought and configuration to ensure groups are getting the information they need, when they need it.

  1. Notifications - incident tickets can be setup to notify specific groups when updates are posted or statuses change. This automates some communication as specific groups will get those updates with no extra effort.

  2. Automations - Automations can be used to send additional information - via email, instant message or more.

  3. Announcements - Admins (and sometimes Agents) can setup announcement banners on Help Desks and Portals. These help inform customers about known incidents and set expectations on when things will be resolved.

Options outside of Jira Service Management

  1. Email - Email is still a very important communication channel during incidents. I use this for longer updates and to direct folks to other resources.

  2. Instant messaging - Slack, teams and other platforms offer ways to communicate quickly and widely. I like this for quick updates.

Operations

Up until recently a tool called OpsGenie would help teams respond to incidents. Atlassian recently rolled this tool into Jira and is calling it Operations. This is an optional feature that needs to be enabled for a Jira Service Management Project, but allows teams to manage their response to incidents by doing things like:

  1. Setting up on-call rotations - Typically team members will be “on-call” for a period of time. This means if an incident does happen, they get alerted via email, text, slack, carrier pigeon and the like to respond. Exactly how these rotations work varies by team, but they are critical to a succession repsonse.

  2. Escalation paths - If the person on-call doesn’t respond to the alert within a time period (e.g. 5 minutes) someone else can be designated to respond. Typically this is the next person on call or the entire team, but can be (almost) anyone in Jira. This helps ensure that someone jumps in and begins work.

  3. Alerts - Alerts are messages (either from a person or an automated system) that something might be going on. Alerts are used to flag potential incidents, but need to be acknolwedged and managed to ensure they’re actually acted on.

  4. Stakeholders & Responders - Incident tickets can have Responders (people who are expect to act on the incident) and stakeholders (individuals who need to be informed of what’s going on). These are optional, but make it substantially easier to communicate what is going on during the incident - and to identify who will fix it.

Post Incident Reviews

After an incident is resolved teams need to take time to review what happened, understand the fault and ensure it doesn’t happen again. This isn’t about assigning blame, but rather focused time to improve the team, process and system.

Typically this is handled in Confluence, however, there is now a “post incident review” feature that can be flipped on for incidents. This helps the team focus their information on the incident, making it easier to track (since it’s directly on the ticket) and easier for the team to gather (since they don’t have to go anywhere else).

This feature adds a button to Incidents called “add post incident review”. This creates a related Task to help track the review. While this can be done on the incident ticket itself, I appreciate that it is split off. This allows the team to focus on solving the problem (e.g. the incident) or on review it (the PIR). Splitting off the information also lets you link additional resources that may only really relate to one or the other. Personally I also appreciate how it consolidates information in Jira - meaning my team doesn’t have to go open other tools to figure out what happened.

Conclusion

There is a lot in Jira Service Management that helps support incident management. From setting up on call rotations to informing stakeholders to tracking related work items. Atlassian Learning also dedicated 90 minutes to this (most other sections got 1 hour, max) so this is definitely an area to dig into.

This session was split across two weeks - check out the recordings below!

Next
Next

Marketing Project Execution | Confluence for Marketing Teams