Designing for Accretion

With hybrid work here to stay, IT service desks are faced with an increasing number of tickets from employees that drain resources and decrease productivity. 95% of end users experience unexpected application or network downtime, causing them to lose valuable productivity, making it imperative to identify and remediate user experience problems before they impact productivity.

Today IT helpdesks have little to no visibility into issues that affect end users' application performance. ADEM was designed to solve this problem by isolating, detecting, diagnosing, and remediating end-user issues. At the same time, we have noticed an explosion of IT tickets

Scenario

The user is on a high-priority Zoom call with executives; they see that the screen suddenly freezes. They have no idea why this is happening and try some troubleshooting methods but eventually reach out to the IT department to fix the issue. IT responds after 1.5 hours, tries pulling up the user page, and identifies what's happening. They can see that the user's WiFi is having some issues in the service delivery path. Upon further analysis, they understand that the user's WiFi router has low signal strength. They eventually capture all the details, channel them back to the user, and ask them to move close to the router. The process took a couple of hours, affecting Zoom performance on the user's device and disrupting their meeting.

There was a need for a system that proactively notifies users about application issues that require attention and empowers users to remediate these issues on their own quickly.


My Role

I worked with 3 Product Managers, 6 Engineers, 3 User Researchers, and 1 Content Writer. The application was launched internally in May 2022 and officially in June 2022.


Picking up existing pieces

Without preexisting insights on IT services, I partnered with the in-house research team to learn about the lifecycle of an IT ticket and understand the role of network admins better.

Thousands of hours are wasted by IT Teams trying to resolve simple issues that users can handle themselves


Sifting through the most common issues

We realized that device, WiFi, and Internet-related issues were the most common.

 
 
 

Discovery phase

Simplicity revealed an opportunity to pick up the perfect experience to solve standard device and network-related issues



Working backward from ‘The Perfect Experience’

How do we define success?

  • The time needed to understand what's going on

  • The number of steps required to resolve the issue

  • The time required to resolve the issue



How might we empower users to solve their own issues and not rely on IT for simple issues?



Design principles

Lightweight - A system that constantly monitors the performance of the system yet does not overwhelm the system

Faster - The loop of raising an IT ticket to the resolution takes too long, how can we minimize the resolution time?

Actionable - Are we guiding the user into taking appropriate actions to resolve the issue?

Effortless - How many steps are we asking the user to take and how much effort do they need?



How we got there?

A sister app to Global Protect

Initially, the plan was to integrate this solution with Global Protect, but we realized that the product offerings fall into very different spectrums and that Global Protect might be retiring sometime soon.



Building Context with Users

Notifying users of irrelevant notifications may annoy and prompt them to turn off notifications altogether. Therefore, sending context-aware notifications was key to the success of this application. Sensing when performance issues emerge like an app that's frozen, an unresponsive tab, or slower downloads, and sending notifications at that moment would help the user take action and 'learn' from experience leading them to take preventative measures in the future.

By my intuition, backed by user research, we believed that omitting the 'Options' button in the notification for the first version of the app was essential since we were training the users to at least click on the notification to open the UI.

 
 
 
 
 

Another design decision was to have a banner or alert-style notification for macOS. While banners are ephemeral, alerts stay until the user has taken action.

 
 
 

Giving the admin user the ability to control thresholds of the system and network performance will prevent too many notifications from being sent to the user. Notifications for this app are turned on by default - a decision made by the admin users.

We A/B tested the textual content in the notifications to ensure they met the following criteria:

Helpful - Every notification must inform the user of valuable information and prompt vital actions to take.

Respectful - User attention is valuable, it's essential to provide value, so we need to be extra considerate while sending notifications. While we want the users to take action based on these notifications, it's essential to be respectful and pause them if the user isn't responding to them. This feedback is captured in telemetry and conveyed to the admin user so they can tweak the frequency and thresholds of notifications for each user. While this was a hard design decision, it would pay off in the long run.

Easy to understand - The content in the notifications has to be free of technical jargon and guide the user to learn more about the issue.

Actionable - Notifications being the primary launch point for this app, we had to ensure that each one conveyed an urgency and prompted the user to take action.

 
 
 
 
 


Help users gain more insight into the problem

Status of the issue

The sidebar gives the user an immediate summary of device/network performance.

Type of the issue

The subtext provides an accurate indication of the health of the device/network. By having the tab pertaining to the notification selected by default, the context of the problem is retained as the user goes into more details.

 
 
 

Time of the issue

The title can be broken down into three key pieces of information - Status, details of the device/network, and time of the incident.

More details

Giving the user a textual overview of what's happening and what could happen in the immediate future builds more context on the issue.

Helping users view trends of similar issues

 
 
 

Using the principle of progressive disclosure, the user can see a historical device/network performance trend over the past 3 hours. The user can turn on/off specific metrics on the chart to simplify the chart even further.

Additionally, points in time where notification was generated are highlighted on the chart giving the user a way to view usage patterns and build mental models of cause and effect relationships. The user can interact with the time slider to go back to a point in time and view usage statistics and recommendations, if any.

The threshold line provides a reference point and helps compare the performance at each point in time. This also provides them with more focus and reasoning to take action.

Simplifying the chart to show only the most pertinent information provides more focus and saves time spent going through extraneous details.

The time range was considered for three hours because this was a sweet spot between the user with prolonged usage of the device vs. a user who uses their device for short bursts of time.

Lastly, it is necessary to summarize the data, so an interactive tooltip would show details for that point in time that the cursor is on and add more context to what's going on.

 
 
 

Remediation steps

Showing only the critical pieces of information that would immediately help with the issue helps users take those actions without feeling overwhelmed and confused.

 
 

More details on troubleshooting the issue

An additional source of information is provided as a link to a webpage showing more troubleshooting tips for this specific issue.

Source of constant improvement

Collecting feedback directly from the user would allow the app designers to get feedback now and use this as a guide for future product versions.

The key pillars of textual content 

 

How do we design for everyone?

After a few rounds of testing with internal folks, we realized that assuming the user would understand basic technical terms didn't work. We had to simplify the language used in the UI to include all users.


In hindsight, we as a team poorly empathized with users who weren’t reflections of people that work around us.



Recognizing exclusion

Designing for inclusivity opens up the product to more people and reflects how people really are. Hence, the decision to go across platforms (Windows and macOS) was necessary.

 
 

Solve for one, extend to many

Ensuring success with simpler issues would lay a solid foundation for users and the product to have the ability to resolve more complex issues later, thereby 'training' them to be self-reliant.


Testing the first version of the app

To our surprise, not a single participant had trouble understanding the issues at a high level, except for some language and iconography. The notification system, built in conjunction with the OS, helped users to be at ease without having the overhead of learning to use a new app. Participants understood the remediation steps well, confirming our intuition around designing for simplicity and speed.


Navigating the interface was very simple, we had to through only four clicks to fix the issue



After the first round of testing, our data revealed that

  • 50% of users globally don't explicitly understand thresholds. More context would be necessary on either the term, the meaning of it, or both.

  • Half of all issues are at least situational/ temporary. In a few cases, issues disappear without the user having to take action, due to inbuilt OS functionalities.

Future enhancements

Telemetry to understand how users interact with the app

Provide more insights like ’the top memory and CPU-consuming applications’

Expand to resolving other issues like battery and application outages. Give users more actionable steps like closing specific applications.

Enrich end users’ notifications and recommendations and direct users to close specific tabs that they are not using or apps that are (frozen).

Offer paths to expanded versions of the chart along with added functionality and deeper dive into the data while preserving values, context, and state.