We at GlobalNOC are thinking of our community and hoping that you and your loved ones are staying healthy and safe in these unprecedented times.
Like everyone else, here at GlobalNOC we’ve had to adapt to a very different world these last few weeks. Teaching and learning, university and government business and important healthcare needs everywhere have all suddenly become remote. And much of this carried by the networks we support . So, more than ever, a high-performing and reliable network is absolutely critical. Not only do we need to quickly resolve any outages, we also have to upgrade connectivity to the services that are in suddenly high demand. We are absolutely committed to excellent support for our communities through this. Yet, the safety and health of our staff is also vitally important. We decided early on that running GlobalNOC on-site during this time was not sustainable. It was just too risky to have our engineers and developers work in the office or fill our service desk with 10-20 people on a 24×7 basis.
So the challenge for us was this: how do we transition an entire NOC to function remotely, AND continue to deliver the highest support for our partners’ networks while they face increased and unprecedented demands, AND do it for an unknown expanse of time? For our engineering and management staff the move was fairly straightforward. These folks are used to working from home on occasion. On-call and off-hours work is common. Other than a run on our supply closet of peripherals like keyboards and adapters, this was pretty simple. But it’s an entirely different story for our 24×7 Service Desk.
Moving a 24×7 Service Desk to Remote
Through our annual disaster recovery drills we’ve developed the ability to quickly move our Service Desk to any location. This served as an excellent starting point for us. But this was based on a scenario in which we would move to one location for a few hours or few days. The COVID-19 situation demanded that we move to many locations for weeks or months.
How would we orchestrate and manage a large team of service desk technicians and supervisors all working from their individual homes?
Gear up – In normal circumstances our service desk works from shared workstations with standard builds for our needs. Our business continuity plan includes a number of laptops to use off-site. But working remotely for at least several weeks meant finding enough high-performing laptops for every technician. In a few days, we were able to pull together many new machines from existing spare hardware and bought the rest. We also had to add licenses for our software-based phone system to allow calls to be handled from multiple locations at once.
Test & Move – We ran an initial complete 12-hour remote shift as a trial. We also tested normal operations as well as emergency failovers of our phone system. We noted what worked (most things) and what didn’t (some processes). This gave us the confidence to make the full switch, which we made official on March 24.
How have we been handling communications?
One word: Slack. We’ve been a pretty heavy Slack user for a while now, sending about 8000 messages on March 10, the last “normal” business day before we transitioned operations. That evening, we sent word to our staff that they should begin working remotely if possible. On March 11, we sent over 12,000 messages, and have sent between 11,000 and 14,000 every weekday since. The move to be remote accounted for about a 50% jump in Slack usage!
We adapted our communication in other ways, too. For example, in the service desk, the time of shift turnover had been a simple face-to-face sharing of information. Communication between technicians on who was handling what issue was similarly open and face-to-face. This has all become much more formal, with explicit assignment and coordination of incidents via Slack. Yes, it may be surprising that historically GlobalNOC has handled communications within such an informal system but we’ve found that our technicians are committed and know best how to share the vital information of the day without adding rigid structure.
What changes have we seen?
Two of the primary metrics we use to gauge our basic operations are Time to Ticket and Time to Restore. Time to Ticket measures the time between when an incident began and when we started tracking it in our trouble ticket system. Time to Resolve measures the overall time to restore an outage, encompassing our work but also the work of fiber providers, equipment vendors, etc. In both cases, we’ve seen no deterioration in service. In fact, our median Time to Ticket has actually dropped slightly from 14 minutes before we moved remote to 12 minutes after. Our median Time to Restore has dropped from 50 minutes before to 39 minutes after.
Project work has continued at a similar pace as well, though collaboration has become more formalized in Zoom-based meetings versus the previous face-to-face whenever needed sessions. This is no doubt adding some friction and difficulty to the work, but it has been handled well so far.
A few things that we’ve learned
In adapting to this suddenly much different reality, we have also learned a few things.
Our partners are very understanding – Our partners have been extraordinarily understanding and caring through this. We’ve worked very hard to ensure we continued to be responsive and adaptive to the needs of our supported networks. More than ever, the research, education, and public interest communities are depending on these networks. But our partners have been just as concerned about making sure that our staff is safe and holding up ok amid such challenges. Anytime we’ve had to alter what we do (which we’ve tried to keep to a minimum), they’ve been understanding without exception.–and we at GlobalNOC are beyond grateful for that.
Trust and support your people – In transitioning to a fully remote operation, we chose to focus first on supporting our staff’s personal and professional needs, rather than worry about tracking their efforts to ensure they weren’t becoming distracted or using this major change as opportunity to slack off. We started from a place of trust that people want to do a good job and serve our network partners well. We believe this paid off. From both objective and subjective viewpoints, our teams are working just as hard as they were before moving remote, if not harder, and we’re continuing to provide great support to partners. We believe that doing this in a way that also supports our staff as humans is the right thing to do and will help us succeed through this current challenge and in the long-run.
Daily team check-ins are a big help – All of our teams have started daily video check-ins with each other. Some were already doing this before the switch to remote. We’ve found it to be a big help in maintaining team cohesion, staying organized, and, frankly maintaining some sense of normalcy.
Meetings: everyone together > everyone apart…BUT everyone apart > a few people remote – Challenging conditions can yield more effective meetings anywhere, anytime. There is no doubt that for a large group meeting, having everyone in a single space is best. People are kinder, discussion flows, energy is boosted, community is built, and there’s an excuse to eat doughnuts. So, we feared that shifting these meetings to be entirely remote might descend us into chaos and disengagement. But this isn’t what we experienced. Instead, because everyone was playing by the same limitations like video and audio delays, discussion and interaction worked remarkably well. We made heavy use of our video system’s “raise hand” feature whenever someone wanted a turn to talk and asking someone to moderate this. So while we definitely prefer to meet in person, our remote meetings have worked better than we’d feared, mainly because we’re all playing by the same conventions. We are also, for sure, extra patient with each other. We’re hoping that we will be able to continue these new practices to improve some remote engagement even after we return to the office.
Now is the time to prepare for the next crisis – There are also some things we’ve learned we might do differently for the next time a pandemic-related incident might happen. First, some of our networks do require in-person visits for changes and repairs. We will need to add a store of gloves, masks, and cleaning supplies to our on-site preparedness kit. Not having these items on hand, we’ve faced an additional strain of having finding them at the same time everyone else is and then also to minimize on-site visits. But beyond this, we can’t assume that the next major disruption will look just like the current one. So we’re also preparing generally. This means setting clear methods for identifying potential disruptions as they are still emerging, deciding we react as a single organization and communicating well with our people and the communities we serve.
The path from here
We think we’re hitting our stride pretty well now, operating in a more or less sustainable way in our new normal. We’re keeping an eye on productivity and performance, but also on the mental and physical health of our employees. We’re following Indiana University’s lead in being accommodating to employees’ needs beyond work: to support their kids’ e-learning, maintain relationships with family and friends during a time of isolation, deal with the logistics of day-to-day life within stay-at-home orders, and take care of themselves.
Of course, at some point this will change again. As conditions ease, our goal is to take the things we’ve learned so far and apply them to help us going forward. We hope this willhelp us prepare for future crises and maintain the good things we’ve learned even in the “normal” times. We are committed to emerging from the COVID-19 challenge as an even stronger GlobalNOC.
Wishing wellness in all forms to our partners, and with deep appreciation to all in the important communities we serve.