Tabletop exercises: Role-playing your way to better data protection

Joel Hans | Last updated on August 30, 2024 | 13 minute read

“If the Death Star had a disaster recovery plan and test, then they probably would have realized they had an exposed thermal exhaust port. You know, it blew up.”

There’s no one we trust more with disaster recovery planning (DRP)—even across galaxies far, far away—than Megan Dean, Rewind’s Director of Security. She’s been central to every security and compliance effort we’ve undertaken in the last few years, including gaining and maintaining compliance with strict data protection frameworks like SOC 2 Type 2

Her work involves far more than implementing technology and enforcing policy—she’s here to encourage a culture that respects what we don’t yet know and pushes us to figure out how to remedy those gaps. That’s how she propels Rewind into becoming a more compliant and secure organization, which trickles down into our products.

The tabletop exercise (TTX) is an essential tool in her portfolio. This hefty—and even fun!—dose of fantasy, almost like a game of Dungeons & Dragons but held around a conference table in business casual get-ups, pushes our culture toward making essential course corrections with gusto, without blame, and always acknowledging the work is never finished.

You might benefit from them, too. Unlike the Empire, you can’t afford to build another product after an unsecured one blows up—not in time, patience, or even Galactic credits. You must protect the environment you have right now.

What is the tabletop exercise?

TL;DR: The TTX is a role-playing activity where players respond to fictional scenarios of outages or data loss incidents in your infrastructure. The goal is to identify gaps and improve your data protection or disaster recovery (DR) plans.

Some operate TTXs in as little as 15 minutes, but Megan recommends scheduling an hour—that’s the sweet spot for laying the groundwork and encouraging folks to ask questions, collaborate, and support one another on their path to a proper resolution. That conversation often happens in person, but you can also settle for a Zoom call if much of your team works remotely and reflects how they collaborate daily.

The TTX allows your peers to implement any existing disaster recovery and data protection plans. Megan says, “It’s an opportunity to get everybody in a room to talk about the what ifs, and you can help build your processes off those discussions.”

As the facilitator, you’re looking for places where those plans break down. You might find cracks in clear communication or a lack of clarity about who can make big decisions, like pulling the plug on your entire cloud infrastructure. Your peers might even work through a technical resolution only to realize they never thought to inform your customers.

Each mistake, unasked question, or lapse in collaboration isn’t recorded to later dole out punishments. Much like writing a postmortem for a data loss incident, TTXs are a blameless activity designed to find fault in your plans, not people.

When should you start rolling out tabletop exercises?

If you ask Megan, there’s no wrong time to start rolling out these exercises within your organization.

Some of you might be in startups or SMBs needing more formal procedures around disaster recovery or data protection. Compliance standards like SOC 2 might be challenges you know you’ll only face a few years and promotions away from now. You can still get much value from the learnings and collaborative improvements TTXs are well-known for making.

Megan says every TTX is a “really good opportunity to talk about these very high-stress situations in a much less chaotic environment than when something has already happened.” When you manage them well—which we will dig into soon enough—they help you “lessen the pain by going through a disaster recovery situation and having those difficult discussions now.”

Of course, TTXs have more direct benefits if your organization is rapidly maturing and needs the compliance markers of SOC 2 or ISO 27001 to illustrate how comprehensively you care for their data and experience to customers. To meet these high standards, you must prove your uptime and demonstrate how you’re testing the resilience of your product—especially if your website copy and sales pitches are brimming with claims of “five nines” (or higher) of availability.

To paraphrase the SOC 2 requirements directly, you must develop and annually test your disaster recovery plans, including your processes for restoring data, and then review the results for efficacy and completeness. You must then fill in all gaps in your contingency plans. In other words, TTXs are a direct mechanism for achieving and maintaining compliance with your data protection policies.

Your path to introducing the tabletop exercise

Whether your organization is just starting out with disaster recovery planning or already has a rich ecosystem of plans you need to validate, your first steps don’t change. The same goes for prepping for your first or twentieth TTX.

Find a well-scoped target

Start by identifying your critical assets and considering the potential risks to each. Megan describes this process as the simpler cousin to comprehensive threat modeling. There is no need to worry (yet) about sophisticated diagrams or exhaustive detail—just focus on where you see the most vulnerability.

Megan argues it’s okay, and maybe even better, to focus on just one product or environment for a given TTX. Trying to role-play the deletion of all customer data will create less meaningful outcomes than a deep dive into the availability of a specific cloud provider you rely on.

The narrower your focus, the more comprehensively your team can dig into that particular process, and the more precise your takeaways will be.

Narrow your objectives

At the core of any TTX, Megan says, “You want to validate that the plan that you have and the procedures within it, including the decision-making and oversight, are correct.”

That might sound straightforward, but you need a strong answer as to why you’ve invited everyone into the same room or via Zoom call. It tempers the fear most folks inevitably feel when pulled into conversation with anyone in security or compliance and improves buy-in from leaders and stakeholders.

If you’re just starting out, your objective might be: 

“We need to make sure that when we wake someone up at 3 a.m. with a critical alert, they know what to do after the adrenaline kicks in and before the sleep wears off.”

Later down the road, with SOC 2 compliance in sight, that objective might become:

“We need to test our data protection plans to maintain the certifications our salespeople rely on to close the deals that keep us in business.”

Megan argues that your objectives and pitch to internal stakeholders might differ—think brand messaging for your TTX. Even if SOC 2 compliance is your goal, she prefers getting buy-in by framing TTXs as an exercise that helps the entire company, not just compliance folks, by flexing the muscles required to control critical situations.

Use a scenario ‘ripped from the headlines’

The scenario you create should be relevant to your target, push you toward one or more clear objectives, and, if you ask Megan, be based on a real-world situation. That makes the TTX feel more realistic—someone suffered through it already, after all—but gives you opportunities to compare and contrast your results against what happened.

The only requirement is that you base your scenario on your organization’s production infrastructure, service providers, and impact to your services or customers. If you don’t use AWS, there is no value in conducting a TTX around responding accordingly—beyond that, you have enormous freedom to be creative.

For example, Megan has run TTXs based on an incident in an AWS data center in Germany. She says, “One of their data centers literally set on fire, and everything went down in that area, and then they had to move those services somewhere else.” She starts the TTX by telling participants that they’ve just heard about the incident on the news, not from AWS itself, and must decide what to do next.

Based on the objectives she’s already plotted out, Megan plans for and asks leading questions that drive participants toward the success or failure of existing plans:

  • How do you figure out whether you’re impacted without hearing the news directly from AWS? 
  • Where and how do you fail services over?
  • How do you recover any lost data, if necessary, and 
  • How do you inform customers about the downstream impact on their experience?

If you get stuck, find inspiration from postmortems or even X/Twitter accounts dedicated to painful scenarios that may very well be based on real-world situations some poor engineers found themselves dealing with.

Research relevant teams and stakeholders

Let’s stick with the “AWS on fire” example. If you’re running the same TTX scenario as Megan, you know you’re working in the territory of the cloud operations team, which informs who you reach out to first and which DR plans you want to validate and improve.

Instead of sending out a bunch of invites to everyone you think is relevant, you can tap into cloud operations team leadership or technical executives to fill out the roster for your TTX. They know better than anyone who should be in these conversations—either their reports are a fount of institutional knowledge not yet written down, or they’re new to the organization and must get caught up on internal processes.

Narrowing your participants helps you achieve specific objectives. You’ll have better luck with large TTXs once you have more facilitation experience and your organization has deeper DRP maturity. Megan says, “As your program starts to mature, you start bleeding out into, ‘How would we communicate this issue to customers?’ And then you would somebody there from the customer support team or someone who works in that area. They might just say, ‘Oh, we’d let customer support know and they’d let the customers know,’ but they might not actually have a process to handle that.”

Get folks excited about tangible benefits

While the objective of a TTX might be critically important to you, you’re not likely to get site reliability engineers or software developers particularly excited about your SOC 2 compliance requirements. And sure, you don’t need your participants to be thrilled about this role-playing exercise to get helpful information from them, but it certainly helps.

Megan argues for a more practical approach. She says, “Nobody likes to get woken up at three in the morning for an alarm. People will love the idea of, ‘Let’s make sure that doesn’t happen.’… People are good at speaking about the process they would want to improve.”

Bring some documentation to the table(top)

If you have existing DR plans or runbooks, circulate them before your TTX. You’d be surprised how often folks are completely ignorant of existing runbooks and best practices. By allowing them time to study in advance, you encourage more insightful collaboration and a clearer understanding of where—and where not—they’re meeting your objectives.

Feel free to loop participants in on your scenario and the leading questions you plan on asking throughout. This briefing eases the inevitable anxiety that comes with participating in a TTX and offers a gentle reminder that a TTX isn’t an exam but an exercise designed to uncover the “unknown unknowns” of recovering from disaster.

Have fun with it!

Tabletop exercises are resoundingly theoretical. They push participants into discomfort zones like role-playing and require them to think critically and suspend their disbelief over a fictional scenario.

To ease them into what is often an uncomfortable situation, Megan advocates going above and beyond to reinforce the believability of her TTX scenarios. By creating fake news reports and graphics that illustrate the severity of the problem, for example, you can demonstrate that you’ve invested in creating an enjoyable, blame-free, and educational environment.

She says, “I find that if I’m asking for people’s time, I want them to be as present as possible.”

How to orchestrate a tabletop exercise

Even the most well-planned TTX can run off the rails on the big (fictional) day. While you’ll invariably get better at facilitating exercises with practice, keep a few goalposts in mind as your peers role-play their way toward a solution.

Don’t let managers take over

In Megan’s TTX experience, the balance can quickly shift toward an over-eager or know-it-all manager wanting to “solve” the problem—even if they have the best intentions.

Equal participation reveals more knowledge gaps and potential points of failure, and managers will ultimately see more value in hearing from their reports. That’s how they understand where their team already excels and where they, as managers, need to encourage improvement.

Encourage equal participation

The most important variable you must modulate during an ongoing TTX is who participates and how often. If anyone gets railroaded into silence, your team misses out on potentially valuable information.

Megan argues you must “create an environment that encourages people to speak up about what they know and then also what they don’t know because everyone’s just here to learn and get the most out of the training.” The TTX is, after all, designed as a safe place for folks to ask questions about what they don’t know.

The TTX is no place for inter-team conflict or placing blame on an individual who “should have already” patched a flaw or designed a workaround, but you can mitigate that instinct by putting your participants against the scenario. It’s your team versus the fire, flood, asteroid, threat actor, and so on. That’s the best way to avoid a “blamestorm”:

If you feel the quality of the conversation slipping away from collaboration and toward blame or silence, try:

  • Remind your participants that you chose this scenario because you knew they wouldn’t have all the answers.
  • Ask a follow-up question that pushes the conversation back toward a previous point, giving that person another chance to speak up.
  • Encourage folks to think out loud—there are no bad ideas.
  • Tap into one of your leading questions, designed to challenge specific teams or individuals into admitting that they don’t have a strong answer, and then immediately transition the group into creating a resolution together.

Don’t put too much pressure on yourself!

You got all these people into a single room and “stole” up to an hour of their time. Your objectives churn and thunder over your head the whole time. You feel a very specific dread related to the fallout of ending your TTX without giving folks concrete takeaways or essential new knowledge about how your organization handles its worst possible days.

Megan advocates for being gentle with yourself and the process. “Just talking about [disaster planning] is huge. You don’t have to put on a song and dance.”

Remember that your objective isn’t to “win” disaster recovery within the span of a TTX, but rather to create incremental improvements that enhance knowledge and patch over pain points. Even a single takeaway moves the needle forward.

What’s next?

At some point, well into your future of role-playing mastery, you can host your version of the TTX of Megan’s dreams: A completely out-of-the-blue situation, where none of the participants know it’s fiction minus management, to see how folks respond in what feels like an actual emergency.

On your path to getting there, whether as a direct result of a TTX or just in your ongoing mission for better data protection, add Rewind to the mix to back up and restore SaaS data in seconds. We’ve recently released integrations for Azure DevOps and Miro, and nearly a dozen new services where mission-critical data lives, like Slack, HubSpot, and Figma, are coming soon.

Rewind won’t solve all your disaster recovery gaps, as there are no one-and-done solutions here, only incremental improvement. Still, it’s like slapping a grate on the Death Star’s exposed thermal exhaust—your clearest path to protecting the environment you have right now.

Because no tabletop exercise—or real-world scenario—ever concludes with, “Well, let’s just build another.”


Profile picture of Joel Hans
Joel Hans
Joel Hans writes copy and marketing content that energizes startups with the technical and strategic storytelling they need to win developer trust. Learn more about how he helps clients like ngrok, CNCF, Rewind, and others at commitcopy.com.