Image of a woman with an IT name tag, talking to a woman with a FIN tag, a man with a Marketing tag, and finally a fellow with an HR tag. They are all standing in front of a whiteboard with "Sensitive Information Types" on it.
Whiteboard before Keyboard: Always involve cross-functional teams in the discovery process for Sensitive Information Types

To roll out Copilot, you have to be ready. It’s not that Copilot is inherently risky, it’s that every project before it has the long tail of “we can finish this later” or “once this is done.” That long tail is where risk hides. And, where Microsoft Purview helps mitigate risks.

Purview Starts at the Whiteboard, Not the Keyboard

AI-Prep: DIY Purview

AI has entered the chat whether you’re ready or not. Copilot can unlock enormous productivity, or expose enormous risk, depending on how well you understand and control your data. Notice I said expose risk, not cause it. The cause is a lack of data control, not the use of Copilot.

This cluster of posts is for the small, overstretched admin teams who have to do this responsibly without waiting for a top-down initiative or a miracle budget. This is the reality many of us face. We’ll do this DIY-style, counting on spare time, four-hour Friday focus sessions, and the occasional sigh of existential admin exhaustion.

Long story short, we can use Purview to control access to sensitive information for users. And since Copilot acts in the security context of the user, we are one step closer to controlling our data.

When we talk about control, we are talking about risk mitigation.

I remind all of my customers that when we put a control in place we don’t do it just because we can, we do it because it mitigates a risk. If we have a common-sense feeling that we should put a control in place, but there is no corresponding risk to mitigate, then we need to look at our risk register because something is likely missing. It’s a lot of back and forth, but reminding our internal customers (and sometimes our bosses) that security measures aren’t random helps. They are calculated responses to risk for the purpose of business enablement or regulatory compliance.

The simplified process goes something like this:

Know your data, Know your risks, Control for those risks

The word simplified in the above does a lot of heavy lifting and is not intended to minimize the amount of Admin hours required to get that job done. This will take some time and effort. If it was easy, AI could just do it 😉

Matching up this basic 3-step processes with a selection of tools from Purview can be represented by this AI-generated image:

Image of purview processes and tools compared. Know your data is compared to sensitive information types, trainable classifiers, content Explorer and Data compliance. Know Your Risks is compared to Activity Explorer, Data Explorer and Insider Risk Management. Control for these risks. Is compared to sensitivity labels, DLP policies, auto labeling, encryption and conditional access. Note that this image is clearly in a style created by AI and while it is supportive, it is not perfect.

 

Assemble a cross-functional team

Set the stage with your internal customers and collaborators. They are likely to be the owners of data as well as those who understand it best.  Together with them, these initial steps are to define what sensitive data is to your organization.

Start the conversation with a series of questions such as these:

  • What is sensitive data to us?
  • What laws and regulations are we subject to?
  • What are the primary data sources for these regulations? Do we have internal expertise?
  • Do we have any upcoming certification aspirations, like ISO, that might benefit from or influence this work?
  • What do we do that might make a good candidate for a Sensitive Information Type (SIT)?
  • Could sensitive data be a project name or list of project names?
  • Are two specific words in the same document sensitive data? (Merger, acquisition, etc.)
  • Could a manually added footer indicate sensitive data? (Draft content prior to release date may be sensitive data.)
  • Do we have case numbers involving minors?
  • Are there special codes for accounts belonging to celebrities?
  • Do we have a data set of sensitive data (think: research & related health records) that needs to be protected en masse?
  • Do we have Power BI datasets that are routinely exported?
  • When we say “HR data,” what do we actually mean? Do we deal with an external company here where we may need to exchange files with a defined group of people?
  • Do we have training data sets? Is there a need to recognize these separately?
  • Are there samples of data we can use machine learning on to develop a SIT?

If you’re already thinking about building Sensitivity Labels at this point, pause.

Please hold off. You don’t know your data yet. Keep going with the whiteboarding process with your various participants. This WILL save you rework later. Your output of this meeting (or meetings) should be a SIT Identification Guide, for example:

Category Example Why Sensitive Owner Team Notes & Concerns
Payroll Data HR export files Contains salary, IDs, HR Encrypt at rest and in transit, policies ref. external email
Customer IDs CRM export Identifies individuals Sales Mask where practical, restrict actions where not
Internal Project Codes “Project Nebula” Confidential development project R&D Keyword-based SIT, list of other project names or codes
Case Numbers Child Welfare System Ties to minors Legal Needs tiered access for few, very limited external activity
Research Data Sets Clinical study results Personally identifiable health data, R&D data Research Dept Bulk protection required, disallow exfiltration

You don’t need to record a long list of concerns, just the top concerns. Don’t get hung-up on a long perfect list. This list will forever be in a state of flux because data types, needs, and laws change. [Now is the time I would normally go off on a tangent about Information Governance, but that’s not the reason we’re here right yet. For admins of smaller orgs, you may well get stuck keeping track of the who-owns-what-sensitive-data list. This list should be a living document updated as needed. If you are the recipient of someone else’s not-always updated list you’ll need to manage up: Request monthly updates until they just happen naturally.]

While you’re implementing Purview, match the size of your Sensitive Information Types list with the size of the activity group (i.e. the admin or two tackling this) to start moving forward in a methodical way. If your list includes 1337 things, you will drown. Keeping this list manageable increases your likelihood of success. Plus, taking the time to record decisions, rationales, responsible parties, and dependencies will save you time in the long run and allow for clearer action plans under change conditions. And at this point, you will have made a process and you won’t have to recreate it from scratch as you mature your Purview implementation.

Once you establish this list, add samples. Personally, I tend to use OneNote for this entire process but doing so inside of teams could be just as effective. When I dive in, I need samples of what my participants are talking about (if it is not obvious to me) and once I feel comfortable I understand all types listed, the search is on for candidate content to validate Sensitive Information Types, or SITs. We’ll match this SIT guide to existing SITs provided by default in Purview and test whether Microsoft’s templates really fit the data.

Spoiler alert: they probably don’t. Next up, custom Sensitive Information Types and how we develop them from our newly-created  SIT Identification Guide and data samples.

 

 

 

Author

  • Jenn Sveigdalen

    Jenn has 30 years in IT, working in both the US and Norway. She brings a healthy amount of practical systems experience, development knowledge and a double dose of the academic with an MIS and MSc in Cyber Security.

    View all posts

Discover more from Agder in the cloud

Subscribe to get the latest posts sent to your email.

By Jenn Sveigdalen

Jenn has 30 years in IT, working in both the US and Norway. She brings a healthy amount of practical systems experience, development knowledge and a double dose of the academic with an MIS and MSc in Cyber Security.

Related Post

Leave a Reply