- Azure VM Restore Scenarios
- Download the MARS Agent
- Managing the Replication Policy
- Microsoft SC-400 Certification
- Obtaining the relevant license
- Technical requirements- Microsoft SC
Saving sensitive data in .csv or .tsv file format- Microsoft SC-400 Certification
The following steps will explain how you can save sensitive data in .csv or .tsv, which are the two supported file formats:
- First, you will need to identify the sensitive information that you wish to utilize and transfer that data to an app, such as Microsoft Excel. The file then needs to be kept in either comma-separated values (.csv), tab-separated values (.tsv), or pip-separated (|) format. Microsoft’s best practice recommendation is to save the file in .tsv format in case the information values include commas. The file can include the following:
• Up to 32 columns per data source
• Up to 100 million rows of sensitive data
• Up to 5 columns marked as searchable - You must configure the sensitive data in your file so that the first row contains the names of the field that will be used for EDM-based classification; for example, firstname and lastname. Ensure that the column header names do not consist of spaces or underscores.
- Fields that might contain commas will be parsed as two individual fields; for example, London,UK for a street address. You can use a .tsv file to avoid this or utilize double quotes around the comma values. If, for example, the value with the comma also has a space in it, you will be required to build a custom sensitive information type that meets the resultant format.
Now that we understand how to save data and sensitive information in the supported file formats, this will enable us to define the schema for the database, which we will cover in the next section.
Defining the schema for your database of sensitive information
You can build a schema and EDM-sensitive information type pattern with both PowerShell and the Exact Data Match Schema and Sensitive Information Type wizard. Note that the wizard is only available for the Worldwide and GCC clouds. You can find additional information on the wizard at the following Microsoft Docs link: https://docs.microsoft.com/en-us/microsoft-365/compliance/sit-edm-wizard?view=o365-worldwide. Let’s look at the following steps:
- First, you need to utilize an XML format file to identify the schema for the database of sensitive information. Name the file edmtest.xml and ensure it has been configured so that for each column, there is a line that uses the following composition:
\<field name=”” searchable=””/\>
Field name values should be used for column names.
If you would like the fields to be searchable, use searchable=”true”. Please note that at least one field must be searchable.
2. For a full example of an EDM XML file that you can use for your lab, please refer to the following Microsoft Docs link: https://docs.microsoft.com/en-us/microsoft-365/compliance/create-custom-sensitive-information-types-with-exact-data-match-based-classification?view=o365-worldwide&viewFallbackFrom=o365-worldwide%3Fazure-portal%3Dtrue.
3. You will need to connect to the Security & Compliance Center using PowerShell. Follow the steps at the following Microsoft Docs link to complete these tasks: https://docs.microsoft.com/en-us/powershell/exchange/connect-to-scc-powershell?view=exchange-ps.
4. To upload the database schema, you will be required to complete two cmdlets in the following order:
$edmSchemaXml=Get-Content .\\edm.xml -Encoding Byte -ReadCount 0
New-DlpEdmSchema -FileData $edmSchemaXml -Confirm:$true
Once these cmdlets have been entered, you will be requested to confirm you want to proceed.
Setting up a rule package
We will now explain the steps you need to follow when setting up a rule package:
- First, you will need to create a rule package in XML format. When creating the rule package, ensure you reference the .csv or .tsv file, as well as the edm.xml file. In this example, the resulting fields will need to be adapted to create the EDM sensitive type, as follows:
• Datastore: This field is specific to the EDM lookup datastore. You must provide the data source name of the configured EDM schema.
• idMatch: This field points to the primary element for EDM:
RulePack id and ExactMatch id: Use the New-GUID command to generate a GUID. You can find more information on how to do this at the following Microsoft Docs link: https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/new-guid?view=powershell-7.1&viewFallbackFrom=powershell-6.
• Match: In this field, you will point to additional proof that can be found in the proximity of idMatch.
• Resource: This segment identifies the name and description of a sensitive type in multiple locales.
TIP
You can find some example. xml code at the following Microsoft Docs site that will help you with this exercise: https://docs.microsoft.com/en-us/microsoft-365/compliance/create-custom-sensitive-information-types-with-exact-data-match-based-classification?view=o365-worldwide#save-sensitive-data-in-csv-or-tsv-format. - Once the rule package has been created, you will need to use PowerShell to upload it. Use the following cmdlets to do so:
$rulepack=Get-Content .\rulepack.xml -Encoding Byte -ReadCount 0
New-DlpSensitiveInformationTypeRulePackage -FileData $rulepack
Once the rule package has been imported with the EDM sensitive info type and the sensitive data table, you will be able to test this by utilizing the Test task in the EDM wizard from within the compliance center.
Once you have created the EDM sensitive information type, you can modify and remove the schema later if you so wish.
Modifying the schema for EDM-based classification
The following steps explain how to test a sensitive information type and how to modify a custom sensitive information type in the compliance center:
- Edit the edm.xml file (this is the file we discussed in the Defining the schema for your database of sensitive information section of this chapter).
- Next, you will need to connect to the Security & Compliance Centre by utilizing PowerShell. You can find instructions for this by going to the link that was shared in the Defining the schema for your database of sensitive information section, earlier in this chapter.
- You can update the database schema by running the following cmdlets, one at a time:
$edmSchemaXml=Get-Content .\\edm.xml -Encoding Byte -ReadCount 0
Set-DlpEdmSchema -FileData $edmSchemaXml -Confirm:$true
You will be asked to confirm your action within the PowerShell window, where you can hit the Enter key to accept the changes.
Removing the schema for EDM-based classification
If you must remove the schema you are using for the EDM-based classification, you do the following:
- First, you will need to connect to the Security & Compliance Center by utilizing PowerShell. You can find instructions on how to do this in the Defining the schema for your database of sensitive information section, earlier in this chapter.
- Now, run the following cmdlet, where you can replace the datastore’s name (patientrecords) with the name of whatever it is you want to remove:
Remove-DlpEdmSchema -Identity patientrecords
As with the previous examples in PowerShell, you will be prompted to confirm your action from within this window.
You should now have the required knowledge and understanding of what a custom sensitive information type is due to EDM, the three parts that define it, as well as how to create, modify, and delete it from the Microsoft 365 compliance center and PowerShell. This completes the sensitive information type section of the chapter. Next, we will take a deeper dive into document fingerprinting.