Google Cloud Platform’s Data Loss Protection API provides a service that can make organizations manage sensitive data, including detecting and redaction, masking, and tokenizing such data. This can help organizations comply with regulations such as GDPR, and reduce the risk of data exposure and data breaches. Such as a name, email address, telephone number, identification number, or credit card number.
In the previous blog Cloud Data Loss Prevention (DLP): Part-1. We saw different types of inbuilt infotypes DLP offers. In this blog, we will cover how to use Infotypes with an example.
How to use Cloud Data Loss Prevention (DLP)
If you wanted to look for a phone number in a block of text, you would specify the PHONE_NUMBER infoType detector in the inspection configuration.
The following Output screenshot and code samples demonstrate a simple scan request to the Cloud DLP API. Notice that the PHONE_NUMBER detector is specified in inspectConfig, which instructs Cloud DLP to scan the given string for a phone number.
NOTE: Always make sure that the format of string you want your respective inbuilt infoptye to detect should be correct. Only then infotype will be able to fetch the data for you.
CODE
def extract_metadata(project, item, info_types=["PHONE_NUMBER"], # more info can be searched min_likelihood="LIKELY"): """Inspects and extracts the info types Args: project: The Google Cloud project id to use as a parent resource. item: The string to inspect (will be treated as text). info_types: A list of strings representing info types to look for. A full list of info type categories can be fetched from the API. Returns: None; the response from the API is printed to the terminal. """ # Import the client library import google.cloud.dlp # Instantiate a client dlp = google.cloud.dlp_v2.DlpServiceClient() # Convert the project id into a full resource id. parent = f"projects/{project}" # Construct inspect configuration dictionary inspect_config = {"info_types": [{"name": info_type} for info_type in info_types], "min_likelihood": min_likelihood, "include_quote": True} # Call the API response = dlp.inspect_content( request={ "parent": parent, "inspect_config": inspect_config, "item": {"value": item}, } ) # Print out the results. if response.result.findings: for finding in response.result.findings: try: if finding.quote: print("Quote: {}".format(finding.quote)) except AttributeError: pass print("Info type: {}".format(finding.info_type.name)) print("Likelihood: {}".format(finding.likelihood)) return response else: print("No findings.") # Press the green button in the gutter to run the script. if __name__ == '__main__': project_id = 'XXXX' #Edit this with your project ID content = 'My phone number is 91-9876543210' print("----EXTRACTION OF PHONE NUMBER BY INBUILT INFOTYPE ----") extract_metadata(project_id, content)
OUTPUT
When you send the preceding request to the specified endpoint, Cloud DLP returns the following:
Conclusion
In this series of Cloud Data Loss Prevention (DLP), I tried to explain what different types of inbuilt infotypes are available and how to use them in your code. Hope it’s helpful to you.
HAPPY LEARNING 🙂
References
- https://cloud.google.com/dlp/docs/infotypes-reference
- https://cloud.google.com/dlp/docs/concepts-infotypes