What is Data Catalog?
Data Catalog is a fully managed, scalable metadata management service in Google Cloud’s Data Analytics .
Data Catalog Search scope
In Data Catalog search scope depends on users i.e. Search results may be different for users with different permissions.
For example, if a user has BigQuery metadata read access to an object. Than object will appear in their Data Catalog search results. To search for a table you need bigquery.tables.get permission for that table. To search for a dataset, you need bigquery.tables.get permission for that dataset.
Data Catalog aggregates date-sharded tables into a single logical entry. This entry has the same schema as the table shard with the most recent date, and contains aggregate information about the total number of shards. The entry derives its access level from the dataset it belongs to.
How to Search for data assets
In this method just provide projectId and query.
For our case query will be =>
String query = “tag:RandomTag”;
How to Search for data assets by tags in Java
- In this method first, we set the scope by providing the projectId and projectId can be one or many as per the requirement for the search.
- Next, we will initialize the DataCatalogClient that we will use to send requests to the DataCatalog. This client only needs to be created once, and can be reused for multiple requests.
- After that we will create the SearchCatalogRequest object by providing the query and the scope of its builder method.
- Now we will use SearchCatalogRequest object to search in Data Catalog by using DataCatalogClient method searchCatalog and providing the argument SearchCatalogRequest and then store the result in SearchCatalogPagedResponse class variable.
- Now we will iterate the response and fetch the column names of the tables which have been tagged by RandomTag.
- For using the search results first we have to move inside the table for that we will use the LookupEntryRequest.
- To build LookupEntryRequest we have to provide it the address of the table of which we want to go inside that and we get the address from the method getLinkedResource().
- Now just we have to make the Entry object by lookupEntry method.
- Once the Entry object is made we have full access to the columns of the DataCatalog so now only we have to find the columns which have tags associated with it.
- For this DataCatalogClient has an inbuilt method listTags in which we will provide entry.getName() argument which means column names of the table.
- We will get the object of the ListTagsPagedResponse in which we have the column names of the tag columns of the table.
- Finally, we will iterate the list and print the columns.
- Here the tag column names are in the line
- String location =tag.getColumn() .