Skip to content

Profiling

These functions are used to profile SQL queries, and objects are used to store the profile in JSON format.

The profiler uses SQLGlot to parse GoogleSQL code.

Objects

cte_profile (JSON)

The CTE (Common Table Expression) profile (cte_profile) is a JSON representation of the structure of a single cte-structured SQL query. It contains the following fields:

Field Name Field Type Field Description
index STRING CTE order of sequence
name STRING CTE alias
sql STRING CTE SQL definition
dependencies ARRAY<STRING> CTE dependencies by alias or table_id
example: cte_profile

The cte_profile for the following SQL query:

WITH
get_data_a AS (
  SELECT * 
  FROM `project_id.dataset_name.table_a`
),

get_data_b AS (
  SELECT * 
  FROM `project_id.dataset_name.table_b`
),

union_all_data AS (
  SELECT * FROM get_data_a UNION ALL 
  SELECT * FROM get_data_b
)


SELECT * 
FROM union_all_data

Will be the following JSON string:

[
{"index": 0, "name": "get_data_a", "sql": "SELECT * FROM `project_id.dataset_name.table_a`", "dependencies": ["project_id.dataset_name.table_a"]},
{"index": 1, "name": "get_data_b", "sql": "SELECT * FROM `project_id.dataset_name.table_b`", "dependencies": ["project_id.dataset_name.table_b"]},
{"index": 2, "name": "union_all_data", "sql": "SELECT * FROM get_data_a UNION ALL SELECT * FROM get_data_b", "dependencies": ["get_data_a", "get_data_b"]}
]

This JSON string response can be converted into a cte_profile JSON object using the PARSE_JSON function on the response. Note that any SQL outside of the CTEs will not be included in the cte_profile.

Functions

profile_query_ctes

Attribute Value
Name profile_query_ctes
ID bqtools.[region].profile_query_ctes
Description Returns the JSON profile of a CTE-based query
Function Type REMOTE
Arguments sql_query STRING
Returns cte_profile_json STRING
Dependencies bqtools.cloudfunctions.net/profile-query-ctes

execution: profile_query_ctes

SET cte_profile = (SELECT PARSE_JSON(bqtools.eu.profile_query_ctes(sql)));
SET cte_profile = (SELECT PARSE_JSON(bqtools.us.profile_query_ctes(sql)));

decode_cte_profile_external_dependencies

Attribute Value
Name decode_cte_profile_external_dependencies
ID bqtools.[region].decode_cte_profile_external_dependencies
Description Extracts the external dependency resource_ids from a cte_profile object relating to a single resource_id.
Type FUNCTION (SQL)
Arguments resource_id STRING, cte_profile JSON
Returns ARRAY<dependencies STRUCT<resource_id STRING, resource_dependency STRING>
Dependencies bqtools.[region].decode_cte_profile_external_dependencies

execution: decode_cte_profile_external_dependencies

SELECT  bqtools.eu.decode_cte_profile_external_dependencies(resource_id, cte_profile);
SELECT  bqtools.us.decode_cte_profile_external_dependencies(resource_id, cte_profile);
example: decode_cte_profile_external_dependencies

In this example, the cte_profile objects are stored in the profile_value JSON column in the BQMANAGER.PROFILE table. This query will result in a unnested set of dependency pairs which can then be used for lineage profiling and mapping.

SELECT 
dependency
FROM `[project_id].BQMANAGER.PROFILE`
LEFT JOIN 
UNNEST(bqtools.eu.decode_cte_profile_external_dependencies(resource_id, profile_value).dependencies) AS dependency
SELECT 
dependency
FROM `[project_id].BQMANAGER.PROFILE`
LEFT JOIN 
UNNEST(bqtools.us.decode_cte_profile_external_dependencies(resource_id, profile_value).dependencies) AS dependency