Profiling
These functions are used to profile SQL queries, and objects are used to store the profile in JSON
format.
The profiler uses SQLGlot to parse GoogleSQL code.
Objects
cte_profile (JSON)
The CTE (Common Table Expression) profile (cte_profile
) is a JSON
representation of the structure of a single cte-structured SQL query. It contains the following fields:
Field Name | Field Type | Field Description |
---|---|---|
index |
STRING |
CTE order of sequence |
name |
STRING |
CTE alias |
sql |
STRING |
CTE SQL definition |
dependencies |
ARRAY<STRING> |
CTE dependencies by alias or table_id |
example: cte_profile
The cte_profile
for the following SQL query:
WITH
get_data_a AS (
SELECT *
FROM `project_id.dataset_name.table_a`
),
get_data_b AS (
SELECT *
FROM `project_id.dataset_name.table_b`
),
union_all_data AS (
SELECT * FROM get_data_a UNION ALL
SELECT * FROM get_data_b
)
SELECT *
FROM union_all_data
Will be the following JSON string:
[
{"index": 0, "name": "get_data_a", "sql": "SELECT * FROM `project_id.dataset_name.table_a`", "dependencies": ["project_id.dataset_name.table_a"]},
{"index": 1, "name": "get_data_b", "sql": "SELECT * FROM `project_id.dataset_name.table_b`", "dependencies": ["project_id.dataset_name.table_b"]},
{"index": 2, "name": "union_all_data", "sql": "SELECT * FROM get_data_a UNION ALL SELECT * FROM get_data_b", "dependencies": ["get_data_a", "get_data_b"]}
]
This JSON string response can be converted into a cte_profile
JSON
object using the PARSE_JSON
function on the response. Note that any SQL outside of the CTEs will not be included in the cte_profile
.
Functions
profile_query_ctes
Attribute | Value |
---|---|
Name | profile_query_ctes |
ID | bqtools.[region].profile_query_ctes |
Description | Returns the JSON profile of a CTE-based query |
Function Type | REMOTE |
Arguments | sql_query STRING |
Returns | cte_profile_json STRING |
Dependencies | bqtools.cloudfunctions.net/profile-query-ctes |
execution: profile_query_ctes
SET cte_profile = (SELECT PARSE_JSON(bqtools.eu.profile_query_ctes(sql)));
SET cte_profile = (SELECT PARSE_JSON(bqtools.us.profile_query_ctes(sql)));
decode_cte_profile_external_dependencies
Attribute | Value |
---|---|
Name | decode_cte_profile_external_dependencies |
ID | bqtools.[region].decode_cte_profile_external_dependencies |
Description | Extracts the external dependency resource_ids from a cte_profile object relating to a single resource_id . |
Type | FUNCTION (SQL) |
Arguments | resource_id STRING, cte_profile JSON |
Returns | ARRAY<dependencies STRUCT<resource_id STRING, resource_dependency STRING> |
Dependencies | bqtools.[region].decode_cte_profile_external_dependencies |
execution: decode_cte_profile_external_dependencies
SELECT bqtools.eu.decode_cte_profile_external_dependencies(resource_id, cte_profile);
SELECT bqtools.us.decode_cte_profile_external_dependencies(resource_id, cte_profile);
example: decode_cte_profile_external_dependencies
In this example, the cte_profile
objects are stored in the profile_value JSON
column in the BQMANAGER.PROFILE
table. This query will result in a unnested set of dependency pairs which can then be used for lineage profiling and mapping.
SELECT
dependency
FROM `[project_id].BQMANAGER.PROFILE`
LEFT JOIN
UNNEST(bqtools.eu.decode_cte_profile_external_dependencies(resource_id, profile_value).dependencies) AS dependency
SELECT
dependency
FROM `[project_id].BQMANAGER.PROFILE`
LEFT JOIN
UNNEST(bqtools.us.decode_cte_profile_external_dependencies(resource_id, profile_value).dependencies) AS dependency