Skip to content

Architecture

Overview

The standard BigQuery Manager (BQMANAGER) deployment is build entirely from native BigQuery resources, functions and components, with no external dependencies or APIs.

flowchart BT

subgraph NATIVE_BIGQUERY_FOUNDATIONS
INFORMATION_SCHEMA.*
INFORMATION_SCHEMA.JOBS
CLOUD_BILLING_EXPORT
BIGQUERY_PRICING_EXPORT
PUBSUB_BIGQUERY_SUBSCRIPTIONS
SCHEDULED_REFRESH_QUERIES
end

subgraph BQMANAGER_FUNCTIONS
REFRESH_INFORMATION_SCHEMAS
REFRESH_INFORMATION_SCHEMA_JOBS
end

subgraph BQMANAGER

subgraph CONFIGURATION
PROJECT_IDS
end

subgraph FOUNDATIONS
CROSS_PROJECT_INFORMATION_SCHEMAS
HISTORICAL_INFORMATION_SCHEMA_JOBS
ACTIVITY_LOGGING_TABLES
end

subgraph BQMANAGER_ANALYSIS
RESOURCE_ANALYSIS
USAGE_ANALYSIS
COST_ANALYSIS
end

end


NATIVE_BIGQUERY_FOUNDATIONS --> BQMANAGER_FUNCTIONS
BQMANAGER_FUNCTIONS --> BQMANAGER

Native BigQuery Foundations

BQMANAGER is built on the following native BigQuery foundations, extending them to enable an accessible, extensible, integrated cross-project view of resource, usage and cost metadata.

BigQuery Foundation Usage Description
INFORMATION_SCHEMA.* Resource metadata views
INFORMATION_SCHEMA.JOBS BigQuery activity logging
CLOUD_BILLING_EXPORT Google Cloud Billing analysis
PUBSUB_BIGQUERY_SUBSCRIPTIONS Scheduled query monitoring
BIGQUERY_PRICING_EXPORT Activity-based cost analysis & scenario modelling

BQMANAGER Components

Configuration

BQMANAGER Resource Type Description
BQMANAGER.PROJECT_IDS SQL UDF An ARRAY<STRING> of in-scope project_ids passed as arguments to the BQMANAGER.REFRESH_* functions.

Functions

These procedures are required to refresh the BQMANAGER foundation tables, which use a variety of implementation strategies depending on the source structure and limitations. However, conceptually they are either FULL REFRESH tables for the majority or an INCREMENTAL table for the INFORMATION_SCHEMA.JOBS table.

BQMMANAGER Function Description
BQMANAGER.REFRESH_INFORMATION_SCHEMAS Recreates the full INFORMATION_SCHEMA_* tables for in-scope projects
BQMANAGER.REFRESH_INFORMATION_SCHEMA_JOBS Overwrites the previous n date partitions of the INFORMATION_SCHEMA.JOBS logs across all in-scope projects.

These are deployed into the local BQMANAGER dataset and call a set of procedures in the bqmanager.[region] datasets depending on the deployed region. These granular refresh functions are only permitted to licensed BQMANAGER users.

Note that by default the INFORMATION_SCHEMA.JOBS logs in BigQuery are automatically deleted after 180 days, making analysis beyond this point impossible, and the expiration period cannot be changed. BigQuery Manager enables permanent preservation of these logs for historic, granular analysis and year-on-year comparisons.

Foundation Tables

In order to enable cross-project analysis, these BigQuery foundations are used to build a set of foundational BigQuery Manager source tables in your local BQMANAGER dataset. Schemas for these foundational tables are precisely aligned to the BigQuery schemas for each data source.

BQMMANAGER Foundation Table Type Description
INFORMATION_SCHEMA_COLUMNS FULL REFRESH Cross-project INFORMATION_SCHEMA.COLUMNS
INFORMATION_SCHEMA_JOBS INCREMENTAL Cross-project, historic INFORMATION_SCHEMA.JOBS
INFORMATION_SCHEMA_PARTITIONS FULL REFRESH Cross-project INFORMATION_SCHEMA.PARTITIONS
INFORMATION_SCHEMA_TABLES FULL REFRESH Cross-project INFORMATION_SCHEMA.TABLES
INFORMATION_SCHEMA_TABLES_METADATA FULL REFRESH Cross-project [dataset].__TABLES__
INFORMATION_SCHEMA_TABLE_OPTIONS FULL REFRESH Cross-project INFORMATION_SCHEMA.TABLE_OPTIONS
INFORMATION_SCHEMA_TABLE_STORAGE FULL REFRESH Cross-project INFORMATION_SCHEMA.TABLE_STORAGE
INFORMATION_SCHEMA_VIEWS FULL REFRESH Cross-project INFORMATION_SCHEMA.VIEWS
SCHEDULED_QUERY_LOGS LOGGING Cross-project raw Scheduled Query logs

These foundation tables are then used to build a specific and growing set of analytic tables, and can also be used as the foundation for custom analysis depending on your specific objectives.

Analytic Resources

This set of core analytic resources is growing and will continue to grow over time as additional generalised use-cases are identified via client implementations.

BQMMANAGER Analytic Resource Type Parameters Description
BQMAANAGER.TABLES VIEW None Integrated TABLES foundation view for all TABLE resources (BASE TABLES, EXTERNAL TABLES, PARTITIONED TABLES, SHARDED TABLES)
BQMANAGER.GET_SCHEDULED_QUERY_LOGS TABLE FUNCTION None Scheduled query logs with decoded payload and metadata
BQMANAGER.GET_TABLE_USAGE TABLE FUNCTION start_date DATE, end_date DATE Date-bounded referenced and destination logs aggregated by table_id
BQMANAGER.GET_TABLE_STATUS TABLE FUNCTION start_date DATE, end_date DATE Aggregated table metadata, date-bounded usage and usage status

Analysis Templates

A set of Looker Studio templates are also available based on the core analysis tables. These are replicated to client-specific data sources using our BigQuery implementation of the Looker Studio Linking API.

Deployment

We can deploy the standard BQMANAGER capabilities to any us or eu located dataset, which can then be used on any number of projects in that region. We have a standard consulting enagement through which we deploy all of the resources, and then an ongoing licence to enable refresh of the BQMANAGER foundation tables. Note that there are often complex permissions challenges to navigate and resolve as part of this process. We can also deploy BQMANAGER to different geographic locations upon request.

Please contact bqmanager@transformationflow.io for further information. We also offer bespoke consulting services based on this library and specific desired analytical outcomes.