Data Metric Function in Snowflake
Data Metric Functions (DMF) allow you to automate data quality validation and monitoring. DMFs are defined as following:
Create a Data Metric Function to count null values
CREATE DATA METRIC FUNCTION IF NOT EXISTS count_of_nulls (ARG_T TABLE (ARG_C STRING)) RETURNS NUMBER AS 'SELECT COUNT IF(ARG_C IS NULL) from ARG_T'; -- Attach metric to the EMAIL COL column ALTER TABLE some_table ADD DATA METRIC FUNCTION count_of_nulls ON (EMAIL_COL); -- Tell Snowflake to evaluate the function every 1 minute ALTER TABLE some_table SET DATA_METRIC_SCHEDULE = '1 MINUTE'; -- View the metric to verify EMAIL COL doesn't include nulls SELECT to_number (VALUE) as count, TABLE_NAME, COLUMN_NAMES, FROM db.schema.data_metric_results WHERE COLUMN_NAME = 'EMAIL_COL' ORDER BY SCHEDULED_TIME DESC;
Data Metric Functions allow you to:
- Create custom quality metric rules as a reusable functions
- Apply rules to one or more columns
- Specific how frequently you want the quality metrics to be calculated
- Execute metric functions ad-hoc (for testing) or incorporate into your pipelines
- Metrics are executed by the serverless backend, no need to keep your warehouse running
- View and manage all of your quality metrics in a single place
Creating a Data Metric Function is as simple as creating a UDF. The function takes a new TABLE data type with one or more column arguments and returns a single result value.
CREATE DATA METRIC FUNCTION IF NOT EXISTS count_of_nulls ( ARG_T TABLE (ARG_C NUMBER) ) RETURNS NUMBER AS ‘SELECT COUNT_IF(ARG_C IS NULL) from ARG_T’ ;
All Data Metric Functions are added to your warehouse so other users can reuse them in a consistent manner without having to recreate the same rules over and over.
You can see which metrics are available by running:
SHOW DATA METRIC FUNCTIONS IN MY_DB.MY_SCHEMA;
Now you need to attach the Data Metric Function to the columns you want to evaluate, like this:
ALTER TABLE some_table ADD DATA METRIC FUNCTION count_of_nulls ON (EMAIL_COL);
Next, you set the interval at which metrics should be calculated. This is configured at the table level and can be as frequent as 1 minute.
ALTER TABLE some_table SET DATA_METRIC_SCHEDULE = '1 MINUTE';
You can then query the metrics results by querying the
DATA_METRIC_RESULTS table from schema where tables to be validated reside.
SELECT to_number(VALUE) as count, TABLE_NAME, COLUMN_NAMES, FROM my_db.my_schema.data_metric_results WHERE COLUMN_NAME = 'EMAIL_COL' ORDER BY SCHEDULED_TIME DESC;
You can test DMFs by executing them ad-hoc or include them directly in your data pipelines running on Snowflake for in-line validation. This is how you execute a DMF:
SELECT MY_DB.DATA_METRICS.count_of_nulls ( SELECT email_col FROM CUSTOMERS ) AS VALUE;