Skyline Query to identify problematic queries in Snowflake
Selecting the correct Warehouse size in Snowflake can be tricky. If your queries are spilling over to the Remote Storage then you either need to re-write the query or increase the Warehouse size. But how do you identify these problematic queries that are running on undersized Warehouse? Identifying these queries is a Maximum Vector problem that be solved using Skyline queries. A Skyline query finds points in a vector space that dominate other points. Here is how a Skyline Query can be used to identify Snowflake queries that are problematic.
with query_history as ( select query_text , bytes_spilled_to_remote_storage , decode(warehouse_size, 'X-Small', 1, 'Small', 2, 'Medium', 3, 'Large', 4, 'X-Large', 5) as warehouse_size from snowflake.account_usage.query_history where warehouse_size is not null and bytes_spilled_to_remote_storage > 1000 order by start_time desc limit 10000 ) select query_text , bytes_spilled_to_remote_storage , warehouse_size from query_history as o where not exists ( -- Skyline query to identify worst performing queries select 1 from query_history as i where i.warehouse_size <= o.warehouse_size and i.bytes_spilled_to_remote_storage >= o.bytes_spilled_to_remote_storage and (i.warehouse_size < o.warehouse_size or i.bytes_spilled_to_remote_storage > o.bytes_spilled_to_remote_storage) );
|select * from …||4||1|
|select * from …||5||4|