Welcome to the R.A.G, a guide about Amazon's Redshift Database written for the Analyst's out there in the world who use it.
Previously on the R.A.G....
Article No Longer Available
Is WLM killing your query?
Do tables seem to not play ball?
There is a LOT to consider when trying to solve the above.
Here's the process for you to work through:
✅ Check the Explain Plan.
Do this for a single query at a time, not your whole script. But start by getting Redshift to tell you how it's going to execute your query. Make sure to look for actions with high costs, sequential scans or nested loops. If you can avoid them, or break your query into smaller tasks this will help you a lot.
Article No Longer Available
✅ Understand the Distribution and Sorting of the tables you are dealing with
Whether the tables you are dealing with are built by you or someone else, their configuration could be working against you.
Run the below query and make note of:
- distyle_and_key: How is the table distributed across nodes?
- row_skew_ratio: This is the effectiveness of the dist key, the closer to 1 the better.
- first_sortkey: How is it sorted on the node?
- no_sort_keys: How many sort keys?
- sortkey_skew_ratio: This is the effectiveness of the sort key, the closer to 1 the better.
- percent_unsorted: How long has this table been since vacuum
- stats_needed: If yes, the table needs 'analyst statistics' before the leader node knows how to handle it properly.
select i.schema as schema_location,
i.table as table_name,
i.encoded as are_columns_encoded,
i.diststyle as distyle_and_key,
i.sortkey1 as first_sortkey,
i.sortkey1_enc as sortkey_compression,
i.sortkey_num as no_sort_keys,
i.skew_sortkey1 as sortkey_skew_ratio,
i.size as size_in_blocks_mb,
i.tbl_rows as total_rows,
i.skew_rows as row_skew_ratio,
i.pct_used as percent_space_used,
i.unsorted as percent_unsorted,
i.stats_off as stats_needed
from svv_table_info i
where i.table = 'table_name'
limit 50
Article No Longer Available
Article No Longer Available
✅ Run your query - read the error logs
If you can run your query and its not being killed by the WLM or crashing, then check the Redshift error logs on how to make it run faster.
The table contains an EVENT and a SOLUTION table, this may provide some key information on how to make your query run faster.
select l.event,
l.solution,
q.querytxt
from stl_alert_event_log l
join stl_query q
on q.query = l.query
where l.userid in (select usesysid from pg_user where usename ilike '%name%') --change to your name
order by l.event_time desc
limit 10
✅ Are you fighting for resources?
In some scenarios your query may be slow because of lack of resources, or you had to wait until a slot opened up. Run the below query to see how your queries are being handled.
select c.name as process_queue,
w.slot_count,
datediff(seconds,w.queue_start_time,w.queue_end_time) as q_wait_time_seconds,
w.exec_start_time,
w.exec_end_time,
datediff(seconds,w.exec_start_time,w.exec_end_time) as exec_time_seconds,
w.final_state,
w.est_peak_mem,
q.querytxt
from stl_wlm_query w
join stv_wlm_service_class_config c
on c.service_class = w.service_class
join stl_query q
on q.query = w.query
where w.userid in (select usesysid from pg_user where usename ilike '%user%') --change to your name
order by w.xid desc;
✅ Understand best practice
Article No Longer Available
header image drawn by me
Who am I?
Ronsoak 🏳️🌈🇳🇿@ronsoakWho Am I❓
|🇳🇿 |🇬🇧 |🏳️🌈
📊 Senior Data Analyst for @Xero
⚠️ My words do not represent my company.
✍️ Writer on @ThePracticalDev (dev.to/ronsoak)
✍️ Writer on @Medium (medium.com/@ronsoak)
🎨 I draw (instagram.com/ronsoak_art/)
🛰️ I ❤️ space!03:11 AM - 13 Oct 2019
Top comments (5)
The best guide to optimize your redshift queries.
Thank you!!!
Thanks :)
I came for the advice, I stayed for gifs. A+ article.
Thank you!!!!!
Thank you very much. This is a really great guide on Redshift.