Query Optimization
Analyze and improve query performance with profiling tools, execution plans, and optimization techniques.
Performance Analysis Tools
Query Profile
Visual execution plan analysis in Databend Cloud
- Access: Monitor → SQL History → Query Profile tab
- Shows: Execution nodes, timing, resource usage
- Use for: Identifying bottlenecks, understanding query execution
Query Hash
Unique query fingerprinting for performance tracking
-- Get query fingerprint
SELECT query_hash('SELECT * FROM table WHERE id = ?');
Query Optimization Fundamentals
Execution Plan Analysis
-- View query execution plan
EXPLAIN SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date >= '2023-01-01';
Look for:
- Table scans vs index usage
- Join algorithms (hash, merge, nested loop)
- Filter pushdown effectiveness
- Resource consumption estimates
Index Strategy
-- Create indexes for common query patterns
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_customer ON orders(customer_id);
Index Guidelines:
- Index WHERE clause columns
- Index JOIN columns on both sides
- Consider composite indexes for multi-column filters
- Monitor index usage statistics
Performance Optimization Techniques
Query Rewriting
-- ❌ Inefficient: Function on column prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2023;
-- ✅ Optimized: Range condition can use index
SELECT * FROM orders
WHERE order_date >= '2023-01-01'
AND order_date < '2024-01-01';
Filter Pushdown
-- ❌ Filter after join
SELECT * FROM (
SELECT o.*, c.name
FROM orders o JOIN customers c ON o.customer_id = c.id
) WHERE order_date >= '2023-01-01';
-- ✅ Filter before join
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date >= '2023-01-01';
Aggregation Optimization
-- Use appropriate GROUP BY extensions
SELECT
region,
product_category,
COUNT(*) as sales_count,
SUM(amount) as total_sales
FROM sales
GROUP BY CUBE(region, product_category);
Common Performance Issues
Issue 1: Large Result Sets
-- ❌ Problem: No limit on large table
SELECT * FROM events ORDER BY timestamp DESC;
-- ✅ Solution: Always limit exploratory queries
SELECT * FROM events ORDER BY timestamp DESC LIMIT 1000;
Issue 2: Inefficient Joins
-- ❌ Problem: Cartesian product
SELECT * FROM table1, table2 WHERE condition;
-- ✅ Solution: Explicit join with proper conditions
SELECT * FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.foreign_id
WHERE condition;
Issue 3: Unnecessary Complexity
-- ❌ Problem: Nested subqueries
SELECT * FROM (
SELECT * FROM (
SELECT col1, col2 FROM table WHERE condition1
) WHERE condition2
) WHERE condition3;
-- ✅ Solution: Combine conditions
SELECT col1, col2 FROM table
WHERE condition1 AND condition2 AND condition3;
Monitoring and Metrics
Key Performance Indicators
- Query execution time
- Rows scanned vs rows returned
- Memory usage
- CPU utilization
- I/O operations
Performance Monitoring Query
-- Find slow queries from query history
SELECT
query_text,
query_duration_ms,
scan_bytes,
result_bytes,
memory_usage
FROM system.query_log
WHERE query_duration_ms > 10000 -- Queries over 10 seconds
ORDER BY query_duration_ms DESC
LIMIT 10;
Optimization Checklist
Query Design
- Use appropriate WHERE conditions
- Minimize data scanned with column selection
- Apply filters before joins
- Use proper join types
- Limit result sets appropriately
Indexing
- Index frequently filtered columns
- Index join columns
- Remove unused indexes
- Monitor index effectiveness
Schema Design
- Choose appropriate data types
- Normalize appropriately (avoid over-normalization)
- Consider partitioning for large tables
- Use clustering keys for sort optimization
Advanced Optimization
Materialized Views
-- Pre-compute expensive aggregations
CREATE MATERIALIZED VIEW daily_sales AS
SELECT
DATE(order_time) as order_date,
product_id,
COUNT(*) as order_count,
SUM(amount) as total_sales
FROM orders
GROUP BY DATE(order_time), product_id;
Query Hints
-- Force specific join algorithm when needed
SELECT /*+ USE_HASH_JOIN */ *
FROM large_table l
JOIN small_table s ON l.id = s.foreign_id;
Best Practices Summary
- Measure first - Use Query Profile to identify bottlenecks
- Index strategically - Cover your query patterns
- Filter early - Apply WHERE conditions as soon as possible
- Limit appropriately - Don't fetch more data than needed
- Monitor continuously - Track query performance over time