Database developers have it tough. Whether they use SQL Server, Oracle, DB2, MySQL, PostgreSQL, or SQLite, the challenges are similar. It’s too easy to write queries that perform badly, that waste system resources, or that don’t take advantage of database features designed to make life easier.
Here are seven common traps to avoid when writing database applications.
7 SQL mistakes to avoid
- Blindly reusing queries
- Nesting views
- Running large, multi-table operations in a single transaction
- Clustering on GUIDs or other “volatile” columns
- Counting rows to check if data exists
- Using triggers
- Doing negative searches
Blindly reusing queries
An SQL query is typically tailored to retrieve the data needed for a specific job. If you repurpose a query that fits most of your use case, it may work outwardly, but it could also supply too much data. This takes a toll on performance and resources, some of which won’t manifest until you hit scale. Always examine queries you intend to repurpose and trim them to fit the new use case.
Views provide a standard way of looking at data and keep users from having to deal with complex queries. The problem arises when we use views to query other views.
Nesting views, as these are called, have multiple drawbacks. For one, they query far more data than you typically need. They also obscure the amount of work that is done to retrieve a given set of data. And, they make it difficult (sometimes impossible) for the database’s plan optimizer to optimize the resulting queries.
If you use a view, don’t query other views with it. Any nested views should be “flattened” and rewritten to retrieve only what’s needed.
Running large, multi-table operations in a single transaction
Let’s say you need to delete data from 10 tables as part of some operation. You might be tempted to run all the deletes across all the tables in a single transaction—but don’t do it. Instead, handle each table’s operations separately.
If you need the deletes across tables to happen atomically, you can break it up into many smaller transactions. For instance, if you have 10,000 rows that need deleting across 20 tables, you can delete the first thousand across all 20 tables in one transaction, then the next thousand in another transaction, and so on. (This is another good use case for a task queue mechanism in your business logic, where operations like these can be managed, paused, and resumed if needed.)
Clustering on GUIDs or other ‘volatile’ columns
GUIDs, or globally unique identifiers, are 16-byte random numbers used to give objects some distinct identifier. Many databases support them as a native column type. But they should not be used for clustering the rows they live in. Because they’re random, they cause the table’s clustering to become highly fragmented. Table operations can very quickly become orders of magnitude slower. In short, don’t cluster on any columns that have a lot of randomness. Dates or ID columns work best.
Counting rows to check if data exists
Using an operation like
SELECT COUNT(ID) FROM table1 to determine whether some data exists in a table is often inefficient. Some databases can intelligently optimize
SELECT COUNT() operations, but not all have that capability. The better approach, if your SQL dialect offers it, is to use something like
IF EXISTS (SELECT 1 from table1 LIMIT 1) BEGIN ... END.
If it’s the row count you want, another approach is to obtain row-count statistics from the system table. Some database vendors also have specific queries; for example, in MySQL, you can use
SHOW TABLE STATUS to get stats about all tables, including row counts. With Microsoft T-SQL, there’s the stored procedure sp_spaceused.
As convenient as triggers are, they come with a big limitation: they must happen in the same transaction as the original operation. If you create a trigger to modify one table when another is modified, both tables will be locked until at least the trigger finishes. If you must use a trigger, ensure it won’t lock more resources than is sufferable. A stored procedure might be the better solution because it can break trigger-like operations across multiple transactions.
Doing negative searches
SELECT * FROM Users WHERE Users.Status <> 2 are problematic. An index on the
Users.Status column is useful, but negative searches like this typically fall back to a table scan. The better solution is to write queries so that they use covering indexes efficiently—for example,
SELECT * FROM Users WHERE User.ID NOT IN (Select Users.ID FROM USERS WHERE Users.Status=2). This allows us to use the indexes on both the
Status columns to pare out what we don’t want, without performing table scans.
Copyright © 2023 IDG Communications, Inc.
This story originally Appeared on Infoworld