More Structured Query Language Basics

Posted in software by Christopher R. Wirz on Wed Mar 14 2012



NULL Values and Three-Valued Logic

  • NULL: Represents missing values in a database with three possible interpretations:

    • Value unknown: Value exists but is not known, or it's unknown whether the value exists
    • Value unavailable: Value exists but is purposely withheld
    • Value not applicable: The attribute doesn't apply to this tuple
  • Three-Valued Logic: SQL uses TRUE, FALSE, and UNKNOWN (instead of just TRUE/FALSE)

    • Comparisons involving NULL produce UNKNOWN results
    • Only tuples that evaluate to TRUE (not FALSE or UNKNOWN) are selected in WHERE clauses
    • Use IS NULL or IS NOT NULL for NULL comparisons (not = or <>)

Nested Queries

  • Nested Query: A complete SELECT-FROM-WHERE query within another SQL query
  • Outer Query: The query that contains the nested query
  • Comparison Operators:

    • IN: Tests if a value is in a set of values
    • = ANY or = SOME: Equivalent to IN
    • > ALL: True if value is greater than all values in the set
    • Other operators: <, <=, >=, <> can be used with ANY, SOME, or ALL
  • Correlated Nested Queries: When a condition in the nested query references an attribute from the outer query

    • Evaluated once for each tuple in the outer query

EXISTS and UNIQUE Functions

  • EXISTS: Returns TRUE if the nested query returns at least one tuple
  • 
    -- Find employees who have at least one dependent
    SELECT Fname, Lname
    FROM EMPLOYEE
    WHERE EXISTS (
        SELECT *
        FROM DEPENDENT
        WHERE EMPLOYEE.Ssn = DEPENDENT.Essn
    );
    
  • NOT EXISTS: Returns TRUE if the nested query returns no tuples
  • 
    -- Find employees who have no dependents
    SELECT Fname, Lname
    FROM EMPLOYEE
    WHERE NOT EXISTS (
        SELECT *
        FROM DEPENDENT
        WHERE EMPLOYEE.Ssn = DEPENDENT.Essn
    );
    
  • UNIQUE: Returns TRUE if there are no duplicate tuples in the result. Note that UNIQUE is less commonly implemented in all SQL database systems compared to EXISTS and NOT EXISTS. Some database systems may require alternative approaches to achieve the same functionality.
  • 
    -- Find projects where each assigned employee works the same number of hours
    SELECT Pnumber, Pname
    FROM PROJECT
    WHERE UNIQUE (
        SELECT Hours
        FROM WORKS_ON
        WHERE WORKS_ON.Pno = PROJECT.Pnumber
    );
    

Joined Tables

  • Joined Table: Specifies joins in the FROM clause rather than in the WHERE clause
  • JOIN Types:
    • INNER JOIN (or JOIN): Returns only matching rows. If only JOIN is specified in an SQL statement without any additional qualifier, an INNER JOIN is executed. The JOIN keyword by itself is shorthand for INNER JOIN in standard SQL. An inner join returns only the rows where there is a match between the tables being joined based on the specified join condition.
    • 
      -- these two statements are equivalent
      
      SELECT *
      FROM Employee JOIN Department
      ON Employee.DeptId = Department.DeptId;
      
      SELECT *
      FROM Employee INNER JOIN Department
      ON Employee.DeptId = Department.DeptId;
      
    • LEFT OUTER JOIN: All rows from left table, matched with right table
    • RIGHT OUTER JOIN: All rows from right table, matched with left table
    • FULL OUTER JOIN: All rows from both tables
    • NATURAL JOIN: Implicit equi-join (a type of join that combines tables based on a matching column between them, specifically where the values in the specified columns are equal) on columns with same name
    • CROSS JOIN: Cartesian product of tables. A CROSS JOIN performs a Cartesian product between two tables, combining each row from the first table with every row from the second table. This results in a table that has the number of rows equal to the product of the number of rows in the two input tables.
    • 
      -- this query would give all possible combinations of EmployeeId, Name, DepartmentID, and DepartmentName
      
      SELECT e.EmployeeId, e.Name, d.DepartmentId, d.DepartmentName
      FROM Employee e
      CROSS JOIN Department d;
      

Aggregate Functions

  • COUNT: Returns number of tuples or values
  • SUM: Returns sum of numeric values
  • MAX: Returns maximum value
  • MIN: Returns minimum value
  • AVG: Returns average (mean) of values
  • NULL Handling: NULLs are removed before calculation (except COUNT(*))

Grouping and HAVING

  • GROUP BY: Partitions tuples into groups based on specified attribute values
  • HAVING: Filters groups (applied after grouping)
  • Key Points:
    • WHERE filters individual tuples (before grouping)
    • HAVING filters groups (after grouping)
    • 
      -- this query 
      --  Joins the Employee and Department tables
      --  Groups the results by department name
      --  Calculates two aggregate values for each group
      --      Count of employees in each department
      --      Average salary in each department
      --  Uses HAVING to filter the groups, showing only departments that have:
      --      An average salary greater than $50,000, AND
      --      At least 3 employees
      
      SELECT Department.Dname AS DepartmentName, 
             COUNT(Employee.Ssn) AS EmployeeCount, 
             AVG(Employee.Salary) AS AverageSalary
      FROM Employee
      JOIN Department ON Employee.Dno = Department.Dnumber
      GROUP BY Department.Dname
      HAVING AVG(Employee.Salary) > 50000 AND COUNT(Employee.Ssn) >= 3;
      
    • SELECT clause with GROUP BY should include only grouping attributes and aggregate functions

WITH and CASE Clauses

  • WITH: Defines a temporary table for use in a single query (similar to a temporary view)
  • CASE: Allows different values based on specific conditions
    • Can be used in SELECT, INSERT, and UPDATE statements

Recursive Queries

  • Introduced in SQL:99 for handling hierarchical/recursive relationships
  • Uses WITH RECURSIVE clause
  • Example applications: organizational hierarchies, bill of materials
  • 
    -- this query
    --  Defines a recursive Common Table Expression (CTE) named EmployeeHierarchy
    --  Starts with a base case that selects the top manager (employee ID 100)
    --  Combines this with a recursive part that finds all direct reports to anyone already in the result
    --  Tracks the level in the hierarchy with a counter
    --  The outer query formats the results to show the reporting structure visually through indentation
    
    WITH RECURSIVE EmployeeHierarchy AS (
        -- Base case: start with the top manager (employee ID 100)
        SELECT EmployeeID, Name, Title, ManagerID, 1 AS Level
        FROM Employees
        WHERE EmployeeID = 100
        
        UNION ALL
        
        -- Recursive case: find all employees who report to someone in our current result set
        SELECT e.EmployeeID, e.Name, e.Title, e.ManagerID, eh.Level + 1
        FROM Employees e
        JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
    )
    SELECT 
        EmployeeID,
        Name,
        Title,
        Level,
        CONCAT(REPEAT('    ', Level-1), Name) AS HierarchyDisplay
    FROM EmployeeHierarchy
    ORDER BY Level, Name;
    
    -- The query will return all employees in the hierarchy, with each employee's level and a formatted display showing the organizational structure.
    

Views

  • View: A virtual table derived from other tables (base tables or other views)
  • Benefits:

    • Simplifies query writing
    • Provides security (restricting access to certain columns/rows)
    • Offers data independence
  • View Implementation:

    • Query modification: Transforms view query into a query on base tables
    • View materialization: Physically creates a temporary table
  • View Updates:

    • Single-table views may be updatable if they contain the primary key
    • Views with joins or aggregate functions are generally not updatable
    • Use WITH CHECK OPTION to enforce view update constraints

Schema Changes

  • DROP Commands:

    • DROP SCHEMA: Removes an entire schema
    • DROP TABLE: Removes a table
    • Options: CASCADE (removes dependent objects) or RESTRICT (fails if dependencies exist)
  • ALTER Commands:

    • ALTER TABLE ADD COLUMN: Adds a new column
    • ALTER TABLE DROP COLUMN: Removes a column
    • ALTER TABLE ALTER COLUMN: Modifies column properties
    • ALTER TABLE ADD CONSTRAINT: Adds a constraint
    • ALTER TABLE DROP CONSTRAINT: Removes a constraint

Constraints and Triggers

  • Assertions: General constraints using CREATE ASSERTION

    • Can specify complex integrity rules across multiple tables
  • Triggers: Automatic actions performed when events occur

    • Components: Event, Condition, Action (ECA rules)
    • Events: INSERT, UPDATE, DELETE operations
    • Can execute before or after the triggering event
    • Used for maintaining consistency, monitoring updates, and automating actions

Query Execution Order

Conceptual order of query evaluation:

  1. FROM clause (identify tables and perform joins)
  2. WHERE clause (filter tuples)
  3. GROUP BY (create groups)
  4. HAVING (filter groups)
  5. SELECT (select columns)
  6. ORDER BY (sort results)

SQL Relations vs. ER Model Relations

Relations in SQL tables and those in the formal ER model differ in several important ways:

  1. Mathematical vs. Implementation: ER relations are based on mathematical set theory where relations are strictly defined as sets of tuples with no duplicates. SQL tables are physical implementations that can contain duplicate rows.

  2. Ordering: ER relations have no inherent ordering of tuples or attributes, while SQL tables can have physical ordering and indexed access paths.

  3. Null Values: Formal ER relations don't typically accommodate null values, while SQL tables explicitly support nulls.

  4. Constraints: ER model constraints are represented conceptually, while in SQL they're implemented through specific declaration mechanisms.

Terminology Differences

ER Model SQL
Entity Row or Record
Attribute Column
Relationship Foreign Key connection
Entity Set Table
Domain Data Type
Tuple Row

SQL and Duplicate Tuples

SQL allows duplicate tuples (rows) in tables and query results for several reasons:

  1. Performance: Enforcing uniqueness requires additional processing overhead.
  2. Practical Applications: Many real-world applications need duplicates (e.g., transaction records).
  3. Query Operations: Operations like joins and aggregations often produce meaningful duplicates.
  4. Multiset Semantics: SQL is based on multiset (bag) rather than set semantics.

To eliminate duplicates, SQL provides the DISTINCT keyword.

SQL Data Types

Common SQL data types include:

  • Numeric: INTEGER, SMALLINT, BIGINT, DECIMAL, NUMERIC, FLOAT, REAL, DOUBLE PRECISION
  • Character: CHAR, VARCHAR, TEXT
  • Boolean: BOOLEAN
  • Temporal: DATE, TIME, TIMESTAMP, INTERVAL
  • Binary: BLOB, BINARY, VARBINARY
  • Specialized: XML, JSON, UUID, ARRAY, ENUM
  • Spatial: GEOMETRY, GEOGRAPHY (in some implementations)

SQL Integrity Constraints

Entity Integrity

SQL implements entity integrity through:

  • PRIMARY KEY constraints
  • UNIQUE constraints
  • NOT NULL constraints

Referential Integrity

Implemented through:

  • FOREIGN KEY constraints with references to primary or unique keys

Referential Triggered Actions

SQL supports referential triggered actions through:

  • ON DELETE: CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION
  • ON UPDATE: CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION

Example:


FOREIGN KEY (dept_id) REFERENCES Department(id)
    ON DELETE CASCADE
    ON UPDATE SET NULL

SQL Query Syntax

A simple SQL retrieval query consists of four main clauses:

  1. SELECT (required)

    • Attribute list or expressions
    • Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
    • DISTINCT keyword to eliminate duplicates
    • Arithmetic expressions
    • String functions
  2. FROM (required)

    • Table names
    • Table aliases
    • Join operations (INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN)
    • Subqueries as derived tables
  3. WHERE (optional)

    • Comparison conditions (=, <>, <, >, <=, >=)
    • Logical operators (AND, OR, NOT)
    • Pattern matching (LIKE, NOT LIKE)
    • Membership tests (IN, NOT IN)
    • Existence tests (EXISTS, NOT EXISTS)
    • NULL tests (IS NULL, IS NOT NULL)
    • Subqueries
  4. ORDER BY (optional)

    • Column names or positions for sorting
    • ASC or DESC for sort direction
    • Multiple sort keys

Example:


SELECT e.first_name, e.last_name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id
WHERE e.hire_date > '2020-01-01' AND d.location = 'New York'
ORDER BY e.last_name, e.first_name;