Stefan Jenkins

Code, Math, & Science

Grouping by Regex in Oracle PL/SQL

29 Mar 2020

With Oracle PL/SQL you can use built in regular expression functions to match query results using SELECT statements. More on that documentation can be found here in the oracle reference documentation.

This entry will use the REGEX_SUBSTR() function to find strings that match the given pattern. To group by substring results, a nested select statement is used to refer to the named column. As a reference we will use Oracle’s built in HR schema Employee table.

As an example we can determine the maximum salary for each department using only the JOB_ID field:

1. Explore the data

  SELECT FIRST_NAME, LAST_NAME, JOB_ID, SALARY FROM Employees;
  


FIRST_NAME LAST_NAME JOB_ID SALARY
Steven King AD_PRES 24000
Neena Kochhar AD_VP 17000
Lex De Haan AD_VP 17000
Alexander Hunold IT_PROG 9000
Bruce Ernst IT_PROG 6000
David Austin IT_PROG 4800
Valli Pataballa IT_PROG 4800
... ... ... ...


2. Use a nested select statement to extract the department from JOB_ID and calculate the maximum salary by department using a GROUP BY clause. The pattern ‘^\w{2}’ means to select the first two alphabet characters at the start of the string.

  SELECT DEPARTMENT, MAX(SALARY) FROM (

      SELECT
          REGEXP_SUBSTR(JOB_ID,'^\w{2}') "DEPARTMENT",
          SALARY
      FROM Employees)

  GROUP BY DEPARTMENT;
  


DEPARTMENT MAX(SALARY)
FI 12008
ST 8200
HR 6500
AD 24000
SA 14000
MK 13000
PR 10000
AC 12008
IT 9000
PU 11000
SH 4200

A useful tool for checking regular expressions is regex101.