Stefan Jenkins

Code, Math, & Science

Utility of List Aggregation Function

28 Jun 2020

The list aggregation function in SQL is a very handy function for consolidating data from multiple rows in data based on a grouping variable. Be sure to try out this function using Oracle’s live SQL editor SQL Worksheet tool.

With Oracle PL/SQL you there is a built-in function for list aggregation with details found in the reference documentation.

This entry will show an example generating a CSV list of each of the salaries based on Job IDs. For this example we will use Oracle’s built in HR schema Employee table.

Syntax

LISTAGG(COLUMN_NAME, ‘.,delimiter’) WITHIN GROUP(ORDER BY GROUPING_VAR1, GROUPUING_VAR2,..)

Followed eventually by a grouping clause:

GROUP BY GROUPING_VAR1, GROUPUING_VAR2,..

Steps

1. Explore the data

  SELECT FIRST_NAME, LAST_NAME, JOB_ID, SALARY FROM Employees;
  


FIRST_NAME LAST_NAME JOB_ID SALARY
Steven King AD_PRES 24000
Neena Kochhar AD_VP 17000
Lex De Haan AD_VP 17000
Alexander Hunold IT_PROG 9000
Bruce Ernst IT_PROG 6000
David Austin IT_PROG 4800
Valli Pataballa IT_PROG 4800
... ... ... ...


2. Run a query to aggregate the salaries grouped by the job ID. The delimiter used can vary but commas or semi-colons are very readable. When joining text spaces can also be used. .

  SELECT
     JOB_ID,
     LISTAGG(SALARY, ',') WITHIN GROUP (ORDER BY JOB_ID) "Salaries"
  FROM HR.EMPLOYEES
  GROUP BY JOB_ID;
  


JOB_IDSalaries
AC_ACCOUNT8300
AC_MGR12008
AD_ASST4400
AD_PRES24000
AD_VP17000,17000
FI_ACCOUNT6900,7700,7800,8200,9000
FI_MGR12008
HR_REP6500
IT_PROG4200,4800,4800,6000,9000
MK_MAN13000
MK_REP6000
PR_REP10000
PU_CLERK2500,2600,2800,2900,3100
PU_MAN11000
SA_MAN10500,11000,12000,13500,14000
SA_REP10000,10000,10000,10500,11000,11500,6100,6200,6200,6400,6800,7000,7000,7000,7200,7300,7400,7500,7500,8000,8000,8400,8600,8800,9000,9000,9500,9500,9500,9600
SH_CLERK2500,2500,2600,2600,2800,2800,2900,3000,3000,3100,3100,3200,3200,3400,3600,3800,3900,4000,4100,4200
ST_CLERK2100,2200,2200,2400,2400,2500,2500,2500,2600,2700,2700,2800,2900,3100,3200,3200,3300,3300,3500,3600
ST_MAN5800,6500,7900,8000,8200

3. Sometimes it is also useful to only select only two or three of the items being consolidated. To do this use the PARTITION function combined with a WHERE filter. The below query limits the selection to only 3 salaries per Job ID.

SELECT * FROM (
  SELECT

    ROW_NUMBER() OVER(PARTITION BY JOB_ID ORDER BY JOB_ID) "NUM",
    E.*,
  FROM HR.EMPLOYEES E
  )
WHERE NUM <= 3 ;


NUMEMPLOYEE_IDFIRST_NAMELAST_NAMEEMAILPHONE_NUMBERHIRE_DATEJOB_IDSALARYCOMMISSION_PCTMANAGER_IDDEPARTMENT_ID
1206WilliamGietzWGIETZ515.123.818107-JUN-02AC_ACCOUNT8300 - 205110
1205ShelleyHigginsSHIGGINS515.123.808007-JUN-02AC_MGR12008 - 101110
1200JenniferWhalenJWHALEN515.123.444417-SEP-03AD_ASST4400 - 10110
1100StevenKingSKING515.123.456717-JUN-03AD_PRES24000 - - 90
1101NeenaKochharNKOCHHAR515.123.456821-SEP-05AD_VP17000 - 10090
2102LexDe HaanLDEHAAN515.123.456913-JAN-01AD_VP17000 - 10090
1109DanielFavietDFAVIET515.124.416916-AUG-02FI_ACCOUNT9000 - 108100
2110JohnChenJCHEN515.124.426928-SEP-05FI_ACCOUNT8200 - 108100
3111IsmaelSciarraISCIARRA515.124.436930-SEP-05FI_ACCOUNT7700 - 108100
1108NancyGreenbergNGREENBE515.124.456917-AUG-02FI_MGR12008 - 101100
... ... ... ... ... ... ... ... ... ... ... ...

4. Lastly combine steps 2 and 3 to consolidate the query and limit the results.

SELECT
   JOB_ID,
   LISTAGG(SALARY, ',') WITHIN GROUP (ORDER BY JOB_ID) "Salaries"
FROM (
SELECT * FROM (
    SELECT
      E.*,
      ROW_NUMBER() OVER(PARTITION BY JOB_ID ORDER BY JOB_ID) "NUM"
    FROM HR.EMPLOYEES E
    )
  WHERE NUM <= 3
)
GROUP BY JOB_ID;


JOB_IDSalaries
AC_ACCOUNT8300
AC_MGR12008
AD_ASST4400
AD_PRES24000
AD_VP17000,17000
FI_ACCOUNT6900,7800,9000
FI_MGR12008
HR_REP6500
IT_PROG4800,6000,9000
MK_MAN13000
MK_REP6000
PR_REP10000
PU_CLERK2500,2600,3100
PU_MAN11000
SA_MAN10500,11000,14000
SA_REP10000,6200,7000
SH_CLERK2600,2600,3200
ST_CLERK2500,2600,3200
ST_MAN7900,8000,8200