Skip to main content
Skip table of contents

Aggregator Not Preceded by ‘Check’ Sort

Rule name

Aggregator not preceded by a ‘Check’ Sort.

Parallel Job

Yes

Server Job

-

Job Sequence

-

Description

Identifies Parallel Aggregator Stages not preceded by a ‘Check’ Sort Stage.

Inroduction

The Aggregator stage summarises data rows from a single input link into groups, computing totals or other aggregate functions for each group. To correctly configure an Aggregator stage, you need to ensure two things:

  • The input link is partitioned on the Aggregator’s specified grouping keys, to ensure records with identical grouping keys values are present in the same partitions, and

  • The input link rows are sorted on the Aggregator’s specified grouping keys.

This second criterion can be achieved using a number of methods: Sort the data in the job using a Sort stage or read the data from a pre-sorted source (such as a Database connector using an ORDER BY clause). In either case, this optional rule enforces the use of a design pattern where you to test that the Aggregator’s incoming data is sorted appropriately using a ‘Check’ sort.

The existence of this rule does not imply that it should be used in all (or indeed any!) instances. It’s provided as an example should this be a rule your organization wishes to apply.

Description

This rule identifies Parallel Aggregator Stages which are not preceded by a ‘Check’ Sort Stage. A ‘check’ sort is a Sort Stage with all of its sort keys having a Sort Key Mode property of 'Do not Sort (Previously Sorted)'.

The DataStage Parallel execution framework typically inserts sorts before any stage that requires matched key values or ordered groupings (Join, Merge, Remove Duplicates, and Aggregator). Sorts are only inserted automatically when the Job does not explicitly define an input sort. Though ensuring correct results, inserted sorts can have a significant (and often unnecessary) performance impact. There are two ways to prevent the Parallel framework from inserting an un-necessary sort:

  • Insert an upstream Sort stage on each link, defining all sort key columns with the “Do not Sort (Previously Sorted)” Sort Mode key property, or

  • Set the environment variable APT_SORT_INSERTION_CHECK_ONLY. This verifies sort order but does not perform a sort, aborting the job if data is not in the required sort order.

This rule verifies you have adopted the former approach.

Actions

Ensure a Parallel Aggregator Stage is preceded by a Sort stage will all sort keys specified, and ensure all of those sort keys have a Sort Key Mode property of 'Do not Sort (Previously Sorted)'.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.