Accessing Columns and Derivations using Compliance Rules
Introduction
There will be compliance rules which need to access extended design attributes such as:
Columns
Constraints
Transformer Stage Variables
Transformer Loop Conditions
Transformer Loop Variables
As of Compliance version 2.1, all these design attributes are now available as properties within the graph. A new Gremlin Pipe has also been introduced that enables deep inspection of derivations.
Accessing extended design attributes
Columns, Constraints and Transformer Variables are all accessible as properties within the Graph Model. The following diagram shows where these properties are available within Graph models representing Jobs and Shared Containers:
Property Name | Type | Description | Availability |
|---|---|---|---|
filterConstraint |
| Represents the derivation(s) used to apply output constraints on stages like Transformers and Lookups | All output Pins which have a constraint defined |
columns | Array of | List of columns on links | All links, list will be empty when no column metadata is defined |
stageVars | Array of | List of all stage variables defined on Transformer Stages | Transformer Stages, list will be empty when no stage variables are defined |
loopCondition |
| The condition used to define loops within transformer Stages | Transformer Stage, is null if no Loop Condition is defined |
loopVars | Array of | List of all loop variables defined on Transformer Stages | Transformer Stage, is null if no Loop Condition is defined |
The properties available for each type are as follows:
Constraint
Property | Type | Description |
|---|---|---|
name | String | Its possible for link constraints defined within DataStage to have a name, however it is almost always null |
derivation | String | The constraint condition as entered by the user |
parsedDerivation | String | The same constraint condition after the DataStage designer has parsed it and applied Transforms |
Column
Property | Type | Description |
|---|---|---|
name | String | Name of column |
key | Boolean | Is the column a key? A true value means the user has selected ☑ box in the column definition screens of DataStage |
sqlType | String | The Type as defined within the column definition screens of DataStage. All type names are defined in UPPER CASE format. NOTE: due to the way DataStage represents column data internally, this field can be null. |
typeCode | String | Internal Type of the column. For parallel, this is the native OSH types |
length | Integer | Maximum Length/Precision of the column, a zero value represents undefined/unbounded |
scale | Integer | Fractional Digits of the column, a |
nullable | Boolean | Is Nullable? |
description | String | Description text provided by the user in the columns definition screens of DataStage |
derivation | String | Derivation for the column. This is usually null for source stages but can also be inaccurate for some Stored Procedure stages |
parsedDerivation | String | The derivation after the DataStage designer has parsed it and applied Transforms |
StageVariable
Property | Type | Description |
|---|---|---|
name | String | Name of the stage variable |
initialValue | String | Initial value of the stage variable |
sqlType | String | The Type as defined within the column definition screens of DataStage. All type names are defined in UPPER CASE format. |
precision | Integer | Maximum Length/Precision of the column, a zero value represents undefined/unbounded |
scale | Integer | Fractional Digits of the column, a |
derivation | String | Derivation for the column. |
parsedDerivation | String | The derivation after the DataStage designer has parsed it and applied Transforms |
Example
Implementing a compliance rule which verifies all column names start with “col” would like the following:
item.graph // start with the graph model
.E // select edges
.link.as("link") // retain only link edges and apply a label for later access
.columns // access the columns properties, this is an array
.scatter // use scatter to "unroll" the array and process each column in the array
.sideEffect{ column, pipes -> // process each column, declaring the optional second parameter to get access to previously labeled pipes
if (!it.name.startsWith("col")) { // find any columns which don't start with "col"
// access the edge that we labeled "link" (line 3) using "pipes.link"
compliance.failure("${pipes.link.linkName}.${column.name} does not start with 'col'")
}
}
Inspection of Derivations
Derivations used by Constraints, Columns and StageVariables are available as string values which match the values DataStage developers see while designing jobs and shared containers. However, beyond simple pattern matching, strings representing derivations are difficult to inspect for compliance purposes. The derivations pipe can be used to parse derivations and construct an Abstract Syntax Tree (AST) which can be inspected as a sub tree of the job graph model.
For example, consider the following derivation:
"Hello " : myLink.FullName[" ", 1, 1] : " there are " : (24 * 7) : " hours in a week"
When processed by the derivations pipe, the following AST would be produced:
When the derivations pipe processes a graph Vertex, an AST will be generated for derivations based on the following table
Vertex Type | Properties with derivations which will be parsed | Example |
|---|---|---|
Stage |
|
CODE
|
Pin |
|
GROOVY
|
The derivations pipe also accepts an optional closure argument which is called for each property with derivations to be parsed. An AST tree will only be constructed when the closure returns true, this allows properties to be filtered before an AST is constructed.
For example, the following could be used to process only Column derivations:
item.graph // job/shared container graph model
.V.pin // vertices of type pin
.derivations{ it instanceof Column } // `it` will be of type Constraint or Column, only accept Columns
Abstract Syntax Tree Vertex Types
Expression
There is always one “expression” type vertex which is the root of the AST tree and will have one immediate child if the expression is non-empty.
Property | Type | Value |
|---|---|---|
type | String | “expression” |
derivation | String | Derivation string that the AST was constructed from |
… | … | Inherited from the source of the derivation. For example, as expression created from a
|
Example query for selecting this node:
item.graph
.V.pin // or V.stage
.derivations
.has('type', 'expression')
Constant
Represents a constant String or Number within the derivation and has no children.
Property | Type | Value |
|---|---|---|
type | String | “constant” |
value | String | String representing the constant. For numeric values, these will be the number while string value include the quotes as defined within the derivation. Note: Numeric values will always be positive as the negative sign is parsed the negate operator (unary type) |
Example query for selecting this node:
item.graph
.V.pin // or V.stage
.derivations
.has('type', 'constant')
Identifier
Represents all “Identifiers” (such as Parameters, Columns, StageVariables, etc) within the derivation. Identifiers have no children.
Property | Type | Value |
|---|---|---|
type | String | “identifier” |
category | String | Describes the type of identifier. Possible values are:
|
name | String | Name of the identifier. Column names a fully qualified with “Link.Column” and System Variables do not include the “@” prefix. |
Example query for selecting all references to Job/Shared Container references:
item.graph
.V.stage // start with stages
.copySplit(_(), _().out.pin) // slit the pipe to collect stages AND output pins in parallel
.fairMerge // merge parallel pipes, fair merge is the equivalent to Continuous Funnel in DataStage PX
.derivations // construct AST for all derivations (Constaints, Columns and StageVariables)
.has('type', 'identifer') // select identifier types from AST
.has('category', 'parameter') // select parameter identifiers
Call
Represents all function/routine calls. There is one child per argument used for the call, each connecting Edge is labeled as argument and has an index property which indicates which argument child represents.
Property | Type | Value |
|---|---|---|
type | String | “call” |
category | String | Distinguishes between the different type of calls. Possible values:
|
Example query for selecting all constant conversion strings passed to “Oconv” calls:
item.graph
.V.stage // start with stages
.copySplit(_(), _().out.pin) // slit the pipe to collect stages AND output pins in parallel
.fairMerge // merge parallel pipes, fair merge is the equivalent to Continuous Funnel in DataStage PX
.derivations // construct AST for all derivations (Constaints, Columns and StageVariables)
.has('type', 'call') // select call types from AST
.has('name', 'Oconv') // select Oconv functions
.outE('argument') // follow the argument edges
.has('index', 1) // only consider index 1 (second argument)
.inV // follow edge to the next Vertex in the AST
.has('type', 'constant') // select only constant arguments
Unary
Unary Vertex types represents operations with only one operand. Derivations support unary + and - operations but + is considered a no-op and is not represented in the AST. An example unary operation would be -Link.Column which negates the value represented by Link.Column. Unary Vertex types have a single child representing the expression the operand will be applied to.
Property | Type | Value |
|---|---|---|
type | String | “unary” |
operator | String | Only |
Example query to select all negated constant expressions:
item.graph
.V.stage // start with stages
.copySplit(_(), _().out.pin) // slit the pipe to collect stages AND output pins in parallel
.fairMerge // merge parallel pipes, fair merge is the equivalent to Continuous Funnel in DataStage PX
.derivations // construct AST for all derivations (Constaints, Columns and StageVariables)
.has('type', 'unary') // select unary types from AST
.has('operator', '-') // select negate operators - not strictly required as operator is always "-"
.outE('expression').inV // select AST node which is negated via the "expression" edge
.has('type', 'constant') // select only constant expressions
Infix
Infix Vertex types represent operations with two operands separated by an operator. Addition (5 + 3) and Concatenation ("Hello" : " world!") are two examples of infix operations. Infix operations a grouped into multiple subtypes called a category, all of which have to children connected by edges: a left edge and a right edge.
Arithmetic
Property | Type | Value |
|---|---|---|
type | String | “infix” |
category | String | “arithmetic” |
operator | String | Arithmetic operators:
|
String
Property | Type | Value |
|---|---|---|
type | String | “infix” |
category | String | “string” |
operator | String | String operators:
|
Relational
Property | Type | Value |
|---|---|---|
type | String | “infix” |
category | String | “relational” |
operator | String | Relational operators:
DataStage derivation support multiple symbols for the same operator but they are normalized to the operators show above. For example, using |
Logical
Property | Type | Value |
|---|---|---|
type | String | “infix” |
category | String | “logical” |
operator | String | Logical operators:
|
Example query to select all constraints where a column is tested to be < another value. eg. Link.Column < 5:
item.graph
.V.pin // start with pins
.derivations{ it instanceof Constraint } // construct AST for Constraints
.has('type', 'infix') // select infix types from AST
.has('operator', '<') // select less than operator
.outE('left').inV // select AST node on left of the < operator
.has('type', 'identifier') // select only identifiers
.has('category', 'column') // select column identifiers
IfThenElse
The IfThenElse Vertex type represents am If <condition> Then <then expression> Else <else expression> statement within the parsed derivation. It has no properties other than the type but the children represent the condition, then expression and else condition.
Property | Type | Value |
|---|---|---|
type | String | “IfThenElse” |
Example to select all else expressions used in Column derivation:
item.graph
.V.pin // start with pins
.derivations{ it instanceof Column } // construct AST for Column derivation
.has('type', 'ifThenElse') // select If-Then-Else nodes
.outE('else').inV // select AST node connected by the 'else' edge
Substring
Substring Vertex types represent the Substring operator [] which accepts 1 or 2 arguments: Link.Column[3] or Link.Column[2, 4]. This vertex type has no property other than type and will have 2 or 3 children depending on the number of arguments specified:
string- string expression which the substring is being performed againstlength- length of the substringstart- start of the substring operation, will only exist when the two arguments are used with the[]operator
Field
Field Vertex types represent the Substring operator [] which accepts 3 arguments: Link.Column[“:”, 1, 2]. This syntax is specific to extracting substrings based on a delimiter, the occurrence to extract and the number to include. This vertex type has no property other than type but has 4 children:
String- the string expression which the substring is being performed againstdelimiter- delimiter that splits the stringoccur- which occurrence to extractcount- how many occurrences to extract.
Property | Type | Value |
|---|---|---|
type | String | “field” |
Example query to select all delimiters used to extract fields in Column derivations:
item.graph
.V.pin // start with pins
.derivations{ it instanceof Column } // construct AST for Column derivation
.has('type', 'field') // select field operators
.outE('delimiter').inV // select AST node connected by the 'delimiter' edge










