Method and Path | Description | ||
---|---|---|---|
GET | /api/processes/ | Retrieves a list of process runs based on search parameters. |
|
GET | /api/processes/:process_id | Retrieves a process run |
|
POST | /api/processes/:process_id/retry | Retries a failed process run by restarting failed tasks; tasks will be given a single additional attempt |
|
POST | /api/processes/:process_id/kill | Terminates an active process |
Method and Path | Description | ||
---|---|---|---|
GET | /api/process_definitions/ | Get all currently registered process definitions |
|
GET | /api/process_definitions/:process_definition_name | Get a registered process definition |
|
PUT | /api/process_definitions/:process_definition_name | Updates or creates a process definition |
|
DELETE | /api/process_definitions/:process_definition_name | Deletes a registered process definition |
|
POST | /api/process_definitions/:process_definition_name/trigger | Triggers a new instance of the process |
|
POST | /api/process_definitions/:process_definition_name/pause | Pause the process schedule |
|
POST | /api/process_definitions/:process_definition_name/resume | Resume the process schedule |
Method and Path | Description | ||
---|---|---|---|
GET | /api/tasks/ | Retrieves the most recent tasks meeting the given criteria |
|
POST | /api/tasks/:task_id/log_entries | Appends log entries for a task; intended for use within the task executable |
|
POST | /api/tasks/:task_id/metadata | Appends metadata entries for a task; intended for use within the task executable |
|
POST | /api/tasks/:task_id/succeed | Marks the task as having succeeded |
|
POST | /api/tasks/:task_id/fail | Marks the task as having failed |
The default EMR EC2 Instance role
Name | Value | Description |
---|---|---|
default_emr_job_flow_role | EMR_EC2_DefaultRole |
|
The default EMR Service role
Name | Value | Description |
---|---|---|
default_emr_service_role | EMR_DefaultRole |
|
List of applications to install in the EMR cluster
Name | Value | Description |
---|---|---|
Hadoop | Hadoop |
|
Hive | Hive |
|
Mahout | Mahout |
|
Pig | Pig |
|
Spark | Spark |
|
Version (aka label in AWS-EMR) of the EMR stack to create.
Name | Value | Description |
---|---|---|
emr-5.14.0 | emr-5.14.0 |
|
emr-5.13.0 | emr-5.13.0 |
|
emr-5.12.1 | emr-5.12.1 |
|
emr-5.12.0 | emr-5.12.0 |
|
emr-5.11.1 | emr-5.11.1 |
|
emr-5.11.0 | emr-5.11.0 |
|
emr-5.10.0 | emr-5.10.0 |
|
emr-5.9.0 | emr-5.9.0 |
|
emr-4.9.2 | emr-4.9.2 |
|
emr-4.9.1 | emr-4.9.1 |
|
Name | Value | Description |
---|---|---|
always | always | Always notify when a process completes
|
on_failure | on_failure | Notify when a process fails
|
on_state_change | on_state_change | Notify when a process goes from succeeding to failing and vica versa
|
on_state_change_and_failures | on_state_change_and_failures | Notify when going from failing to succeeded and on each failure
|
never | never | Never notify
|
The onDemand type of Market Type
Name | Value | Description |
---|---|---|
on_demand | on_demand |
|
Name | Value | Description |
---|---|---|
wait | wait | The process should wait until the currently running instance finishes
|
terminate | terminate | The currently running process should be killed
|
Name | Value | Description |
---|---|---|
running | running | The process has tasks currently executing
|
succeeded | succeeded | All of the process's tasks succeeded on its last run
|
failed | failed | At least one of the process's tasks failed on its last run
|
Name | Value | Description |
---|---|---|
submitted | submitted | The task has been submitted
|
runnable | runnable |
|
starting | starting |
|
pending | pending | The task is waiting on compute resources
|
running | running | The task is currently executing or awaiting backoff
|
failed | failed | The task has irrevocably failed
|
succeeded | succeeded | The task has succeeded without serious errors
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
image | string | Yes | - | Name of docker image including registry URL if needed
|
tag | string | Yes | latest | Tag on docker image
|
command | [string] | Yes | - | Command to pass to Docker container
|
memory | integer | Yes | - |
|
vCpus | integer | Yes | - |
|
job_role_arn | string | No | - | ARN of an IAM role, see http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html
|
environment_variables | [environment_variable] | Yes | [] | environment variables to be passed to container
|
job_queue | string | No | - | Override default job queue, eg: for priority queue or GPU instances queue
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
buffer_seconds | integer | No | - | The minimum amount of time (in seconds) that must pass between executions of the process
|
Interfaces: None
See http://quartz-scheduler.org/api/2.2.0/org/quartz/CronExpression.html
Field | Type | Required? | Default | Description |
---|---|---|---|---|
day_of_week | string | Yes | - |
|
month | string | Yes | - |
|
day_of_month | string | Yes | - |
|
hours | string | Yes | - |
|
minutes | string | Yes | - |
|
Interfaces: None
Custom, preconfigured, EMR EC2 instance role
Field | Type | Required? | Default | Description |
---|---|---|---|---|
role_name | string | Yes | - |
|
Interfaces: None
Custom, preconfigured, EMR Service role
Field | Type | Required? | Default | Description |
---|---|---|---|---|
role_name | string | Yes | - |
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
image | string | Yes | - |
|
tag | string | Yes | latest |
|
command | [string] | Yes | - |
|
memory | integer | No | - |
|
cpu | integer | No | - |
|
taskRoleArn | string | No | - |
|
log_paths | [string] | Yes | [] |
|
environment_variables | [environment_variable] | Yes | [] |
|
Interfaces: None
An email to send notifications to
Field | Type | Required? | Default | Description |
---|---|---|---|---|
name | string | Yes | - |
|
string | Yes | - |
|
|
notify_when | notification_options | Yes | on_state_change_and_failures |
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
emr_cluster | emr_cluster | Yes | - | The EMR cluster on which to launch the job (aka step)
|
job_name | string | Yes | - | NAme to be assigned to the Job
|
region | string | Yes | us-east-1 | the AWS Region
|
class | string | Yes | - | The spark job's
Example: org.apache.spark.examples.SparkPi |
s3_jar_path | string | Yes | - | the s3 path to the jar of the job to be run
Example: s3://my-emr-source-bucket/my-emr-jar.jar |
spark_conf | [string] | Yes | - | options that will be sent to spark as --conf key=value, e.g.
Example: ["spark.driver.extraJavaOptions=-Denvironment=integration"] |
args | [string] | Yes | - | command line arguments to be passe to the job's main class
Example: ["--executor-memory", "1g"] |
s3_log_details | s3_log_details | No | - | AWS EMR periodically (currently every 5 minutes) zips up all the logs and upload them into S3. This does not work well with sundial's live logs. Make use of S3 Logs to visualise in real time job's log on sundial's live logs panel
|
load_data | [s3_cp] | No | - | will trigger a s3 dist cp job for every entry in this array, can be used to load data from s3 into HDFS. requires Hadoop as application on the cluster
|
save_results | [s3_cp] | No | - | will trigger a s3 dist cp job for every entry in this array, can be used to save data from HDFS to s3. requires Hadoop as application on the cluster
|
Interfaces: None
Additional configuration for Master/Core/Task EMR instance Groups.
Field | Type | Required? | Default | Description |
---|---|---|---|---|
emr_instance_type | string | Yes | - |
|
instance_count | integer | Yes | - |
|
aws_market | aws_market | Yes | - |
|
ebs_volume_size | integer | No | - | optional EBS volume in GB to attach to the emr instance, for simplicity only one volume for the given size and type gp2 will be created.
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
variable_name | string | Yes | - |
|
value | string | Yes | - |
|
Interfaces: None
Existing EMR cluster configuration object
Field | Type | Required? | Default | Description |
---|---|---|---|---|
cluster_id | string | Yes | - | The EMR cluster id
Example: j-2CRR69WTM7N31 |
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
status | string | Yes | - |
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
log_entry_id | uuid | Yes | - | Uniquely identifies the log message to prevent duplication
|
when | date-time-iso8601 | Yes | - |
|
source | string | Yes | - |
|
message | string | Yes | - |
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
metadata_entry_id | uuid | Yes | - | Uniquely identifies the metadata entry to prevent duplication
|
when | date-time-iso8601 | Yes | - |
|
key | string | Yes | - |
|
value | string | Yes | - |
|
Interfaces: None
New EMR cluster configuration object
Field | Type | Required? | Default | Description |
---|---|---|---|---|
name | string | Yes | - |
|
release_label | emr_release_label | Yes | emr-5.10.0 |
|
applications | [emr_application] | Yes | - |
|
s3_log_uri | string | Yes | - |
|
master_instance | emr_instance_group_details | Yes | - |
|
core_instance | emr_instance_group_details | No | - |
|
task_instance | emr_instance_group_details | No | - |
|
ec2_subnet | string | No | - | To be set in case the EMR cannot be launched in the
|
emr_service_role | emr_service_role | Yes | - | role to be assigned to the service, the default one should cover most of the cases
|
emr_job_flow_role | emr_job_flow_role | Yes | - | role to be assigned to the ec2 instances composing the EMR cluster. The default one should cover most of the cases
|
visible_to_all_users | boolean | Yes | false | whether or not to show the new cluster in the list of clusters. For more details see emr docs
|
Interfaces: None
Pager Duty integration
Field | Type | Required? | Default | Description |
---|---|---|---|---|
service_key | string | Yes | - |
|
num_consecutive_failures | integer | Yes | 1 |
|
api_url | string | Yes | https://events.pagerduty.com |
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
process_id | uuid | Yes | - |
|
process_definition_name | string | Yes | - |
|
start_time | date-time-iso8601 | Yes | - |
|
status | process_status | Yes | - |
|
task | [task] | Yes | - |
|
Interfaces: None
A grouping of related tasks that are run as a single unit on the same schedule
Field | Type | Required? | Default | Description |
---|---|---|---|---|
process_definition_name | string | Yes | - |
|
paused | boolean | No | - | If true, ignore schedule and only start process if triggered manually
|
process_description | string | No | - |
|
schedule | process_schedule | No | - | The schedule that the process runs on; if not specified, the process will only run when triggered manually
|
task_definitions | [task_definition] | Yes | - |
|
overlap_action | process_overlap_action | Yes | wait |
|
notifications | [notification] | No | - |
|
Interfaces: None
Wrapper of the S3DistCp spark job for more info visit the aws s3-dist-cp page
Field | Type | Required? | Default | Description |
---|---|---|---|---|
source | string | Yes | - | Source folder or file to copy. It can be either a s3 or HDFS location
Example: s3:///my-bucket-root - hdfs://tmp/my-temp-folder |
destination | string | Yes | - | Source folder or file to copy. It can be either a s3 or HDFS location
Example: s3:///my-bucket-root - hdfs://tmp/my-temp-folder |
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
log_group_name | string | Yes | - |
|
log_stream_name | string | Yes | - |
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
script | string | Yes | - |
|
environment_variables | [environment_variable] | No | - |
|
Interfaces: None
the maximum spot bid price
Field | Type | Required? | Default | Description |
---|---|---|---|---|
bid_price | decimal | Yes | - |
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
task_id | uuid | Yes | - |
|
process_id | uuid | Yes | - |
|
process_definition_name | string | Yes | - |
|
task_definition_name | string | Yes | - |
|
start_time | date-time-iso8601 | Yes | - |
|
end_time | date-time-iso8601 | No | - |
|
previous_attempt_count | integer | Yes | - |
|
log_entries | [log_entry] | Yes | - |
|
metadata_entries | [metadata_entry] | Yes | - |
|
execution_state | [metadata_entry] | No | - | Internal bookkeeping metadata used for task scheduling (e.g. ECS task ID and cluster name)
|
status | task_status | Yes | - |
|
Interfaces: None
An individual task that runs as part of a process
Field | Type | Required? | Default | Description |
---|---|---|---|---|
task_definition_name | string | Yes | - | The canonical name for this task used by other tasks to identify this task
|
dependencies | [task_dependency] | Yes | - | The tasks that must have completed prior to this one beginning
|
executable | task_executable | Yes | - |
|
max_attempts | integer | Yes | - |
|
max_runtime_seconds | integer | No | - | The execution time (for a single attempt) after which the system will kill the task
|
backoff_base_seconds | integer | Yes | - |
|
backoff_exponent | double | Yes | 1 |
|
require_explicit_success | boolean | Yes | false | If true, the task must explicitly update its status with Sundial in order to succeed.
|
Interfaces: None
Field | Type | Required? | Default | Description |
---|---|---|---|---|
task_definition_name | string | Yes | - |
|
success_required | boolean | Yes | true |
|
Interfaces: None
The EC2 market of the EC2 instances
Type | Discriminator Value | Example Json | Description |
---|---|---|---|
on_demand | on_demand | Minimal | Full |
|
spot | spot | Minimal | Full |
|
Interfaces: None
Whether or not to create a new EMR cluster or re-use a pre-existing one
Type | Discriminator Value | Example Json | Description |
---|---|---|---|
new_emr_cluster | new_emr_cluster | Minimal | Full |
|
existing_emr_cluster | existing_emr_cluster | Minimal | Full |
|
Interfaces: None
The EMR role to use on the EC2 instances
Type | Discriminator Value | Example Json | Description |
---|---|---|---|
default_emr_job_flow_role | default_emr_job_flow_role | Minimal | Full |
|
custom_emr_job_flow_role | custom_emr_job_flow_role | Minimal | Full |
|
Interfaces: None
The EMR service role to use
Type | Discriminator Value | Example Json | Description |
---|---|---|---|
default_emr_service_role | default_emr_service_role | Minimal | Full |
|
custom_emr_service_role | custom_emr_service_role | Minimal | Full |
|
Interfaces: None
Type | Discriminator Value | Example Json | Description |
---|---|---|---|
Minimal | Full |
|
||
pagerduty | pagerduty | Minimal | Full |
|
Interfaces: None
A specification for when a process should be run
Type | Discriminator Value | Example Json | Description |
---|---|---|---|
cron_schedule | cron_schedule | Minimal | Full |
|
continuous_schedule | continuous_schedule | Minimal | Full |
|
Interfaces: None
Type | Discriminator Value | Example Json | Description |
---|---|---|---|
docker_image_command | docker_image_command | Minimal | Full | Docker image to run on ECS with Sundial companion container
deprecated:
Running jobs on AWS ECS is now deprecated. Use AWS Batch via batch_image_command instead. |
shell_script_command | shell_script_command | Minimal | Full | Shell command to run on Sundial service instance (Experimental)
|
batch_image_command | batch_image_command | Minimal | Full | Docker image to run on AWS Batch
|
emr_command | emr_command | Minimal | Full | Command to submit Spark Jobs on AWS EMR
|