Batch jobs

Use batch jobs when you have large manifests (JSONL) in cloud storage (S3 or GCS) and want Refinery to create tasks at scale.

For smaller workloads, Submitting tasks (POST /v1/tasks / batch) may be simpler.

When to use jobs vs inline API

Use case	API
Few hundred tasks from app servers	`POST /v1/tasks`, `POST /v1/tasks/batch`
Large files in S3, recurring ETL	`POST /v1/jobs` + manifest

Manifest format (JSONL)

One JSON object per line:

{"data_url": "https://cdn.example.com/1.jpg", "metadata": {"sku": "A1"}}
{"data_url": "https://cdn.example.com/2.jpg", "metadata": {"sku": "A2"}}

Each line must include a valid data_url. Optional metadata is stored for your traceability.

Create a job — `POST /v1/jobs`

:::info Security S3/GCS credentials are encrypted at rest using AES-256-GCM before storage. They are never returned in API responses (GET /v1/jobs/{id} omits credentials). For production, set a unique ASG_ENCRYPTION_KEY (generate with openssl rand -hex 32). :::

Body (simplified):

Field	Description
`manifest_url`	Cloud URI to JSONL manifest: `s3://...` or `gs://...`
`label_spec`	Shared `question` / `options` for all tasks in the job
`consensus_threshold`	2–10, default 3
`callback_url`	Optional webhook target for job-level notifications
`task_type`	e.g. `image_classification`
`credentials`	For S3: `access_key_id`, `secret_access_key`, `region`; for GCS: `service_account_json`
`delivery`	Optional `{ "type": "s3", "bucket": "my-results-bucket" }`

S3 Manifest

curl -sS -X POST https://api.asgrefinery.io/v1/jobs \
  -H "Authorization: Bearer $API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "manifest_url": "https://my-bucket.s3.amazonaws.com/manifest.jsonl",
    "label_spec": {
      "question": "Animal?",
      "options": ["cat", "dog"]
    },
    "consensus_threshold": 3,
    "task_type": "image_classification",
    "credentials": {
      "access_key_id": "AKIA...",
      "secret_access_key": "...",
      "region": "us-east-1"
    },
    "delivery": { "type": "s3", "bucket": "my-results-bucket" }
  }'

GCS Manifest

curl -sS -X POST https://api.asgrefinery.io/v1/jobs \
  -H "Authorization: Bearer $API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "manifest_url": "gs://my-bucket/batch/manifest.jsonl",
    "label_spec": {
      "question": "Animal?",
      "options": ["cat", "dog"]
    },
    "credentials": {
      "service_account_json": "{\"type\":\"service_account\",...}"
    }
  }'

Accepted (202):

{
  "job_id": "job_...",
  "status": "accepted",
  "message": "Job accepted. Manifest will be processed asynchronously."
}

Job lifecycle

Typical states: accepted → processing → completed (or failed on fatal errors).

Partial per-task failures may increment tasks_failed while others complete.

Monitor — `GET /v1/jobs/{id}`

curl -sS -H "Authorization: Bearer $API_KEY" \
  https://api.asgrefinery.io/v1/jobs/job_xxx

Example (200):

{
  "job_id": "job_xxx",
  "status": "processing",
  "total_tasks": 1000,
  "tasks_done": 240,
  "tasks_failed": 3,
  "progress_percent": 24.3,
  "created_at": "2026-04-09T10:00:00Z",
  "completed_at": null
}

Export results — `GET /v1/jobs/{id}/export`

Returns JSONL (application/x-ndjson) of settled task results when the job is completed.

409 if not completed yet:

{
  "error": "job is not completed yet"
}

S3 delivery

When delivery.type is s3, results are written to your bucket using the supplied credentials. Rotate keys regularly; credentials are stored on the job row for processing.

tip

Use s3:// URIs for AWS S3, MinIO, and S3-compatible storage. Use gs:// URIs for Google Cloud Storage.

Error handling

Manifest line errors may increment tasks_failed without failing the whole job.
Auth / not found: 404 for other customers’ job ids.
Retry exports and status polls with backoff on 5xx.

When to use jobs vs inline API​

Manifest format (JSONL)​

Create a job — POST /v1/jobs​

S3 Manifest​

GCS Manifest​

Job lifecycle​

Monitor — GET /v1/jobs/{id}​

Export results — GET /v1/jobs/{id}/export​

S3 delivery​

Error handling​