Skip to main content
Complete the requirements before you begin. You can also learn about Unstructured’s partitioning and structured data extraction before you begin.
1

Create the on-demand job

Replace EXTRACTION_PROMPT with your extraction prompt, and INPUT_DIR with the path to your local directory of files to process. The response includes the job ID.
Each on-demand job is limited to 10 files, and each file is limited to 50 MB in size.If you need to launch a series of on-demand jobs in rapid succession, you must wait at least one second between launch requests. Otherwise, you will receive a rate limit error.A maximum of 5 on-demand jobs can be running in your Unstructured account. If you launch a new on-demand job but 5 existing on-demand jobs are still running, the new on-demand job will remain in a scheduled state until one of the 5 existing on-demand jobs is done running.
Save and run this script:
#!/usr/bin/env bash

EXTRACTION_PROMPT="Represent dates such as May-12-24 as 2024-05-12 and June-12-25 as 2025-06-12."
INPUT_DIR="/full/path/to/your/directory"

form_args=()

for filepath in "$INPUT_DIR"/*; do
    [ -f "$filepath" ] || continue
    filename=$(basename "$filepath")
    mimetype=$(file --mime-type -b "$filepath")
    form_args+=(--form "input_files=@${filepath};filename=${filename};type=${mimetype}")
done

json_schema='{"type":"object","properties":{"invoice_number":{"type":"number"},"invoice_date":{"type":"string"},"payment_due":{"type":"string"},"bill_to":{"type":"string"}},"additionalProperties":false,"required":["invoice_number","invoice_date","payment_due","bill_to"]}'

request_data=$(jq -n --arg prompt "$EXTRACTION_PROMPT" --arg schema "$json_schema" '{
"job_nodes": [
    {"name":"Partitioner","type":"partition","subtype":"vlm","settings":{"is_dynamic":true,"allow_fast":true}},
    {"name":"Extractor","type":"structured_data_extractor","subtype":"llm","settings":{"schema_to_extract":{"json_schema":$schema,"extraction_guidance":$prompt},"provider":"openai","model":"gpt-5-mini","output_mode":"extracted_data_only"}}
]
}')

response=$(curl --request POST --location \
  "$UNSTRUCTURED_API_URL/jobs/" \
  --header "accept: application/json" \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --form "request_data=$request_data" \
  "${form_args[@]}")

JOB_ID=$(echo "$response" | jq -r '.id')
echo "Job ID: $JOB_ID"
This script requires jq to parse the JSON response.
2

Poll for job status

Replace JOB_ID with the job ID from the previous step. This script polls every 10 seconds and stops when the job completes.
Save and run this script:
#!/usr/bin/env bash

JOB_ID="<job-id>"

while true; do
    job=$(curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/jobs/$JOB_ID" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY")

    status=$(echo "$job" | jq -r '.status')
    echo "Job status: $status"

    if [ "$status" = "COMPLETED" ]; then
        echo "Job completed."
        echo "Output node file IDs: $(echo "$job" | jq -c '[.output_node_files[].file_id]')"
        break
    elif [ "$status" = "FAILED" ] || [ "$status" = "STOPPED" ]; then
        echo "Job did not complete successfully: $status"
        exit 1
    fi

    sleep 10
done
This script requires jq to parse the JSON response.
3

Download the job output

Replace JOB_ID, OUTPUT_FILE_IDS, and OUTPUT_DIR with your values from the previous steps.
Save and run this script:
#!/usr/bin/env bash

JOB_ID="<job-id>"
OUTPUT_FILE_IDS=("<output-file-id>" "<output-file-id>") # From Step 1
OUTPUT_DIR="/full/path/to/your/output/directory"

mkdir -p "$OUTPUT_DIR"

for file_id in "${OUTPUT_FILE_IDS[@]}"; do
    curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/jobs/$JOB_ID/download?file_id=$file_id" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
      --output "$OUTPUT_DIR/$file_id.json"
    echo "Saved: $OUTPUT_DIR/$file_id.json"
done

Complete end-to-end script

Replace EXTRACTION_PROMPT, INPUT_DIR, and OUTPUT_DIR with your values, then save and run this script.
This script requires jq to parse JSON responses.
#!/usr/bin/env bash

EXTRACTION_PROMPT="Represent dates such as May-12-24 as 2024-05-12 and June-12-25 as 2025-06-12."
INPUT_DIR="/full/path/to/your/input/directory"
OUTPUT_DIR="/full/path/to/your/output/directory"

# Step 1: Create the on-demand job.
form_args=()

for filepath in "$INPUT_DIR"/*; do
    [ -f "$filepath" ] || continue
    filename=$(basename "$filepath")
    mimetype=$(file --mime-type -b "$filepath")
    form_args+=(--form "input_files=@${filepath};filename=${filename};type=${mimetype}")
done

json_schema='{"type":"object","properties":{"invoice_number":{"type":"number"},"invoice_date":{"type":"string"},"payment_due":{"type":"string"},"bill_to":{"type":"string"}},"additionalProperties":false,"required":["invoice_number","invoice_date","payment_due","bill_to"]}'

request_data=$(jq -n --arg prompt "$EXTRACTION_PROMPT" --arg schema "$json_schema" '{
"job_nodes": [
    {"name":"Partitioner","type":"partition","subtype":"vlm","settings":{"is_dynamic":true,"allow_fast":true}},
    {"name":"Extractor","type":"structured_data_extractor","subtype":"llm","settings":{"schema_to_extract":{"json_schema":$schema,"extraction_guidance":$prompt},"provider":"openai","model":"gpt-5-mini","output_mode":"extracted_data_only"}}
]
}')

response=$(curl --request POST --location \
  "$UNSTRUCTURED_API_URL/jobs/" \
  --header "accept: application/json" \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --form "request_data=$request_data" \
  "${form_args[@]}")

JOB_ID=$(echo "$response" | jq -r '.id')
echo "Job ID: $JOB_ID"

# Step 2: Poll until the job completes.
output_file_ids=()

while true; do
    job=$(curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/jobs/$JOB_ID" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY")

    status=$(echo "$job" | jq -r '.status')
    echo "Job status: $status"

    if [ "$status" = "COMPLETED" ]; then
        echo "Job completed."

        while IFS= read -r id; do
            output_file_ids+=("$id")
        done < <(echo "$job" | jq -r '.output_node_files[].file_id')

        break
    elif [ "$status" = "FAILED" ] || [ "$status" = "STOPPED" ]; then
        echo "Job did not complete successfully: $status"
        exit 1
    fi

    sleep 10
done

# Step 3: Download the job output.
mkdir -p "$OUTPUT_DIR"

for file_id in "${output_file_ids[@]}"; do
    curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/jobs/$JOB_ID/download?file_id=$file_id" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
      --output "$OUTPUT_DIR/$file_id.json"
    echo "Saved: $OUTPUT_DIR/$file_id.json"
done

What’s next?