Skip to content

Monitoring ETL Pipelines

ETL (Extract, Transform, Load) pipelines move data between systems on a schedule. These jobs often run for hours and can fail at any stage—connection timeouts, data validation errors, or disk space issues.

Set your API key as an environment variable. For cron jobs, add it to your crontab:

Terminal window
# Edit crontab
crontab -e
# Add at top of crontab
PAKYAS_API_KEY=pk_live_xxxxx
# Then your ETL job
0 0 * * * pakyas monitor etl-pipeline -- python /jobs/data_pipeline.py

Or source from a file:

Terminal window
0 0 * * * . ~/.pakyas_env && pakyas monitor etl-pipeline -- python /jobs/data_pipeline.py

See Environment Variables for all options.

  • Pipelines run on a schedule
  • Jobs are long-running (minutes to hours)
  • Silent failures cause stale or missing data
Terminal window
pakyas monitor etl-pipeline -- python pipeline.py

Pakyas wraps your pipeline, tracks duration, and alerts you if it fails or runs longer than expected.

Terminal window
# crontab example - runs every night at midnight
0 0 * * * pakyas monitor etl-pipeline -- python /jobs/data_pipeline.py
  • Pipeline exits non-zero
  • Pipeline runs longer than expected
  • Pipeline never starts (missed schedule)