Cron Job Monitoring Glossary

Clear definitions for the language of job monitoring — from dead man's switch and heartbeat monitoring to every job state Pakyas tracks.

Pakyas is execution-signal based: each job proves it ran by sending a signal, and Pakyas compares those signals against the job's schedule. The terms below explain that model and the precise states a job can be in. Job-state terms link to their canonical definition on the Features page.

Cron job monitoring

Cron job monitoring is the practice of verifying that scheduled jobs — cron jobs, background workers, backups, and other recurring tasks — actually run, run on time, and complete successfully. Instead of assuming a job worked because no one complained, monitoring watches for the job's execution signal and alerts a team when that signal is late, missing, or reports an error. Pakyas does this by giving each job a unique ping URL: the job reports when it starts and finishes, and Pakyas compares those signals against the job's expected schedule.

Dead man's switch

A dead man's switch is a mechanism that triggers an action when an expected signal stops arriving — originally a device that activates if its operator becomes incapacitated. In infrastructure monitoring, a job acts as a dead man's switch by sending a regular signal; if that signal goes silent, the monitor raises an alert. This inverts ordinary alerting: rather than waiting for something to report a problem, you are alerted by the absence of a healthy signal, which catches jobs that die quietly or never start at all.

Heartbeat monitoring

Heartbeat monitoring tracks a periodic signal (a heartbeat) that a job or service sends to prove it is alive and on schedule. It differs from polling: with polling, the monitor reaches out to check a target; with heartbeat monitoring, the target pushes its own signal outward. This push model works for jobs that run behind firewalls, on ephemeral infrastructure, or on private networks a poller could never reach, and it confirms the job ran rather than merely that a port is open.

Execution signal

An execution signal is the canonical Pakyas term for the evidence that a job actually ran. A job sends signals at meaningful moments — when it starts, when it succeeds, or when it reports an error — and Pakyas evaluates those signals against the expected schedule and run duration. Because monitoring is grounded in real execution signals rather than inferred health, Pakyas can tell the difference between a job that finished On Schedule, one that ran Late, and one that is Missing entirely.

See "Execution signal" on the Features page →

On Schedule

On Schedule means a valid execution signal arrived within the job's expected window. The job ran when it was supposed to and reported success — the healthy baseline state for a monitored job.

See "On Schedule" on the Features page →

Late

Late means the job ran, but its execution signal arrived outside the expected window. The work happened, just not on time. Late is distinct from Missing: a Late job has reported, whereas a Missing job has gone silent.

See "Late" on the Features page →

Missing

Missing means no execution signal was received within the expected window — the job's heartbeat went silent. This is the dead man's switch firing: the job may have failed to start, crashed before reporting, or lost connectivity. Missing is the most serious schedule state because the job's true status is unknown.

See "Missing" on the Features page →

Running

Running means a job has reported that it started but has not yet reported completion. Pakyas tracks the open run so it can measure duration and detect a run that exceeds its expected length.

See "Running" on the Features page →

Overrunning

Overrunning means a job started and is still running past its expected duration. The job has not failed or gone Missing, but it is taking longer than its normal run time — often an early warning of a stuck process, a growing dataset, or a downstream slowdown.

See "Overrunning" on the Features page →

Error

Error (sometimes shown as Failed) means the job ran but reported that it did not succeed — it sent an explicit failure signal rather than a success signal. This is not silence; it is a job telling the truth about its own outcome, and it is distinct from Missing, where no signal arrived at all.

See "Error" on the Features page →

Paused

Paused means a job has been intentionally taken out of monitoring so it will not raise alerts — for example during maintenance or while it is decommissioned. A Paused job is not evaluated against its schedule and is never treated as Missing.

See "Paused" on the Features page →

Waiting for first ping

Waiting for first ping describes a newly created job that has never reported an execution signal. Pakyas has no baseline yet, so it cannot judge the job On Schedule or Missing — it is simply waiting for the first signal to establish the job's rhythm.

See "Waiting for first ping" on the Features page →

Criticality

Criticality is what makes a job state actionable: how much it matters when a given job misbehaves. A nightly backup going Missing is critical and should page on-call immediately; a best-effort cleanup task running Late may only warrant a quiet note. Assigning criticality lets teams route the right alerts to the right people with the right urgency instead of treating every state change equally.

Flap dampening / throttling

Flap dampening and throttling are alert-hygiene techniques that prevent notification storms. Flapping is a job that rapidly oscillates between healthy and unhealthy states; dampening waits for a state to stabilize before alerting. Throttling limits how often repeat notifications for the same issue are sent. Together they keep alerts trustworthy by suppressing noise without hiding genuine problems.

MTTR for jobs

MTTR (mean time to recovery) for jobs measures how long it takes, on average, for a job to return to On Schedule after it became Missing, ran Late, or reported an Error. It is a recovery metric, not a failure-rate metric: tracking MTTR over time shows whether a team is getting faster at detecting and resolving job problems and helps prioritize the jobs whose outages take longest to fix.

Ready to put these states to work?

Start monitoring your cron jobs with execution-signal precision — free for up to 10 checks.

Start monitoring free