Overview: Anonymized job-level records from the Kestrel HPC system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, utilization, energy estimates, and efficiency metrics. Sensitive fields (user, account, job name, submit line, working directory, submit script, and job type) are replaced with 7-character cryptographic hashes.
System & Timeframe: Kestrel is located at the NLR campus. Standard compute nodes have 104 cores and 256 GB RAM; bigmem nodes have 2,000 GB. GPU nodes (gpu-h100 partition) use NVIDIA H100 GPUs. Data covers jobs submitted August 2023 through December 2025. Funding provided by the U.S. Department of Energy, EERE.
Files:
- esif.hpc.kestrel.job-anon.zip — Anonymized job records (Hive-partitioned Parquet)
- datacard.md — Full dataset documentation
~11 million rows, 50 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool.
Data Collection: Jobs collected via sacct with timezone-aware export (SLURM_TIME_FORMAT="%Y-%m-%dT%H:%M:%S%z"), loaded into PostgreSQL. Calculated columns updated via database triggers and batch functions. All timestamps use timestamptz and correctly handle DST transitions.
Preprocessing:
- Anonymization of name, user, account, submit_line, work_dir, submit_script, and job_type via 7-char hex hashes
- Derived columns: queue_wait, cpu_eff, max/min/avg_mem_eff, energy estimates
- Simplified job state mapping (e.g., "CANCELLED by 132357" → "CANCELLED")
- Boolean flags: python_job, reframe_job
- Temporal decomposition: year, month, day, day_of_week, hour, minute from submit_time
- Shared node tracking: shared_job_count, nodes_shared, jobs_shared
Key Variables:
Scheduling: job_id, partition, state_simple, submit_time, start_time, end_time, queue_wait
Resources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested
Efficiency: cpu_eff, max/min/avg_mem_eff
Energy: cpu_energy_tdp_estimated_max/used_watt_hours, consumed_energy_raw_joules, consumed_energy_raw_watt_hours
Sharing: shared_job_count, nodes_shared, jobs_shared
Partitions: short, standard, debug, gpu-h100
Job States: CANCELLED, COMPLETED, FAILED, PENDING, RUNNING
QoS Levels: normal, high
Important Notes:
- Timestamps include timezone offsets; DST transitions are handled correctly, though adding intervals across DST boundaries requires offset adjustment
- shared_job_count reflects physical node co-residency, not use of the shared partition
- Job step records and raw Slurm JSONB fields are excluded
- Do not attempt to re-identify individuals from hashed fields
| Name | Size | Type | Resource Description | History |
|---|---|---|---|---|
| esif.hpc.kestrel.job-anon.zip | 697.3 MB | Archive | Kestrel Jobs Dataset (Zipped Parquet/Hive Dataset). Range: 08/2023 - 12/2025. MD5sum: 8f1d3be1cbe6345ef45e658a783c2aa0. | |
| datacard.md | 15 KB | Document | Genesis formatted datacard that describes this dataset. |
Clark, Struan, Matt Selensky, and Kevin Menear. 2025. "NLR HPC Kestrel Jobs Data." NLR Data Catalog. Golden, CO: National Laboratory of the Rockies. Last updated: April 22, 2026. DOI: 10.7799/3023270.
