NLR HPC Kestrel Jobs Data

Overview: Anonymized job-level records from the Kestrel HPC system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, utilization, energy estimates, and efficiency metrics. Sensitive fields (user, account, job name, submit line, working directory, submit script, and job type) are replaced with 7-character cryptographic hashes.

System & Timeframe: Kestrel is located at the NLR campus. Standard compute nodes have 104 cores and 256 GB RAM; bigmem nodes have 2,000 GB. GPU nodes (gpu-h100 partition) use NVIDIA H100 GPUs. Data covers jobs submitted August 2023 through December 2025. Funding provided by the U.S. Department of Energy, EERE.

Files:

  • esif.hpc.kestrel.job-anon.zip — Anonymized job records (Hive-partitioned Parquet)
  • datacard.md — Full dataset documentation

~11 million rows, 50 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool.

Data Collection: Jobs collected via sacct with timezone-aware export (SLURM_TIME_FORMAT="%Y-%m-%dT%H:%M:%S%z"), loaded into PostgreSQL. Calculated columns updated via database triggers and batch functions. All timestamps use timestamptz and correctly handle DST transitions.

Preprocessing:

  • Anonymization of name, user, account, submit_line, work_dir, submit_script, and job_type via 7-char hex hashes
  • Derived columns: queue_wait, cpu_eff, max/min/avg_mem_eff, energy estimates
  • Simplified job state mapping (e.g., "CANCELLED by 132357" → "CANCELLED")
  • Boolean flags: python_job, reframe_job
  • Temporal decomposition: year, month, day, day_of_week, hour, minute from submit_time
  • Shared node tracking: shared_job_count, nodes_shared, jobs_shared

Key Variables: 

Scheduling: job_id, partition, state_simple, submit_time, start_time, end_time, queue_wait

Resources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested

Efficiency: cpu_eff, max/min/avg_mem_eff

Energy: cpu_energy_tdp_estimated_max/used_watt_hours, consumed_energy_raw_joules, consumed_energy_raw_watt_hours

Sharing: shared_job_count, nodes_shared, jobs_shared

Partitions: short, standard, debug, gpu-h100

Job States: CANCELLED, COMPLETED, FAILED, PENDING, RUNNING

QoS Levels: normal, high

Important Notes:

  • Timestamps include timezone offsets; DST transitions are handled correctly, though adding intervals across DST boundaries requires offset adjustment
  • shared_job_count reflects physical node co-residency, not use of the shared partition
  • Job step records and raw Slurm JSONB fields are excluded
  • Do not attempt to re-identify individuals from hashed fields
2 Data Resources
Name Size Type Resource Description History
esif.hpc.kestrel.job-anon.zip 697.3 MB Archive Kestrel Jobs Dataset (Zipped Parquet/Hive Dataset). Range: 08/2023 - 12/2025. MD5sum: 8f1d3be1cbe6345ef45e658a783c2aa0.
datacard.md 15 KB Document Genesis formatted datacard that describes this dataset.
Author Information
Struan Clark, National Laboratory of the Rockies, ORCID iD: 0000-0003-0078-6560
Matt Selensky, National Laboratory of the Rockies, ORCID iD: 0000-0001-7743-2459
Kevin Menear, National Laboratory of the Rockies, ORCID iD: 0009-0004-8836-2387
Cite This Dataset
Clark, Struan, Matt Selensky, and Kevin Menear. 2025. "NLR HPC Kestrel Jobs Data." NLR Data Catalog. Golden, CO: National Laboratory of the Rockies. Last updated: April 22, 2026. DOI: 10.7799/3023270.
About This Dataset
302
10.7799/3023270
Public
04/22/2026
DOE Project
Facilities
Energy Systems Integration Facility (ESIF)
High Performance Computing Center (HPC)
Funding Organization
Department of Energy (DOE)
Sponsoring Organization
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
Research Areas
Computational Science
Energy Analysis
Energy Systems Integration
License
View License
Digital Object Identifier
10.7799/3023270