NLR HPC Kestrel Jobs Data | NLR Data Catalog

Overview: Anonymized job-level records from the Kestrel HPC system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, utilization, energy estimates, and efficiency metrics. Sensitive fields (user, account, job name, submit line, working directory, submit script, and job type) are replaced with 7-character cryptographic hashes.

System & Timeframe: Kestrel is located at the NLR campus. Standard compute nodes have 104 cores and 256 GB RAM; bigmem nodes have 2,000 GB. GPU nodes (gpu-h100 partition) use NVIDIA H100 GPUs. Data covers jobs submitted August 2023 through December 2025. Funding provided by the U.S. Department of Energy, EERE.

Files:

esif.hpc.kestrel.job-anon.zip — Anonymized job records (Hive-partitioned Parquet)
datacard.md — Full dataset documentation

~11 million rows, 50 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool.

Data Collection: Jobs collected via sacct with timezone-aware export (SLURM_TIME_FORMAT="%Y-%m-%dT%H:%M:%S%z"), loaded into PostgreSQL. Calculated columns updated via database triggers and batch functions. All timestamps use timestamptz and correctly handle DST transitions.

Preprocessing:

Anonymization of name, user, account, submit_line, work_dir, submit_script, and job_type via 7-char hex hashes
Derived columns: queue_wait, cpu_eff, max/min/avg_mem_eff, energy estimates
Simplified job state mapping (e.g., "CANCELLED by 132357" → "CANCELLED")
Boolean flags: python_job, reframe_job
Temporal decomposition: year, month, day, day_of_week, hour, minute from submit_time
Shared node tracking: shared_job_count, nodes_shared, jobs_shared

Key Variables:

Scheduling: job_id, partition, state_simple, submit_time, start_time, end_time, queue_wait

Resources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested

Efficiency: cpu_eff, max/min/avg_mem_eff

Energy: cpu_energy_tdp_estimated_max/used_watt_hours, consumed_energy_raw_joules, consumed_energy_raw_watt_hours

Sharing: shared_job_count, nodes_shared, jobs_shared

Partitions: short, standard, debug, gpu-h100

Job States: CANCELLED, COMPLETED, FAILED, PENDING, RUNNING

QoS Levels: normal, high

Important Notes:

Timestamps include timezone offsets; DST transitions are handled correctly, though adding intervals across DST boundaries requires offset adjustment
shared_job_count reflects physical node co-residency, not use of the shared partition
Job step records and raw Slurm JSONB fields are excluded
Do not attempt to re-identify individuals from hashed fields

2 Data Resources

Name	Size	Type	Resource Description	History
esif.hpc.kestrel.job-anon.zip	697.3 MB	Archive	Kestrel Jobs Dataset (Zipped Parquet/Hive Dataset). Range: 08/2023 - 12/2025. MD5sum: 8f1d3be1cbe6345ef45e658a783c2aa0.
datacard.md	15 KB	Document	Genesis formatted datacard that describes this dataset.

Version	Name	Size	Type	Resource Description	Notes	Date
1	esif.hpc.kestrel.job-anon.zip	697.3 MB	Archive	Kestrel Jobs Dataset (Zipped Parquet/Hive Dataset). Range: 08/2023 - 12/2025. MD5sum: 8f1d3be1cbe6345ef45e658a783c2aa0.		03-14-2026 22:31:23

Version	Name	Size	Type	Resource Description	Notes	Date
1	datacard.md	15 KB	Document	Genesis formatted datacard that describes this dataset.		03-14-2026 22:31:23

Keywords

computational science

high performance computing

processed data

slurm

kestrel

Submitted

• Jun • 02 2025

Struan Clark

303-275-4821

Center 2C00

ORCID iD 0000-0003-0078-6560

Author Information

Struan Clark, National Laboratory of the Rockies, ORCID iD: 0000-0003-0078-6560

Matt Selensky, National Laboratory of the Rockies, ORCID iD: 0000-0001-7743-2459

Kevin Menear, National Laboratory of the Rockies, ORCID iD: 0009-0004-8836-2387

Cite This Dataset

Clark, Struan, Matt Selensky, and Kevin Menear. 2025. "NLR HPC Kestrel Jobs Data." NLR Data Catalog. Golden, CO: National Laboratory of the Rockies. Last updated: April 22, 2026. DOI: 10.7799/3023270.

About This Dataset

id 302

DOI 10.7799/3023270

Status Public

Last Updated 04/22/2026

DOE Project

DE-AC36-08GO28308

Facilities

Energy Systems Integration Facility (ESIF)

High Performance Computing Center (HPC)

Funding Organization

Department of Energy (DOE)

Sponsoring Organization

USDOE Office of Energy Efficiency and Renewable Energy (EERE)

Research Areas

Computational Science

Energy Analysis

Energy Systems Integration

License

View License

Digital Object Identifier

10.7799/3023270