NLR HPC Eagle Jobs Data and Additional Energy Metrics

Overview: Anonymized job-level records from the Eagle high-performance computing (HPC) system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, resource utilization, CPU/GPU energy consumption, and efficiency metrics. Sensitive fields (user, account, job name) are replaced with cryptographic hashes.

System & Timeframe: Eagle was a 2,000-node, 8-petaflop system operated at NLR from 2019–2024. Data covers the full operational lifetime of the system. Slurm data was processed nightly; timestamps are in Mountain Time. Funding provided by the U.S. Department of Energy, EERE.

Files:

  • esif.hpc.eagle.job-anon.zip — Core anonymized job records (Hive-partitioned Parquet)
  • esif.hpc.eagle.job-anon-energy-metrics.zip — Same records with additional iLO and Ganglia energy metrics
  • datacard.md — Full dataset documentation

~13.8 million rows, 62 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool.

Data Collection: Jobs collected via sacct through a pipeline: Eagle Jobs API → Redpanda → StreamSets → HPCMON API → PostgreSQL. Node-level power from iLO (HP Integrated Lights-Out); GPU power from Ganglia monitoring, joined to jobs via node lists and time ranges.

Preprocessing:

  • Anonymization of name, user, and account fields via cryptographic hashing
  • Derived columns: queue_wait, cpu_eff, max_mem_eff
  • Simplified job state mapping (e.g., "CANCELLED BY 12345" → "CANCELLED")
  • QoS accounting rules (buy-in, standby, or Slurm QoS value)
  • CPU energy estimated from TDP (200W, Intel Xeon Gold 6154, 18 cores)
  • Timezone-aware columns (_tz) sourced from LEX accounting database to correctly handle DST transitions

Key Variables: 

Scheduling: job_id, partition, state_simple, submit_time_tz, start_time_tz, end_time_tz, queue_waitResources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested

Efficiency: cpu_eff, max_mem_eff

Energy: cpu_energy_tdp_estimated_max/used_watt_hours, node_energy_total_watt_hours (iLO), gpu0/1_energy_total_watt_hours (Ganglia)

Partitions: bigmem, bigmem-8600, bigscratch, csc, dav, ddn, debug, gpu, haswell, long, mono, short, standard

Job States: CANCELLED, COMPLETED, FAILED, NODE_FAIL, OUT_OF_MEMORY, PENDING, RUNNING, TIMEOUT

QoS Levels: Unknown, normal, buy-in, debug, penalty, high, standby

Important Notes:

  • Non-_tz timestamp columns may be off by one hour across DST boundaries; use _tz columns for time difference calculations
  • Energy fields are null for jobs without monitoring coverage
  • Job step records and raw Slurm JSONB fields are excluded from this extract
  • Do not attempt to re-identify individuals from hashed fields
3 Data Resources
Name Size Type Resource Description History
esif.hpc.eagle.job-anon.zip 870.1 MB Archive Eagle Jobs Dataset (Zipped Parquet/Hive Dataset). Range: 11/2018 - 06/2024. MD5sum: a06527de28cbf207d1e743822978b9b4.
esif.hpc.eagle.job-anon-energy-metrics.zip 1.3 GB Archive Eagle Jobs + Additional Energy Metrics Dataset (Zipped Parquet/Hive Dataset). Range: 11/2018 - 06/2024. MD5sum: cc60eac4d10b38a1bbfe3ef7dede5590.
datacard.md 18 KB Document Genesis formatted datacard that describes this dataset.
Author Information
Struan Clark, National Laboratory of the Rockies, ORCID iD: 0000-0003-0078-6560
Matt Selensky, National Laboratory of the Rockies, ORCID iD: 0000-0001-7743-2459
Kevin Menear, National Laboratory of the Rockies, ORCID iD: 0009-0004-8836-2387
Cite This Dataset
Clark, Struan, Matt Selensky, and Kevin Menear. 2025. "NLR HPC Eagle Jobs Data and Additional Energy Metrics." NLR Data Catalog. Golden, CO: National Laboratory of the Rockies. Last updated: April 22, 2026. DOI: 10.7799/3023273.
About This Dataset
295
10.7799/3023273
Public
04/22/2026
DOE Project
Facilities
Energy Systems Integration Facility (ESIF)
High Performance Computing Center (HPC)
Funding Organization
Department of Energy (DOE)
Sponsoring Organization
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
Research Areas
Computational Science
Energy Analysis
Energy Systems Integration
License
View License
Digital Object Identifier
10.7799/3023273