Methodology

v1.0

Released under CC0 1.0 Universal (public domain). Every number is documented. Every calculation is reproducible. Every assumption is cited.

1. Overview
2. Calculation Formula
3. Token Estimation
4. Ranged Estimates
5. Uncertainty Treatment
6. Scope Alignment
7. Data Sources
8. Known Limitations

1.Overview

Commit Carbon estimates the carbon emissions associated with AI assisted software commits. The estimation combines AI tool usage data (from ai-attestation), token generation energy estimates (from the emissions factor database), grid carbon intensity (regional and temporal where available), and conservative bias with uncertainty ranges.

2.Calculation Formula

commit_emissions_gco2e =
  estimated_tokens_per_commit /
  1000 *
  energy_per_1k_tokens_watt_seconds /
  3600 *
  grid_intensity_gco2e_per_kwh

Where estimated_tokens_per_commit defaults to 2500, scaled by commit size (lines changed * 40, minimum 2500). Agent tools apply a 3x multiplier. Energy per 1k tokens is tool-specific from the factor database. Grid intensity is region-specific from IEA annual averages (or real-time if configured).

3.Token Estimation

The default of 2500 tokens per commit accounts for: multiple code completions (some accepted, some rejected, all consuming inference energy), chat interactions for debugging and planning, context window tokens consumed in prompt construction, and conservative bias erring toward higher count. For commits with more lines changed, tokens scale proportionally. Agent tools (Claude Code, Devin, OpenHands, Cline, GPT Engineer, Bolt) apply a 3x multiplier reflecting sustained multi-turn inference sessions. This is the most uncertain input in the calculation. We document it openly and invite refinement.

4.Ranged Estimates

Every emission estimate produces three values: Low (optimistic, using low energy factors and clean grid assumptions), Central (most likely, using central factors and representative grid data), and High (pessimistic, using high energy factors and dirty grid assumptions). The ratio between high and low is typically 4x, reflecting genuine uncertainty. This ratio is documented and justified.

5.Uncertainty Treatment

Sources of uncertainty: (1) Energy per token: AI tool vendors do not publish per-request energy data. We estimate from inference energy research on models of similar capability. (2) Grid intensity: hourly data available only for some regions. Annual averages used as fallback. (3) Tokens per commit: actual count is private to AI tool vendors. We use conservative estimates. (4) Regional attribution: which grid powers the AI tool inference is usually the vendor's data center, not the developer's location. We apply conservative bias at each step.

6.Scope Alignment

Emissions fall under: GHG Protocol Scope 3 Category 1 (Purchased Goods and Services), CSRD Topic E1 (Climate Change, emissions from supply chain purchased services), and SEC proposed rule category "other indirect emissions from operations."

7.Data Sources

Primary sources: IEA Emissions Factors 2024 (annual grid intensity by country), EPA eGRID 2022 (US subnational), Electricity Maps API (real-time global, optional), WattTime API (real-time marginal emissions, optional), Luccioni et al. 2023 "Power Hungry Processing" (inference energy baselines), Patterson et al. 2021 "Carbon Emissions and Large Neural Network Training."

8.Known Limitations

(1) Does not measure training emissions (one time, amortized). (2) Does not measure water consumption (planned v2.0). (3) Does not measure embodied emissions of data center hardware. (4) Grid data availability varies by region. (5) Vendor telemetry not available; must estimate token counts. (6) Annual grid averages smooth real-time variations. Companies using this methodology should document these limitations in their sustainability reports.

Open for Peer Review

This methodology is open to review by climate scientists, sustainability professionals, and AI researchers.

Submit Feedback on GitHub