Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeCounting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning
Machine learning (ML) requires using energy to carry out computations during the model training process. The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source. Existing research on the environmental impacts of ML has been limited to analyses covering a small number of models and does not adequately represent the diversity of ML models and tasks. In the current study, we present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision. We analyze them in terms of the energy sources used, the amount of CO2 emissions produced, how these emissions evolve across time and how they relate to model performance. We conclude with a discussion regarding the carbon footprint of our field and propose the creation of a centralized repository for reporting and tracking these emissions.
Quantifying the Carbon Emissions of Machine Learning
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training
Prominent works in the field of Natural Language Processing have long attempted to create new innovative models by improving upon previous model training approaches, altering model architecture, and developing more in-depth datasets to better their performance. However, with the quickly advancing field of NLP comes increased greenhouse gas emissions, posing concerns over the environmental damage caused by training LLMs. Gaining a comprehensive understanding of the various costs, particularly those pertaining to environmental aspects, that are associated with artificial intelligence serves as the foundational basis for ensuring safe AI models. Currently, investigations into the CO2 emissions of AI models remain an emerging area of research, and as such, in this paper, we evaluate the CO2 emissions of well-known large language models, which have an especially high carbon footprint due to their significant amount of model parameters. We argue for the training of LLMs in a way that is responsible and sustainable by suggesting measures for reducing carbon emissions. Furthermore, we discuss how the choice of hardware affects CO2 emissions by contrasting the CO2 emissions during model training for two widely used GPUs. Based on our results, we present the benefits and drawbacks of our proposed solutions and make the argument for the possibility of training more environmentally safe AI models without sacrificing their robustness and performance.
Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study
The rise of machine learning (ML) systems has exacerbated their carbon footprint due to increased capabilities and model sizes. However, there is scarce knowledge on how the carbon footprint of ML models is actually measured, reported, and evaluated. In light of this, the paper aims to analyze the measurement of the carbon footprint of 1,417 ML models and associated datasets on Hugging Face, which is the most popular repository for pretrained ML models. The goal is to provide insights and recommendations on how to report and optimize the carbon efficiency of ML models. The study includes the first repository mining study on the Hugging Face Hub API on carbon emissions. This study seeks to answer two research questions: (1) how do ML model creators measure and report carbon emissions on Hugging Face Hub?, and (2) what aspects impact the carbon emissions of training ML models? The study yielded several key findings. These include a stalled proportion of carbon emissions-reporting models, a slight decrease in reported carbon footprint on Hugging Face over the past 2 years, and a continued dominance of NLP as the main application domain. Furthermore, the study uncovers correlations between carbon emissions and various attributes such as model size, dataset size, and ML application domains. These results highlight the need for software measurements to improve energy reporting practices and promote carbon-efficient model development within the Hugging Face community. In response to this issue, two classifications are proposed: one for categorizing models based on their carbon emission reporting practices and another for their carbon efficiency. The aim of these classification proposals is to foster transparency and sustainable model development within the ML community.
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model
Progress in machine learning (ML) comes with a cost to the environment, given that training ML models requires significant computational resources, energy and materials. In the present article, we aim to quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes of~\carboneq~if we consider only the dynamic power consumption, and 50.5 tonnes if we account for all processes ranging from equipment manufacturing to energy-based operational consumption. We also study the energy requirements and carbon emissions of its deployment for inference via an API endpoint receiving user queries in real-time. We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of ML models and future research directions that can contribute towards improving carbon emissions reporting.
Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision
The energy consumption of deep learning models is increasing at a breathtaking rate, which raises concerns due to potential negative effects on carbon neutrality in the context of global warming and climate change. With the progress of efficient deep learning techniques, e.g., model compression, researchers can obtain efficient models with fewer parameters and smaller latency. However, most of the existing efficient deep learning methods do not explicitly consider energy consumption as a key performance indicator. Furthermore, existing methods mostly focus on the inference costs of the resulting efficient models, but neglect the notable energy consumption throughout the entire life cycle of the algorithm. In this paper, we present the first large-scale energy consumption benchmark for efficient computer vision models, where a new metric is proposed to explicitly evaluate the full-cycle energy consumption under different model usage intensity. The benchmark can provide insights for low carbon emission when selecting efficient deep learning algorithms in different model usage scenarios.
The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink
Machine Learning (ML) workloads have rapidly grown in importance, but raised concerns about their carbon footprint. Four best practices can reduce ML training energy by up to 100x and CO2 emissions up to 1000x. By following best practices, overall ML energy use (across research, development, and production) held steady at <15% of Google's total energy use for the past three years. If the whole ML field were to adopt best practices, total carbon emissions from training would reduce. Hence, we recommend that ML papers include emissions explicitly to foster competition on more than just model quality. Estimates of emissions in papers that omitted them have been off 100x-100,000x, so publishing emissions has the added benefit of ensuring accurate accounting. Given the importance of climate change, we must get the numbers right to make certain that we work on its biggest challenges.
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
The carbon footprint associated with large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes, including operational and embodied carbon emissions. An essential aspect is accurately estimating the carbon impact of emerging LLMs even before their training, which heavily relies on GPU usage. Existing studies have reported the carbon footprint of LLM training, but only one tool, mlco2, can predict the carbon footprint of new neural networks prior to physical training. However, mlco2 has several serious limitations. It cannot extend its estimation to dense or mixture-of-experts (MoE) LLMs, disregards critical architectural parameters, focuses solely on GPUs, and cannot model embodied carbon footprints. Addressing these gaps, we introduce \carb, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs. Compared to mlco2, \carb~significantly enhances the accuracy of carbon footprint estimations for various LLMs. The source code is released at https://github.com/SotaroKaneda/MLCarbon.
High carbon stock mapping at large scale with optical satellite imagery and spaceborne LIDAR
The increasing demand for commodities is leading to changes in land use worldwide. In the tropics, deforestation, which causes high carbon emissions and threatens biodiversity, is often linked to agricultural expansion. While the need for deforestation-free global supply chains is widely recognized, making progress in practice remains a challenge. Here, we propose an automated approach that aims to support conservation and sustainable land use planning decisions by mapping tropical landscapes at large scale and high spatial resolution following the High Carbon Stock (HCS) approach. A deep learning approach is developed that estimates canopy height for each 10 m Sentinel-2 pixel by learning from sparse GEDI LIDAR reference data, achieving an overall RMSE of 6.3 m. We show that these wall-to-wall maps of canopy top height are predictive for classifying HCS forests and degraded areas with an overall accuracy of 86 % and produce a first high carbon stock map for Indonesia, Malaysia, and the Philippines.
Chasing Low-Carbon Electricity for Practical and Sustainable DNN Training
Deep learning has experienced significant growth in recent years, resulting in increased energy consumption and carbon emission from the use of GPUs for training deep neural networks (DNNs). Answering the call for sustainability, conventional solutions have attempted to move training jobs to locations or time frames with lower carbon intensity. However, moving jobs to other locations may not always be feasible due to large dataset sizes or data regulations. Moreover, postponing training can negatively impact application service quality because the DNNs backing the service are not updated in a timely fashion. In this work, we present a practical solution that reduces the carbon footprint of DNN training without migrating or postponing jobs. Specifically, our solution observes real-time carbon intensity shifts during training and controls the energy consumption of GPUs, thereby reducing carbon footprint while maintaining training performance. Furthermore, in order to proactively adapt to shifting carbon intensity, we propose a lightweight machine learning algorithm that predicts the carbon intensity of the upcoming time frame. Our solution, Chase, reduces the total carbon footprint of training ResNet-50 on ImageNet by 13.6% while only increasing training time by 2.5%.
Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters
In recent years, Large Language Models (LLM) such as ChatGPT, CoPilot, and Gemini have been widely adopted in different areas. As the use of LLMs continues to grow, many efforts have focused on reducing the massive training overheads of these models. But it is the environmental impact of handling user requests to LLMs that is increasingly becoming a concern. Recent studies estimate that the costs of operating LLMs in their inference phase can exceed training costs by 25x per year. As LLMs are queried incessantly, the cumulative carbon footprint for the operational phase has been shown to far exceed the footprint during the training phase. Further, estimates indicate that 500 ml of fresh water is expended for every 20-50 requests to LLMs during inference. To address these important sustainability issues with LLMs, we propose a novel framework called SLIT to co-optimize LLM quality of service (time-to-first token), carbon emissions, water usage, and energy costs. The framework utilizes a machine learning (ML) based metaheuristic to enhance the sustainability of LLM hosting across geo-distributed cloud datacenters. Such a framework will become increasingly vital as LLMs proliferate.
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. Furthermore, we design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts. To reduce the computation overhead and carbon emission, we propose an online distillation framework for ERNIE 3.0 Titan, where the teacher model will teach students and train itself simultaneously. ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets.
Discovering Effective Policies for Land-Use Planning with Neuroevolution
How areas of land are allocated for different uses, such as forests, urban areas, and agriculture, has a large effect on the terrestrial carbon balance, and therefore climate change. Based on available historical data on land-use changes and a simulation of the associated carbon emissions and removals, a surrogate model can be learned that makes it possible to evaluate the different options available to decision-makers efficiently. An evolutionary search process can then be used to discover effective land-use policies for specific locations. Such a system was built on the Project Resilience platform and evaluated with the Land-Use Harmonization dataset LUH2 and the bookkeeping model BLUE. It generates Pareto fronts that trade off carbon impact and amount of land-use change customized to different locations, thus providing a potentially useful tool for land-use planning.
When Layers Play the Lottery, all Tickets Win at Initialization
Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.
Lessons Learned from Mining the Hugging Face Repository
The rapidly evolving fields of Machine Learning (ML) and Artificial Intelligence have witnessed the emergence of platforms like Hugging Face (HF) as central hubs for model development and sharing. This experience report synthesizes insights from two comprehensive studies conducted on HF, focusing on carbon emissions and the evolutionary and maintenance aspects of ML models. Our objective is to provide a practical guide for future researchers embarking on mining software repository studies within the HF ecosystem to enhance the quality of these studies. We delve into the intricacies of the replication package used in our studies, highlighting the pivotal tools and methodologies that facilitated our analysis. Furthermore, we propose a nuanced stratified sampling strategy tailored for the diverse HF Hub dataset, ensuring a representative and comprehensive analytical approach. The report also introduces preliminary guidelines, transitioning from repository mining to cohort studies, to establish causality in repository mining studies, particularly within the ML model of HF context. This transition is inspired by existing frameworks and is adapted to suit the unique characteristics of the HF model ecosystem. Our report serves as a guiding framework for researchers, contributing to the responsible and sustainable advancement of ML, and fostering a deeper understanding of the broader implications of ML models.
Amortized Network Intervention to Steer the Excitatory Point Processes
We tackle the challenge of large-scale network intervention for guiding excitatory point processes, such as infectious disease spread or traffic congestion control. Our model-based reinforcement learning utilizes neural ODEs to capture how the networked excitatory point processes will evolve subject to the time-varying changes in network topology. Our approach incorporates Gradient-Descent based Model Predictive Control (GD-MPC), offering policy flexibility to accommodate prior knowledge and constraints. To address the intricacies of planning and overcome the high dimensionality inherent to such decision-making problems, we design an Amortize Network Interventions (ANI) framework, allowing for the pooling of optimal policies from history and other contexts, while ensuring a permutation equivalent property. This property enables efficient knowledge transfer and sharing across diverse contexts. Our approach has broad applications, from curbing infectious disease spread to reducing carbon emissions through traffic light optimization, and thus has the potential to address critical societal and environmental challenges.
The Stable Artist: Steering Semantics in Diffusion Latent Space
Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible in a one-shot fashion. On the contrary, text-guided image generation involves the user making many slight changes to inputs in order to iteratively carve out the envisioned image. However, slight changes to the input prompt often lead to entirely different images being generated, and thus the control of the artist is limited in its granularity. To provide flexibility, we present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process. The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions. This allows for subtle edits to images, changes in composition and style, as well as optimization of the overall artistic conception. Furthermore, SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model, even complex ones such as 'carbon emission'. We demonstrate the Stable Artist on several tasks, showcasing high-quality image editing and composition.
Structuring Radiology Reports: Challenging LLMs with Lightweight Models
Radiology reports are critical for clinical decision-making but often lack a standardized format, limiting both human interpretability and machine learning (ML) applications. While large language models (LLMs) have shown strong capabilities in reformatting clinical text, their high computational requirements, lack of transparency, and data privacy concerns hinder practical deployment. To address these challenges, we explore lightweight encoder-decoder models (<300M parameters)-specifically T5 and BERT2BERT-for structuring radiology reports from the MIMIC-CXR and CheXpert Plus datasets. We benchmark these models against eight open-source LLMs (1B-70B), adapted using prefix prompting, in-context learning (ICL), and low-rank adaptation (LoRA) finetuning. Our best-performing lightweight model outperforms all LLMs adapted using prompt-based techniques on a human-annotated test set. While some LoRA-finetuned LLMs achieve modest gains over the lightweight model on the Findings section (BLEU 6.4%, ROUGE-L 4.8%, BERTScore 3.6%, F1-RadGraph 1.1%, GREEN 3.6%, and F1-SRR-BERT 4.3%), these improvements come at the cost of substantially greater computational resources. For example, LLaMA-3-70B incurred more than 400 times the inference time, cost, and carbon emissions compared to the lightweight model. These results underscore the potential of lightweight, task-specific models as sustainable and privacy-preserving solutions for structuring clinical text in resource-constrained healthcare settings.
Comprehensive Analysis of Transparency and Accessibility of ChatGPT, DeepSeek, And other SoTA Large Language Models
Despite increasing discussions on open-source Artificial Intelligence (AI), existing research lacks a discussion on the transparency and accessibility of state-of-the-art (SoTA) Large Language Models (LLMs). The Open Source Initiative (OSI) has recently released its first formal definition of open-source software. This definition, when combined with standard dictionary definitions and the sparse published literature, provide an initial framework to support broader accessibility to AI models such as LLMs, but more work is essential to capture the unique dynamics of openness in AI. In addition, concerns about open-washing, where models claim openness but lack full transparency, has been raised, which limits the reproducibility, bias mitigation, and domain adaptation of these models. In this context, our study critically analyzes SoTA LLMs from the last five years, including ChatGPT, DeepSeek, LLaMA, and others, to assess their adherence to transparency standards and the implications of partial openness. Specifically, we examine transparency and accessibility from two perspectives: open-source vs. open-weight models. Our findings reveal that while some models are labeled as open-source, this does not necessarily mean they are fully open-sourced. Even in the best cases, open-source models often do not report model training data, and code as well as key metrics, such as weight accessibility, and carbon emissions. To the best of our knowledge, this is the first study that systematically examines the transparency and accessibility of over 100 different SoTA LLMs through the dual lens of open-source and open-weight models. The findings open avenues for further research and call for responsible and sustainable AI practices to ensure greater transparency, accountability, and ethical deployment of these models.(DeepSeek transparency, ChatGPT accessibility, open source, DeepSeek open source)
Optimizing Memory Mapping Using Deep Reinforcement Learning
Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, namely the memory mapping problem that occurs during compilation of machine learning programs: That is, mapping tensors to different memory layers to optimize execution time. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. RL is a solution paradigm well-suited for sequential decision making problems that are amenable to planning, and combinatorial search spaces with high-dimensional data inputs. We formulate the problem as a single-player game, which we call the mallocGame, such that high-reward trajectories of the game correspond to efficient memory mappings on the target hardware. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions that lead to faster execution times on real ML workloads on ML accelerators. We compare the performance of mallocMuZero to the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads. In addition, we show that mallocMuZero is capable of improving the execution time of the recently published AlphaTensor matrix multiplication model.
A Real-World Energy Management Dataset from a Smart Company Building for Optimization and Machine Learning
We present a large real-world dataset obtained from monitoring a smart company facility over the course of six years, from 2018 to 2023. The dataset includes energy consumption data from various facility areas and components, energy production data from a photovoltaic system and a combined heat and power plant, operational data from heating and cooling systems, and weather data from an on-site weather station. The measurement sensors installed throughout the facility are organized in a hierarchical metering structure with multiple sub-metering levels, which is reflected in the dataset. The dataset contains measurement data from 72 energy meters, 9 heat meters and a weather station. Both raw and processed data at different processing levels, including labeled issues, is available. In this paper, we describe the data acquisition and post-processing employed to create the dataset. The dataset enables the application of a wide range of methods in the domain of energy management, including optimization, modeling, and machine learning to optimize building operations and reduce costs and carbon emissions.
ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval
State-of-the-art neural retrievers predominantly focus on high-resource languages like English, which impedes their adoption in retrieval scenarios involving other languages. Current approaches circumvent the lack of high-quality labeled data in non-English languages by leveraging multilingual pretrained language models capable of cross-lingual transfer. However, these models require substantial task-specific fine-tuning across multiple languages, often perform poorly in languages with minimal representation in the pretraining corpus, and struggle to incorporate new languages after the pretraining phase. In this work, we present a novel modular dense retrieval model that learns from the rich data of a single high-resource language and effectively zero-shot transfers to a wide array of languages, thereby eliminating the need for language-specific labeled data. Our model, ColBERT-XM, demonstrates competitive performance against existing state-of-the-art multilingual retrievers trained on more extensive datasets in various languages. Further analysis reveals that our modular approach is highly data-efficient, effectively adapts to out-of-distribution data, and significantly reduces energy consumption and carbon emissions. By demonstrating its proficiency in zero-shot scenarios, ColBERT-XM marks a shift towards more sustainable and inclusive retrieval systems, enabling effective information accessibility in numerous languages. We publicly release our code and models for the community.
A Comprehensive Survey of Compression Algorithms for Language Models
How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the gigantic size of language models, such as increased carbon emissions and expensive maintenance fees. While numerous compression algorithms have shown remarkable progress in compressing language models, it ironically becomes challenging to capture emerging trends and identify the fundamental concepts underlying them due to the excessive number of algorithms. In this paper, we survey and summarize diverse compression algorithms including pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. We not only summarize the overall trend of diverse compression algorithms but also select representative algorithms and provide in-depth analyses of them. We discuss the value of each category of compression algorithms, and the desired properties of low-cost compression algorithms which have a significant impact due to the emergence of large language models. Finally, we introduce promising future research topics based on our survey results.
Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
Fine-tuning large models is highly effective, however, inference can be expensive and produces carbon emissions. Knowledge distillation has been shown to be a practical solution to reduce inference costs, but the distillation process itself requires significant computational resources. Rather than buying or renting GPUs to fine-tune, then distill a large model, an NLP practitioner might instead choose to allocate the available budget to hire annotators and manually label additional fine-tuning data. In this paper, we investigate how to most efficiently use a fixed budget to build a compact model. Through extensive experiments on six diverse tasks, we show that distilling from T5-XXL (11B) to T5-Small (60M) is almost always a cost-efficient strategy compared to annotating more data to directly train a compact model (T5-Small). We further investigate how the optimal budget allocated towards computation varies across scenarios. We will make our code, datasets, annotation cost estimates, and baseline models available as a benchmark to support further work on cost-efficient training of compact models.
Efficient and Green Large Language Models for Software Engineering: Vision and the Road Ahead
Large Language Models (LLMs) have recently shown remarkable capabilities in various software engineering tasks, spurring the rapid growth of the Large Language Models for Software Engineering (LLM4SE) area. However, limited attention has been paid to developing efficient LLM4SE techniques that demand minimal computational cost, time, and memory resources, as well as green LLM4SE solutions that reduce energy consumption, water usage, and carbon emissions. This paper aims to redirect the focus of the research community towards the efficiency and greenness of LLM4SE, while also sharing potential research directions to achieve this goal. It commences with a brief overview of the significance of LLM4SE and highlights the need for efficient and green LLM4SE solutions. Subsequently, the paper presents a vision for a future where efficient and green LLM4SE revolutionizes the LLM-based software engineering tool landscape, benefiting various stakeholders, including industry, individual practitioners, and society. The paper then delineates a roadmap for future research, outlining specific research paths and potential solutions for the research community to pursue. While not intended to be a definitive guide, the paper aims to inspire further progress, with the ultimate goal of establishing efficient and green LLM4SE as a central element in the future of software engineering.
Reporting and Analysing the Environmental Impact of Language Models on the Example of Commonsense Question Answering with External Knowledge
Human-produced emissions are growing at an alarming rate, causing already observable changes in the climate and environment in general. Each year global carbon dioxide emissions hit a new record, and it is reported that 0.5% of total US greenhouse gas emissions are attributed to data centres as of 2021. The release of ChatGPT in late 2022 sparked social interest in Large Language Models (LLMs), the new generation of Language Models with a large number of parameters and trained on massive amounts of data. Currently, numerous companies are releasing products featuring various LLMs, with many more models in development and awaiting release. Deep Learning research is a competitive field, with only models that reach top performance attracting attention and being utilized. Hence, achieving better accuracy and results is often the first priority, while the model's efficiency and the environmental impact of the study are neglected. However, LLMs demand substantial computational resources and are very costly to train, both financially and environmentally. It becomes essential to raise awareness and promote conscious decisions about algorithmic and hardware choices. Providing information on training time, the approximate carbon dioxide emissions and power consumption would assist future studies in making necessary adjustments and determining the compatibility of available computational resources with model requirements. In this study, we infused T5 LLM with external knowledge and fine-tuned the model for Question-Answering task. Furthermore, we calculated and reported the approximate environmental impact for both steps. The findings demonstrate that the smaller models may not always be sustainable options, and increased training does not always imply better performance. The most optimal outcome is achieved by carefully considering both performance and efficiency factors.
CaBuAr: California Burned Areas dataset for delineation
Forest wildfires represent one of the catastrophic events that, over the last decades, caused huge environmental and humanitarian damages. In addition to a significant amount of carbon dioxide emission, they are a source of risk to society in both short-term (e.g., temporary city evacuation due to fire) and long-term (e.g., higher risks of landslides) cases. Consequently, the availability of tools to support local authorities in automatically identifying burned areas plays an important role in the continuous monitoring requirement to alleviate the aftereffects of such catastrophic events. The great availability of satellite acquisitions coupled with computer vision techniques represents an important step in developing such tools. This paper introduces a novel open dataset that tackles the burned area delineation problem, a binary segmentation problem applied to satellite imagery. The presented resource consists of pre- and post-fire Sentinel-2 L2A acquisitions of California forest fires that took place starting in 2015. Raster annotations were generated from the data released by California's Department of Forestry and Fire Protection. Moreover, in conjunction with the dataset, we release three different baselines based on spectral indexes analyses, SegFormer, and U-Net models.
A Multi-Level Framework for Accelerating Training Transformer Models
The fast growing capabilities of large-scale deep learning models, such as Bert, GPT and ViT, are revolutionizing the landscape of NLP, CV and many other domains. Training such models, however, poses an unprecedented demand for computing power, which incurs exponentially increasing energy cost and carbon dioxide emissions. It is thus critical to develop efficient training solutions to reduce the training costs. Motivated by a set of key observations of inter- and intra-layer similarities among feature maps and attentions that can be identified from typical training processes, we propose a multi-level framework for training acceleration. Specifically, the framework is based on three basic operators, Coalescing, De-coalescing and Interpolation, which can be orchestrated to build a multi-level training framework. The framework consists of a V-cycle training process, which progressively down- and up-scales the model size and projects the parameters between adjacent levels of models via coalescing and de-coalescing. The key idea is that a smaller model that can be trained for fast convergence and the trained parameters provides high-qualities intermediate solutions for the next level larger network. The interpolation operator is designed to break the symmetry of neurons incurred by de-coalescing for better convergence performance. Our experiments on transformer-based language models (e.g. Bert, GPT) as well as a vision model (e.g. DeiT) prove that the proposed framework reduces the computational cost by about 20% on training BERT/GPT-Base models and up to 51.6% on training the BERT-Large model while preserving the performance.
Single-atom catalysts boost nitrogen electroreduction reaction
Ammonia (NH3) is mainly produced through the traditional Haber-Bosch process under harsh conditions with huge energy consumption and massive carbon dioxide (CO2) emission. The nitrogen electroreduction reaction (NERR), as an energy-efficient and environment-friendly process of converting nitrogen (N2) to NH3 under ambient conditions, has been regarded as a promising alternative to the Haber-Bosch process and has received enormous interest in recent years. Although some exciting progress has been made, considerable scientific and technical challenges still exist in improving the NH3 yield rate and Faradic efficiency, understanding the mechanism of the reaction and promoting the wide commercialization of NERR. Single-atom catalysts (SACs) have emerged as promising catalysts because of its atomically dispersed activity sites and maximized atom efficiency, unsaturated coordination environment, and its unique electronic structure, which could significantly improve the rate of reaction and yield rate of NH3. In this review we briefly introduce the unique structural and electronic features of SACs, which contributes to comprehensively understand the reaction mechanism owing to their structural simplicity and diversity, and in turn expedite the rational design of fantastic catalysts at the atomic scale. Then, we summarize the most recent experimental and computational efforts on developing novel SACs with excellent NERR performance, including precious metal-, nonprecious metal- and nonmetal-based SACs. Finally, we present challenges and perspectives of SACs on NERR, as well as some potential means for advanced NERR catalyst.
Group Reasoning Emission Estimation Networks
Accurate greenhouse gas (GHG) emission reporting is critical for governments, businesses, and investors. However, adoption remains limited particularly among small and medium enterprises due to high implementation costs, fragmented emission factor databases, and a lack of robust sector classification methods. To address these challenges, we introduce Group Reasoning Emission Estimation Networks (GREEN), an AI-driven carbon accounting framework that standardizes enterprise-level emission estimation, constructs a large-scale benchmark dataset, and leverages a novel reasoning approach with large language models (LLMs). Specifically, we compile textual descriptions for 20,850 companies with validated North American Industry Classification System (NAICS) labels and align these with an economic model of carbon intensity factors. By reframing sector classification as an information retrieval task, we fine-tune Sentence-BERT models using a contrastive learning loss. To overcome the limitations of single-stage models in handling thousands of hierarchical categories, we propose a Group Reasoning method that ensembles LLM classifiers based on the natural NAICS ontology, decomposing the task into multiple sub-classification steps. We theoretically prove that this approach reduces classification uncertainty and computational complexity. Experiments on 1,114 NAICS categories yield state-of-the-art performance (83.68% Top-1, 91.47% Top-10 accuracy), and case studies on 20 companies report a mean absolute percentage error (MAPE) of 45.88%. The project is available at: https://huggingface.co/datasets/Yvnminc/ExioNAICS.
Net-Zero: A Comparative Study on Neural Network Design for Climate-Economic PDEs Under Uncertainty
Climate-economic modeling under uncertainty presents significant computational challenges that may limit policymakers' ability to address climate change effectively. This paper explores neural network-based approaches for solving high-dimensional optimal control problems arising from models that incorporate ambiguity aversion in climate mitigation decisions. We develop a continuous-time endogenous-growth economic model that accounts for multiple mitigation pathways, including emission-free capital and carbon intensity reductions. Given the inherent complexity and high dimensionality of these models, traditional numerical methods become computationally intractable. We benchmark several neural network architectures against finite-difference generated solutions, evaluating their ability to capture the dynamic interactions between uncertainty, technology transitions, and optimal climate policy. Our findings demonstrate that appropriate neural architecture selection significantly impacts both solution accuracy and computational efficiency when modeling climate-economic systems under uncertainty. These methodological advances enable more sophisticated modeling of climate policy decisions, allowing for better representation of technology transitions and uncertainty-critical elements for developing effective mitigation strategies in the face of climate change.
On The Impact of Replacing Private Cars with Autonomous Shuttles: An Agent-Based Approach
The European Green Deal aims to achieve climate neutrality by 2050, which demands improved emissions efficiency from the transportation industry. This study uses an agent-based simulation to analyze the sustainability impacts of shared autonomous shuttles. We forecast travel demands for 2050 and simulate regulatory interventions in the form of replacing private cars with a fleet of shared autonomous shuttles in specific areas. We derive driving-related emissions, energy consumption, and non-driving-related emissions to calculate life-cycle emissions. We observe reduced life-cycle emissions from 0.4% to 9.6% and reduced energy consumption from 1.5% to 12.2%.
HyperionSolarNet: Solar Panel Detection from Aerial Images
With the effects of global climate change impacting the world, collective efforts are needed to reduce greenhouse gas emissions. The energy sector is the single largest contributor to climate change and many efforts are focused on reducing dependence on carbon-emitting power plants and moving to renewable energy sources, such as solar power. A comprehensive database of the location of solar panels is important to assist analysts and policymakers in defining strategies for further expansion of solar energy. In this paper we focus on creating a world map of solar panels. We identify locations and total surface area of solar panels within a given geographic area. We use deep learning methods for automated detection of solar panel locations and their surface area using aerial imagery. The framework, which consists of a two-branch model using an image classifier in tandem with a semantic segmentation model, is trained on our created dataset of satellite images. Our work provides an efficient and scalable method for detecting solar panels, achieving an accuracy of 0.96 for classification and an IoU score of 0.82 for segmentation performance.
Gasformer: A Transformer-based Architecture for Segmenting Methane Emissions from Livestock in Optical Gas Imaging
Methane emissions from livestock, particularly cattle, significantly contribute to climate change. Effective methane emission mitigation strategies are crucial as the global population and demand for livestock products increase. We introduce Gasformer, a novel semantic segmentation architecture for detecting low-flow rate methane emissions from livestock, and controlled release experiments using optical gas imaging. We present two unique datasets captured with a FLIR GF77 OGI camera. Gasformer leverages a Mix Vision Transformer encoder and a Light-Ham decoder to generate multi-scale features and refine segmentation maps. Gasformer outperforms other state-of-the-art models on both datasets, demonstrating its effectiveness in detecting and segmenting methane plumes in controlled and real-world scenarios. On the livestock dataset, Gasformer achieves mIoU of 88.56%, surpassing other state-of-the-art models. Materials are available at: github.com/toqitahamid/Gasformer.
NEBULA: A National Scale Dataset for Neighbourhood-Level Urban Building Energy Modelling for England and Wales
Buildings are significant contributors to global greenhouse gas emissions, accounting for 26% of global energy sector emissions in 2022. Meeting net zero goals requires a rapid reduction in building emissions, both directly from the buildings and indirectly from the production of electricity and heat used in buildings. National energy planning for net zero demands both detailed and comprehensive building energy consumption data. However, geo-located building-level energy data is rarely available in Europe, with analysis typically relying on anonymised, simulated or low-resolution data. To address this problem, we introduce a dataset of Neighbourhood Energy, Buildings, and Urban Landscapes (NEBULA) for modelling domestic energy consumption for small neighbourhoods (5-150 households). NEBULA integrates data on building characteristics, climate, urbanisation, environment, and socio-demographics and contains 609,964 samples across England and Wales.
How Green are Neural Language Models? Analyzing Energy Consumption in Text Summarization Fine-tuning
Artificial intelligence systems significantly impact the environment, particularly in natural language processing (NLP) tasks. These tasks often require extensive computational resources to train deep neural networks, including large-scale language models containing billions of parameters. This study analyzes the trade-offs between energy consumption and performance across three neural language models: two pre-trained models (T5-base and BART-base), and one large language model (LLaMA 3-8B). These models were fine-tuned for the text summarization task, focusing on generating research paper highlights that encapsulate the core themes of each paper. A wide range of evaluation metrics, including ROUGE, METEOR, MoverScore, BERTScore, and SciBERTScore, were employed to assess their performance. Furthermore, the carbon footprint associated with fine-tuning each model was measured, offering a comprehensive assessment of their environmental impact. This research underscores the importance of incorporating environmental considerations into the design and implementation of neural language models and calls for the advancement of energy-efficient AI methodologies.
Designing a sector-coupled European energy system robust to 60 years of historical weather data
As energy systems transform to rely on renewable energy and electrification, they encounter stronger year-to-year variability in energy supply and demand. However, most infrastructure planning is based on a single weather year, resulting in a lack of robustness. In this paper, we optimize energy infrastructure for a European energy system designed for net-zero CO_2 emissions in 62 different weather years. Subsequently, we fix the capacity layouts and simulate their operation in every weather year, to evaluate resource adequacy and CO_2 emissions abatement. We show that interannual weather variability causes variation of pm10\% in total system cost. The most expensive capacity layout obtains the lowest net CO_2 emissions but not the highest resource adequacy. Instead, capacity layouts designed with years including compound weather events result in a more robust and cost-effective design. Deploying CO_2-emitting backup generation is a cost-effective robustness measure, which only increase CO_2 emissions marginally as the average CO_2 emissions remain less than 1\% of 1990 levels. Our findings highlight how extreme weather years drive investments in robustness measures, making them compatible with all weather conditions within six decades of historical weather data.
Bridging the Gap: Integrating Ethics and Environmental Sustainability in AI Research and Practice
As the possibilities for Artificial Intelligence (AI) have grown, so have concerns regarding its impacts on society and the environment. However, these issues are often raised separately; i.e. carbon footprint analyses of AI models typically do not consider how the pursuit of scale has contributed towards building models that are both inaccessible to most researchers in terms of cost and disproportionately harmful to the environment. On the other hand, model audits that aim to evaluate model performance and disparate impacts mostly fail to engage with the environmental ramifications of AI models and how these fit into their auditing approaches. In this separation, both research directions fail to capture the depth of analysis that can be explored by considering the two in parallel and the potential solutions for making informed choices that can be developed at their convergence. In this essay, we build upon work carried out in AI and in sister communities, such as philosophy and sustainable development, to make more deliberate connections around topics such as generalizability, transparency, evaluation and equity across AI research and practice. We argue that the efforts aiming to study AI's ethical ramifications should be made in tandem with those evaluating its impacts on the environment, and we conclude with a proposal of best practices to better integrate AI ethics and sustainability in AI research and practice.
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts
The rapid growth of artificial intelligence (AI), particularly Large Language Models (LLMs), has raised concerns regarding its global environmental impact that extends beyond greenhouse gas emissions to include consideration of hardware fabrication and end-of-life processes. The opacity from major providers hinders companies' abilities to evaluate their AI-related environmental impacts and achieve net-zero targets. In this paper, we propose a methodology to estimate the environmental impact of a company's AI portfolio, providing actionable insights without necessitating extensive AI and Life-Cycle Assessment (LCA) expertise. Results confirm that large generative AI models consume up to 4600x more energy than traditional models. Our modelling approach, which accounts for increased AI usage, hardware computing efficiency, and changes in electricity mix in line with IPCC scenarios, forecasts AI electricity use up to 2030. Under a high adoption scenario, driven by widespread Generative AI and agents adoption associated to increasingly complex models and frameworks, AI electricity use is projected to rise by a factor of 24.4. Mitigating the environmental impact of Generative AI by 2030 requires coordinated efforts across the AI value chain. Isolated measures in hardware efficiency, model efficiency, or grid improvements alone are insufficient. We advocate for standardized environmental assessment frameworks, greater transparency from the all actors of the value chain and the introduction of a "Return on Environment" metric to align AI development with net-zero goals.
ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets
Public and private actors struggle to assess the vast amounts of information about sustainability commitments made by various institutions. To address this problem, we create a novel tool for automatically detecting corporate, national, and regional net zero and reduction targets in three steps. First, we introduce an expert-annotated data set with 3.5K text samples. Second, we train and release ClimateBERT-NetZero, a natural language classifier to detect whether a text contains a net zero or reduction target. Third, we showcase its analysis potential with two use cases: We first demonstrate how ClimateBERT-NetZero can be combined with conventional question-answering (Q&A) models to analyze the ambitions displayed in net zero and reduction targets. Furthermore, we employ the ClimateBERT-NetZero model on quarterly earning call transcripts and outline how communication patterns evolve over time. Our experiments demonstrate promising pathways for extracting and analyzing net zero and emission reduction targets at scale.
Nuclear Explosions for Large Scale Carbon Sequestration
Confronting the escalating threat of climate change requires innovative and large-scale interventions. This paper presents a bold proposal to employ a buried nuclear explosion in a remote basaltic seabed for pulverizing basalt, thereby accelerating carbon sequestration through Enhanced Rock Weathering (ERW). By precisely locating the explosion beneath the seabed, we aim to confine debris, radiation, and energy while ensuring rapid rock weathering at a scale substantial enough to make a meaningful dent in atmospheric carbon levels. Our analysis outlines the parameters essential for efficient carbon capture and minimal collateral effects, emphasizing that a yield on the order of gigatons is critical for global climate impact. Although this approach may appear radical, we illustrate its feasibility by examining safety factors, preservation of local ecosystems, political considerations, and financial viability. This work argues for reimagining nuclear technology not merely as a destructive force but as a potential catalyst for decarbonization, thereby inviting further exploration of pioneering solutions in the fight against climate change.
From Microbes to Methane: AI-Based Predictive Modeling of Feed Additive Efficacy in Dairy Cows
In an era of increasing pressure to achieve sustainable agriculture, the optimization of livestock feed for enhancing yield and minimizing environmental impact is a paramount objective. This study presents a pioneering approach towards this goal, using rumen microbiome data to predict the efficacy of feed additives in dairy cattle. We collected an extensive dataset that includes methane emissions from 2,190 Holstein cows distributed across 34 distinct sites. The cows were divided into control and experimental groups in a double-blind, unbiased manner, accounting for variables such as age, days in lactation, and average milk yield. The experimental groups were administered one of four leading commercial feed additives: Agolin, Kexxtone, Allimax, and Relyon. Methane emissions were measured individually both before the administration of additives and over a subsequent 12-week period. To develop our predictive model for additive efficacy, rumen microbiome samples were collected from 510 cows from the same herds prior to the study's onset. These samples underwent deep metagenomic shotgun sequencing, yielding an average of 15.7 million reads per sample. Utilizing innovative artificial intelligence techniques we successfully estimated the efficacy of these feed additives across different farms. The model's robustness was further confirmed through validation with independent cohorts, affirming its generalizability and reliability. Our results underscore the transformative capability of using targeted feed additive strategies to both optimize dairy yield and milk composition, and to significantly reduce methane emissions. Specifically, our predictive model demonstrates a scenario where its application could guide the assignment of additives to farms where they are most effective. In doing so, we could achieve an average potential reduction of over 27\% in overall emissions.
CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds
Clouds play a significant role in global temperature regulation through their effect on planetary albedo. Anthropogenic emissions of aerosols can alter the albedo of clouds, but the extent of this effect, and its consequent impact on temperature change, remains uncertain. Human-induced clouds caused by ship aerosol emissions, commonly referred to as ship tracks, provide visible manifestations of this effect distinct from adjacent cloud regions and therefore serve as a useful sandbox to study human-induced clouds. However, the lack of large-scale ship track data makes it difficult to deduce their general effects on cloud formation. Towards developing automated approaches to localize ship tracks at scale, we present CloudTracks, a dataset containing 3,560 satellite images labeled with more than 12,000 ship track instance annotations. We train semantic segmentation and instance segmentation model baselines on our dataset and find that our best model substantially outperforms previous state-of-the-art for ship track localization (61.29 vs. 48.65 IoU). We also find that the best instance segmentation model is able to identify the number of ship tracks in each image more accurately than the previous state-of-the-art (1.64 vs. 4.99 MAE). However, we identify cases where the best model struggles to accurately localize and count ship tracks, so we believe CloudTracks will stimulate novel machine learning approaches to better detect elongated and overlapping features in satellite images. We release our dataset openly at {zenodo.org/records/10042922}.
Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models
The growing carbon footprint of artificial intelligence (AI) models, especially large ones such as GPT-3, has been undergoing public scrutiny. Unfortunately, however, the equally important and enormous water (withdrawal and consumption) footprint of AI models has remained under the radar. For example, training GPT-3 in Microsoft's state-of-the-art U.S. data centers can directly evaporate 700,000 liters of clean freshwater, but such information has been kept a secret. More critically, the global AI demand may be accountable for 4.2 -- 6.6 billion cubic meters of water withdrawal in 2027, which is more than the total annual water withdrawal of 4 -- 6 Denmark or half of the United Kingdom. This is very concerning, as freshwater scarcity has become one of the most pressing challenges shared by all of us in the wake of the rapidly growing population, depleting water resources, and aging water infrastructures. To respond to the global water challenges, AI models can, and also must, take social responsibility and lead by example by addressing their own water footprint. In this paper, we provide a principled methodology to estimate the water footprint of AI models, and also discuss the unique spatial-temporal diversities of AI models' runtime water efficiency. Finally, we highlight the necessity of holistically addressing water footprint along with carbon footprint to enable truly sustainable AI.
The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture
New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,400 MOF materials containing adsorbed CO_2 and/or H_2O. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.
ACE2-SOM: Coupling to a slab ocean and learning the sensitivity of climate to changes in CO_2
While autoregressive machine-learning-based emulators have been trained to produce stable and accurate rollouts in the climate of the present-day and recent past, none so far have been trained to emulate the sensitivity of climate to substantial changes in CO_2 or other greenhouse gases. As an initial step we couple the Ai2 Climate Emulator version 2 to a slab ocean model (hereafter ACE2-SOM) and train it on output from a collection of equilibrium-climate physics-based reference simulations with varying levels of CO_2. We test it in equilibrium and non-equilibrium climate scenarios with CO_2 concentrations seen and unseen in training. ACE2-SOM performs well in equilibrium-climate inference with both in-sample and out-of-sample CO_2 concentrations, accurately reproducing the emergent time-mean spatial patterns of surface temperature and precipitation change with CO_2 doubling, tripling, or quadrupling. In addition, the vertical profile of atmospheric warming and change in extreme precipitation rates with increased CO_2 closely agree with the reference model. Non-equilibrium-climate inference is more challenging. With CO_2 increasing gradually at a rate of 2% year^{-1}, ACE2-SOM can accurately emulate the global annual mean trends of surface and lower-to-middle atmosphere fields but produces unphysical jumps in stratospheric fields. With an abrupt quadrupling of CO_2, ML-controlled fields transition unrealistically quickly to the 4xCO_2 regime. In doing so they violate global energy conservation and exhibit unphysical sensitivities of and surface and top of atmosphere radiative fluxes to instantaneous changes in CO_2. Future emulator development needed to address these issues should improve its generalizability to diverse climate change scenarios.
Understanding Environmental Posts: Sentiment and Emotion Analysis of Social Media Data
Social media is now the predominant source of information due to the availability of immediate public response. As a result, social media data has become a valuable resource for comprehending public sentiments. Studies have shown that it can amplify ideas and influence public sentiments. This study analyzes the public perception of climate change and the environment over a decade from 2014 to 2023. Using the Pointwise Mutual Information (PMI) algorithm, we identify sentiment and explore prevailing emotions expressed within environmental tweets across various social media platforms, namely Twitter, Reddit, and YouTube. Accuracy on a human-annotated dataset was 0.65, higher than Vader score but lower than that of an expert rater (0.90). Our findings suggest that negative environmental tweets are far more common than positive or neutral ones. Climate change, air quality, emissions, plastic, and recycling are the most discussed topics on all social media platforms, highlighting its huge global concern. The most common emotions in environmental tweets are fear, trust, and anticipation, demonstrating public reactions wide and complex nature. By identifying patterns and trends in opinions related to the environment, we hope to provide insights that can help raise awareness regarding environmental issues, inform the development of interventions, and adapt further actions to meet environmental challenges.
Sustainable AI: Environmental Implications, Challenges and Opportunities
This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, we capture the operational and manufacturing carbon footprint of AI computing and present an end-to-end analysis for what and how hardware-software design and at-scale optimization can help reduce the overall carbon footprint of AI. Based on the industry experience and lessons learned, we share the key challenges and chart out important development directions across the many dimensions of AI. We hope the key messages and insights presented in this paper can inspire the community to advance the field of AI in an environmentally-responsible manner.
A Configurable Pythonic Data Center Model for Sustainable Cooling and ML Integration
There have been growing discussions on estimating and subsequently reducing the operational carbon footprint of enterprise data centers. The design and intelligent control for data centers have an important impact on data center carbon footprint. In this paper, we showcase PyDCM, a Python library that enables extremely fast prototyping of data center design and applies reinforcement learning-enabled control with the purpose of evaluating key sustainability metrics including carbon footprint, energy consumption, and observing temperature hotspots. We demonstrate these capabilities of PyDCM and compare them to existing works in EnergyPlus for modeling data centers. PyDCM can also be used as a standalone Gymnasium environment for demonstrating sustainability-focused data center control.
A Framework for Scalable Ambient Air Pollution Concentration Estimation
Ambient air pollution remains a critical issue in the United Kingdom, where data on air pollution concentrations form the foundation for interventions aimed at improving air quality. However, the current air pollution monitoring station network in the UK is characterized by spatial sparsity, heterogeneous placement, and frequent temporal data gaps, often due to issues such as power outages. We introduce a scalable data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements. This approach provides a comprehensive dataset for England throughout 2018 at a 1kmx1km hourly resolution. Leveraging machine learning techniques and real-world data from the sparsely distributed monitoring stations, we generate 355,827 synthetic monitoring stations across the study area, yielding data valued at approximately \pounds70 billion. Validation was conducted to assess the model's performance in forecasting, estimating missing locations, and capturing peak concentrations. The resulting dataset is of particular interest to a diverse range of stakeholders engaged in downstream assessments supported by outdoor air pollution concentration data for NO2, O3, PM10, PM2.5, and SO2. This resource empowers stakeholders to conduct studies at a higher resolution than was previously possible.
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI
Contrails (condensation trails) are line-shaped ice clouds caused by aircraft and are likely the largest contributor of aviation-induced climate change. Contrail avoidance is potentially an inexpensive way to significantly reduce the climate impact of aviation. An automated contrail detection system is an essential tool to develop and evaluate contrail avoidance systems. In this paper, we present a human-labeled dataset named OpenContrails to train and evaluate contrail detection models based on GOES-16 Advanced Baseline Imager (ABI) data. We propose and evaluate a contrail detection model that incorporates temporal context for improved detection accuracy. The human labeled dataset and the contrail detection outputs are publicly available on Google Cloud Storage at gs://goes_contrails_dataset.
ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English
The topic of Climate Change (CC) has received limited attention in NLP despite its real world urgency. Activists and policy-makers need NLP tools in order to effectively process the vast and rapidly growing textual data produced on CC. Their utility, however, primarily depends on whether the current state-of-the-art models can generalize across various tasks in the CC domain. In order to address this gap, we introduce Climate Change Benchmark (ClimaBench), a benchmark collection of existing disparate datasets for evaluating model performance across a diverse set of CC NLU tasks systematically. Further, we enhance the benchmark by releasing two large-scale labelled text classification and question-answering datasets curated from publicly available environmental disclosures. Lastly, we provide an analysis of several generic and CC-oriented models answering whether fine-tuning on domain text offers any improvements across these tasks. We hope this work provides a standard assessment tool for research on CC text data.
A Dataset for Detecting Real-World Environmental Claims
In this paper, we introduce an expert-annotated dataset for detecting real-world environmental claims made by listed companies. We train and release baseline models for detecting environmental claims using this new dataset. We further preview potential applications of our dataset: We use our fine-tuned model to detect environmental claims made in answer sections of quarterly earning calls between 2012 and 2020 -- and we find that the amount of environmental claims steadily increased since the Paris Agreement in 2015.
PDRs4All. XII. FUV-driven formation of hydrocarbon radicals and their relation with PAHs
We present subarcsecond-resolution ALMA mosaics of the Orion Bar PDR in [CI] 609 um, C2H (4-3), and C18O (3-2) emission lines, complemented by JWST images of H2 and aromatic infrared band (AIB) emission. The rim of the Bar shows very corrugated structures made of small-scale H2 dissociation fronts (DFs). The [CI] 609 um emission peaks very close (~0.002 pc) to the main H2-emitting DFs, suggesting the presence of gas density gradients. These DFs are also bright and remarkably similar in C2H emission, which traces 'hydrocarbon radical peaks' characterized by very high C2H abundances, reaching up to several x10^-7. The high abundance of C2H and of related hydrocarbon radicals, such as CH3, CH2, and CH, can be attributed to gas-phase reactions driven by elevated temperatures, the presence of C+ and C, and the reactivity of FUV-pumped H2. The hydrocarbon radical peaks roughly coincide with maxima of the 3.4/3.3 um AIB intensity ratio, a proxy for the aliphatic-to-aromatic content of PAHs. This implies that the conditions triggering the formation of simple hydrocarbons also favor the formation (and survival) of PAHs with aliphatic side groups, potentially via the contribution of bottom-up processes in which abundant hydrocarbon radicals react in situ with PAHs. Ahead of the DFs, in the atomic PDR zone (where [H]>>[H2]), the AIB emission is brightest, but small PAHs and carbonaceous grains undergo photo-processing due to the stronger FUV field. Our detection of trace amounts of C2H in this zone may result from the photoerosion of these species. This study provides a spatially resolved view of the chemical stratification of key carbon carriers in a PDR. Overall, both bottom-up and top-down processes appear to link simple hydrocarbon molecules with PAHs in molecular clouds; however, the exact chemical pathways and their relative contributions remain to be quantified.
EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection
Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Mainstream NLP tasks, such as sentiment analysis, dominate the scene, but there remains an untouched space in the literature involving the analysis of environmental impacts of certain events and practices. To address this gap, this paper presents EcoVerse, an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis. We detail the data collection, filtering, and labeling process that led to the creation of the dataset. Remarkable Inter-Annotator Agreement indicates that the annotation scheme produces consistent annotations of high quality. Subsequent classification experiments using BERT-based models, including ClimateBERT, are presented. These yield encouraging results, while also indicating room for a model specifically tailored for environmental texts. The dataset is made freely available to stimulate further research.
Using remotely sensed data for air pollution assessment
Air pollution constitutes a global problem of paramount importance that affects not only human health, but also the environment. The existence of spatial and temporal data regarding the concentrations of pollutants is crucial for performing air pollution studies and monitor emissions. However, although observation data presents great temporal coverage, the number of stations is very limited and they are usually built in more populated areas. The main objective of this work is to create models capable of inferring pollutant concentrations in locations where no observation data exists. A machine learning model, more specifically the random forest model, was developed for predicting concentrations in the Iberian Peninsula in 2019 for five selected pollutants: NO_2, O_3 SO_2, PM10, and PM2.5. Model features include satellite measurements, meteorological variables, land use classification, temporal variables (month, day of year), and spatial variables (latitude, longitude, altitude). The models were evaluated using various methods, including station 10-fold cross-validation, in which in each fold observations from 10\% of the stations are used as testing data and the rest as training data. The R^2, RMSE and mean bias were determined for each model. The NO_2 and O_3 models presented good values of R^2, 0.5524 and 0.7462, respectively. However, the SO_2, PM10, and PM2.5 models performed very poorly in this regard, with R^2 values of -0.0231, 0.3722, and 0.3303, respectively. All models slightly overestimated the ground concentrations, except the O_3 model. All models presented acceptable cross-validation RMSE, except the O_3 and PM10 models where the mean value was a little higher (12.5934 mu g/m^3 and 10.4737 mu g/m^3, respectively).
Recent global temperature surge amplified by record-low planetary albedo
In 2023, the global mean temperature soared to 1.48K above the pre-industrial level, surpassing the previous record by 0.17K. Previous best-guess estimates of known drivers including anthropogenic warming and the El Nino onset fall short by about 0.2K in explaining the temperature rise. Utilizing satellite and reanalysis data, we identify a record-low planetary albedo as the primary factor bridging this gap. The decline is caused largely by a reduced low-cloud cover in the northern mid-latitudes and tropics, in continuation of a multi-annual trend. Understanding how much of the low-cloud trend is due to internal variability, reduced aerosol concentrations, or a possibly emerging low-cloud feedback will be crucial for assessing the current and expected future warming.
Continuous Convolutional Neural Networks for Disruption Prediction in Nuclear Fusion Plasmas
Grid decarbonization for climate change requires dispatchable carbon-free energy like nuclear fusion. The tokamak concept offers a promising path for fusion, but one of the foremost challenges in implementation is the occurrence of energetic plasma disruptions. In this study, we delve into Machine Learning approaches to predict plasma state outcomes. Our contributions are twofold: (1) We present a novel application of Continuous Convolutional Neural Networks for disruption prediction and (2) We examine the advantages and disadvantages of continuous models over discrete models for disruption prediction by comparing our model with the previous, discrete state of the art, and show that continuous models offer significantly better performance (Area Under the Receiver Operating Characteristic Curve = 0.974 v.s. 0.799) with fewer parameters
Analyzing the Impact of Climate Change With Major Emphasis on Pollution: A Comparative Study of ML and Statistical Models in Time Series Data
Industrial operations have grown exponentially over the last century, driving advancements in energy utilization through vehicles and machinery.This growth has significant environmental implications, necessitating the use of sophisticated technology to monitor and analyze climate data.The surge in industrial activities presents a complex challenge in forecasting its diverse environmental impacts, which vary greatly across different regions.Aim to understand these dynamics more deeply to predict and mitigate the environmental impacts of industrial activities.
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I
This is the 1st part of the dissertation for my master degree and compares the power consumption using the default floating point (32bit) and Nvidia mixed precision (16bit and 32bit) while training a classification ML model. A custom PC with specific hardware was built to perform the experiments, and different ML hyper-parameters, such as batch size, neurons, and epochs, were chosen to build Deep Neural Networks (DNN). Additionally, various software was used during the experiments to collect the power consumption data in Watts from the Graphics Processing Unit (GPU), Central Processing Unit (CPU), Random Access Memory (RAM) and manually from a wattmeter connected to the wall. A benchmarking test with default hyper parameter values for the DNN was used as a reference, while the experiments used a combination of different settings. The results were recorded in Excel, and descriptive statistics were chosen to calculate the mean between the groups and compare them using graphs and tables. The outcome was positive when using mixed precision combined with specific hyper-parameters. Compared to the benchmarking, the optimisation for the classification reduced the power consumption between 7 and 11 Watts. Similarly, the carbon footprint is reduced because the calculation uses the same power consumption data. Still, a consideration is required when configuring hyper-parameters because it can negatively affect hardware performance. However, this research required inferential statistics, specifically ANOVA and T-test, to compare the relationship between the means. Furthermore, tests indicated no statistical significance of the relationship between the benchmarking and experiments. However, a more extensive implementation with a cluster of GPUs can increase the sample size significantly, as it is an essential factor and can change the outcome of the statistical analysis.
Modeling Sustainable City Trips: Integrating CO2e Emissions, Popularity, and Seasonality into Tourism Recommender Systems
Tourism affects not only the tourism industry but also society and stakeholders such as the environment, local businesses, and residents. Tourism Recommender Systems (TRS) can be pivotal in promoting sustainable tourism by guiding travelers toward destinations with minimal negative impact. Our paper introduces a composite sustainability indicator for a city trip TRS based on the users' starting point and month of travel. This indicator integrates CO2e emissions for different transportation modes and analyses destination popularity and seasonal demand. We quantify city popularity based on user reviews, points of interest, and search trends from Tripadvisor and Google Trends data. To calculate a seasonal demand index, we leverage data from TourMIS and Airbnb. We conducted a user study to explore the fundamental trade-offs in travel decision-making and determine the weights for our proposed indicator. Finally, we demonstrate the integration of this indicator into a TRS, illustrating its ability to deliver sustainable city trip recommendations. This work lays the foundation for future research by integrating sustainability measures and contributing to responsible recommendations by TRS.
Detecting Fallacies in Climate Misinformation: A Technocognitive Approach to Identifying Misleading Argumentation
Misinformation about climate change is a complex societal issue requiring holistic, interdisciplinary solutions at the intersection between technology and psychology. One proposed solution is a "technocognitive" approach, involving the synthesis of psychological and computer science research. Psychological research has identified that interventions in response to misinformation require both fact-based (e.g., factual explanations) and technique-based (e.g., explanations of misleading techniques) content. However, little progress has been made on documenting and detecting fallacies in climate misinformation. In this study, we apply a previously developed critical thinking methodology for deconstructing climate misinformation, in order to develop a dataset mapping different types of climate misinformation to reasoning fallacies. This dataset is used to train a model to detect fallacies in climate misinformation. Our study shows F1 scores that are 2.5 to 3.5 better than previous works. The fallacies that are easiest to detect include fake experts and anecdotal arguments, while fallacies that require background knowledge, such as oversimplification, misrepresentation, and slothful induction, are relatively more difficult to detect. This research lays the groundwork for development of solutions where automatically detected climate misinformation can be countered with generative technique-based corrections.
Identifying Climate Targets in National Laws and Policies using Machine Learning
Quantified policy targets are a fundamental element of climate policy, typically characterised by domain-specific and technical language. Current methods for curating comprehensive views of global climate policy targets entail significant manual effort. At present there are few scalable methods for extracting climate targets from national laws or policies, which limits policymakers' and researchers' ability to (1) assess private and public sector alignment with global goals and (2) inform policy decisions. In this paper we present an approach for extracting mentions of climate targets from national laws and policies. We create an expert-annotated dataset identifying three categories of target ('Net Zero', 'Reduction' and 'Other' (e.g. renewable energy targets)) and train a classifier to reliably identify them in text. We investigate bias and equity impacts related to our model and identify specific years and country names as problematic features. Finally, we investigate the characteristics of the dataset produced by running this classifier on the Climate Policy Radar (CPR) dataset of global national climate laws and policies and UNFCCC submissions, highlighting the potential of automated and scalable data collection for existing climate policy databases and supporting further research. Our work represents a significant upgrade in the accessibility of these key climate policy elements for policymakers and researchers. We publish our model at https://huggingface.co/ClimatePolicyRadar/national-climate-targets and related dataset at https://huggingface.co/datasets/ClimatePolicyRadar/national-climate-targets.
First detections of CO absorption in the Magellanic Clouds and direct measurement of the CO-to-H_2 ratio
Molecular hydrogen (H_2) is by far the most abundant molecule in the Universe. However, due to the low emissivity of H_2, carbon monoxide (CO) is widely used instead to trace molecular gas in galaxies. The relative abundances of these molecules is expected to depend on both physical (e.g., density) and chemical (e.g., metal enrichment) properties of the gas, making direct measurements in diverse environments crucial. We present a systematic search for CO in absorption toward 34 stars behind H_2 gas in the Magellanic Clouds using the Hubble Space Telescope. We report the first two definitive detections of CO absorption in the Large Magellanic Cloud (LMC) and one in the Small Magellanic Cloud (SMC), along with stringent upper limits for the remaining sightlines. Non-detections of CO are consistent with models of low thermal pressures and/or low metallicities while detections at the lower metallicities of the Magellanic Clouds require higher thermal pressures, P_{rm th}=10^5-10^6,K,cm^{-3} than detections the Milky Way at similar N({rm H_2}). Notably, the high density derived from the rotational excitation of CO towards SK,143 in the SMC suggests full molecularization of CO in the absorbing cloud, with CO/H_2 = 8.3^{+2.0}_{-1.6}times10^{-5} consistent with the standard ratio (3.2times10^{-4}) measured in dense molecular gas in the Milky Way, scaled to the SMC's 0.2,Z_{odot} metallicity.
Exploring Public Attention in the Circular Economy through Topic Modelling with Twin Hyperparameter Optimisation
To advance the circular economy (CE), it is crucial to gain insights into the evolution of public attention, cognitive pathways of the masses concerning circular products, and to identify primary concerns. To achieve this, we collected data from diverse platforms, including Twitter, Reddit, and The Guardian, and utilised three topic models to analyse the data. Given the performance of topic modelling may vary depending on hyperparameter settings, this research proposed a novel framework that integrates twin (single and multi-objective) hyperparameter optimisation for the CE. We conducted systematic experiments to ensure that topic models are set with appropriate hyperparameters under different constraints, providing valuable insights into the correlations between CE and public attention. In summary, our optimised model reveals that public remains concerned about the economic impacts of sustainability and circular practices, particularly regarding recyclable materials and environmentally sustainable technologies. The analysis shows that the CE has attracted significant attention on The Guardian, especially in topics related to sustainable development and environmental protection technologies, while discussions are comparatively less active on Twitter. These insights highlight the need for policymakers to implement targeted education programs, create incentives for businesses to adopt CE principles, and enforce more stringent waste management policies alongside improved recycling processes.
Urban Air Pollution Forecasting: a Machine Learning Approach leveraging Satellite Observations and Meteorological Forecasts
Air pollution poses a significant threat to public health and well-being, particularly in urban areas. This study introduces a series of machine-learning models that integrate data from the Sentinel-5P satellite, meteorological conditions, and topological characteristics to forecast future levels of five major pollutants. The investigation delineates the process of data collection, detailing the combination of diverse data sources utilized in the study. Through experiments conducted in the Milan metropolitan area, the models demonstrate their efficacy in predicting pollutant levels for the forthcoming day, achieving a percentage error of around 30%. The proposed models are advantageous as they are independent of monitoring stations, facilitating their use in areas without existing infrastructure. Additionally, we have released the collected dataset to the public, aiming to stimulate further research in this field. This research contributes to advancing our understanding of urban air quality dynamics and emphasizes the importance of amalgamating satellite, meteorological, and topographical data to develop robust pollution forecasting models.