Data Terrariums - Part I
As part of our EFC Seminar, we helped write the following grant proposal to UCSD’s CA CARES Grant.
Terraria are small-scale ecosystems in a sealed glass container; many students construct them in elementary school as a learning exercise about balanced ecosystems. In concept, the only input to a terrarium is solar energy, which passes through the glass. The natural water and life cycles of the plants, fungi, and animals within the sealed terrarium are sufficient for continued life within the ecosystem. This is possible because the density of solar energy is sufficient to power these cycles - terrariums are scale-limited, but at their size, a circular ecosystem powered by the sun is sustainable for very long spans of time, with the oldest terrarium today still functioning after nearly 60 years.
Data centers and workstations for data processing are usually large open systems with many inputs (water, power, bandwidth) and outputs (waste heat, warmed water or air, carbon emissions from power generation). However, the need for such inputs and outputs is a function of their scale. Because of the density of compute capacity available today, many tasks including interacting with large language models and processing genomic data can be done on even a single GPU in a very limited power budget. In addition, the price and efficiency of solar cells has created an opportunity to sustain power-efficient compute capacity via the sun.
A data terrarium consists of the following components: a small PC with a power efficient GPU; a small consumer lithium-ion battery backup; and sufficient solar capacity to match the power consumption of the PC (usually 100-500 watts). By shrinking the scale of data processing and carefully balancing the workload, power, compute, storage, cooling, and solar capacity, we can build an enclosed and self-sustaining ecosystem where the only inputs/outputs are input solar energy and data flow over WiFi. Each terrarium can be built for less than $10,000 apiece and run deep learning models that power user-interaction with chatbots, image and video processing and generative AI, and genomic data processing.
We can demonstrate that a sustainable data system can be built at small scale and learn how to integrate these systems into the local environment by developing a scheduling program using geographic information systems (GIS) data about solar availability. Just as one might move a terrarium to a sunnier spot in winter and a cooler spot in summer, we will use this program to plan the location of our data terrarium to sustain net-zero external energy input. This software can be scaled to predict the solar capacity needs of larger data centers and estimate and improve the level of sustainability based on their size, workload, and location. By demonstrating this is possible at small scale, we can create a new method for planning data processing centers that is distributed, resilient, and sustainable.