AMPERE and Energy Efficiency in Model-Based Designed Cyber-Physical Systems

AMPERE and Energy Efficiency in Model-Based Designed Cyber-Physical Systems
Björn Forsberg & Thomas Benz
Figure 1
Photo 1

 

The Digital Circuits and Systems Group of ETH Zurich has been working on energy-efficient digital systems for many years. As part of AMPERE, the team is developing efficient energy optimization and monitoring techniques.

 

Energy is a first-tier citizen in Cyber-physical Systems of Systems. With technology scaling slowly stagnating and higher demand for computational performance, modern computing systems rely on energy efficiency to operate correctly without overheating. The current trend towards 2.5D and 3D integration only intensifies this problem as the transistors are covered by multiple stacks of memory reducing temperature flow and therefore cooling significantly. Additionally, embedded or extreme-edge systems may run on battery power or even rely on energy harvesting to work off-grid. Increasing such a system's energy efficiency prolongs the time it can operate autonomously.

Measuring or estimating a system's power and a task's energy consumption is the first step in improving energy efficiency. Classically, this is done with a dedicated power meter in the system's supply. Whilst being the simplest method, it, unfortunately, has major downsides: 

  • a dedicated power meter is an additional hardware that increases the system's cost, size, and power consumption.
  • modern SoCs consist of a multitude of on-chip functional units, rendering it extremely costly or impossible to place each in a dedicated power domain (with a dedicated power pin at the package) to allow individual monitoring of the units.
  • external measurement circuits are interfaced over low-bandwidth peripheral buses like I2C, limiting the temporal resolution of the power data available.

The energy optimization component of the AMPERE ecosystems is centered around three main parts, as shown in figure 1 below: the platform characteristics, the offline, and the online phase.

 

Figure 2
Figure 1

 

The platform characterization holds platform-specific and workload-agnostic information about the system, that is used as input for the offline optimization phase. To the platform characterization, we add two pieces of information. First, a table holding the correlation between the individual performance counters and the systems power consumption, which we use to identify the smallest subset of counters we need during the offline and online phases. This is required as systems can only track a limited number of events at a time. Second, we generate a dataset to infer the workloads compute or memory boundedness from the performance counters, to enable efficient dynamic voltage frequency scaling of individual system components. Depending on how memory or compute-bound a process is, the performance is differently impacted by frequency scaling of e.g., the main memory and the processor.

The information from the platform characterization is used together with a single-frequency profiling run in the offline optimization stage. The offline phase provides a model to estimate a task’s energy consumption given an arbitrary operating point; without the need to re-profile the system at this given operating point. This allows efficient energy efficiency optimization by cutting out the time-consuming step of frequent re-profiling. By evaluating the system with all possible operating points to our models, we can determine at which point the system is working at its highest efficiency.

The main components of the offline optimizer are three tables: E_f(alpha), P_f(alpha), t_f(alpha), which provide an energy cost-function E_f that from a given arithmetic intensity alpha over a number of frames F (i.e., a time period given by [t_start, t_end)) provides an energy estimate E for a selected frequency f. This uses an activity-frequency dependency table P_f, to map changes in the controllable variable f on a given alpha to produce the power P. The change in the time component t (as given from frame F) at each frequency f, which is needed to compute the energy is estimated by a look-up table t_f, trained using the counter-trace table provided in the Platform Characteristics.

And finally, for the online phase, we use performance counters effectively to estimate the systems' energy consumption. The Online Monitor uses performance counters to estimate the scoped energy usage of the system, where a scope S is defined by a triple (core, t_start, t_end). Scoping enables the tracking of individual tasks in the system, as the scope allows the tracking of a single task during migration, by providing the scope S that corresponds to the system’s static schedule, or by dynamically updating the scope at dynamic scheduling decisions.

 

 

Figure 3
Photo 2

 

This approach has an accuracy at par with classical dedicated off-chip power monitors but at a much lower runtime overhead. On top of that, the approach allows us to estimate the individual contributions of the chips functional units, as performance counters are implemented at a very fine granularity, and enabling the scoped energy usage component. The Online Monitor is invoked with (S, E_budget), i.e., the scope to measure, and the energy budget for that scope. If the energy budget is violated, an event is triggered. Using this information we can ensure that the system stays within the bounds computed during the offline phase, and take additional action to further optimize energy efficiency.

The process of integrating these models is in motion with the model-based design (AMALTHEA) used as a front-end in AMPERE. By leveraging high-level model information as well as profiling and platform characteristics, the energy efficiency component will enable co-optimization with other optimization criteria in the AMPERE framework, and enable parallel and efficient Cyber-Physical Systems of Systems to be designed using Model-Based Design tools.

 

 

Figure 4
Photo 3

 

 

Photos (c) ETH Zürich.