Understanding the physical processes that underly the functioning of biological computing machinery often requires describing processes that occur far from thermodynamic equilibrium. In recent years significant progress has been made in this area, most notably Jarzynski’s work relation and Crooks’ fluctuation theorem. In this talk I will explore how dissipation of energy is related to a system’s information processing inefficiency. The focus is on driven systems that are embedded in a stochastic operating environment. If we describe the system as a state machine, then we can interpret the stochastic dynamics as performing a computation that results in an (implicit) model of the stochastic driving signal. I will show that instantaneous non-predictive information, which serves as a measure of model inefficiency, provides a lower bound on the average dissipated work. This implies that learning systems with larger predictive power can operate more energetically efficiently. We could speculate that perhaps biological systems may have evolved to reflect this kind of adaptation. One interesting insight here is that purely physical notions require what is perfectly in line with the general belief that a useful model must be predictive (at fixed model complexity). Our result thereby ties together ideas from learning theory with basic non-equilibrium thermodynamics.