$\renewcommand{\vec}[1]{\mathbf{#1}}$ $\newcommand{\tens}[1]{\mathrm{#1}}$ $\renewcommand{\matrix}[1]{\tens{#1}}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\suml}{\sum\limits}$ $\newcommand{\intl}{\int\limits}$ $\newcommand{deriv}[1]{\frac{\mathrm{d}}{\mathrm{d}#1}\,}$ $\newcommand{dd}[1]{\mathrm{d}#1}$
Page 21 of 28 for blogjou | I am happy about any comments, remarks, critics, or discussions. Just send me a mail!

blogjou

It seems like at least the European CORONA crisis is coming to an end, so I need another socially accepted excuse for never being around anywhere. A blog!

  • Model–data synthesis in terrestrial carbon observation: imethods, data requirements and data uncertainty specifications

    The focus of this paper is observation of the carbon cycle, and in particular its land-atmosphere compo- nents, as one part of an integrated earth observation system.

    Introduction

    • model data synthesis: the combination of the information contained in both observations and models through both parameter-estimation and data-assimilation techniques
      • model testing and data quality control
      • interpolation of spatially and temporally sparse data
      • inference from available observations of quantities which are not directly observable
      • forecasting
    • 3 themes:
      • model-data syntheis based on terrestrial biosphere models as essential component of a terrestrial carbon observation ssystem (TCOS)
      • data uncertainties as as important as data values
      • sound uncertain specifications

    Purposes and attributes of a TCOS

    A succinct statement of the overall purpose of a TCOS might be: to operationally monitor the cycles of carbon and related entities (water, energy, nutrients) in the terrestrial biosphere, in support of comprehensive, sustained earth observation and prediction, and hence sustainable environmental management and socio- economic development.

    • a TCOS needs:
      • scientific credibility
      • respect carbon budgets
      • high spatial resolution
      • high temporal resolution
      • large number of entities
      • sufficient range of processes
      • partitioninong of net fluxes
      • quantification of uncertainty
      • altogether: swiss army knife (eierlegende Wollmilchsau)

    Model–data synthesis: methods

    Overview

    All applications rest on three foundations: a model of the system, data about the system, and a synthesis approach.

    Model

    • ODE or difference equation, including a noise term
    • noise accounts for imperfect model formulation and stochastic variability in forcings and parameters

    Data

    • two types:
      • observations and measurements
        • $z=h(x, u) + \text{noise}$, where $h$ specifies the deterministic relationship between the measured quantities $z$ and $u$ and the system state $x$; z accounts fot mesurement error and representation error
      • prior estimates for model quantities

    Synthesis

    • finding optimal match between data and model
    • 3 kinds of output:
      • optimal estimates for model properties to be adjusted (target variables)
      • uncertainties about these estimates
      • assessment of fitting the data, given the uncertainties
    • 3 basic choices:
      • target variables
      • cost function
      • search strategy for optimal values
        • nonsequential: all data treated at once
        • sequential: data incorporated step by step

    Target variables

    • model parameters $p$, forcing variables $u$, initial conditions $x^0$m state vector $x^n$ itself: all collected in vector $y$
    • parameter estimation problems: $y=p$
    • data assimilation problems: target variables can be any model property, with emphasis on state variables

    Cost function

    • common choice:

    \begin{equation} \label{eqn:cf} J(y) = (z-h(y))^T[\operatorname{Cov}\,z]^{-1}(z-h(y)) + (y-\hat{y})^T[\operatorname{Cov}\,\hat{y}]^{-1}(y-\hat{y}) \end{equation}

    • $\hat{y}$ vector of priors (a priori estimates for target variables)
    • model–data synthesis problem: vary $y$ to minimize $J(y)$, subject to the constraint that $x(t)$ must satisfy the dynamic model
    • $y$ at the minimum is the a posteriori estimate of $y$, including information from the observations as well as the priors
    • Eq. \eqref{eqn:cf} is minimum-variance estimate for $y$
      • for any error distributed it is unbiased
      • minimum error covariance among all in $z$ linear and unbiased estimates
      • if error distributions Gaussian, even a maximum likelihood estimate for $y$, conditional on data and model dynamics
    • other choices for other problems or other error distributions

    Search strategies for nonsequential problems

    Example

    Thus the cost function, and thence the entire minimization, takes a form in which neither the observations nor the prior estimates appear; they are replaced by quantities a and b scaled by the square roots of the inverse covariance matrices, which are measures of confidence. This is no mathematical nicety; rather it demonstrates that the data and the uncertainties are completely inseparable in the formalism. To put the point provocatively, providing data and allowing another researcher to provide the uncertainty is indistinguishable from allowing the second researcher to make up the data in the first place.

    Algorithms for nonsequential problems

    A high condition number of the Hessian of $J$ indicates that some linear combination(s) of the columns are nearly zero, that is, that the curvature is nearly zero in some direction(s), so that the minimization problem is ill-conditioned, as in the case of a valley with a flat floor.

    • analytical solution: only possible of $h(y)=H\,y+\text{noise}$ linear- gradient descent: simple and low cost, but tend to find local minima near starting value rather than global minimum
    • global search: find global minimum by searching through the whole $y$ space: overcome local minimum problem, but high costs
      • for example, simulated annealing finds the vicinity of a global minimum
      • then apply gradient descent from there

    Search strategies for sequential problems

    • Kalman filter, genetic methods
    • adjoint methods (backward integration)

    Discussion of model–data synthesis methods

    Differences between nonsequential and sequential strategies.
    • advantages sequential:
      • optimal state can differ from that embodied in model equations
      • $x^n$ required to be included in $y$ (leads to intractable dimensionality in nonsequential models
      • size does not grow with length of model integration
      • can easily handle incremental extensions to time series observations
    • advantages nonsequential:
      • treat all data at once (see impacts of data at different points in time)
    Model and data error structures
    • often assumend Gaussian, no temporal correlations
    • generalizations active area of research
    Nonsequential and sequential parameter estimation
    • usually done nonsequantially (LS)
    • but: one can incorporate $p$ as part of $y$ and do sequential analysis
      • allows change of parameters by data, caused by catastrophic events

    Model–data synthesis: examples

    • parameter estimation
    • atmospheric inversion methods to infer surface-atmosphere fluxes from atmospheric composition observations and atmospheric transport models
    • combination, advantages:
      • different observations constrain different processes
      • different observations have different resolutions in space and time (also a problem)
    • weather forecast by atmospheric and ocean circulation models

    Data characteristics: uncertainty in measurement and representation

    We have emphasized that data uncertainties affect not only the predicted uncertainty of the eventual result of a model–data synthesis process, but also the predicted best estimate.

    • scale mismatches between measurement and models is part of the representation error

    An analogous temporal representation error arises when flask measurements (actually grab samples in time) are interpreted as longer-term means…

    A further contribution to representation errors for most atmospheric inversion studies to date has been the projection of possible source distributions to a restricted subspace, usually by dividing the earth into a number of large regions. This is done both for computational reasons and to reduce the error amplification arising from under-determined problems. Errors in the prescription of flux distributions within these regions give rise to a so-called aggregation error, described and quantified by Kaminski et al. (2001). This error can be avoided by using adjoint representations of atmospheric transport that do not require aggregation (Rodenbeck et al., 2003a,b).

    There are few experiments where representation errors can be evaluated, since this requires simultaneous knowledge of sources and atmospheric transport. However, one can use the range of model simulations as a guide (e.g. Law et al., 1996; Gurney et al., 2003).

    • GPP = [net assimilation]
    • net primary productivity (NPP) = [GPP - autotrophic respiration]
    • net ecosystem productivity (NEP) = [NPPheterotrophic respiration]
    • net biome productivity (NBP) = [NEP - disturbance flux]
    • disturbance flux: grazing, harvest, and catastrophic events (fire, windthrow, clearing)

    Mistmatches of measurement and model scale

    Observational issue

    There are several options to relate fine-scaled measurements to a coarse-scaled model:

    • $z_{\text{fine}}$ is a noisy sample of $z_{\text{coarse}}$, variability in $z_{\text{fine}}$ (covariance $R_{\text{fine}}$) treated as contribution to representation error
    • direct aggregation: $z_{\text{coarse}}$ is weighted average of $z_{\text{fine}}$
    • $z_{\text{fine}} = g(x_{\text{coarse}}, a_{\text{fine}})$: relate fine-scale observations to coarse-scale state variables and additional fine-scale sncillary data such as topography

    Scaling of dynamic model

    Translate $\mathrm{d}x/\mathrm{d}t=f(x, u, p)$ between scales:

    fine-scale and coarse-scale equations are different (for instance, biased with respect to each other) because of interactions between fine-scale variability and nonlinearity in the fine-scale function f(x, u, p)

    Summary and conclusions

    Critical error properties include:

    • the diagonal elements $[\operatorname{Cov}(z)]_{mm}=\sigma_m^2$ of the measurement error covariance matrix (where $\sigma_m^2$ is the error magnitude for an observation $z_m$)
    • the correlations between different observations, quantified by the off-diagonal elements of the covariance matrix
    • the temporal and the spatial structure of errors
    • the error distribution
    • possible scale mismatches between measurements and models
    • the representation of the observations in the model
  • It's perhaps a historical pecularity...

    It’s perhaps a historical pecularity, but we also lack a living woman poet who can rival Emily Dickinson and Elizabeth Bishop. Ideological cheerleading does not necessarily nurture grat, or even good, readers and writers; instead it seems to malform them.

  • Model Selection and Multimodel Inference

    We wrote this book to introduce graduate students and research workers in various scientific disciplines to the use of information-theoretic approaches in the analysis of empirical data. These methods allow the data-based selection of a “best” model and a ranking and weighting of the remaining models in a pre-defined set. Traditional statistical inference can then be based on this selected best model. However, we now emphasize that information-theoretic approaches allow formal inference to be based on more than one model (multimodel inference). Such procedures lead to more robust inferences in many cases, and we advocate these approaches throughout the book.

    Preface

    We recommend the information-theoretic approach for the analysis of data from observational studies. In this broad class of studies, we find that all the various hypothesis-testing approaches have no theoretical justification and may often perform poorly. For classic experiments (control–treatment, with randomization and replication) we generally support the traditional approaches (e.g., analysis of variance); there is a very large literature on this classic subject. However, for complex experiments we suggest consideration of fitting explanatory models, hence on estimation of the size and precision of the treatment effects and on parsimony, with far less emphasis on “tests” of null hypotheses, leading to the arbitrary classification “significant” versus “not significant.” Instead, a strength of evidence approach is advocated.

    Introduction

    1.1 Objectives of the Book

    • model parameters can provide insights even if not linked to directly observable variables
    • AIC used routinely in time series analysis
    • marriage of information theory and mathematical statistics: Kullback’s (1959) book
    • Akaike considered AIC an extension of R. A. Fisher’s likelihood theory
    • estimates of model selection uncertainty: inference problems that arise from in using the same data for both model selection and parameter estimation and inference; if irgnored, precision overestimated
    • multimodel inference (MMI): model averaging, confidence sets on models
    • small sample size: AIC$_c$ instead of AIC (Mina)

    1.2 Background Material

    1.2.1 Inference from Data, Given a Model

    • Fisher’s likelihood theory assumes that the model structure is known and correct, only the parameters are to be estimated

    1.2.2 Likelihood and Least Squares Theory

    • LS and likelohood yield identical estimators if structural parameters of residuals are normal and independent
    • LS:
      • $y_i = \beta_0 + \beta_i\,\cdot x_i + \epsilon_i$ with $\epsilon_i\sim\mathcal{N}(0,\sigma^2)$ independent
      • LS gives $\hat{\beta_0}$ and $\hat{\beta_i}$ such that the residual square sum $\operatorname{RSS} = \sum_i \epsilon_i^2$ is minimized
    • in likelihood function, data are given and model is assumed; interest in estimating unknown parameters: likelihood is function of only the parameters
    • $\mathcal{L}(\theta\,|\,data,\,model) = \mathcal{L}(\theta,|\,x,g)$ is the likelihood of the unknown parameter $\theta$, given data $x$ and model $g$
    • likelihood is a relative value
    • LS is a special case of ML
    • $\sigma^2$ is to be considered as a parameter, $\hat{\sigma}^2$ differs by a multiplicative constant (depending on number of parameters and samplie size) for LS and ML
    • in LS, $\operatorname{RSS}=n\hat{\sigma}^2$ is minimized, which is for all parameters other than $\sigma^2$ equivalent to maximizing $-1/2\cdot n\,\log \hat{\sigma}^2$

    1.2.3 The Critical Issue: “What Is the Best Model to Use?”

    As Potscher (1991) noted, asymptotic properties are of little value unless they hold for realized sample sizes.

    • model selection based on parsimony, information-theoretic criteria, selection uncertainty

    1.2.4 Science Inputs: Formulation of the Set of Candidate Models

    Building the set of candiate models is partially a subjective art; […] The most original, innovative part of scientific work is the phase leading to the proper question.

    • lots of exploratory work necessary
    • large datasets are likely to support more complexity: one hase to correct for this
    • Freedman’s paradox: If number of variables $\sim$ number of observations, high $R^2$ and so on possible even if $y$ independent of data
    • “An inference from a model to some aspect of the real world is justified only after the model has been shown to adequately fit relevant empirical data.”
    • careful thinking rather than brute force

    1.2.5 Models Versus Full Reality

    • fundamental assumption: none of the candidate models are the “true model” for the “true model” is infinite-dimensional

    Models, used cautiously, tell us “what effects are supported by the (finite) data available.” Increased sample size (information) allows us to chase full reality, but never quite catch it.

    If we were given a nonlinear formula with 200 parameter values, we could make correct predictions, but it would be difficult to understand the main dynamics of the system without some further simplification or analysis. Thus, one should tolerate some inexactness (an inflated error term) to facilitate a simpler and more useful understanding of the phenomenon.

    […] there are often several large, important effects, followed by many smaller effects, and, finally, followed by a myriad of yet smaller effects. […] Rare events that have large effects may be very important but quite difficult to study.

    Conceptually, the role of a good model is to filter the data so as to separate information from noise.

    1.2.6 An Ideal Approximating Model

    It is important that the best model is selected from a set of models that were defined prior to data analysis and based on the science of the issue at hand.

    There are many cases where two or more models are essentially tied for “best,” and this should be fully recognized in further analysis and inference, especially when they produce different predictions. In other cases there might be 4–10 models that have at least some support, and these, too, deserve scrutiny in reaching conclusions from the data, based on inferences from more than a single model.

    • good tool to assess model quanlity: small-sized confidence intervals with high confidende ($\geq 0.95$) for parameter values

    1.3 Model Fundamentals and Notation

    Introduction of model classes and notation.

    1.3.1 Truth or Full Reality $f$

    Truth is denoted by $f$ as an abstract concept because it is unknown.

    1.3.2 Approximating models $g_i(x\,|\,\theta)$

    Ideally, the set of $R$ models will have been defined prior to data analysis. These models specify only the form of the model, leaving the unknown parameters ($θ$) unspecified.

    1.3.3 The Kullback–Leibler Best Model $g_i(x\,|\,\theta_0)$

    • “K-L best” means relative to the unkown truth $f$

    The parameters that produce this conceptually best single model, in the class $g(x\,|\,\theta)$0, are denoted by $\theta_0$. Of course, this model is generally unknown to us but can be estimated; such estimation involves computing the MLEs of the parameters in each model ($\theta$) and then estimating K-L information as a basis for model selection and inference. The MLEs converge asymptotically to $\theta_0$ and the concept of bias is with respect to $\theta_0$, rather than our conceptual “true parameters” associated with full reality $f$.

    1.3.4 Estimated Models $g_i(x\,|\,\hat{\theta})$

    In a sense, when we have only the model form $g(x\,|\,\theta)$ we have an infinite number of models, where all such models have the same form but different values of $\theta$. Yet, in all of these models there is a unique K-L best model. Conceptually, we know how to find this model, given $f$.

    1.3.5 Generating Models

    One should not confuse a generating model or results based on Monte Carlo data with full reality $f$.

    1.3.6 Global Model

    Ideally, the global model has in it all the factors or variables thought to be important. Other models are often special cases of this global model. There is not always a global model. If sample size is small, it may be impossible to fit the global model. Goodness-of-fit tests and estimates of an overdispersion parameter for count data should be based (only) on the global model. The concept of overdispersion is relatively model-independent; however, some model must be used to compute or model any overdispersion thought to exist in count data.

    • In compartmental systems, a global model with $n$ pools is then the models where all pools are connected, and all pools have external inputs and outputs
    • other models or special cases are models with missing connections.
    • In terms of AIC and finding the right dimension ($n$, number of pools), should one first compare several “global” models with one another? If so, how (pure likelihood, AIC)?

    The advantage of this approach is that if the global model fits the data adequately, then a selected model that is more parsimonious will also fit the data (this is an empirical result, not a theorem).

    1.3.7 Overview of Stochastic Models in the Biological Sciences

    • linear and nonlinear regression (based on LS and ML)
    • log-linear and logistic models (mostly for count data)
    • compartmental models as state transitions (more advanced: “random effects”, Kreft and deLeeuw 1998)
    • differential equations in general
    • open and closed capture-recapture, band recovery, distance sampling
    • spatial models (Kriging can be viewed as LS technique), Gibbs sampler
    • spatiotemporal (MCMC methods)

    There are general information-theoretic approaches for models well outside the likelihood framework (Qin and Lawless 1994, Ishiguo et al. 1997, Hurvich and Simonoff 1998, and Pan 2001a and b). There are now model selection methods for nonparametric regression, splines, kernel methods, martingales, and generalized estimation equations. Thus, methods exist for nearly all classes of models we might expect to see in the theoretical or applied biological sciences.

    1.4 Inference and the Principle of Parsimony

    1.4.1 Avoid Overfitting to Achieve a Good Model Fit

    “How many parameters does it take to fit an elephant?”

    [Wel] concluded that the 30-term model “may not satisfy the third-grade art teacher, but would carry most chemical engineers intopreliminary design.”

    Wel’s finding is both insightful and humorous, but it deserves further interpretation for our purposes here. His “standard” is itself only a crude drawing—it even lacks ears, a prominent elephantine feature; hardly truth. A better target would have been a large, digitized, high-resolution photograph; however, this, too, would have been only a model (and not truth). Perhaps a real elephant should have been used as truth, but this begs the question, “Which elephant should we use?”

    1.4.2 The Principle of Parsimony

    Statisticians view the principle of parsimony as a bias versus variance tradeoff. In general, bias decreases and variance increases as the dimension of the model increases.

    If we believe that truth is essentially infinite-dimensional, then overfitting is not even defined in terms of the number of parameters in the fitted model.

    Instead, we reserve the terms underfitted and overfitted for use in relation to a “best approximating model”.

    Here, an underfitted model would ignore some important replicable (i.e., conceptually replicable in most other samples) structure in the data and thus fail to identify effects that were actually supported by the data. In this case, bias in the parameter estimators is often substantial, and the sampling variance is underestimated, both factors resulting in poor confidence interval coverage. Underfitted models tend to miss important treatment effects in experimental settings.

    Overfitted models, as judged against a best approximating model, are often free of bias in the parameter estimators, but have estimated (and actual) sampling variances that are needlessly large (the precision of the estimators is poor, relative to what could have been accomplished with a more parsimonious model). Spurious treatment effects tend to be identified, and spurious variables are included with overfitted models.

    The goal of data collection and analysis is to make inferences from the sample that properly apply to the population. The inferences relate to the information about structure of the system under study as inferred from the models considered and the parameters estimated in each model. A paramount consideration is the repeatability, with good precision, of any inference reached. When we imagine many replicate samples, there will be some recognizable features common to almost all of the samples. Such features are the sort of inference about which we seek to make strong inferences (from our single sample). Other features might appear in, say, 60% of the samples yet still reflect something real about the population or process under study, and we would hope to make weaker inferences concerning these. Yet additional features appear in only a few samples, and these might be best included in the error term $(\sigma^2)$ in modeling.

    The data are not being approximated; rather we approximate the structural information in the data that is replicble over such samples (see Chatfield 1996, Collopy et al. 1994). Quantifying that structure with a model form and parameter estimates is subject to some “sampling variation” that must also be estimated (inferred) from the data.

    Some model selection methods are “parsimonious” (e.g., BIC, Schwarz 1978) but tend, in realistic situations, to select models that are too simple (i.e., underfitted). One has only a highly precise, quite biased result.

    • precision: model results look similar for different datasets from the same source
    • overfittied models replicate the very dataset at hand and are thus imprecise, meaning that model parameters are uncertain

    This example illustrates that valid statistical inference is only partially dependent on the analysis process; the science of the situation must play an important role through modeling.

    • mass balance is such a scientific fact

    1.4.3 Model Selection Methods

    Generally, hypothesis testing is a very poor basis for model selection (Akaike 1974 and Sclove 1994b).

    • stepwise backwards (removing variables step by step): one misses parameter combinations and hence possible synergistic effects
    • cross-validation:
      • data are divided into two partitions, first partition is used for model fitting and the second is used for model validation
      • then a new partition is selected, and this whole process is repeated hundreds or thousands of times
      • some criterion is as basis for model selection, e.g. minimum squared prediction error
      • computationally expensive

    1.5 Data Dredging, Overanalysis of Data, and Spurious Effect

    • data dredging: analzing data and searching for patterns without questions or goals
    • resulting models overfitted and without predivtive power

    1.5.1 Overanalysis of Data

    • two versions of data dredging: iteratively adding variables and trying “all” models
    • with increasing computer power become more poular
    • better: think before doing data dredging

    Journal editors and referees rarely seem to show concern for the validity of results and conclusions where substantial data dredging has occurred. Thus, the entire methodology based on data dredging has been allowed to be perpetuated in an unthinking manner.

    We believe that objective science is best served using a priori considerations with very limited peeking at plots of the data, parameter estimates from particular models, correlation matrices, or test statistics as the analysis proceeds.

    • more use of likelihood (compuationally more expensive, more flexible) then least squares
    • less hypothesis thesting, more estimation of effects and confidence intervals
    • no formal test theory for model selection exists, how to interpret diferent $P$-values from tests with different powers
    • likelihoods require nested models

    1.6 Model Selection Bias

    • data are used to both select a parsimonious model and estimate the model parameters and their precision (i.e., the conditional sampling covariance matrix, given the selected model).
    • large biases in regression coefficients are often caused by data-based model selection
    • if a variable would be selected (model selection) into a model by only few of a large numble of samples, this variable’s importance will be vastly overestimated if one looks only at one of the datasets which would suggest to include the variable
    • you actually don’t know that because you only have this particular dataset available
    • even $t$-tests will tell you to include this variable

    1.7 Model Selection Uncertainty

    Denote the sampling variance of an estimator $\theta$, given a model, by $\operatorname{var}(\theta\,|\,\text{model})$. More generally, the sampling variance of $\hat{\theta}$ should have two components: (1) $\operatorname{var}(\theta\,|,\text{model})$ and (2) a variance component due to not knowing the best approximating model to use (and, therefore, having to estimate this). Thus, if one uses a method such as AIC to select a parsimonious model, given the data, and estimates a conditional sampling variance, given the selected model. Then estimated precision will be too small because the variance component for model selection uncertainty is missing.

    • problems for inference, probably not for mere data description
    • proper model selection is accompanied by a substantial amount of uncertainty
    • bootstrap techniques can allow insights into model uncertainty
    • choosing a model completely independent of the data has hidden costs in lack of reliability

    1.8 Summary

    • model selection includes scientific understanding: which models to include in the candidate sets and which not
    • data dreding weakens inferences
    • information-theoretic can be used to select a model
    • multimodel inference: models are ranked and scaled to understand model uncertainty

    Data analysis is taken to mean the entire integrated process of a pri- ori model specification, model selection, and estimation of parameters and their precision. Scientific inference is based on this process.

    • databased selection of a parsimonious model is challenging
    • rewards: valid inferences
    • dangers: underfitting or overfitting, model selection bias and model selection uncertainty
  • Lust is the hero-villain...

    Lust is the hero-villain of this night-piece of the spirit, male lust for the “hell” that concludes the sonnet, hell being the Elizabethean-Jacobean slang for the vagina. The ancient commonplace of sadness-after-coition achieves its apothesis in Sonnet 129, but at more than the expense of spirit. So impacted is this sonnet’s language that it evades its apparent adherence to the Renaissance believe that each sexual act shortens a man’s life.

  • I love "Sir Patrick Spence"...

    I love “Sir Patrick Spence” because it has a tragic economy almost unique in its stoic heroism. There is a sense throughout the poem that heroism is necessarily self-destructive, and yet remains admirable.

  • Mapping the deforestation footprint of nations reveals growing threat to tropical forests

    The authors provide a fine-scale representation of spatial patterns of deforestation associated with international trade. They find that many developed countries have increased the deforestation embodied in their imports Consumption patterns of G7 countries drive an average loss of 3.9 trees per person per year. The results emphasize the need to reform zero-deforestation policies through strong transnational efforts and by improving supply chain transparency, public–private engagement and financial support for the tropics.

    Current situation

    • deforestation is permanently increasing
      • driven by international trade
      • negative impact on global climate and biodiversity
    • many developed/developing countries with net domestic forest gain but imported deforestation
    • spatial distribution of deforestation embodied in imports not well known

    Questions

    • Which deforestation hotspots are driven by which consumer countries?
    • Which forest ecosystems, tropical rain forests or other forest types are the top targets of global supply chains?

    Results

    • using a global supply chain model, high resolution maps- of deforestation footprints of various nations were built
      • Germany: cocoa in Ivory Coast and Ghana, coffee in Vietnam
      • Japan: cotton and sesame seed in Tanzania
      • China: timber and rubber in Indochina
      • USA: Cambodia, Madagascar, Liberia, Central America, Chile, Amazon through timber, rubber, beef, fruits, nuts
    • many developed/developing countries increase their imported damage faster than their domestic mitigation
    • tree loss per capita and year
      • G7: 4
      • USA: 8
      • Sweden 22: biomass for energy supply
    • different usage requires different tree types with different impacts on biodiversity
    • tropical: USA, Germany, Singapore, China, Russia have increasing net imports from all biomes, but rapidly increasing from tropics
    • tropical and mangrove rainforest deforestation increased GDPs per capita in developed countries, trade patterns remained the same, leading to even more deforestation (except for Norway and Sweden)

    Discussion

    • maps can help countries to improve their deforestation footprint and its ecological impact (climate and biodiversity)
    • to maintain net forest gains, G7 and China and India outsourced their deforestation with increasing tendency
    • no subnational analysis included
    • maps can be used by each country to think about their personal consumptional behavior and supply chains
    • international strategies, collaboration of private and public sector, financial support for exporting countries, and transparent supply chains necessary

    My comments

    The 22 trees for Sweden are shocking given that only recently I watched a documentary praising Sweden for their smooth and successful transition to renewable energies.

  • Global maps of twenty-first century forest carbon fluxes

    The authors introduce a geospatial monitoring framework that integrates ground Earth observation data to map annual forest-related greenhouse gas emissions and removels from 2001 till 2019. They estimate that global forests were a carbon sink of $-7.6\,$GtCO$_2$e yr$^{-1}$ ($-15.6+8.1$). The final goal is to support forest-specific climate mitigation with both local detail and global consistency.

    Current situation

    • land use change patterns change faster than modelled
    • distinguishing anthropogenic from non-anthropogenic effects possible only by direct observation
    • different approaches lead to very different global net forest fluxes (projects, models vs inventories, countries, etc).
      $\to$ forests’ role in climate mitigation unclear
      $\to$ discouraging to take transformational actions

    What’s new

    • transparent, independent and spatially explicit global system for monitoring collective impact of forest-related climate policies by diverse actors across multiple scales
    • separation of sources from sinks

    Global distribution of forest emissions and removals

    • most uncertainty in global gross removals
    • tropical forests with highest gross fluxes, highest net sinks in temporal and boreal forests

    Fluxes for specific localities and drivers of forest change

    • Brazilian amazon forest a net source, greater Amazon River Basin a net sink
    • smaller Congo River basin six times higher net sink due to lower emissions

    A flexible data integration framework

    • three tiers of methods, parameters, and data sources with different complexity and accuracy
    • results most sensitive to data sources

    Forest fluxes in the global carbon budget

    • results not comparable to other global estimates (net vs gross, forests vs all, all GHGs vs CO$_2$)
    • no way here to distinguish between anthropogenic and non-anthropogenic effects (data not available on small scales)
    • net CO$_2$ forest sink larger than in Global Carbon Project because emissions might not be completely captured here by the medium resolution satellite observations used to underpin the analysis

    Limitations and future improvements

    • data spatially detailed with temporal inconsistencies:
      • lack of consistent time-series of forest regrowth
      • lack of consistent time-series for global loss product
    • for many forests required long-term inventories do not exist

    Conclusions

    • reduce deforestation is important
    • mitiagtion effects of intact (middle-)old forests often underestimated
    • maps better than tables

    The global forest carbon monitoring framework introduced here, and the main improvements identified above, allow for efficient prioritization and evaluation of how data updates and improvements influence GHG flux estimates and their uncertainties.

    My comments

    As far as I understood, the authors seem to see the problem that there is no time to install detailed and long-term monitoring systems for forests, so they propose to synthesize available data by means of general standards.

    Ideas

    Giulia asked whether a 30mx30m resolution is rather necessary or hindering. What about an AIC for spatial resolution?

  • From self-information to thermodynamic entropy

    The goal of this post is to introduce Shannon entropy as an information theory concept with its origin in self-information of events, and then linking it to the thermodynamic concept of entropy through maximization.

  • Carbon cycle in mature and regrowth forests globally

    The authors compile the Global Forest Database (ForC) to provide a macropscopic overview of the C cycle in the world’s forests. They compute the mean and standard deviation of 24 flux and stock variables (no soil variables) for mature and regrown (age < 100 years) forests. C cycling rates decrease from tropical to temperate to boreal forests. The majority of flux variables, together with most live biomass pools, increased significantly with the logarithm of stand age.

    1. Introduction

    • forests photosynthesize 69 GtC/year, leading to being a C sink accounting for 29% of fossil fuel emissions (problem: deforestation)
    • regrowth (= secondary) forests become increasingly important
    • biomes: categories for different climate and vegetation
    • NEP = GPP - $R_{\text{eco}}$: net ecosystem production = gross primary production - total ecosystem respiration
    • biomass accumulation increases rapidly in young forests, followed by a slow decline to near zero in old forests

    2. Methods and design

    • synthesis of many existing databases with the goal of understanding how C cycle varies depending on location and stand age
    • R scripts and manual edits
    • unit dry organic matter converted to C by C=0.47 OM (IPCC, 2018)
    • 4 biome types (tropical broadleaf, temperate broadleaf, temperate needleleaf, boreal needleleaf) and 2 age classes (young, mature)
    • C budget assumed closed if mean of components summed to within one standard deviation of the aggregate variable
    • effect of stand age tested by using mixed effects models
    • logarithmic fit also due to lack of sufficient data to use more parameters

    3. Review results and synthesis

    • mature forests:
      • fluxes: tropical > temperate > boreal
      • NEP: no significant trend
      • mean stocks: tropical > temperate > boreal
      • max. stocks in temperate biomes
    • young forests:
      • fluxes and stocks increase with $\log_{10}$ of age
      • fluxes: tropical > temperate > boreal
      • NEP: temperate > boreal

    4. Discussion

    • variation in NPP in mature forests less controlled by climate, more by moderate disturbance and $R_{\text{soil}}$ vs C inputs
    • organic layer (OL) highest in boreal forests due to slow decomposition
    • NEP increases for first 100 years
    • future forest C cycling will shape climate (Song et al. 2019, Schimel et al. 2015)
    • ForC contains ground data for variables that cannot be measured (at least directly) remotely, such as respiration fluxes

    5. Conclusions

    • loss of biomass from mature forests cannot be recovered on time scales relevant for mitigating climate change
    • conservation of mature forests most important

    Ideas

    By definition, future projections extend our existing observations and understanding to conditions that do not currently exist on Earth (Bonan and Doney 2018, Gustafson et al 2018, McDowell et al 2018). To ensure that models are giving the right answers for the right reasons (Sulman et al 2018), it is important to benchmark against multiple components of the C cycle that are internally consistent with each other (Collier et al 2018, Wang et al 2018).

    What about applying information partitioning to ForC?

  • The pleasures of reading...

    The pleasures of reading indeed are selfish rather than social. You cannot directly improve anyone else’s life by reading better or more deeply. I remain skeptical of the traditional social hope that care for others may be stimulated by the growth of individual imagination, and I am wary of any arguments whatsoever that connect the pleasures of solitary reading to the public good.