This paper concerns the problem of optimal monitoring network layout using information-theoretical methods. Numerous different objectives based on information measures have been proposed in recent literature, often focusing simultaneously on maximum information and minimum dependence between the chosen locations for data collection stations. We discuss these objective functions and conclude that a single-objective optimization of joint entropy suffices to maximize the collection of information for a given number of stations. We argue that the widespread notion of minimizing redundancy, or dependence between monitored signals, as a secondary objective is not desirable and has no intrinsic justification. The negative effect of redundancy on total collected information is already accounted for in joint entropy, which measures total information net of any redundancies. In fact, for two networks of equal joint entropy, the one with a higher amount of redundant information should be preferred for reasons of robustness against failure. In attaining the maximum joint entropy objective, we investigate exhaustive optimization, a more computationally tractable greedy approach that adds one station at a time, and we introduce the “greedy drop” approach, where the full set of stations is reduced one at a time. We show that no greedy approach can exist that is guaranteed to reach the global optimum.

Over the last decade, a large number of papers on information-theory-based design of monitoring networks have been published. These studies apply information-theoretical measures on multiple time series from a set of sensors, to identify optimal subsets. Jointly, these papers

In this paper, we do not answer the question “what is optimal?” with an optimal network. Rather, we reflect on the question of how to define optimality in a way that is logically consistent and useful within the monitoring network optimization context, thereby questioning the widespread use of minimum dependence between stations as part of the objectives. In fact, we argue that minimizing redundancy is a redundant objective.

The objective of a hydrological monitoring network depends on its purpose, which can usually
be framed as supporting decisions. The decisions can be relating to management of water systems, as for example considered by

The decision problem of choosing an optimal monitoring network layout
needs an explicit objective function to be optimized. While this objective could be stated in terms of a utility function

Although ultimately the objective will be a more general utility, the focus of this paper is on information-theoretical methods for monitoring network design, which typically do not optimize for a specific decision problem supported by the network. Because information and utility (value of information) are linked through a complex relationship, this does not necessarily optimize decisions for all decision makers. Since we do not consider a specific decision problem, the focus in the present paper is on methods for maximization of information retrieved from a sensor network.

In this paper, the rationale behind posing various information-theoretical objectives is discussed in detail. While measures from information theory provide a strong foundation for mathematically and conceptually rigorous quantification of information content, it is important to pay attention to the exact meaning of the measures used. This paper is intended to shed some light on these meanings in the context of monitoring network optimization and provides new discussion motivated in part by recently published literature.

We present three main arguments in this paper. Firstly, we argue that objective functions for optimizing monitoring networks can, in principle, not be justified by analyzing the resulting networks from application case studies. Evaluating performance of a chosen monitoring network would require a performance indicator, which in itself is an objective function. Case studies could be helpful in assessing whether one objective function (the optimization objective) could be used as an approximation of another, underlying, objective function (the performance indicator). However, from results of case studies we should not attempt to draw any conclusions as to what objective function should be preferred. In other words: the objective function is intended to assess the quality of the monitoring network, as opposed to a practice in which the resulting monitoring networks are used to assess the quality of the objective function. Secondly, we argue that, in purely information-based approaches, the joint entropy of all signals together is in principle sufficient to characterize information content and can therefore serve as a single optimization objective. Notions of minimizing dependence between monitored signals through incorporation of other information measures in the objective function lack justification and are therefore not desirable. Thirdly, we could actually argue for maximizing redundancy as a secondary objective because of its associated benefits for creating a network robust against individual sensors' failures. The reason is that the undesirable information inefficiencies associated with high dependency or redundancy are already accounted for in maximizing joint entropy. Minimization of redundancy would mean that each sensor becomes more essential, and therefore the network as a whole would be more vulnerable to failures in delivering information. Adding a trade-off with maximum redundancy is outside the scope of this paper but serves to further illustrate the argument against use of minimum redundancy.

The information-theoretical approach to monitoring network design is not the only option, and other objective functions have also been used for this problem. Examples are cost, geographical spread, and squared-error-based metrics. Some approaches use models describing spatial variability with certain assumptions, e.g., Kriging

In this paper, our main focus is to discuss the formulation of information-theoretical objective functions and previous literature. Therefore, we restrict our scope to those information-theory-based objective functions based on observed data on one single variable in multiple locations. Keeping this limited scope allows us to discuss the interpretation of these objective functions for monitoring network design, which formalize what we actually want from a network. Furthermore, we investigate whether the desired optimum in the objective function can be found by greedy approaches or whether exhaustive search is needed to prevent a loss of optimality.

Only after it is agreed on what is wanted from a network and this is captured as an optimization problem do other issues such as the solution or approximation of the solution to the problem become relevant. The numerical approach to this solution and calculation of information measures warrants independent discussion, which is outside our current scope and will be presented in a future paper. Our discussion is numerically illustrated by a case study using data from Brazos River in Texas, as presented in

In this paper, since we are discussing the appropriate choice of objective function, there is no experimental setup that could be used to provide evidence for one objective version vs. the other. Rather, we must make use of normative theoretical reasoning and shine light on the interpretation of the objectives used and their possible justifications. The practical case studies in this paper therefore serve as an illustration but not as evidence for all the conclusions advocated in this paper, some of which are arrived at through interpretation and argumentation in the Results and discussion section.

The paper is organized as follows. In the following Methodology section, we introduce the methods used to investigate and illustrate the role of objective functions. In Sect. 3, we discuss the case study on the streamflow monitoring network of Brazos River. Section 4 introduces the results for the various methods and then discusses the need for multiple objectives, the interpretation of trade-offs between redundancy and total information, and the feasibility of greedy algorithms reaching the optimum. The article concludes with summarizing the key messages and raising important questions about the calculation of the measures to be addressed in future research.

In monitoring network design, IT has been applied in the literature to evaluate data collection networks that serve a variety of purposes, including rainfall measurement, water quality monitoring, and streamflow monitoring. These evaluations are then used to optimize the placement of sensors. In the monitoring network optimization literature, three expressions from IT are often used in monitoring network design: (1) entropy

The Shannon entropy

In this paper, we argue for the maximum joint entropy (maxJE) objective for maximizing the total information collected by a monitoring network. This is equivalent to the GR3 objective proposed by

Venn diagram illustrating the relations between the relevant information measures. In the legend, the joint and marginal information-theoretical quantities (joint) entropy

Our research compares and contrasts a variety of objective functions from literature. Information-theory-based multi-objective optimization methods for monitoring networks have gained significant attention recently. Maximizing network information content, through either the sum of marginal entropy or joint entropy, is the common theme among existing methods

Various information-theoretical objectives used by methods proposed in recent literature.

The table shows whether an objective is maximized (max) or minimized (min) or forms part of a weighted objective function that is maximized with weights

Apart from the objective function, the optimization of monitoring networks is also characterized by constraints. These constraints can either be implemented for numerical reasons or for the reflection of practical aspects of the real-world problem. The majority of existing literature listed in Table

In this paper, for the maximization of joint entropy that we advocate, we consider and compare three cases for constraints with a large influence on computational cost, with the purpose of investigating whether these influence the results. We also interpret the constraints as reflections of placement strategies.
Firstly, the “greedy add” strategy is the commonly applied constraint that each time the network expands, the most favorable additional station is chosen, while leaving the already chosen network intact. The optimal network for

In this comparison, we investigate whether the exhaustive optimization yields a series of networks where an increase in network size may also involve relocating stations. This may not always be practically feasible or desired in actual placement strategies, where networks are slowly expanded one station at a time. Occurrence of relocation in the sequence of growing subsets would also show that no greedy algorithm could exist that guarantees optimality.

Template illustrations of information interactions with

In this paper, we argue that, due to the additive relations of information
measures (Eq.

There are two important caveats with these visualizations. In the general Venn diagram of three-variate interactions, the “interaction information”, represented by the area where three circles overlap, can become negative. Hence, the Venn diagram ceases to be an adequate visualization. For similar reasons, in the chord diagram, the sector size of outer arc lengths should not be interpreted as total information transferred

In this paper, we use Venn diagrams to illustrate information relations between three groups of variables. Group one is the set of all sensors that are currently selected as being part of the monitoring network, which we denote as

For the purpose of illustrating the main arguments of this study, we compare maxJE objective function (Eq.

Resulting maximum joint entropy (bits) for the different network sizes found with different methods for the Brazos River case study (JE used exhaustive optimization).

Note that MIMR's trade-off weight (

Optimal gauge orders found with different methods for the Brazos River case study.

Note that for the last five stations, indicated with

Brazos streamflow network and USGS stream gauges' locations.

Brazos River streamflow

In previous studies, the focus of the research has been on finding an optimal network for the subject case study with only little discussion on the theoretical justification of applying a new methodology. For this reason, the primary goal of this paper is critically discussing the rationale for use of several objective functions in monitoring network design. To illustrate differences between the methods, we decided to apply our methodology to the Brazos River streamflow network (Fig.

In

Spatial distribution of the top eight streamflow gauges ranked by different objectives.

Approximately proportional Venn diagrams showing the evolution of information measures when progressively (going down the rows) selecting stations (selected station for each step indicated by the numbers) using four different methods (in the different columns). The interpretation of the color-coded areas representing the information measures is the same as in Fig.

As indicated in the Introduction, we should not attempt to gauge the merits of the objective functions by the intuitive optimality of the resulting network. Rather, the merits of the networks should be gauged by the objective functions. Still, the case study can provide insight into some behaviors resulting from the objective functions.

To assess and illustrate the workings of the different objectives in retrieving information from the water system, we compared three existing methods with a direct maximization of the joint entropy of selected sensors,

The most notable difference between maxJE and the other methods is the selection of all three of the stations located most downstream. While other methods would not select these together due to high redundancy between them, maxJE still selects all stations because despite the redundancy, there is still found to be enough new information in the second-most and third-most downstream station. This can be in part attributed to the quantization choice of equally sized bins throughout the network, leading to higher information contents downstream. While this quantization choice is debatable, it is important, in our opinion, to not compensate artifacts from quantization by modifying the objective function, even if the resulting network may seem more reasonable, but rather to address those artifacts in the quantization choices themselves. To repeat the key point: an objective function should not be chosen based on whether it yields a “reasonable network” but rather based on whether the principles that define it are reasonable.

Though already necessarily true from the formulation of the objective functions, we use the case study to illustrate how other methods with a separate minimum redundancy objective lead to the selection of stations with lower new information content (green area in Fig.

Evolution of pairwise information interaction between already selected stations in the previous iterations and new proposed station. Green and red links represent proportional conditional entropy and mutual information, respectively. Links with black border emphasize the information interactions with the new proposed station in each iteration.

The resulting total correlation and joint entropy for all 924 possible combinations of 6 out of 12 sensor locations. In some past approaches, a Pareto front in the lower right corner is given importance. In this paper, we argue that this trade-off is irrelevant, and information can be maximized with the horizontal direction only. If a trade-off with reliability needs to be considered, the Pareto front of interest is in the top-right corner instead of the lower right corner that is previously recommended in the literature.

The existing approaches considered above have in common that they all involve some form of dependence criterion to be minimized. For example, the total correlation gives a measure of total redundant information within the selected set. This is information that is duplicated and therefore does not contribute to the total information content of the sensors, which is given by the joint entropy. Focusing fully on minimizing dependence, such as what is done in the minT objective optimization, makes the optimization insensitive to the amount of non-duplicated information added. This results in many low-entropy sensors being selected. It is important to note that the joint entropy already accounts for duplicated information and only quantifies the non-redundant information. This is exactly the reason why it is smaller than the sum of individual entropies. In terms of joint entropy, two completely dependent stations are considered to be exactly as informative as one of them. This means that the negative effect that dependency has on total information content is already accounted for by maximizing joint entropy only.

The Pareto front that would be interesting to explore in this context is the
trade-off between

In summary, the maximization of joint entropy while minimizing redundancy is akin to maximizing effectiveness while maximizing a form of efficiency (i.e., bits of unique information/bits collected). However, bits do not have any significant associated cost. If installing and maintaining a monitoring location has a fixed cost, then efficiency should be expressed as unique information gathered per sensor installed, which could be found by maximizing joint entropy (effectiveness) for a given number of stations, as we suggest in this paper.

All optimal combinations of sensors for the joint entropy objective. For number of sensors above 7, multiple optimal combinations can be found due to saturation of joint entropy. Black squares are selected sensors.

Different search strategies have been adopted in the literature for monitoring network design. The most commonly used greedy algorithms impose a constraint on exhaustive search space to reduce computational effort. We investigated three different search strategies to obtain the optimal network in the context of using maxJE as an objective function. We discuss the advantages and limitations of each search strategy in terms of optimality of the solution and computational effort.

The exhaustive optimization tests all possible new combinations, not restricted to those combinations containing the already selected set in a smaller network. Since the joint entropy of a set of locations does not depend on the order in which they are added, the number of possible combinations
is

Resulting station selections for the artificially permuted dataset with 860 data points.

Greedy approaches might be candidates for efficient algorithms. For the proposed joint entropy objective, we tested the optimality of greedy approaches against the benchmark of exhaustive optimization of all possible station combinations. For the Brazos River case study, both the “add” and “drop” greedy selection strategies resulted in the global optimum sets, i.e., the same gauge order and resulting joint entropy as was found by the exhaustive optimization. These results can be read from the last row of Tables

In a further test, using artificially generated data, we experimentally falsified the hypothesis that greedy approaches can guarantee optimality. For this test, we generated a correlated random Gaussian dataset for 12 stations, based on the covariance matrix of the data from the case study. We increased the number of generated observations to 860 time steps, to get a more reliable multidimensional probability distribution. Table

Based on our limited case study, the questions remain open: (1) whether faster algorithms can be formulated that yield guaranteed optimal solutions, and (2) in which cases the greedy algorithm provides a close approximation. It is also possible to formulate modified greedy methods with the ability of replacing a
limited number of stations instead of just adding stations. This leads to a significantly reduced computational burden while reaching the optimum more often than when adding stations one at a time. In Table

Resulting joint entropy for the artificially permuted dataset with 860 data points.

The aim of this paper was to contribute to better understanding the problem of optimal monitoring network layout using information-theoretical methods. Since using resulting networks and performance metrics from case studies to demonstrate that one objective should be preferred over the other would be circular, the results from our case study served as an illustration of the effects but not as arguments supporting the conclusions we draw about objective functions. We investigated the rationale for using various multiple-objective and single-objective approaches and discussed the advantages and limitations of using exhaustive vs. greedy search. The main conclusions for the study can be summarized as follows:

The purpose of the monitoring network governs which objective functions should be considered. When no explicit information about users and their decision problems can be identified, maximizing the total information collected by the network becomes a reasonable objective. Joint entropy is the only objective needed to maximize retrieved information, assuming that this joint entropy can be properly quantified.

We argued that the widespread notion of minimizing redundancy, or dependence between monitored signals, as a secondary objective is not desirable and has no intrinsic justification. The negative effect of redundancy on total collected information is already accounted for in joint entropy, which measures total information net of any redundancies.

When the negative effect on total information is already accounted for, redundant information is arguably beneficial, as it increases robustness of the network information delivery when individual sensors may fail. Maximizing redundancy as an objective secondary to maximizing joint entropy could therefore be argued for, and a trade-off between these objectives could be explored depending on the specific case.

The comparison of exhaustive and greedy search approaches shows that no greedy approach can exist that is guaranteed to give the true optimum subset of sensors for each network size. However, the exponential computational complexity, which doubles the number of sensor combinations to evaluate with every sensor added, makes exhaustive search prohibitive for large networks, illustrated by the following. During the COVID-19 response in March 2020, Folding@home, currently the world's largest distributed computing project, broke the exaFLOP barrier (

The constraints on the search space imposed by the greedy approach could also be interpreted as a logistical constraint. In a network expansion scenario, it disallows the replacement of stations already selected in the previous iteration.

We introduced the “greedy drop” approach that starts from the full set and deselects stations one by one. We have demonstrated that the two types of greedy approaches do not always lead to the same result, and neither approach guarantees the unconstrained true optimal solution. Synergistic interactions between variables may play a role, although this is not the only possible explanation. In our case study, the suboptimality of greedy algorithms was not visible in original data, but we demonstrated its existence with artificially generated data. In our specific case studies, differences between exhaustive and greedy approaches were small, especially when using a combination of the greedy add and greedy drop strategy. It remains to be demonstrated in further research how serious this loss of optimality is in a range of practical situations and how results compare to intermediate computational complexity approaches such as metaheuristic algorithms.

In this paper, we focused on the theoretical arguments for justifying the use of various objective functions and compared a maximization of joint entropy to other methods while using the same dataset and quantization scheme. Since the majority of previous research used greedy search tools to find optimal network configurations, we compared greedy and exhaustive search approaches to raise awareness in the scientific community that greedy optimization might fall into a local optimum, though its application can be justified considering the computation cost of the exhaustive approach.

Another important question that needs to be addressed in future research is to investigate how the choices and assumptions made (i.e., data quantization which influences probability distribution) in the numerical calculation of objective functions would affect network ranking. What many of these objective functions have in common is that they rely on multivariate probability distributions. For example, in our case study, the joint entropy is calculated from a 12-dimensional probability distribution. These probability distributions are hard to reliably estimate from limited data, especially in higher dimensions, since data requirements grow exponentially. Also, these probability distributions and the resulting information measures are influenced by multiple factors, including choices about the data's temporal scale and quantization. To have an unbiased comparison framework of objective functions, we kept data and quantization choices from a case study previously described in the literature. It is worth acknowledging that these assumptions, as well as data availability, can greatly influence station selection and require more attention in future research.

Numerically, the limited data size in the case study presents a problem for the calculation of multivariate information measures. Estimating multivariate discrete joint distributions exclusively from data requires quantities of data that exponentially grow with the number of variables, i.e., potential locations. When these data requirements are not met, and joint distributions are still estimated directly based on frequencies, independent data will be falsely qualified as dependent and joint information content severely underestimated. This can also lead to apparent earlier saturation of joint entropy, at a relatively low number of stations. For the case study presented here, we do not recommend interpreting this saturation as reaching the number of needed stations, since it could be a numerical artifact. This problem applies to all methods discussed in this paper. Before numerics can be discussed, clarity is needed on the interpretation and choice of the objective function. In other words, before thinking about how to optimize, we should be clear on what to optimize. We hope that this paper helped illuminate this.

Set of indices of selected stations' locations

Set of indices of potential monitoring locations not yet selected

The index of the monitoring station currently under consideration for addition

The (sets of) time series (variables) measured at the station(s) in the respective sets

The marginal probability distribution of random variable

The joint probability distribution of variable

A shorthand for

The entropy of the marginal distribution of time series in

The conditional joint entropy of variables in

Mutual information or transinformation between set of variables in

Total correlation, the amount of information shared between all variables

Information-redundancy trade-off weight

“select below median”, the constraint whereby only stations are considered that are below the median score of all potential stations

Histogram bin width

Station's streamflow value

Quantized value after discretization

Apportionment entropy

Ranking disorder index

Local spatiotemporal information of the grid

Probability distribution of the standard deviation

Network accuracy

Kriging variance

Detection time

The average of the shortest time among the detection times for monitoring station

Reliability

Binary choice of 1 or 0 for whether the contamination is detected or not

Recent literature has expanded the information-theoretical objectives with additional objectives. For instance, (1)

The code and data that were used to generate the results in this paper
are available from

SVW conceptualized study, and HF and SVW jointly performed the analysis and wrote the paper. SVW supervised HF.

The authors declare that they have no conflict of interest.

This research was supported by funding from Hossein Foroozand's NSERC CGS – Doctoral award and Steven V. Weijs's NSERC discovery grant. The authors sincerely thank the four reviewers for their critical views and interesting discussion, which helped improve the paper.

This research has been supported by an NSERC discovery grant (grant no. RGPIN-2016-04256) and an NSERC CGS – Doctoral award (grant no. CGSD2-535466-2019).

This paper was edited by Nunzio Romano and reviewed by Heye Bogena and three anonymous referees.