A Comparison of Decision-Making Models for Determining File Importance

Abstract

File replication is the most popular approach used to promote system reliability and file availability in a network-based environment. However, all of the distributed file systems equipped with the functionality of file replication require their users to determine how important their files are, in order to assist systems in making decisions regarding distributing replicas in networks. As such, system users are inevitably burdened with this potential responsibility. The problem can be partially alleviated if systems can take more responsibility for their users on determining file importance. To achieve this goal, we need to better understand how system users cognitively make decisions regarding determining file importance. In this paper, we quantitatively compare the performance of three decision-making models popularly used in juror decision-making to examine how satisfactorily they model the process of determining file importance. Since juror decision-making is a process characterized by features similar to those of determining file importance, the three models used in juror decision-making are expected to shed some light on how system users cognitively make decisions regarding determining file importance. The three models are the linear weighting model, the Bayesian model, and the Poisson model. The linear weighting model postulates that file importance can be determined by linearly combining those weighted pieces of information (referred to as predictors in this paper) during the session of determining file importance. The set of weights associated with the predictors identified can be determined in such a way that the predicted file importance is optimally correlated with the observed file importance using multiple regression analysis. The Bayesian model postulates that file importance can be determined by a series of simple inferences, in which importance is revised according to the direct impact of the predictors identified independently. In other words, the determination of file importance using the model is concerned with determining the posterior odds for importance (Rn), which is identified in terms of determining the ratio of the probability of importance given all the predictors identified, to the probability of unimportance given all the predictors identified. Once Rn is determined, it is compared with the decision criterion (dc) adopted by system users to judge if the file under consideration is important (if Rn ¡Ý dc) or not (if Rn < dc). The Poisson model postulates that determining file importance is a Poisson process. In the process, it assumes that there exists an apparent weight of predictors (w) important to the file under consideration. The apparent weight accumulates constantly with time during the session of determining file importance until either a critical predictor is identified or the end of the session is encountered. The apparent weight accumulated (wa) is then compared with the decision criterion adopted by system users to judge if the file is important (if wa ¡Ý dc) or not (if wa < dc). Five predictors were systematically identified in this study for model comparison: the number of characters keyed, the computer cost spent, file length, file dependency, and the frequency of file access. Correlation coefficients between observed and predicted file importance were used to quantitatively evaluate the performance of the three models. A computer program, written in C++, was designed and implemented on a laptop to collect data for observed file importance and the five predictors. The data collected were classified into five importance ratings (from important to unimportant) and mapped proportionally to an importance rating scale (from 1 to 5, respectively). There were 41 subjects (randomly selected in an academic environment) participating in the experiment. These subjects accessed a total of 169 files. Since the subjects were asked to randomly pick up their files created by them, the sample may contain various types of file contents. The comparison results are summarized as follows. (1) The correlation coefficients computed for each of the three models suggest that the linear weighting model and the Bayesian model with dc = 1 perform much more satisfactorily than the Poisson model using the empirical data collected in the study. The poor performance of the Poisson model may be resulted from the following three possible sources of errors: the data collected may not be representative, the assumptions made in this study may not hold for the model, and the model itself is inferior. More studies are needed to clarify the issue. (2) The linear weighting model is characterized by the nature of determining file importance slightly different from the Bayesian model and the Poisson model. The former model determines how important the file under consideration is (a rated outcome), while the latter models determine whether or not the file under consideration is important (a binary outcome). Moreover, the linear weighting model associates file importance ratings directly with predictor ratings in determining file importance. On the other hand, the Bayesian model and the Poisson model convert predictor ratings into predictor appearance probability, which may not be directly related to file importance ratings. As such, the linear weighting model provides more information about how each of the predictors is correlated with each other, and how each of the predictors is weighted by the subjects participating in the experiment. (3) There is no noticeable performance difference in model implementation and file importance determination using the three models. All of the three models need an order of O(n¡Ám) accesses to various data items for model implementation and an order of O(m) accesses to determine predicted file importance, where n = the number of files created by a subject and m = the number of predictors each file has. (4) The three models represent three quite different decision-making processes, reflecting how the subjects participating in the experiment cognitively make decisions regarding file importance determination.


Back to Table of Contents