If the number of categories used is small (z.B. 2 or 3), the probability of 2 advisors agreeing by pure coincidence increases considerably. This is because the two advisors must limit themselves to the limited number of options available, which affects the overall agreement rate, not necessarily their propensity to enter into an “intrinsic” agreement (an agreement is considered “intrinsic” if not due to chance). For the two critical values, we determined absolute agreement (z.B Liao et al., 2010) as a percentage of statistically non-different evaluations. The absolute approval rate was 100% if we take into account the CIR calculated on the basis of CCI for our sample. On the other hand, the absolute approval rate was 43.4% if the reliability of the test manual was used to estimate the critical difference. With this more conservative measure of absolute agreement, the probability of obtaining a consistent rating was not too fortuitous. This probability did not differ statistically for the two scoring subgroups (parents-teachers and parent-father-assessments) and therefore on the entire study population, regardless of the calculation of the RCI chosen. These results support the hypothesis that the parents and teachers of the daycare were, in this case, equally competent counsellors with regard to the children`s early expressive vocabulary.
Nevertheless, RCI, which was identified by different reliability estimates, differed significantly from specific estimates of absolute compliance. The very divergent amounts of absolute agreement obtained through the use of the reliability of the Inter-Rater division within a relatively small sample or the reliability of the instrument test, obtained with a larger and more representative sample, underline the need for caution in calculating reliable differences. Measuring the reliable difference between ratings based on Inter-Rater`s reliability in our study led to a 100% rating agreement. On the other hand, a considerable number of different evaluations were identified in the RCI calculation based on the more conservative reliability of the manual tests; The absolute approval rate was 43.4%. The use of this conservative RCI estimate did not result in a significantly higher number of identical or divergent assessments, either for a single rating subgroup, or for the entire population studied. (see Table 2 for the results of the corresponding binomial tests). Therefore, the probability of a child receiving a matching assessment was no different from chance. When the reliability of the study was used, the probability of obtaining correlated ratings was 100%, which is significantly higher than random. To date, we have reported results on rate reliability and the number of divergent assessments within and between subgroups, using two different but equally legitimate reliability estimates. We also looked at factors that could influence the likelihood of obtaining two statistically divergent ratings and described the magnitude of the differences observed.
These analyses focused on reliability and consistency between councils, as well as related measures. In this last section, we look to the Pearson correlation coefficient to study the linear relationship between ratings and their strength within and between rate subgroups. The reliable exchange index (RCI) was used to calculate the smallest number of T-points required for two ELAN scores to differ significantly from each other. We used two different reliability estimates to demonstrate their impact on the measures of the agreement. First, CCI, which was calculated for the entire population studied, was used as an estimate of the reliability of ELAN in the population of this study. Because CCI is calculated within and between dentendances and not between certain groups of advisors, this is a valid approach for estimating overall reliability in both rating subgroups.