Originally Posted by
ARambler
e) If you remove the variability associated with the zero days, you might be able to give a good representation for hiking variability just by reporting aggregate data. This data might also be easiest to understand and use in a statistic free way. I propose to aggregate the data for each section into five groups. Because the distances vary by such a large amount, the intervals for the groupings should also vary. I suggest that each of the five groups vary by m=1, 2, or 3 days. You would then report 8 values/section: g1.days, m, n.g1, n.g2, n.g3, n.g4, n.g5, Slow. I'm not sure whether the g1.days should be integer and the start of the interval. Assuming that it is, you would get numbers like:
5, 1, 12, 21, 32, 19, 11, 2. For the first section, 12 hikers would reach the GA line in 5.0 to 5.9 days, 21 hikers would reach the border in 6.0 to 6.9 days, 32 hikers in 7.0 to 7.9 days, 19 hikers in 8 to 8.9 days, 11 hikers in 9.0 to 9.9 days and 2 hiker over 9.9 days (optional). By calculation, 105-(12+21+32+19+11+2)=3 hikers less than 5.0 days. The relative distribution for the Damascus to Waynesboro will not be exactly the same, but if it was, the data would be reported as 17, 3, 12, 21, 32, 19, 11, 2. and the groupings would be: 17 to 19.9 days, 20 to 22.9 days, 23 to 25.9, 26 to 28.9 days, and 29 to 31.9 days. Slow hikers would look at this raw data and see 11 in 105 needed 8- 8.9 days food to reach the GA border and 23 to 25.9 days to get to Waynesboro, and would plan on packing this amount. (Hopefully, not all at once.)