010. Frequency Distributions
Example 2.2 As the resident statistician for Pigs & People Airline, you have been asked by the chairman of the board to collect and organize data regarding flight operations. You are primarily interested in the daily values for two variables: (1) number of passengers and (2) number of miles flown, rounded to the nearest tenth. You are able to obtain these data from daily flight records for the past 50 days, and you have recorded this information in Table 2.1 and Table 2.2. However, in this raw form it is unlikely that the chairman could gain any useful knowledge regarding operations.
Table 2.1 – Raw Data on the Number of Passengers for P&P Airlines
68 |
71 |
77 |
83 |
79 |
72 |
74 |
57 |
67 |
69 |
50 |
60 |
70 |
66 |
76 |
70 |
84 |
59 |
75 |
94 |
65 |
72 |
85 |
79 |
71 |
83 |
84 |
74 |
82 |
97 |
77 |
73 |
78 |
93 |
95 |
78 |
81 |
79 |
90 |
83 |
80 |
84 |
91 |
101 |
86 |
93 |
92 |
102 |
80 |
69 |
Table 2.2 – Raw Data on Miles Flown for P&P Airlines
569.3 |
420.4 |
468.5 |
443.9 |
403.7 |
519.7 |
518.7 |
445.3 |
459.0 |
373.4 |
493.7 |
505.7 |
453.7 |
397.1 |
463.9 |
618.3 |
493.3 |
477.0 |
380.0 |
423.7 |
391.0 |
553.5 |
513.7 |
330.0 |
419.8 |
370.7 |
544.1 |
470.0 |
361.9 |
483.8 |
405.7 |
550.6 |
504.6 |
343.3 |
497.9 |
453.3 |
604.3 |
473.3 |
393.9 |
478.4 |
437.9 |
320.4 |
473.3 |
359.3 |
568.2 |
450.0 |
413.4 |
469.3 |
383.7 |
469.1 |
There are several ways to organize presentation.
1. Frequency distribution (frequency table) will provide some order to the data by dividing them into classes and recording the number of observations in each class. In Table 2.3 distribution for the daily number of passengers over the last 50 days is presented.
Each class has a Lower boundary and Upper boundary. The class limits of these boundaries are quite important. It is essential that class boundaries do not overlap.
Table 2.3 – Frequency Distribution for Passengers
Class (passengers) |
Frequency (days) |
50 to 59 |
3 |
60 to 69 |
7 |
70 to 79 |
18 |
80 to 89 |
12 |
90 to 99 |
8 |
100 to 109 |
2 |
Total |
50 |
Boundaries such as
50 to 60
60 to 70
70 to 80
…
Are confusing. Since Passengers is a discrete variable, values such as 59.9 pose no problem since it is impossible to have fractional values. On the other hand, Miles flown is a continuous variable since it is possible to fly a fraction of a mile. It would be improper to set the boundaries as
300 to 349
350 to 399
400 to 449
Since it is unclear in which class observations such as 349.9 or 399.9 should be tallied. The frequency distribution for miles flown might instead appear in Table 2.4. The chairman can now detect a pattern to flight operations not apparent from the raw data in Table 2.2. for example, P&P never flew over 650 miles on any of the 50 days examined. They flew between 450 and 500 miles more often than any other distance. On 26 of the 50 days examined, total mileage was between 400 and 500 miles.
Table 2.4 – Frequency Distribution for Miles Flown
Class (miles) |
Frequency (days) |
300 and under 350 |
3 |
350 and under 400 |
9 |
400 and under 450 |
9 |
Continuation of Table 2.4
Class (miles) |
Frequency (days) |
450 and under 500 |
17 |
500 and under 550 |
6 |
550 and under 600 |
4 |
600 and under 650 |
2 |
Total |
50 |
2. The number of classes in a frequency table is somewhat arbitrary. In general, StatisticAl table should have between 5 and 20 classes. Too few classes would not reveal any details about the data; too many would prove as confusing as the list of raw data itself.
There is a simple rule to approximate the number of classes
, (2.1)
Or
C = 1 + 3.322*Lg n, (2.2)
Where C is the number of classes, N is the number of observations.
In example 2.2 for P&P the number of observations is N = 50. thus,
.
Solving for C, it can be found , which exceeds N. This rule suggests that there should be six classes in the frequency table.
This rule should not be taken as the final determining factor. For convenience, more classes or fewer classes may be used.
< Предыдущая | Следующая > |
---|