Exercises
Unit A-5

1.-10. Here are data on the planets.  Year is the number of earth days it takes for the planet to orbit the sun, and Moons is the number of moons the planet has.

                  Year  Moons
    Mercury       88.0      0
    Venus        224.7      0
    Earth        365.26     1
    Mars         687.0      2
    Jupiter     4332.6     16
    Saturn     10759.2     18
    Uranus     30685.4     15
    Neptune    60189.0      8
    Pluto      90465.0      1
    
  1. Draw the stem-and-leaf plot for the Moons variable, using the tens digit as the stem and the ones digit as the leaf.
    (Be sure to include a prototype for each stem-and-leaf plot.)
  2. Which stem is the modal stem, or stems if there are more than one?
  3. Does the plot exhibit any outliers?
  4. Redo the plot, but with the tens and ones digits being the stem. (What are the leaves?)
  5. What is the modal stem, or stems if there are more than one?
  6. What kind of skewness, if any, do you see?
  7. Are there any outliers evident?  Are there clusters?
  8. Draw the stem-and-leaf plot for the Year variable, with the ten-thousands digit the stem and the thousands digit the leaf.
  9. What kind of skewness, if any, do you see?
  10. Redo the plot with the thousands digit the stem and the hundreds digit the leaf.
    What drawback do you see for this plot?

11.-15.  Here is a stem-and-leaf plot for the percentage enrolled in school for the data on the United States plus D. C.

Prototype: "84 | 8" represents 84.8.
84 | 8
85 | 9
86 | 039
87 | 14699
88 | 1225
89 | 257
90 | 345
91 | 0024577
92 | 00123569
93 | 04
94 | 3566
95 | 15
96 | 015
97 | 2
98 | 3
99 | 66
100 |
101 |
102 |
103 |
104 |
105 |
106 | 0
  1. Are there any outliers evident?
    Which state (or DC) is it?  (Look through the data.)
    What is wrong with the value, if anything?
  2. Complete the table below for the data leaving out the 106, (so that there are only 50 observations), in preparation for drawing a histogram of the data with the given bins.  (Use the data from the stem-and-leaf plot.)
Bin Frequency Relative Frequency
84 to 86
86 to 88
88 to 90
90 to 92
92 to 94
94 to 96
96 to 98
98 to 100
   
  1. Draw the histogram using the table in (b).
  2. Based on the histogram, approximately what percentage of the states have school enrollment over 95%?
  3. Based on the histogram, approximately what percentage of the states have school enrollment under 85%?

Interactivity:

Here are the brain weights in kilograms of 24 species of mammal:

Mountain beaver    8.1   Human             1320
Cow              423     African elephant  5712
Gray wolf        119.5   Rhesus monkey      179
Goat             115     Kangaroo            56
Guinea pig         5.5   Golden hamster       1
Asian elephant  4603     Mouse                0.4
Donkey           419     Rabbit              12.1
Horse            655     Sheep              175
Potar monkey     115     Jaguar             157
Cat               25.6   Chimpanzee         440
Giraffe          680     Mole                 3
Gorilla          406     Pig                180

Click on the button below and use the applet before answering the questions below.

  1. Note down the the median and upper and lower quartiles.
  2. Find the interquartile range.
  3. Which is closer to the median, the upper quartile or the lower quartile?
    Does this suggest skewness to the higher values or towards the lower values, or neither?
  4. Find the lower and upper fences. Which numbers are outside of these fences?  (That is, which are the outliers?)
    Which animals are these?  (You might wish to press the "Order data" button.)
  5. Now delete the three outliers from the area to the left of the plot, and press "OK."
    How do the median and quartiles here compare to the same quantities when the outliers were included?
  6. What is the interquartile range, and how does it compare to that when the outliers were included?
  7. Is there skewness?  Are there further outliers?

23.-50. Here are heights in inches and weights in pounds of 132 professional male athletes, in two sports.  Also included are their body mass index numbers, which are defined by

BMI = Body Mass Index = (Weight in Pounds)*703/((Height in inches)2)

BMI is supposed to measure how overweight or underweight one is.  A value in the range 20-25 is fine; more is deemed overweight; and under 20 is deemed underweight.   It is fairly easy to be overweight under this measure.

Here are boxplots for the three variables:

">

  1. What is the median height, approximately?
    How tall is the tallest person?
    About what percentage of these athletes are taller than six feet? Are these people generally taller than the average male?
  2. What are the two quartiles for the Weight variable?
    How heavy is the heaviest person?
    Are these athletes heavier than the average male, in general?
  3. What is the median BMI number?
    Approximately what percentage is "overweight", i.e., has BMI over 25?
    What percentage is "underweight"?

Interactivity:

The button below opens a histogram for the heights.  You can adjust the widths of the bins by moving the slider to the left and to the right.  Try different values.

  1. How many bins reveal the data best?  What feature do you see in the data?

Interactivity:

Below is a histogram for the weights:

  1. How many bins reveal the data best?
    What feature do you see in the data?
  2. About what percentage of these athletes weight more than 350 pounds?

Interactivity:

The histogram for the BMI's:

Fix it so that the bin widths are small, so that there are two distinct clusters.

  1. What are the medians, approximately, of the two clusters?
  2. Which of the two clusters is more spread out?
  3. Of the three histograms, which showed most clearly two clusters?
  4. Which showed the clusters better, the histograms or the boxplots?

Interactivity:

The following problems use the same data as the previous problems, but look at the athletes from the two sports separately.

  1. First are the two boxplots for the heights. 
    What are the medians for the two groups of athletes?
  2. What are the shortest heights for the two groups?
  3. Which group has the larger interquartile range?
  4. Which group is generally taller?
  5. Is there any overlap in the heights for the two groups?
  6. Now choose the "Weight" variable in the list to the left of the boxplots.  What are the medians for the two groups?
  7. Is there more or less overlap in the weights than in the heights?
  8. What percentage in each group has weight over 300 pounds?
  9. Which group has the larger interquartile range?
  10. What outliers are there?
  11. Which group is generally heavier?
  12. Finally, choose the "BMI" variable.  What are the medians for the two groups?
  13. Is there any overlap in the BMI's for the two groups?
  14. What percentage of each group is overweight, that is, has BMI over 25?
  15. What percentage of each group has BMI over 30?
  16. What are the interquartile ranges for the two groups?
    Are they similar?
  17. Taking the three variables into account, how would you characterize the two groups of athletes?
  18. (Optional) Guess what sports are the two groups associated with?

Interactivity:

Use this "Pets" histogram for the next questions:

  1. Do you see skewness?  If so, in which directions?
  2. Are there any obvious outliers?  About how many?
  3. Estimate the median.  (Click the mouse on the plot to find the estimated percentage to the left of where you click.)  It is likely your answer will not be an integer, even though the actual data consist of just integers.   (Why?)
  4. Estimate the quartiles, and the interquartile range.
  5. About what percentage of people had fewer than 5 dogs plus cats?
  6. About what percentage of people had more than 10 dogs plus cats?
  7. About what percentage of people had more than 75 dogs plus cats?

Interactivity:

The next plots are based on the same class as above.  The data are split into groups based on people's heights in inches.  The variable of interest is "MPH," the fastest each person has ever driven a car, in miles per hour.

  1. With the splitting value for height being the median height, 66 inches, what differences are there on the MPH variable between the taller and shorter people?
  2. Change the splitting value for height to be approximately the upper quartile for height. What is the upper quartile for height?
  3. With the splitting value in (b), compare the shorter and taller people on MPH.
  4. Now use the weight variable as the splitting variable, and splitting value of 135, the median weight.  (That is, choose "Weight" in the upper "Split on" list.)
    What differences do you see in the MPH variable for the heavier and lighter people?
  5. Overall, who tends to drive faster, bigger people or smaller people?
  6. Does size really affect how fast people drive, or could there be another factor influencing both size and speed?  (Hint: See the Practice Material for 'Uses in Practice 4'.)

Interactivity:

Here are the average January temperatures for 59 metropolitan areas in the United States:
27 23 29 45 35 45 30 30 24 27 42 26
28 31 46 30 30 27 24 24 40 27 55 29
32 53 35 42 67 20 12 40 30 54 33 32
38 29 33 39 25 32 55 48 49 40 28 24
23 37 32 33 24 33 28 34 31 29 26

 

  1. Create the stem-and-leaf plot with these data.  Type the leaves into the plot below. The prototype is given in the plot.  When you are done, press "Order leaves" to order the leaves within each stem, then press "OK"to see the corresponding histogram and boxplot.
  2. What is the modal stem and bin?
  3. Estimate the median.
  4. What are the highest and lowest January Temperatures?
  5. Is there skewness?  If so, which way?


Load the "usstates.dat" data set into DataTools. Create the boxplot for the "Poverty" variable: Choose "Boxplot" from the "Graphics" menu, then choose "Poverty" from the list that appears. Then press "Create Graph!"

69. What is the median, approximately?

70. Are there any notable outliers?

71. Is there any skewness?

Now find the boxplot for the "Employment" variable.

72. What is the median, approximately?

73. Is there any skewness?

74.-77. 

74.  The next problem is based on data for the cities in the United States with populations over 200,000.  The boxplots below are for the populations in 100,000's, divided into groups depending in the cities' land areas in square miles.

Which group has the most extreme outliers?

The next plot is the same as above, but without those three largest outliers so that it is easier to see the boxplots.

  1. Compare the median populations for the four groups.
  2. Compare the interquartile ranges.
  3. Compare the ranges.

Copyright © 2000 CyberGnostics, Inc. All rights reserved.