Objective: The goal of this activity is to apply basic statistical concepts using a real-world dataset within the Excel environment.
- A computer with Microsoft Excel installed
- An Internet connection to download datasets
Data Set: We will use a public dataset related to computer hardware performance from the UCI Machine Learning Repository. It contains various numerical variables related to the performance of different models of computer processors.
You can download the dataset here: Computer Hardware Data Set
- Data Understanding and Preparation:
- Download the dataset and load it into an Excel spreadsheet.
- Examine the variables (columns) in the dataset and identify which are numerical and categorical.
- Central Tendency:
- Select a numerical variable (like Maximum main memory (MMAX)).
- Calculate the mean, median, and mode for this variable using Excel functions:
- Still focusing on the same numerical variable:
- Calculate the range, variance, and standard deviation using Excel functions: for range
MAX(range) - MIN(range), for variance
VAR.S(range), and for standard deviation
- Hypothesis Testing:
- Consider a hypothesis related to the dataset. For example, “The newer models of processors have higher clock speed (MYCT) on average”.
- Divide the dataset into two subsets based on the year of manufacture (e.g., before 1985 and after 1985).
- Use the
AVERAGE(range)function to calculate the mean clock speed for both groups.
- Use the
T.TEST(range1, range2, tails, type)function in Excel to conduct a T-test comparing the means of both groups. This will give you a p-value to help determine if the difference in means is statistically significant.
- Correlation and Regression:
- Choose two numerical variables (for instance, Maximum main memory (MMAX) and Cache memory size (CHMAX)).
- Calculate the correlation between these two variables using the
CORREL(array1, array2)function in Excel.
- Perform a simple linear regression using Excel’s Data Analysis ToolPak. (You might need to activate this Excel add-in in the Options menu). Go to the Data tab, select Data Analysis, then choose Regression. Select your Y (dependent variable) and X (independent variable) ranges. The output will provide you with regression statistics, including the equation of the line of best fit.
Submission: Once you have completed the steps above, prepare a short report summarizing your findings from each step, including any Excel screenshots that help illustrate your points.
Assessment: Your work will be assessed on the completeness and accuracy of your statistical calculations, the clarity and correctness of your interpretations, and the presentation of your report.