# Activity: Exploring Basic Statistics in Computer Engineering with Excel

Objective: The goal of this activity is to apply basic statistical concepts using a real-world dataset within the Excel environment.

Materials Needed:

• A computer with Microsoft Excel installed

Data Set: We will use a public dataset related to computer hardware performance from the UCI Machine Learning Repository. It contains various numerical variables related to the performance of different models of computer processors.

Steps:

1. Data Understanding and Preparation:
• Examine the variables (columns) in the dataset and identify which are numerical and categorical.
2. Central Tendency:
• Select a numerical variable (like Maximum main memory (MMAX)).
• Calculate the mean, median, and mode for this variable using Excel functions: `AVERAGE(range)`, `MEDIAN(range)`, and `MODE.SNGL(range)`, respectively.
3. Dispersion:
• Still focusing on the same numerical variable:
• Calculate the range, variance, and standard deviation using Excel functions: for range `MAX(range) - MIN(range)`, for variance `VAR.S(range)`, and for standard deviation `STDEV.S(range)`.
4. Hypothesis Testing:
• Consider a hypothesis related to the dataset. For example, “The newer models of processors have higher clock speed (MYCT) on average”.
• Divide the dataset into two subsets based on the year of manufacture (e.g., before 1985 and after 1985).
• Use the `AVERAGE(range)` function to calculate the mean clock speed for both groups.
• Use the `T.TEST(range1, range2, tails, type)` function in Excel to conduct a T-test comparing the means of both groups. This will give you a p-value to help determine if the difference in means is statistically significant.
5. Correlation and Regression:
• Choose two numerical variables (for instance, Maximum main memory (MMAX) and Cache memory size (CHMAX)).
• Calculate the correlation between these two variables using the `CORREL(array1, array2)` function in Excel.
• Perform a simple linear regression using Excel’s Data Analysis ToolPak. (You might need to activate this Excel add-in in the Options menu). Go to the Data tab, select Data Analysis, then choose Regression. Select your Y (dependent variable) and X (independent variable) ranges. The output will provide you with regression statistics, including the equation of the line of best fit.

Submission: Once you have completed the steps above, prepare a short report summarizing your findings from each step, including any Excel screenshots that help illustrate your points.

Assessment: Your work will be assessed on the completeness and accuracy of your statistical calculations, the clarity and correctness of your interpretations, and the presentation of your report.