Upon using a spreadsheet to create a graph of raw data it is frequently desirable to generate the best fit equation for that data. Most spreadsheets have the capability of performing a polynomial regression analysis of any order. By means of an example, the following steps can be followed to obtain the equation of best fit for a set of data. In an attempt to keep this discussion general, a discipline specific example will not be presented. Also, nonlinear data will be used, although linear data can also be analyzed.
1. Gather the data. For the example, the following data will be used.
Y Data | X Data |
---|---|
0 | 0 |
0.2 | 1 |
0.6 | 2 |
1.8 | 3 |
5.4 | 4 |
16.2 | 5 |
2. Enter the data into a spreadsheet in two columns with the Y data to the left of the X data. Since the data is obviously nonlinear, create a third column that is the X data raised to the second power (X^{2}) as shown below.
Y Data | X Data | X^{2} |
---|---|---|
0 | 0 | 0 |
0.2 | 1 | 1 |
0.6 | 2 | 4 |
1.8 | 3 | 9 |
5.4 | 4 | 16 |
16.2 | 5 | 25 |
3. Select the regression analysis option within the spreadsheet you are using. Highlight the column of Y data as the dependent variable and the X and X^{2} data as the independent variables. Selecting only the X data as the independent variable will yield a linear equation.
4. The menu should have an OUTPUT selection option. Select OUTPUT and then place the cursor somewhere off of your raw data. It is necessary to place the output off your raw data so that your data is not erased when the analysis is performed.
5. Select GO from the regression analysis menu. Your output should look like the following.
Regression Output: Constant 0.964286 Std Err of Y Est 1.77477 R Squared 0.952187 No. of Observations 6 Degrees of Freedom 3 X Coefficient(s) 3.08071 1.175 Std Err of Coef. 1.513022 0.290465
6. The equation correlating the Y and X data is as follows,
- Y = 0.964286 - 3.08071*X + 1.175*X^{2} (1)
The R Squared (correlation coeffient) value is used to determine if the equation is of adequate accuracy. The closer R Squared is to 1.0, the better the equation fits the data. For this data, a value of 0.952187 may or may not be of adequate accuracy depending on what the user's requirements are. Let's assume a better fit is needed. The analysis will be rerun using X through X^{4} data.
7. Expand the X data to include X^{3} and X^{4} values, as shown below.
Y Data | X Data | X^{2} | X^{3} | X^{4} |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0.2 | 1 | 1 | 1 | 1 |
0.6 | 2 | 4 | 8 | 16 |
1.8 | 3 | 9 | 27 | 81 |
5.4 | 4 | 16 | 64 | 256 |
16.2 | 25 | 125 | 625 |
8. In the regression analysis option, highlight the X, X^{2}, X^{3} and X^{4} data as the independent data. Rerun the regression analysis and the output will look like the following.
Regression Output: Constant 0.00873 Std Err of Y Est 0.138587 R Squared 0.999903 No. of Observations 6 Degrees of Freedom 1 X Coefficient(s) 0.53201 1.098611 0.50648 0.0875 Std Err of Coef. 0.510028 0.487215 0.153745 0.015278
9. The new equation correlating the data is as follows,
- Y = 0.00873 - 0.53201*X + 1.098611*X^{2} - 0.50648*X^{3} + 0.0875*X^{4}. (2)
The new R Squared value is 0.999903, which indicates a very good fit for the data.
10. The equation can now be programmed inside the spreadsheet and, if desired, used to calculate values for Y so that a comparison between the raw data and calculated data can be made. Also, it would now be possible to generate a graph with a smooth curve through the raw data.