Polynomial regression is a nonlinear relationship between independent x and dependent y variables. Fitting such type of regression is essential when we analyze fluctuated data with some bends.
In this post, we'll learn how to fit and plot polynomial regression data in R. We use an lm() function in this regression model. Although it is a linear regression model function, lm() works well for polynomial models by changing the target formula type. The tutorial covers:
- Preparing the data
- Fitting the model
- Finding the best fit
- Source code listing
Preparing the data
We'll start by preparing test data for this tutorial as below.
peq = function(x) x^3+2*x^2+5
x = seq(-0.99, 1, by = .01)
y = peq(x) + runif(200)
df = data.frame(x = x, y = y)
head(df)
x y
1 -0.99 6.635701
2 -0.98 6.290250
3 -0.97 6.063431
4 -0.96 6.632796
5 -0.95 6.634153
6 -0.94 6.896084
We can visualize the the 'df' data to check visually in a plot. Our task is to fit this data with the best curve.
plot(df$x, df$y, pch=20, col="gray")
Fitting the model
We build a model with lm() function with a formula. I(x^2) represents x2 in a formula. We can also use poly(x,2) function and it is the same with the expression of I(x^2).
model = lm(y~x+I(x^3)+I(x^2), data = df)
summary(model)
Call:
lm(formula = y ~ x + I(x^3) + I(x^2), data = df)
Residuals:
Min 1Q Median 3Q Max
-0.49598082 -0.21488892 -0.01301059 0.18515573 0.58048188
Coefficients:
Estimate Std. Error t value
(Intercept) 4.3634157 0.1091087 39.99144
x -0.1078152 0.9309088 -0.11582
I(x^3) -0.5925309 1.3905638 -0.42611
I(x^2) 3.6462591 2.1359770 1.70707
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
x 0.908039
I(x^3) 0.670983
I(x^2) 0.091042 .
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2626079 on 96 degrees of freedom
Multiple R-squared: 0.9243076, Adjusted R-squared: 0.9219422
F-statistic: 390.7635 on 3 and 96 DF, p-value: < 0.00000000000000022204
Next, we'll predict data with a trained model.
pred = predict(model,data=df)
Finding the best fit
Finding the best-fitted curve is important. We check the model with various possible functions. Here, we apply four types of function to fit and check their performance.
The orange line (linear regression) and yellow curve are the wrong choices for this data. The pink curve is close, but the blue curve is the best match for our data trend. Thus, I use the y~x3+x2 formula to build our polynomial regression model.
You may find the best-fit formula for your data by visualizing them in a plot.
The source code of a plot is listed below.
windows(width=8, height=6)
plot(x=df$x, y=df$y, pch=20, col="grey")
lines(df$x, predict(lm(y~x, data=df)), type="l", col="orange1", lwd=2)
lines(df$x, predict(lm(y~I(x^2), data=df)), type="l", col="pink1", lwd=2)
lines(df$x, predict(lm(y~I(x^3), data=df)), type="l", col="yellow2", lwd=2)
lines(df$x, predict(lm(y~poly(x,3)+poly(x,2), data=df)), type="l", col="blue", lwd=2)
legend("topleft",
legend = c("y~x, - linear","y~x^2", "y~x^3", "y~x^3+x^2"),
col = c("orange","pink","yellow","blue"),
lty = 1, lwd=3
)
Plotting the result
1. Plotting with a plot() function.
pred = predict(model,data = df)
lines(df$x, pred, lwd = 3, col = "blue")
2. Plotting with a ggplot().
Polynomial regression data can be easily fitted and plotted with ggplot().
library(ggplot2)
ggplot(data=df, aes(x,y)) +
geom_point() +
geom_smooth(method="lm", formula=y~I(x^3)+I(x^2))
In this tutorial, we have briefly learned how to fit polynomial regression data and plot the results with a plot() and ggplot() functions in R. The full source code is listed below.
Source code listing
peq = function(x) x^3+2*x^2+5
x = seq(-0.99, 1, by = .01)
y = peq(x) + runif(200)
df = data.frame(x = x, y = y)
head(df)
plot(df$x, df$y, pch=20, col="gray")
model = lm(y~x+I(x^3)+I(x^2), data = df)
summary(model)
pred = predict(model,data=df)
windows(width=8, height=6)
plot(x=df$x, y=df$y, pch=20, col="grey")
lines(df$x, predict(lm(y~x, data=df)), type="l", col="orange1", lwd=2)
lines(df$x, predict(lm(y~I(x^2), data=df)), type="l", col="pink1", lwd=2)
lines(df$x, predict(lm(y~I(x^3), data=df)), type="l", col="yellow2", lwd=2)
lines(df$x, predict(lm(y~poly(x,3)+poly(x,2), data=df)), type="l", col="blue", lwd=2)
legend("topleft",
legend = c("y~x, - linear","y~x^2", "y~x^3", "y~x^3+x^2"),
col = c("orange","pink","yellow","blue"),
lty = 1, lwd=3
)
pred = predict(model,data = df)
lines(df$x, pred, lwd = 3, col = "blue")
library(ggplot2)
ggplot(data=df, aes(x,y)) + geom_point()
+ geom_smooth(method="lm", formula=y~I(x^3)+I(x^2))
No comments:
Post a Comment