# Linear Model Function

The `lm()`

function (short for "Linear Modeling") is a function in base R that can be used to, as the name suggests, create a linear model; this model can include multiple variables, including interaction terms and squared terms. A brief discussion of its use is provided below.

## Syntax

Much like we use '==' instead of just '=' when using an 'if' statement, we do not use the lone equals sign when writing equations in R; instead we use a '~', located on the top left of the keyboard. Otherwise, our equations still follow the same format as when we write them by hand.

## Create a Linear Model

The lm() function requires two arguments, as shown below:

`lm(data, formula)`

*data*: the name of the dataset from which you are building the model*formula*: the equation you are using to create your model

For this discussion, we will be using the mtcars dataset, included in base R, to demonstrate.

Let's begin by modeling a car's miles per gallon (mpg) as a function of its weight (wt):

`library(ggplot2)`

myModel = lm(data = mtcars, mpg ~ wt)

myModel

ggplot(data = mtcars, aes(x = wt, y = mpg, col = 'red')) +

geom_point() +

geom_abline(aes(intercept = 37.285, slope = -5.344))

As you can see, creating a model with R is very simple; also note that we do not need to use a $ or enter the variable names as strings within the `lm()`

function.

the `fmodel()`

function from the 'statisticalModeling' package is useful for plotting the line of the equation, and requires only the name of the model ( myModel in this case), though it does not include the dataset in this plot; we use ggplot here to make the points easily visible for this demonstration.

## Multiple terms

While the mtcars data serves as a simple example, the data you'll encounter in the workplace is farm more complex; using a single variable to create a model just won't work. Luckily, adding variables to our model is quite simple: we simply use the '+' symbol, followed by our new term, as demonstrated below:

`myModel = lm(data = mtcars, mpg ~ wt + hp)`

myModel

## Higher Order Terms

However, sometimes a relationship between variables isn't purely linear: our relationship may have a quadratic term, or perhaps one variable influences another, and we need an interaction term in the data. While we could code these into our dataset using R and dplyr, we do not need to do that here.

To create an interaction term, we can use one of two methods: if we wish to include both terms and their interaction in the model, we can simply use the asterisk:

`myModel = lm(data = mtcars, mpg ~ wt * hp)`

myModel

However, if we wish to include only one (or neither) of these terms, we use a colon for the interaction term instead

`myModel = lm(data = mtcars, mpg ~ wt : hp)`

myModel