In today’s post, I’m going to use RODBC to connect to SQL Server and perform a few operations.
There are a couple of steps we’re going to have to do in order to get RODBC to work with SQL Server.
Grab The Notebook
The first step is to grab the Basic SQL Server Connection notebook from my GitHub repo. We’ll follow along with this notebook throughout the blog post.
Grab The Database Backup
If you want to follow along, I have a backup of PASSData which you can download from the GitHub repo. It’s about 20 MB and does require SQL Server 2016. If you don’t want to download this file or don’t have SQL Server 2016 installed, you can still get most of the benefit from this notebook by using your own DSN and writing your own queries.
Set Up Your DSN
If You’re On Linux
First, if you’re running on Linux, you’ve got a bit of work to do. For this post, I’m going to assume that you’re running Windows, but if you are on Linux, follow the instructions above and you should be able to follow along. You’ll probably have to change the connection to use SQL authentication instead of Windows authentication, but that’s a minor one-line change in the
conn <- obdcConnect("PASSDATA") block.
If You’re On Windows
Before you begin connecting to a SQL Server instance, you need to set up an ODBC connection. To do this in Windows, go to the ODBC Data Sources application and create a SYSTEM-level data source.
Once you have that data source loaded, you can use RODBC to connect to SQL Server using that DSN. In this case, I created a data source named “PASSData” which connects to a local SQL Server instance’s PASSData database.
Learning What RODBC Can Do
First, we need to get RODBC from CRAN. RODBC is a package which allows you to connect to ODBC-compliant databases using R. This package allows you to connect to a wide variety of database products, which has its positives and negatives. The big positive is that you don’t need to use SQL Server on the back end: you can set up an Oracle, MySQL, Postgres, or a large number of other products. The big negative is that this means less effort can be put into vendor-specific extensions. RODBC is not a least-common-denominator product, but won’t be optimized for every product’s use cases.
We install RODBC with the following commands:
install.packages("RODBC", repos = "http://cran.us.r-project.org") library(RODBC)
If you’d like more details on RODBC, I recommend checking out the documentation.
help(RODBC) RShowDoc("RODBC", package="RODBC")
RODBC provides the odbcConnect function, which lets you connect to a database using a DSN or a connection string. We’re going to use the DSN we created above.
We have a couple methods for getting data. First, we can get metadata using built-in methods. Here’s how to get a list of all tables:
sqlTables(conn, schema = "dbo")
Once you have that, you can easily grab all of the rows in a table using the sqlFetch command:
registrations <- sqlFetch(conn, "dbo.SQLSatRegistrations")
If you want a more fine-grained query or want to write SQL to perform some of the early data cleansing, write a sqlQuery:
sessions <- sqlQuery(conn, paste("SELECT TOP(50) EventName, EventDate, SessionName", "FROM dbo.SQLSatSessions", "WHERE SatNum = 217", "ORDER BY SessionName DESC"))
You can also insert and update data, create tables, and manage tables using RODBC, but for the most part, aside from inserting cleansed data, I’d personally shy away from using a data analysis tool for database management.
At any rate, once you’re done with the connection, close it using the close command:
Given the continued immense popularity of relational databases, it makes sense to know how to connect to a relational database using R. RODBC is just one package which is up to the task—RJDBC and RSQLServer are two others. If you find yourself having performance problems with RODBC on SQL Server, try the others.