Access Data from EXASOL Using ODBC

Data

Access Data from EXASOL Using ODBC

by Archana Chinnaswamy
//

Unlike JDBC, ODBC is language-independent. ODBC is an open interface that can be used to communicate with databases. All you should do is create an ODBC connection to EXASOL database in your system. For windows, follow these steps to add ODBC creation. Here is an example of using EXASOL data for predictive analysis in R tool.

Steps to Create ODBC Connection

  1. Click on the Start Menu
  2. Select Control Panel
  3. Select Administrative Tools and double-click the ODBC Data Sources (either 32-bit or 64-bit) icon
  4. Click on the System DSN tab
  5. Click the Add button
  6. Select EXASolution Driver from the driver’s list and click the Finish button

ODBC Data Source Administrator

EXASOL Data Retrieval

How To Import Data Into EXASOL

In this example, I am using two datasets: train and test. Train dataset is used to build a predictive model and test dataset is used to validate or test the model built. Do not panic if you do not have datasets handy in EXASOL. All you should do is use import statements to load data from your local machine. An import statement in EXASOL can get you gigabytes of data in a few minutes.

I used the import statement for the first time and it was so fast. It seemed like the dataset was already there in EXASOL and I didn’t need to do any importing. Remember to create tables in EXASOL as per the dataset before importing the data.

I already have the train.csv and test.csv file in C: drive and I am simply giving the path to import. Since it is a CSV file, column separator is “comma.”

IMPORT INTO retail.TRAIN FROM LOCAL CSV FILE ‘C:train.csv’

ENCODING = ‘UTF-8’

ROW SEPARATOR = ‘CRLF’

COLUMN SEPARATOR = ‘,’

COLUMN DELIMITER = ‘”’

SKIP = 1

REJECT LIMIT 0;



IMPORT INTO retail.TEST FROM LOCAL CSV FILE ‘C:test.csv’

ENCODING = ‘UTF-8’

ROW SEPARATOR = ‘CRLF’

COLUMN SEPARATOR = ‘,’

COLUMN DELIMITER = ‘”’

SKIP = 1

REJECT LIMIT 0;

 

EXASOL Data To Build Predictive Models In R

First, you need to install EXASOL package in R. The package can be found on GitHub. This example demonstrates the use of the random forest technique to predict image id in test data given the pixel input.

# packages
install.packages("RODBC")

library(RODBC)

install.packages("randomForest")

library(randomForest)

install.packages("devtools")

devtools::install_github("exasol/r-exasol")

Connect to EXASOL using the ODBC connection that was made in the first step and read the data from tables in EXASOL.

# connect

C <- odbcConnect("exa_db")

# read data

train <- exa.readData(C, "SELECT * FROM retail.TRAIN")

test  <- exa.readData(C, "SELECT * FROM retail.TEST")

Remove image id from the test data since we are using this data to predict values of image id using the model. Also, extract the labels from the train data. “Labels” in the train data is simply the target variable.

# remove imageid from test

test     <- test[order(test[1]),]

imageid  <- as.factor(test[,1])

test     <- test[,-1]

# extract labels from train
labels <- as.factor(train[,1])

train  <- train[,-1]

I am using the random forest technique to build 500 trees. Random forests correct overfitting problems created by decision trees, because random forests are an ensemble method of classification that operates by constructing many decision trees.

# random forest algorithm
model  <- randomForest(train, labels, xtest=test, ntree=500, keep.forest=TRUE)

predictions <- cbind(imageid ,levels(labels)[rf$test$predicted])

# print prediction
predictions
# print forest
model

Random Forest Algorithm technique

You can also save the model built locally and reuse it with a different dataset.

# save the model locally
save(model,file="model")

# reload and reuse
load("model")
predict(model,test)

The model turned out to have a misclassification rate of 60%. We can see that the predicted values for image id four to nine are wrong. The model could definitely be trained in conjunction with larger datasets. This is where EXASOL comes in handy since we can load and access huge datasets within mere seconds.  

More About the Author

Archana Chinnaswamy

Analytics Consultant
How to Load Data from JSON to EXASOL If you’ve been looking for a way to load JSON data into EXASOL with just a click, then you’ll love this post! As data scientists, most ...
EXASOL – Small Business Edition Many of you must have heard about EXASOL. If not, I hope this blog gives you the motivation to discover it’s capabilities for yourself. ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!