JRower


Contents:

Overview
Quick Start
Validating input data
Csv Data Files
Delimited Data Files
Fixed Length Data Files
Excel Files
Adding custom logic
Embedding JRower in your program
JRower Servlet
Developing JRower
Acknowledgements and License






Overview


JRower is a tool for loading Database tables from flat files. The flat file can be a text file in CSV, delimited, or fixed length format. It can also be an Excel file. You supply a XML file describing the fields in the file and thier mapping to the database table columns. You can also add validations that the input data must pass before it is inserted to the database. A command line tool for running JRower is provided and also a Servlet for running JRower from a web page.







Quick Start


First download jrower.jar.
To perform the load enter "java -jar jrower.jar MyLoad.xml" on the command line.

As an example, assume we have a simple database table for the persons on a bowling team. The table and it's Oracle sequence looks like this:

   CREATE TABLE PERSON
   (
     person_id     number(6),
     name          varchar2(100),
     nick_name     varchar2(100),
     age           number(3),
     birthday      date,
     score         number,
     team_name     varchar2(100),
     favorite_food varchar2(400),
     created       date
   );

   create sequence person_seq increment by 1 start with 1 nomaxvalue nominvalue nocache;
  

Every so often we get a new data file with the updated persons. We wish to delete all the rows from the person table and insert the fresh data from the file. The fields in the data file are delimited by a vertical bar "|" character. The file has five lines of header at the top and includes an extra column "Shoe Size", that we will ignore, because we are not storing it in our person table
The data file is named Test-delimited.txt, and looks like this:

                  The "Pin Pals" bowling team members

    Name|Shoe size|Nickname|Age|Birthday|Score|Favorite Foods
    *************************************************************************************************
    John|9|Johnnie|20|12/25/1985|1.2|Eggs and Ham, Ham sandwich, Roast Beef
    Harold|11|Harrythee|22|02/15/1966|2.3|Cereal,Pizza,Macaroni and cheese
    Stan|8.5|"TheMan!"|55|12/15/1970|3.4|Pancakes,Turkey on Rye, Beef wellington
    Steve|9|"The Stevinator", "The SteveMeister"|66|01/30/1945|5.6|BLT, Bolonga Sandwich, Lobster 
  

We will create an XML file, called loadSimpleDelimited.xml, that will specify how we wish to load the data from Test-delimited.txt to the "person" database table

<?xml version="1.0" encoding="UTF-8"?> <load> <!-- Oracle Database and table we are inserting records to --> <dbType>Oracle</dbType> <dbUrl>jdbc:oracle:thin:@localhost:1521:jddb</dbUrl> <dbUserName>scott</dbUserName> <dbPassword>tiger</dbPassword> <table>person</table> <sequence name="person_seq" pkColumn="person_id" /> <sqlToRunBeforeLoad>delete from person</sqlToRunBeforeLoad> <!-- Input data file --> <file pathAndName="Test-delimited.txt" type="delimited" delimiter="|" /> <skipLines top="5" /> <!-- Input data file fields and associated table coulmns--> <column fieldNumber="1" dbColumn="name" type="String" /> <column fieldNumber="3" dbColumn="nick_name" type="String" /> <column fieldNumber="4" dbColumn="age" type="Integer" /> <column fieldNumber="5" dbColumn="birthday" type="Date" > <simpleDateFormat>MM/dd/yyyy</simpleDateFormat> <invalidDateErrorMessage>Invalid birthday</invalidDateErrorMessage> </column> <column fieldNumber="6" dbColumn="score" type="Float" /> <column fieldNumber="7" dbColumn="favorite_food" type="String" /> <column constantValue="Pin Pals" dbColumn="team_name" type="String" /> <column sqlToGetValue="(sysdate)" dbColumn="created" /> </load>

To perform the load enter "java -jar jrower.jar loadSimpleDelimited.xml" on the command line.
The jrower.jar, Test-delimited.txt data file, and loadSimpleDelimited.xml should be in the current directory.

As it runs JRower will log messages to the command line, and also to a file "jrower.log". The logging level defaults to "SEVERE", so that you won't be bothered with any messages unless they are severe errors.
You can the loging level by providing the level on the command line as the 2nd parameter. The logging level are the Java JDK: java.util.logging levels: SEVERE, WARNING, INFO, CONFIG, FINE, FINER, and FINEST.
"java -jar jrower.jar loadSimpleDelimited.xml FINEST" will log all.


Explanation of XMl file


The <dbUrl>, <dbUserName>, and <dbPassword> point to the database with the <table> we are loading.
The person table has a primary key that is supplied by a Oracle sequence.
The sequence name and the primary key column is given by the: <sequence name="person_seq" pkColumn="person_id"> tag.

The next tag: <file pathAndName="Test-delimited.txt" type="delimited" delimiter="|" >
Specifies that we want to load a file called Test-delimted.txt.
In the file the data fields are separated by a delimiter and that delimiter is a vertical bar "|".
This <skipLines top="5" > tag tells us that the first five lines of this file are header and should be ignored.

The <column> tags specify a field in the input data file that we wish to load to a database table column.
The first <column> tag: <column fieldNumber="1" dbColumn="name" type="String" >
Specifies to take the first field in the input file (terminated by a "|") and put it into the database column "NAME" on the person table. The type=String means we will treat this as String data by calling preparedStatement.setString() when inserting.

The next column tag is another String column. <column fieldNumber="3" dbColumn="nick_name" type="String" >
Notice that the fieldNumber is three. We skipped the second, "Shoe Size", field in the input data file, because we are not loading it to our datbase table.

The next column is an Integer data type.
<column fieldNumber="4" dbColumn="age" type="Integer" >
This field will be treated as a Integer by calling preparedStatement.setInteger() when inserting. Likewise we will preparedStatement.setFloat() when loading the sixth field- "SCORE".

The "BIRTHDAY" column is a date and is defined like this:.
<column fieldNumber="5" dbColumn="birthday" type="Date" >
  <simpleDateFormat>MM/dd/yyyy</simpleDateFormat>
  <invalidDateErrorMessage>Invalid birthday</invalidDateErrorMessage>
</column>
For this field we need to specify the java.text.SimpleDateFormat format string inside of the <simpleDateFormat> for parsing this date. If the input field can't be parsed according to this format, we will display the error message given in the <invalidDateErrorMessage> tag.

<column constantValue="Pin Pals" dbColumn="team_name" type="String">
This column tag is different in that it doesn't have a fieldNumber="" attribute. The value to insert for this datatbase column does not come from the input data file- but is given in the constantValue="" attribute. This tag will put the String "Pin Pals" in the TEAM_NAME column of every row on the database table.

<column sqlToGetValue="(sysdate)" dbColumn="created" >
This column tag also doesn't have a fieldNumber="" attribute. The value to insert for this column is the result of running the SQL given. This tag will load the current Oracle system date and time in the "CREATED" column. If this was the only column defined then the SQL insert statement would look like this: "insert into person ( created) values ( (sysdate) )".
You can put any SQL here that returns a single value.

The sql given in the <sqlToRunBeforeLoad> tag will be run before the load.
To scratch the table and completely refresh it use:
<sqlToRunBeforeLoad>delete from person</sqlToRunBeforeLoad>
To only load the delete the "Pin Pals" rows from the table use:
<sqlToRunBeforeLoad>delete from person where team_name = 'Pin Pals'</sqlToRunBeforeLoad>
If you don't want to delete at all- but append new data- then leave this tag out.







Validating input data


You can supply input error checking logic by adding validation tags inside the <column> tags. Here is an example. The Person table's "NAME" column must by 1 to 100 letters, and can't be blank.

<column fieldNumber="1" dbColumn="name" type="String" > <validation errorWhen="notMatch" > <!-- must be all letters, otherwise show the error message --> <regExp>^[A-Za-z]{1,100}$</regExp> <message>Invalid name. Name can't be blank and can't contain blanks, numbers, or special characters.</message> </validation> <validation errorWhen="match" > <!-- Must not be blank --> <message>Name can't be blank</message> <regExp>^ *$</regExp> </validation> </column>

The two <validation> tags defines conditions that must be met before the user can save the record. The <regExp> tag defines the regular expression to compare the user input to, and the <message> tag has the message to display if an error is detected. The errorWhen="match/notMatch" specifies if the error is when the regular expression matches the user input or when it fails to match.

The next validation example is used for the Age column which should be 1 to 3 digits.

<validation errorWhen="notMatch" > <regExp>^\d{1,3}$</regExp> <message>Invalid age must be a number, maximum 3 digits.</message> </validation>

This validation ensures the date is in the mm/dd/yyyy format.

<validation errorWhen="notMatch" > <regExp>^\d\d/\d\d/\d\d\d\d$</regExp> <message>Invalid Date must be in format mm/dd/yyyy.</message> </validation>

If any validation fails the load will abort and the database won't be changed.







Csv Data Files


Data files can come in many formats. One of the most common is CSV (Comma separated values) format. In CSV files each field has a comma at the end to separate it from the following field.
If a field contains a comma within it, the entire field is surrounded with double quotes
If a field contains a double quote within it, then the double quote is repeated.
In our bowling leauge example the CSV file would look like this.


              The "Pin Pals" bowling team members

Name   Shoe   Nickname         Age      Birthday     Score  Favorite Foods
-----  -----  --------        ----      ---------    -----  --------------------------------------
John,    9, Johnnie,        20,      12/25/1985,  1.2,     "Eggs and Ham,  Ham sandwich, Roast Beef"
Harold, 11, Harry thee,     22,      02/15/1966,  2.3,     "Cereal, Pizza, Macaroni and cheese"
Bill,  8.5, """The Man!""", 55,      12/15/1970,  3.4,     "Pancakes, on Rye, Beef wellington"
Steve,   9, "\"The Stevinator\"\, \"The SteveMeister\"",  66, 01/30/1945,  5.6, "BLT, Bolonga Sandwich,  Lobster" 
 

The Favorite Foods columns are surrounded by double quotes because they contain commas. The nickname columns for Bill and Steve are surrounded by double quotes because they contain embedded double quotes or commas. The embedded quotes around "The Man!" in Bill's nickname are repeated. So that: "The Man!" becomes: ""The Man!""
Some CSV files escape thier embedded double quotes and commas with a \ in front. The embedded quotes in Steve's nickname are escaped.

The tag: <file pathAndName="testing/csv/Test.csv" type="csv" delimiter="," >
Tells us that this is a csv file and the delimter is a comma.
Each column tag gives a "fieldNumber" attribute that corresponds to it's order amongst the comma separated fields in the input file.
The "dbColumn" attribute specifies the database column to save the data to.
See the example CSV load file: loadCsv.xml

The Ant task "ant testCsv" runs the Csv example load.







Delimited Data Files


A delimited file is one where the fields are separated by a special delimiter character. Common delimiters are tabs, vertical bars, slashes, and semi-colons. It is important that the delimiter character chosen not occur naturally in the data.

Here is an example of a delimited file with our example bowling team's data.


              The "Pin Pals" bowling team members

Name|Shoe size|Nickname|Age|Birthday|Score|Favorite Foods
*************************************************************************************************
John|9|Johnnie|20|12/25/1985|1.2|Eggs and Ham, Ham sandwich, Roast Beef
Harold|11|Harrythee|22|02/15/1966|2.3|Cereal,Pizza,Macaroni and cheese
Stan|8.5|"TheMan!"|55|12/15/1970|3.4|Pancakes,Turkey on Rye, Beef wellington
Steve|9|"The Stevinator", "The SteveMeister"|66|01/30/1945|5.6|BLT, Bolonga Sandwich, Lobster 
 

The XML file for delimited file is simular to the to the Csv load XML file.
The tag: < file pathAndName="testing/delimited/Test-delimited.txt" type="delimited" delimiter="|" >
Tells us that this is a delimited file and the delimiter is a vertical bar "|".
Each column tag gives a "fieldNumber" attribute that corresponds to it's order amongst the delimiter separated fields in the input file.
The "dbColumn" attribute specifies the database column to save the data to.
See the example delimited load file: loadDelimited.xml

The Ant task "ant testDelimited" runs the delimited example load.







Fixed Length Data Files


A fixed length file is one where the fields have hard-coded lengths and always begin at the same position on the input line.
Here is an example of a fixed length file with our example bowling team's data.


              The "Pin Pals" bowling team members

Name      Shoe  Nickname      Age Birthday   Score Favorite Foods
-----     ----  --------      --- ---------  ----- ------------------------------------
John        9  Johnnie        20  12/25/1985  1.2  Eggs, Ham sandwich, Beef
Harold     11  Harry          22  02/15/1966  2.3  Cereal,  Pizza, and cheese
William   8.5  Bill           55  12/15/1970  3.4  Pancakes, Turkey on Rye, wellington
Stephen     9  Steve          66  01/30/1945  5.6  BLT, Sandwich, and crab 
 

See the example fixed length load file: loadFixed.xml
The tag: < file pathAndName="testing/fixed-length/Test-fixed.txt" type="fixed" > Tells us that this is a fixed field length file.

The XML file for fixed length file is different from the Csv or delmited files in that the order the field appears in the input file is not important. The "fieldNumber" attribute is missing from the column tag.
Instead each column is given a start and end position.



  <column fieldNumber="1" dbColumn="name"          start="0"  end="9"   type="String">
  <column fieldNumber="2" dbColumn="nick_name"     start="15" end="29"  type="String">
  <column fieldNumber="3" dbColumn="age"           start="30" end="31"  type="Integer">
  <column fieldNumber="4" dbColumn="birthday"      start="34" end="43"  type="Date">
  <column fieldNumber="5" dbColumn="score"         start="45" end="48"  type="Float">
  <column fieldNumber="6" dbColumn="favorite_food" start="51" end="100" type="String">
  

For example the person's name starts at position 0 and ends at position 9.
The Ant task "ant testFixed" runs the fixed length example load.







Excel Files




Here is an example of an Excel spreadsheet with our example bowling team's data.



See the example Excel load file: loadExcel.xml
The tag: < file pathAndName="testing/excel/Test-excel.xls" type="Excel" sheetNumberInWorkBook="0" >
Tells us that this is an Excel file and we want to load the first (zero) spreadsheet in the workbook.
The <skipLines top="5"> tags tells us to skip the first 5 header lines.
Each column tag gives a "fieldNumber" attribute that corresponds to it's order amongst the cells of the spreadsheet.
The "dbColumn" attribute specifies the database column to save the data to.
This column tag: < column fieldNumber="3" dbColumn="nick_name" type="String" >
Tells us that the NICK_NAME column on the database PERSON table is loaded from the third column of cells in the spreadsheet.
The Ant task "ant testExcel" runs the Excel example load.







Adding custom logic


Sometimes there may not be a neat one-to-one correspondence between a field in the input data and a column on the database table. You may wish to combine several input fields into one database column, or split one input field amongst two columns. Or you may need to perform some calculation to get generate the value. To do this you can provide a Java class with the custom logic by using the loadDecoratorClass tag.

Here is an example of invoking custom logic:
<loadDecoratorClass>com.jdonohue.jrower.loaddecorator.ExampleLoadDecorator</loadDecoratorClass>
The specified Java class "ExampleLoadDecorator" will be instantiated at run time.
This class should extend the LoadDecorator abstract class and is required to implement two methods: decorateColumns(List columns) and decorateRecord(List record). The ExampleLoadDecorator class "decorates" the load logic with custom behavior.

The decorateColumns() method is given a ArrayList of the column tags defined in the load.xml file. These columns generally correspond to the database columns that will be included in the sql INSERT statement. If you wish to customize what database columns get loaded you can add or delete from this list. The decorateColumns() method is called only once per load; before the database inserts are done.

The decorateRecord() method is given a ArrayList of the record fields that will be used as the values to place in each database column. You can add or delete elements from this list. The decorateRecord() method is called prior to each insert.

Imaginge if our bowling team example had an input data file format such that the "Favorite Food" was split up into three coulmns: Favorite Breakfast, Favorite Lunch, and Favorite Diner. We want to concatenate together and save all three of these input columns to the "FAVORITE_FOODS" column on the database.

Here is the example input.

              The "Pin Pals" bowling team members

Name|Shoe size|Nickname|Age|Birthday|Score|Favorite Breakfast|Favorite Lunch| FavoriteDiner
*************************************************************************************************
John|9|Johnnie|20|12/25/1985|1.2|Eggs and Ham|Ham sandwich|Roast Beef
Harold|11|Harrythee|22|02/15/1966|2.3|Cereal|Pizza|Macaroni and cheese
Stan|8.5|TheMan|55|12/15/1970|3.4|Pancakes|Turkey on Rye|Beef wellington
Steve|9|Stevie|66|01/30/1945|5.6|BLT|Bolonga Sandwich|Lobster 

See the example load xml file : loadCustomLogicDelimited.xml
This file has the line: <loadDecoratorClass>com.jdonohue.jrower.loaddecorator.ExampleLoadDecorator</loadDecoratorClass>
Based on this line we will instantiate the class ExampleLoadDecorator, and call it's decorateColumns() and decorateRecord() methods during the load.
Notice in loadCustomLogicDelimited.xml that we don't define the 7th "FAVORITE_FOODS" column. We will instead add it dynamically at run time.

The ExampleLoadDecorator custom class:

public class ExampleLoadDecorator extends LoadDecorator {

  /** 
   *
   */
  public void decorateColumns(List columns) { 
    logger.finest("Decorating Columns");
    //Addding a new 7th column: "favorite_food"
    Column column = new Column(7, "favorite_food", "String");
    columns.add(column);

  }

  /** 
   *
   */
  public void decorateRecord(List record) {
    logger.finest("Decorating record");
    //We wish to concatenate fields 6, 7, 8 to one field.
    //So that the 3 columns: Favorite Breakfast, Favorite Lunch, Favorite Dinner are pasted together 
    //and inserted as one into the "favorite_foods" column.
    String combinedFields = record.get(6) + ", " + record.get(7) + ", " + record.get(8);
    record.remove(8); record.remove(7); record.remove(6);
    record.add(combinedFields);
  }

}

This will take the three input fields: "Eggs and Ham", "Ham sandwich", and "Roast Beef". At positions 6, 7, and 8 in the input record and concatenate them and save them in the new 7th column.
The Ant task "ant testCustomLogicDelimited" runs the custom logic example load.






Embedding JRower in your program


If you wish to use JRower in your program use the main() and load() methods in the JRower class as examples. Or use the JRowerServlet.java class as an example.






JRower Servlet


The JRowerServlet is a J2EE web application (JRower.war file) that runs pre-defined JRower loads from a web page. You create the xml files that describe loads. The users can upload new data files and then load it using one of these "xml load" files. You are only giving the user the ability to perform a pre-defined load using a data file, that they upload. The user can upload new data files, but can't change the load logic. You can enforce validations that the input file must pass before the load is done.

Screen Shots:

Login





Step 1: Upload a file




The user has clicked "Browse" and selected a file called "Test-delimited.txt", and then clicked "Upload". The file was uploaded ok.





Step 2: Validate a file where file has bad data


The user has validated a data file called "Bad.csv" against the "loadCvs.xml" file.
The data file "Bad.csv" has errors in its data. The errors are show in red.





Step 2: Validate a file, where file has good data



The user has validated a data file called "Test-delimited.txt" against the "loadDelimited.xml" file. The validation was succesfull. The data file is ready to be loaded using the logic in the loadDelimited.xml file.
The "Number of rows of data in input file:" shows how many new rows will be loaded.
The "Number of rows that will be deleted from data base:" shows how many rows will be deleted from database in preparation for the load.
The user should look at these numbers and check that they make sense, before clicking the "Load" button.





Step 3: Load to Database



The user has clicked the "Load" button. The load was succesfull and 4 rows were inserted to the database.


Servlet Install

To run the servlet, install the jrower.war, and go to:
   http://localhost:7001/JRower/JRowerServlet.

The file "jrower.properties" defines the password and gives the directories jrower.jar will search for "load xml" and data files.
This file should be put under the home directory of the user who runs the App server. For WebLogic this might be: C:\bea\user_projects\domains\mydomain. The ant task "ant -buildfile webapp/build.xml deployExplodedWebApp" will copy the example properties file to the web app server directory.
The example jrower.properties file is shown below:

    dataFilesDirectory=jrower/dataFiles
    loadXmlDirectory=jrower/loadXmlFiles
    password=admin
   


You will need to create the directories specified in the properties file.
Like so: mkdir jrower; cd jrower; mkdir dataFiles; mkdir loadXmlFiles.






Developing JRower


To develop JRower first download the source tree: jrower-src.zip. You must have ant installed (tested with 1.6.1)


The main Ant tasks are:
ant compileCompile main jrower classes
ant testCvsTest with the example CSV data file.
testDelimited Test with the example delimited data file.
testFixed Test with the example fixed length data file.
testExcel Test with the example Excel data file.
testCustomLogicDelimitedTest with the example that uses custom load logic.
ant jar Create jrower.jar file for use when running from command line. The jrower.jar includes all required libirary classes in it for convience. We unzipped the libray jar files and re-jarred them as one jar.
cleanDelete all compiled class files.

The Ant tasks for the JRowerServlet are:
ant -buildfile webapp/build.xml deployExplodedWebApp Compile and deploy the JRowerServlet web application to the local app server.
ant -buildfile webapp/build.xml createWar Compile and deploy the JRowerServlet web application to the local app server.
ant -buildfile webapp/build.xml clean Delete all files created as part of the JRowerServlet web application.



Note: The webapp/build.xml has hardcoded paths to the J2EE Web App server assumed to be installed on the local machine. You will need to edit lines 9, 10, and 24 as appropriate for your development environment.
These are the lines that need to change:

    9:  <property name="DEPLOYDIR" value="C:\bea\user_projects\domains\mydomain\applications"/> 
   10:  <property name="RUNTIMEDIR" value="C:\bea\user_projects\domains\mydomain"/> 
   24:  <fileset dir="C:\bea\weblogic81\server\lib">
  






Acknowledgements and License


This tool makes use of or includes several JAR libraries. They are listed below:

Jdom
POI- HSSF
Xerces


JRower License

The JRower tool is licensed under the GNU GPL.

Questions or Comments to:  


Up to Java Pages
Up to John Donohue's home page