The beet build generates a command-line tool suite for analyzing beet logs. The tool suite can be used to convert binary logs to CSV or XML, apply custom XSL transforms, and perform efficient bulk loads of beet data to an Oracle database. The tools require the following:
There are two packages, for java 5 and java 6; you only need to install the version appropriate to your local SDK install.
After you have unpacked the archive, you can run any of the following utilities
from a command prompt within the created beet
directory. All of these
instructions assume you are logging in the default format, GZIP-compressed FastInfoSet
(i.e. binary XML).
(first time only) Run the provided etl/create_etl.sql script to create the required data structures in your target Oracle database.
Run the import script:
> ./load-event.sh user/pass@sid path/to/log.bxml.gz
The time required for this process will vary with available system resources,
the size of the log, the speed of your connection to the database, and so on. Examine
the resulting log files load_event_csv.log
and (if there were error records)
BAD_EVENT_CSV.log
. Typically errors will only occur if you have tried to insert values to large for the target
schema. If this is the case, you may want to update the schema to accommodate the larger values,
or truncate the bad data (in BAD_EVENT_CSV.log
) and attempt the load again. The provided
structures are adequate to handle most needs, so errors should be rare.
While the script supports Cygwin, an Oracle sqlldr limitation requires the use of very large temporary files in a Cygwin environment. It is strongly recommended that you execute the upload scripts from a true Unix environment with stronger pipeline support, such as Solaris or Linux.
Use of the provided script is simple, but database administration is up to you. Depending on what you hope to do with your data, you will likely want to customize the ETL process to suit your needs. Therefore familiarity with sqlldr and basic Oracle database administration is assumed here. You should examine the provided scripts and make sure you understand what they do before using them.
You can easily export a binary log to a simple XML format legible to humans or other XML processing utilities:
> zcat path/to/log.bxml.gz | java -jar bt-utils.jar -tool xml > result.xml
Be careful, though; the default compressed-binary format has a compression ratio of around 20:1 compared to its plain-text counterpart, so you can use a lot of disk this way. If you plan to apply an XSL transform to the output document, consider the XSLT mode of the export tool asoutlined below.
Similarly, you can export to a CSV file for use in a spreadsheet or older EDI tools:
> zcat path/to/log.bxml.gz | java -jar bt-utils.jar -tool csv > result.csv
Again, assume that your CSV data will be quite a bit larger than the compressed-binary data.
You can certainly use the XML export and stream the result to an XSL transformer. However, executing XSL transforms on large XML documents can be extremely resource-intensive without specialized tools. Included with the log analysis package is an analyzer that splits a large input document into fragments based on an XPath query, and then applies an XSL transform to each fragment. For transforms that are stateless or only need to examine a small part of a document, this is vastly more efficient than loading an entire document to invoke the transform.
The following example splits the input document into one fragment per 'event' element, applying the given XSL transform file to each fragment and streaming the result to standard out:
> zcat path/to/log.bxml.gz | java -jar bt-utils.jar -tool xslt -split event-log/event -xsl etl/insert_events.xsl > result.csv
The included file etl/insert_events.xsl
provides an example transform document, including some custom XSL functions available to transforms
invoked in this way.