Why I created a blog

Its been many years since I first created this blog. It has remained true to Essbase and related information over those years. Hopefully it has answered questions and given you insight over those years. I will continue to provide my observations and comments on the ever changing world of EPM. Don't be surprised if the scope of the blog changes and brings in other Hyperion topics.


Tuesday, September 18, 2012

A little surprise in 11.1.2.2 data loading

Today on OTN was a tread asking about wildcards in MaxL import statements. OTN Thread . I didn’t realize it was possible and up until 11.1.2.2 it was not. John Goodwin researched the new features guide and found

11.1.2.2 essbase new features readme
"Block Storage Parallel Data Load
Parallel data load refers to the concurrent loading of multiple data files into an Essbase database. When working with large data sets (for example, a set of ten 2 GB files), loading the data sources concurrently enables you to fully utilize the CPU resources and I/O channels of modern servers with multiple processors and high-performance storage subsystems.
Parallel data load uses multiple parallel pipelines on the server side, and multiple threads on the client-side, to load multiple data files concurrently, thus enabling data loads to be truly optimized to the capabilities of modern servers."

In the tech reference (for BSO cubes only) is the ability to use wild cards.

For the import statement

Specify whether the data import file(s) are local or on the server, and specify the type of import file(s).

To import from multiple files in parallel, use the wildcard characters * and/or ? in the IMP-FILE name so that all intended import files are matched.

  • * substitutes any number of characters, and can be used anywhere in the pattern. For example, day*.txt matches an entire set of import files ranging from day1.txt - day9.txt.

  • ?* substitutes one occurrence of any character, and can be used anywhere in the pattern. For example, 0?-*-2011.txt matches data source files named by date, for the single-digit months (Jan to Sept).

Example:

import database Sample.Basic
data from local data_file '/nfshome/data/foo*.txt'
using local rules_file '/nfshome/data/foo.rul'
on error abort;


and for



using max_threads INTEGER



Optionally specify a maximum number of threads to use, if this is a parallel data load.



Example:



import database Sample.Basic using max_threads 12
data from data_file '/nfshome/data/foo*.txt'
using rules_file '/nfshome/data/foo.rul'
on error write to 'nfshome/error/foo.err';


If this clause is omitted for a parallel data load, Essbase uses a number of pipelines equal to the lesser of number of files, or half the number of CPU cores.



While I have not had a chance to try this, it would have been very useful to me in the past. I’m guessing it was added for Exalytics efficiency, but we certainly reap the benefits of it.



Another tidbit



While not really a MaxL but more of a windows tip that has been around, but most don’t know. In a path statement in a windows batch file we know if we want to have a path statement in a script (like the import) most people use something like c:\\datafile\\Sample\\mydata.txt. We have to use the \\ because MaxL uses the backslash as an escape character. Did you know you can use forward slashes instead, just like on Unix.



c:/datafiles/Sample/mydat.txt. It can make your life easier