Multi-File Joins and Lookup Transforms

Joining large tables to satisfy queries taxes DBMS performance. There has also been no efficient way to compare large files and identify field changes (inserts, updates, deletes) over time.

By defining intersections in database extracts and legacy files, CoSort users can simultaneously discover, transform, and report on related data. And by performing join and lookup functions on flat files, CoSort users can: 1) relieve the DBMS of query overhead; and, 2) incorporate mainframe/index file, spreadsheet, and other data into the process.

“In addition to offloading DBMSs, multi-file joins offload data integration tools, by merging data before it hits the tool,” said Philip Russom, senior manager at The Data Warehousing Institute. “At the high end, this is useful with the distributed architectures that many users apply to scaling up their data integration solutions. At the other extreme, multi-file joins may eliminate the need for a data integration tool.”

Data cleansing, multi-table joins, and complex computations that produce discrete solutions are resource-intensive operations. Where a simple lookup can replace a runtime computation (e.g. mathematic expression or pseudonymization), the performance gain is significant because retrieving a value in memory is faster than computing that value.

To achieve these fast retrievals, CoSort users specify lookups against set files. By referencing multi-column files, users get faster answers to discrete questions like the right zip code for a city in a state lookup. Russom added that “when multi-column files are sources for a data warehouse, multi-dimensional file lookups can generate cubes and other multi-dimensional structures for the warehouse and analysis tools.”

Find this article at: http://www.labnol.org/internet/pressrelease/multi-file-joins-and-lookup-transforms/1153/

, Internet, Press Releases



Google Custom Search