How to consume CSV files
Using Data Pipeline to ingest data
When you need to take in data from an external source, the first option you should consider is to use the platform's Data Pipeline component.
This enables you to define the data source, transform it to match fields in your tables, then insert or update the data in the specified table. This can be set up to handle streaming data or static files.
A good practical example is to be able to take Copp Clark holiday data that arrives in CSV files.
You can also use the Data Pipeline component to send out data to external systems. This works the same way in reverse, taking data from tables in your Genesis application and transforming it into the format required by the target system.
Here we are going focus on incoming data in static files.
Transforming data
To create a Data Pipeline for ingesting data, you create a -project-name_data-pipeline.kts file to define a pipeline that maps data from an external source (database, file) to tables in your application.
Each Data Pipeline must define three things:
- source specifies the location of the incoming data
- operator parses the source and maps the data to fields in the application's database; typically, you are looking to transform column headings on the incoming files into the relevant fields in the target table in your database
- sink is the destination of the transformed data within your application. This can be a file or a queue, for example. You can use this to further update the data and store it somewhere else - such as a different database or a log
Examples
To show how to use Data Pipeline to ingest data, we have provided an example application. This provides different scenarios, based on incoming data in Copp Clark holiday files.
These start at the most basic case and cover increasing levels of complexity.
Throughout, the code in the application includes detailed comments explaining the steps.
The examples are within one complete example application, which includes a front end so that you can run and see the data.
You can clone the repo to see the code - which includes comments at the key points to highlight what is being specified - and run the application.
Setting up a pipeline in your own app
To find out how to create an app similar to our example, go to the readme file in the repository for instructions.
When you want to create a Data Pipeline in your own application, there are other things you need to do in addition to creating the pipeline files themselves.
Processes.xml
Every process in your app must be configured in the application's *-processes.xml
file. See the howto-csv-ingress-processes.xml
file for a simple example.
At minimum, you need to:
- add
genesis-pal-datapipeline
to<module>
- add
global.genesis.pipeline
to<package>
- add the relevant gpal script (e.g. howto-csv-ingress-cc-simple-data-pipeline.kts) to
<script>
)
For example:
<processes>
<process name="MYAPP_MANAGER">
...
<module>genesis-pal-datapipeline</module>
<package>global.genesis.pipeline</package>
<script>myapp-simple-data-pipelines.kts</script>
...
</process>
</processes>
We would then add code into the data pipeline file like this:
pipelines {
pipeline("MY_PIPELINE") {
source(camelSource { location = getDefaultLocalFileCamelLocation(systemDefinition.get("FolderLocation").get(),"FileName") })
.map { LOG.info("Triggered MY_PIPELINE"); it}
.split(csvRawDecoder())
.map { row ->
...
}
.sink(dbSink())
.onCompletion
}
}
System definition
Every application also has a main configuration file called genesis-system-definition.kts
.
Check this file and make sure that the application knows where to source the files from.
Testing
Go to our Testing page for details of our testing tools and the dependencies that you need to declare.
Full technical details
You can find full technical details of how to use the Data Pipeline in our reference documentation.