Hadoop HDFS Listener
Within the eiConsole or eiPlatform the Hadoop HDFS Listener is used to pick up files from the Hadoop HDFS file system.
Listener (Adapter) Configuration Drop-Down List
Connection Hadoop HDFS Listener Configuration Options
On the Connection tab configuration you should specify the Host of the Hadoop master server to connect to, and the Port on the Hadoop master server that is open to connect to.
Hadoop HDFS Listener Connection Configuration Options
FileSystem Hadoop HDFS Listener Configuration Options
The FileSystem tab allows you set:
- Polling Interval – to specify how often the directory is polled for new files (in seconds, minutes, hours, days or weeks)
- Polling Directory – to specify the directory path to be polled, from the root of the HDFS system
- File Name Restristion – to specify a string or regular expression that a file name must match in order to be retrieved
- File Extension Restriction – to specify a comma separated list of the extansions that the file must have one of in order to be retrieved
Hadoop HDFS Listener FileSystem Configuration Options
Advanced Hadoop HDFS Listener Configuration Options
The Advanced tab allows you to specify whether or not you should only run the Listener when it is triggered externally, how many elements should be serialized and whether or not to also scan sub-folders.
Also, you can set the following:
- Allow Command-Line Invocation – if enabled, the Listener can be invoked using the CLI client application
- Restart on Listening Error – if enabled, the Listener will be restarted after an error occurs
- FIFO Queue Name – the FIFO options enable a “First In, First Out” queuing mechanism between Listeners and Transports. If a FIFO Queue Name is provided, it will be used as a key for a transaction queue. Transactions will be written to this queue before they reach a Transport. The transactions in this queue will be ordered according to when they were created by the Listener.
- FIFO Queue Delay – this is the interval between updates or checks against that queue. Providing a queue name guarantees that a given Transport sends transactions in the same order that the Listener created them in.
Hadoop HDFS Listener Advanced Configuration Options
Transaction Logging Hadoop HDFS Listener Configuration Options
The Transaction Logging tab allows you to specify:
- Transaction Logging Enabled – if enabled, allows transaction events originating from this Listener to be logged by a TransactionEventListener
- Log Transaction Data – if enabled, logs transaction data body
- Log Transaction Data Base64 – if enabled, logs transaction data body as Base64
- Log Transaction Attributes – if enabled, logs transaction attributes
- Log All Attributes – if enabled, no attributes will be filtered
- Allowed Attributes – attributes that are allowed to be logged
Hadoop HDFS Listener Transaction Logging Configuration Options
Inactivity Hadoop HDFS Listener Configuration Options
The Inactivity tab allows you to specify:
- Enable Inactivity Monitor – check this box to enable inactivity monitoring. This will throw a non-transaction exception if the specified number of transactions haven’t been processed in the specified time interval.
- Min. Transactions to Expect – the number of transactions to expect to be completed per monitoring interval
- Monitoring Interval – how often to check the specified number of transactions that have been processed
- Times to Monitor – if set, monitoring will be done during the defined times of the day. To ignore, set start and end time equally.
- Days to Exclude from Monitoring – inactivity monitoring will not occur on the days specified
- Include Errors in Transaction Count – if checked, transactions that attempted to start, but failed at the Listener stage, will also be counted
Hadoop HDFS Listener Inactivity Configuration Options
Throttling Hadoop HDFS Listener Configuration Options
The Throttling tab allows you to specify:
- Throttling Mode – the throttling mode to use for limiting the number of transactions or messages emitted by this Listener. “Timed” will limit transactions based on time intervals, while “Concurrent” will limit based on a concurrent number of transactions. “Concurrent” mode requires a Throttling Response Processor step later in your interface workflow to acknowledge completion.
Hadoop HDFS Listener Throttling Mode
- Throttling Mechanism – the mechanism to use for throttling messages. “Blocking” prevents the Listener from continuing to process and emit messages altogether, while “queued” pushes received messages into the interface queue or a default, in-memory queue.
- Max Concurrent Messages – how many messages can be concurrently processed, either by time-based limits (allow X per second) or synchronous (allow X at any time)
- Timed Emission Interval – the interval for time-based limits (allow X per X timed emission interval)
- Synchronous Timeout Interval – the interval to wait for a synchronous response before failing
Hadoop HDFS Listener Throttling Configuration Options
Post-Process Hadoop HDFS Listener Configuration Options
The Post-Process tab allows you to specify the behavior of the system after a file has been picked up. You can Keep, Delete or Move the file. If you choose to Move a file, the Target Directory becomes required. This allows you to specify the directory to use for the processed files.
Hadoop HDFS Listener Post-Process Configuration Options
Scheduling Hadoop HDFS Listener Configuration Options
The Scheduling tab allows you to create a schedule for how often the chosen Listener should be run. You can easily modify the start time or end time.
- Scheduled Start Time – specify the scheduled start time. If left blank, the system will defer to the polling interval listed on the Basic tab.
- Scheduled End Time – specify the scheduled end time. If left blank, the system will defer to the polling interval listed on the Basic tab.
- Week Days to Exclude – specify days of the week to exclude from scheduling
- Dates to Exclude – specify specific dates to exclude from scheduling
- Time Zone – specify the Time Zone that should be used for scheduling. By default, it is set to the Time Zone of the eiConsole during the initial configuration.
To modify the scheduled start or end time, choose the three dots next to the corresponding line. You will receive a dialogue box that looks like this:
Hadoop HDFS Listener Scheduling Configuration Options