The Hadoop HDFS listener is used to pick up files from the Hadoop HDFS filesystem.
- On the Connection tab configuration you should specify the Hostname of the Hadoop master server to connect to, and the Port on the Hadoop master server that is open to connect to.
2. The FileSystem tab allows you set:
the Polling interval (in seconds, minutes, hours, days or weeks) to specify how often the directory is polled for new files;
the Polling directory to specify the directory path to be polled, from the root of the HDFS system;
the File name restristion to specify a string or regular expression that a file name must match in order to be retrieved;
the File extension restriction to specify a comma separated list of the extansions that the file must have one of in order to be retrieved.
3. The Advanced tab allows you to specify whether or not you should only run the Listener when it is triggered externally, how many elements should be serialized and whether or not to also scan sub-folders.
Also you can set:
Initialize on trigger only: If enabled, the Listener doesn’t start up until a trigger initializes it.
Allow command-line invocation: If enabled, the listener can be invoked using the CLI client application.
Restart on listening error: If enabled, the listener will be restarted after an error occurs.
FIFO Queue Name: The FIFO field enables a “First In, First Out” queuing mechanism between Listeners and Transports. If a “FIFO Queue Name” is provided, that name will be used as a key for a queue Transactions & will be pushed into before reaching a transport. They’ll be ordered in this queue according to when the Listener created them.
FIFO Queue Delay: It is the interval between updates/checks against that queue. Providing a queue name guarantees that a given Transport sends transactions in the same order the Listener created them in.
4. Postprocess: This tab allows you to specify the behavior of the system after a file has been picked up. You can Keep, Delete or Movethe file. If you choose to Move a file, the Target directory becomes required. This allows you to specify the directory to use for the processed files.
5. The Scheduling configuration tab allows you to specify the start times and end times for scheduled execution. In addition, you can specify days of the week or particular dates to exclude from scheduling. The Time Zone drop-down menu allows to specify the Time Zone that should be used for scheduling; by default, it is set to the Time Zone of the console during initial configuration.