Processing multiple files

Provided that the reader supports, you can also process multiple files, one after the other. For that you either specify them explicitly (multiple arguments to the --input option) or use a glob syntax (e.g., --input "*.json"). For the latter, you should surround the argument with double quotes to avoid the shell expanding the names automatically.

If you have a lot of files, it will be more efficient to store these in text files (with one file per line) and pass these to the reader using the --input_list option (assuming that the reader supports this). Such file lists can be generated with the llm-find tool. See Locating files for examples.

As for specifying the output, you simply specify the output directory. An output file name gets automatically generated from the name of the current input file that is being processed.

If you want to compress the output files, you need to specify your preferred compression format via the global -c/--compression option of the llm-convert tool. By default, no compression is used.

Please note, that when using a stream writer (e.g., for text or jsonlines output) in conjunction with an output directory, each record will be stored in a separate file. In order to transfer all the records into a single file, you have to explicitly specify that file as output.