Home
The llm-dataset-converter library (and its dependent libraries) can be used for converting Large Language Model (LLM) datasets from one format into another. It has support for the following domains:
- Pretrain
-
Supervised
- Classification
- Pairs (Q&A, P/R)
-
Translation
Please refer to the dataset formats section for more details on supported formats.
But the library does not just convert datasets, you can also slot in complex filter pipelines to process/clean the data.
On this website you can find examples for:
- Downloader usage
- General usage
- Processing multiple files
- Locating files
- Compression
- Filter usage
- Docker usage
Examples for the additional libraries: