How can i build a High Quality dataset?

I want to build a high-quality Persian assistance dataset for an SLM.

I have already used models like ChatGPT to generate a small Persian assistance dataset, but the overall quality was not good enough. Since the model I want to fine-tune is small, I need a larger dataset with much less noise and better overall quality.

I want to ask how I can build the fine-tuning dataset I need in a high-quality way.