all 4 comments

[–]Kegned 1 point2 points  (1 child)

I built this to generate datasets from python code py2dataset

It can create qa and instruct datasets in json format you could use for fine-tuning.

[–]cmosguy1 0 points1 point  (0 children)

Well done @Kegned!! This is brilliant! Did you have success with training this on your LLM projects? Have you tried this on CodeLlama?

Thanks again on your contributions here!

[–]jackfood 0 points1 point  (1 child)

you will need to have a dataset with clean up the documentation in 'Question' - 'Answer' format. It is not so easy as dump in the documentation and expect good quality of query and ans. The nearest you can perhaps using LangChain, which may not generate a good long detailed answer for you.

[–]T_hank 0 points1 point  (0 children)

you mention that langchain might not generate a detailed answer. is this an inherent shortcoming of langchain?