How to pre-train or finetune LLM with structured dataset, so the LLM can reason the relationships between data objects

brook-l · December 1, 2023, 3:17am

I want to train LLM with structured dataset such as database with multiple tables. Here is an simple example:
t_user:

id	name
1	Jason
2	Eric
3	David

t_book:

id	title
1	Gone with The Wind
2	Brave New World
3	Native Son

t_bookstore:

id	name
1	The Book Nook
2	The Literary Loft
3	Wordsmith Books

t_order:

user_id	bookstore_id	book_id
1	2	1
1	2	2
2	3	3

After training the LLM, it can reason on this relational database. Such as when I ask:
how many books did Jason buy in The Literary Loft bootstore? It can answers: 2 books, Gone with The Wind and Brave New World.

How can I prepare the corpus from the database and train the LLM ?

I have searched for some solutions, such as ask the LLM to transform prompts to sql queries, but that’s not what I want.

panigrah · December 1, 2023, 9:40am

what about What is Table Question Answering? - Hugging Face

brook-l · December 4, 2023, 1:41am

Table Question Answering seems can handle only one table, can’t reason relationships between multiple tables, are there any other suggestions?

panigrah · December 5, 2023, 2:07am

There is at least one model for multi table qa.will this work for your data?

Topic		Replies	Views
How to pass table structure to LLM model Intermediate	2	1857	May 1, 2024
Structure-agnostic Knowledge Graph Extracting LLM 🤗Datasets	0	788	February 6, 2024
Newbie - Alpaca stracture custome DataSet Beginners	0	53	July 31, 2024
Teaching an LLM about your domain only with machine readable data Beginners	3	3834	May 7, 2023
How to process tabular data for fine tuning LLMs 🤗Datasets	0	1147	November 24, 2023

How to pre-train or finetune LLM with structured dataset, so the LLM can reason the relationships between data objects

Related topics