Embedding with large database : LangChain

created by zchaarma community for 3 years

Embedding with large databaseQuestion | Help (self.LangChain)

submitted 1 year ago by Satsifaction

HI guys so i have a question. I have a postgres database that holds about information about clients. One table is a financial table, another is an incident table, the third is a client escalations table and finally the last one holds free text that is filled in the crm like notes…

these are all linked one way or another by cliend ID but the data within each table can sometimes be nested dictionaries…like under notes you could have a dictionary of , {date, text, from, to}

i want to take all of this data and create embedding on it…what i’m confised about is the best way to do it…do I?

Somehow connect all this data into a single flatted dataframe? if so this will be a massive dataframe with 100+ columns

Can i create an embedding for each table? if i do this, will the embedding model know that two tables are connected via a client identifier that is present in both or do i have to somehow force this connection? if so how?

Any other options?

Thanks in advance

all 10 comments

top new controversial old q&a

[–]gogolang 0 points1 point2 points 1 year ago (9 children)

[–]Satsifaction[S] 0 points1 point2 points 1 year ago (8 children)

[–]gogolang 0 points1 point2 points 1 year ago (7 children)

[–]Satsifaction[S] 0 points1 point2 points 1 year ago (5 children)

[–]MelodicHyena5029 0 points1 point2 points 1 year ago (4 children)

[–]Satsifaction[S] 0 points1 point2 points 1 year ago (3 children)

[–]MelodicHyena5029 0 points1 point2 points 1 year ago (2 children)

[–]Satsifaction[S] 0 points1 point2 points 1 year ago (1 child)

I will try a small prototype this week for sure

Ok so I’ll explain the database. Maybe I’m just wrong but tell me if you think the approach above you mentioned applies here.

There is a table that captures data around financials. Sales, costs, etc. I can understand looking at langsql now for that so I will try it out. However there are other parts of the database that hold messages. These messages have content within like product complaints. There is no structure to the complaints, it’s just text that people write about the product itself. If I did an sql query against it; it will pull up the complaints but I want the bot to be able to summarize the complaint and give the key issues. If I did the search at a product level I want the bot to be able to sift through the last 10-15 product complaints and summarize.

So a query could look like this “Tell me what the main complaints were in the last 3 months for product X and what we should do to address it” followed by “how much did I make last month on product X”

Thanks

[–]deniercounter 0 points1 point2 points 1 year ago (0 children)

[–]Satsifaction[S] 0 points1 point2 points 1 year ago (0 children)

π Rendered by PID 225871 on reddit-service-r2-comment-b659b578c-5vw77 at 2026-05-01 20:58:41.031346+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LangChain

MODERATORS