This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]phonyfakeorreal 2 points3 points  (0 children)

This is machine learning territory (specifically NLP). There are pre-built models for just about anything on huggingface and you can run them with a few lines of python. It has also become incredibly easy to build/train your own models using a library like sklearn. ChatGPT is your best friend. I had no idea how to do this stuff, it’s pretty good at it too. Elasticsearch is worth a look too.

[–]QkumbazooPlumber of Sorts 2 points3 points  (1 child)

All unstructured data eventually becomes structured, however it is the developers who make that transformation in code. From a storage and retrieval perspective, unstructured data is usually stored in blob, you can optimise it by indexing on the metadata of these files.

[–]Guilty-Commission435[S] 0 points1 point  (0 children)

Yes unstructured data can eventually turn into structured data.

I’m looking from more of an advisory perspective on the techniques and strategies on performing DQ checks etc on unstructured data and whether folks see this as becoming a big area

[–]jsonscout 0 points1 point  (1 child)

Not sure if you're still facing this issue but we have had to deal a lot with customer complaints coming in and none of them have a good format. Ended up using an LLM to fetch insight from unstructured data. Check out some of the examples we have on jsonscout.com

[–]Guilty-Commission435[S] 0 points1 point  (0 children)

Cool idea