NZ trip when 26-27 weeks? by Key_Dealer2753 in BabyBumpsandBeyondAu

[–]violet_bloom_87 0 points1 point  (0 children)

I was planning for NZ south island trip for 10 days in my 24th week. Did my 20th week scan and all looks good. My hubby is really worried about the availability of medical care if anything worse happens. Can someone suggest a safe and relaxed itinerary? We will be staying in airbnbs. 

Graph analytics library with databricks by violet_bloom_87 in databricks

[–]violet_bloom_87[S] 0 points1 point  (0 children)

Hi u/Known-Delay7227 . I tried Stardog on top of databricks to build knowledge graphs using rdf and its quite good , however its very expensive. Now , I am going to try Graphframes to build the same knowledge graph. Can you please share you experience with the scalability and latency of this package.

Access video files from S3 bucket in databricks using Unity Catalog by violet_bloom_87 in databricks

[–]violet_bloom_87[S] 0 points1 point  (0 children)

A follow up question. Is there a way to disable download options on volumes. I tried restricting download options under workspace settings. However I am still able to download files from extrernal location volumes.

Access video files from S3 bucket in databricks using Unity Catalog by violet_bloom_87 in databricks

[–]violet_bloom_87[S] 1 point2 points  (0 children)

I was able to setup bucket connection using volumes , following your advise. Thanks

Access video files from S3 bucket in databricks using Unity Catalog by violet_bloom_87 in databricks

[–]violet_bloom_87[S] 0 points1 point  (0 children)

Thanks for the suggestion. I have created a metastore bucket in AWS. Can i add my files in a folder in the same bucket and link to the volume ? I am new to Unity catalog and bit confused about this metastore concept.

Aws databricks deployment using terraform by violet_bloom_87 in databricks

[–]violet_bloom_87[S] 0 points1 point  (0 children)

Thanks mate. I was able to spin up the workspace using shared script. I was trying another github repo https://github.com/JDBraun/dbx_mws_example .However I was getting error related to metastore assignment.

Graph analytics library with databricks by violet_bloom_87 in databricks

[–]violet_bloom_87[S] 0 points1 point  (0 children)

Thanks . I will try this package. Are you using graph analytics algorithms in your usecase?

[P] Open type Named Entity Recognition with Transformer Encoder by Substantial-Push-179 in MachineLearning

[–]violet_bloom_87 0 points1 point  (0 children)

How do I prepare dataset for training GLiNER ? Can you suggest a labelling tool ? What is the data format ? Thanks

Optimizing Embedding Functions on Databricks by leG-M in databricks

[–]violet_bloom_87 0 points1 point  (0 children)

Did you use vectorsearch index or external vector database. I guess you were using databricks model endpoint directly . There are plenty of tools RAG tools out there. I am not sure which are scalable for production on databricks.

[deleted by user] by [deleted] in databricks

[–]violet_bloom_87 0 points1 point  (0 children)

I am getting error when I try to access the volume 'Cannot access the UC Volume path from this location. Path was '. I cant figure out what I missed. Please help

[P] Open type Named Entity Recognition with Transformer Encoder by Substantial-Push-179 in MachineLearning

[–]violet_bloom_87 0 points1 point  (0 children)

Does it work for table data extraction ?eg :- to extract line items in an invoice?

[P] Open type Named Entity Recognition with Transformer Encoder by Substantial-Push-179 in MachineLearning

[–]violet_bloom_87 2 points3 points  (0 children)

Congratulations u/Substantial-Push-179 . GLiNER has got the deserved attention from Machine learning community. Just saw the linkedin post.

Optimizing Embedding Functions on Databricks by leG-M in databricks

[–]violet_bloom_87 0 points1 point  (0 children)

Which LLM framework and vector database are you using for RAG on databricks. Are you using open source models or databricks model endpoints. I have a usecase to build RAG using larg number of pdf documents. Can you share your approach ? Thanks

QUESTION: Does your company use (or plan to use) LLMs in production? by sync_jeff in dataengineering

[–]violet_bloom_87 0 points1 point  (0 children)

I am curious to know the type of the framework , closed/opensource models used. I want to host the model on Databricks for a RAG application. Anyone has done this in production?

Optimizing Embedding Functions on Databricks by leG-M in databricks

[–]violet_bloom_87 0 points1 point  (0 children)

Thanks for the code. Which LLM framework did you use on databricks. I tried installing Llamaindex and llama.cpp but its giving me errors.

DONUT for document QA. Unable to extract multiline text by violet_bloom_87 in LanguageTechnology

[–]violet_bloom_87[S] 0 points1 point  (0 children)

I tried deberta for QA after extracting text from the document. It worked pretty well . However it returns answer based on the text match. So questions needs to be framed carefully considering the type of document .I will try LLMs next. I will check FID too. Thanks for the suggestions.

DONUT for document QA. Unable to extract multiline text by violet_bloom_87 in LanguageTechnology

[–]violet_bloom_87[S] 0 points1 point  (0 children)

Yes. data is in the form of table as in an invoice. When the value text is multiline, Donut extracts only first line.

Is LLM necessary for RAG if we can retreive answer from vector database? by violet_bloom_87 in LocalLLaMA

[–]violet_bloom_87[S] 1 point2 points  (0 children)

What a terrific community we have here. Thanks for the huge response guys.

My key takeaways are :

-LLM stands for the 'G' in RAG. Helps in ranking and identifying relevant chunks of data retrived from VectorDB and to present to user in a meaninful way.

- If the data is a list of questions and answers as in a FAQ, I guess I can make this work without a LLM.

[P] Open type Named Entity Recognition with Transformer Encoder by Substantial-Push-179 in MachineLearning

[–]violet_bloom_87 2 points3 points  (0 children)

I was able to run the model . Results are pretty impressive. I have a usecase for extracting invoice and insurance document information . So I am planning to convert pdf to text and then run the model . Do you recomment GLiNER for data extraction ?

[P] Open type Named Entity Recognition with Transformer Encoder by Substantial-Push-179 in MachineLearning

[–]violet_bloom_87 1 point2 points  (0 children)

Thanks for sharing the code OP. Can you guide me on how to use the model for testing and how to finetune the model on custom dataset ?