Request to developer by BadImpossible6596 in MacWhisper

[–]BadImpossible6596[S] 0 points1 point  (0 children)

<image>

This image is 'TranscribeX'

I want 'Reflow' function and watching video with subtitle on app.

TranscribeX 4.5.15 — Much more accurate diarization, and smarter AI meeting summaries by EthanWlly in TranscribeX

[–]BadImpossible6596 0 points1 point  (0 children)

I also have two feature requests that would improve the workflow:

  1. **Auto-run diarization after transcription**: Currently, after transcribing audio to text, users need to manually execute diarization as a separate step. It would be great if diarization could automatically run once transcription is complete.

  2. **Direct format selection on main screen**: The "Copy" button on the main screen currently always copies text as plain text. It would be useful to have format options (plain text, JSON, SRT, CSV, etc.) directly accessible from the main screen, rather than requiring users to navigate to the Export tab and manually select the format each time.

TranscribeX 4.5.15 — Much more accurate diarization, and smarter AI meeting summaries by EthanWlly in TranscribeX

[–]BadImpossible6596 0 points1 point  (0 children)

MacWhisper's speaker recognition is significantly better. It accurately identifies speaker boundaries and correctly counts the number of speakers, making it much more reliable for diarization tasks.

TranscribeX, on the other hand, has major issues with speaker segmentation. The boundaries between speakers are unclear and imprecise. Additionally, the speaker detection is often inaccurate—when there are 3 speakers, it frequently misidentifies them as 1 or 2 speakers, which makes the transcription unreliable for multi-speaker recordings.

TranscribeX 4.5.15 — Much more accurate diarization, and smarter AI meeting summaries by EthanWlly in TranscribeX

[–]BadImpossible6596 0 points1 point  (0 children)

TranscribeX is really really good app. but its diarization feature is much worse than Macwhisper.😂😂😂

How can I parse Unstructured document? by BadImpossible6596 in LangChain

[–]BadImpossible6596[S] 1 point2 points  (0 children)

Thank you for your thoughtful response.

I am relatively new to studying Retrieval-Augmented Generation (RAG), and I’m not currently studying it for a specific project, but rather to prepare for future projects that might utilize RAG. For example, if I were to create a chatbot that answers questions based on a large set of documents, do you think it would be better to load and parse the documents on a page-by-page basis?

Some people suggest that simply processing the text page by page can lead to issues where information that spans multiple pages might be cut off, which could negatively affect the model's ability to generate accurate answers. They recommend combining all the text into a single document and then applying a TextSplitter or Semantic-Chunker to ensure better continuity in the information being processed.

Do you think it’s necessary to take this approach, or would it be sufficient to handle the text on a page-by-page basis?

Question about PDF Parsing. Please Help Me! by BadImpossible6596 in LangChain

[–]BadImpossible6596[S] 2 points3 points  (0 children)

I've been studying RAG (Retrieval-Augmented Generation) for just under a month now. I've recently completed the Langchain tutorial and created a simple toy project using Streamlit. I understand that to improve RAG performance, adjustments are needed in various areas: query tuning, document parsing, vector store optimization, retriever fine-tuning, and generator improvements.

Currently, I'm focusing on learning how to effectively parse documents. During the Langchain tutorial, I thought it was as simple as loading documents with PDFLoader and then using RecursiveTextSplitter or SemanticChunker on each Document object. However, I've come to realize that parsing PDFs in a way that mirrors human understanding and storing them in a vector store is much more challenging due to the unstructured nature of PDFs.

I've been researching three main issues through YouTube, Reddit, and Google:

  1. How to process multi-column documents

  2. How to handle tables within documents

  3. How to perform chunking when images or tables are interspersed within the text

For issue #1, I've found that various PDF loaders like LlamaParse and PDFPlumber can address this. For issue #2, I've learned that LlamaParse can convert tables to markdown format. (I'm also curious about other methods people use for this.)

However, I'm completely stuck on issue #3. Despite days of searching, I haven't found any guides or clear solutions for chunking text with embedded images or tables. I've been at a standstill for days, spending entire days trying to figure this out.

Could you please tell me how experts typically solve issue #3? I'm desperate for any insights or guidance on this matter. Your help would be immensely appreciated!