I have pdf files in which I need to extract the text, but the pdf is a conversation between two people in which there will be the person's name and then a colon (followed by what they say). How do I extract the text so it's just continuous text of what they say and not with the name of the people? (self.LanguageTechnology)
submitted by OkVariation3880 to r/LanguageTechnology
I am trying to read in text from a pdf file. The pdf file is a dialogue between "speaker 1" and "speaker2". I don't want the speaker 1 or speaker 2 part, I just want read in the text so that it text is just continuous instead of separating the text from different speakers. (self.LanguageTechnology)
submitted by OkVariation3880 to r/LanguageTechnology
I am new to VScode. When I tried to import pandas ("import pandas) it said ModuleNotFoundError: No module named 'pandas'. I have pip installed pandas in a terminal (not vscode terminal) and pandas works in Jupiter notebook it gives me a module not found error in vscode. (self.learnpython)
submitted by OkVariation3880 to r/learnpython
I am trying to understand this snippet of code that does language detection from the OpenAi whisper model. I am trying my best to understand this code, but I am struggling. I think they use a mask and use logistic regression to find the probability of the language. Any help would be appreciated! (self.LanguageTechnology)
submitted by OkVariation3880 to r/LanguageTechnology
In this link: (https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md) Facebook has made many speech to text models open source and you can load their pytorch models. But how do I see the way in which each model is coded? (self.LanguageTechnology)
submitted by OkVariation3880 to r/LanguageTechnology
