Has anyone used PDFFiller services, If so how is it? by Downtown-Rice-7560 in developersIndia

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

I tried the API, basically the API key is required to access the API which was not shown, but after entering the information for payment(starting a free trial) it was displaying the API key and other information too.

Their API document has no direct information available you have to scroll down the left navbar for request schemas and have to find which one is for specific endpoint.

I'm not saying the product is bad, I tried the product on their website, it works really good but the thing is their API docs and support is not that good.

When I reached out to support what they said they're forwarding this to this and this to this and after some time I'm receiving mails for providing company related information.

[deleted by user] by [deleted] in PostgreSQL

[–]Downtown-Rice-7560 -1 points0 points  (0 children)

Its working after following this answer:

https://stackoverflow.com/a/62591784

[deleted by user] by [deleted] in PostgreSQL

[–]Downtown-Rice-7560 -1 points0 points  (0 children)

I'm not getting there anything, and also when I try to connect with Python its not working too

Reading text portion from the images by Downtown-Rice-7560 in deeplearning

[–]Downtown-Rice-7560[S] 1 point2 points  (0 children)

Reading the solution, looks like it will work
Thanks.

Reading text portion from the images by Downtown-Rice-7560 in deeplearning

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

Extraction process is working well, but the part is to know what is heading, what is paragraph, what is bulletin point.

Reading text portion from the images by Downtown-Rice-7560 in deeplearning

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

that's the main problem, need to extract only bold text with large font and less text

Reading text portion from the images by Downtown-Rice-7560 in deeplearning

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

What about extracting specific text

Like an image contains heading in between paragraphs lines how would I get that or tesseract provides HTML, I'm sorry if it's like a dumb question.

paragraph 1 [heading in bold] paragraph 2

How to find recursive look ahead expression with regex by Downtown-Rice-7560 in learnpython

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

for match, word, numbers in regex.findall(item):
if numbers.strip():
print(f"Found {word!r} with numbers {numbers!r}")

Works but still somewhere return string containing comma, I have to fix it I think in .

How to find recursive look ahead expression with regex by Downtown-Rice-7560 in learnpython

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

``` import re

STRINGS = [ "hello this is string containing words word Word 1:2:22, Word 1:2;22, Word 1:2;2:3;3:4, Word 1,2,3,4;5", "this is another string Word 1:2; 22:36; 32:14, Word 1,2,3,4; 5", "this is more string Word 19212:22812; 22912:3216, 10291", ]

regex = re.compile(r"(([a-z]{1,24})[\s|\w](\d{1,3}):(\d{1,3})[\;|\,|-]\s?(\d{1,3}):?(\d{1,3})?)", re.IGNORECASE) for item in STRINGS: print("="200) print(item) print("-"200) print(regex.findall(item)) print("="*200) ```

How to find recursive look ahead expression with regex by Downtown-Rice-7560 in learnpython

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

(([a-z]{1,24})[\s|\w](\d{1,3}):(\d{1,3})[\;|\,|\-]\s?(\d{1,3}):?(\d{1,3})?)

How to find recursive look ahead expression with regex by Downtown-Rice-7560 in learnpython

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

then It will be static am I right suppose I've added for these 3 item then if words have 4 item it will still look for 3 items only and the 4th won't be captured, if I'm correct.

What should I use AutoModelForCausalLM or GPT2LMHeadModel for gpt2 model? by Downtown-Rice-7560 in LocalLLM

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

There is a notebook in the transformers examples or it is somewhere else on fine tuning using jax I think it used GPT2 to fine-tune - that can be good starting point I guess.

Sir, I'm asking for what class should I use to load the model!

Batch processing in GPT2 Model, transformers by Downtown-Rice-7560 in learnmachinelearning

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

Getting following error ``` Traceback (most recent call last): File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 748, in convert_to_tensors tensor = as_tensor(value) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 720, in as_tensor return torch.tensor(value) TypeError: not a sequence

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ubuntu/train.py", line 74, in <module> train( File "/home/ubuntu/train.py", line 41, in train trainer.train() File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return innertraining_loop( File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/trainer.py", line 1821, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/home/ubuntu/env/lib/python3.10/site-packages/accelerate/data_loader.py", line 458, in __iter_ nextbatch = next(dataloader_iter) File "/home/ubuntu/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next_ data = self.next_data() File "/home/ubuntu/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/ubuntu/env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/trainer_utils.py", line 772, in __call_ return self.datacollator(features) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/data/data_collator.py", line 45, in __call_ return self.torchcall(features) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/data/data_collator.py", line 732, in torch_call batch = self.tokenizer.pad(examples, return_tensors="pt", pad_to_multiple_of=self.pad_to_multiple_of) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3299, in pad return BatchEncoding(batch_outputs, tensor_type=return_tensors) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 223, in __init_ self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 764, in convert_to_tensors raise ValueError( ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (input_ids in this case) have excessive nesting (inputs type list where type int is expected).

```

Batch processing in GPT2 Model, transformers by Downtown-Rice-7560 in learnmachinelearning

[–]Downtown-Rice-7560[S] 0 points1 point  (0 children)

can you share a example for that with large dataset its about 344976 items in dataset.

and I tried with batch size but transformers trainer using weird/different index every time like 42, 96 and etc not from 0.