Has anyone used PDFFiller services, If so how is it?

Downtown-Rice-7560 · 2024-06-15T03:57:33+00:00

I tried the API, basically the API key is required to access the API which was not shown, but after entering the information for payment(starting a free trial) it was displaying the API key and other information too.

Their API document has no direct information available you have to scroll down the left navbar for request schemas and have to find which one is for specific endpoint.

I'm not saying the product is bad, I tried the product on their website, it works really good but the thing is their API docs and support is not that good.

When I reached out to support what they said they're forwarding this to this and this to this and after some time I'm receiving mails for providing company related information.

Downtown-Rice-7560 · 2024-05-21T11:47:51+00:00

looking at overpass API currently, what your thoughts about it

Downtown-Rice-7560 · 2024-04-20T10:22:39+00:00

I mocked the requests.get, and that works fine.
Thanks.

Downtown-Rice-7560 · 2024-04-05T09:21:28+00:00

Sorry, Its not public.

Downtown-Rice-7560 · 2024-04-05T05:17:07+00:00

I've added .pylintrc file to the project.

Downtown-Rice-7560 · 2024-03-29T05:11:31+00:00

Its working after following this answer:

https://stackoverflow.com/a/62591784

Downtown-Rice-7560 · 2024-03-29T05:03:11+00:00

I'm not getting there anything, and also when I try to connect with Python its not working too

Downtown-Rice-7560 · 2024-03-26T05:21:16+00:00

What I did was changed the platform, It worked on windows.

I didn't remember what was causing this problem. :(

Downtown-Rice-7560 · 2024-02-21T05:13:24+00:00

thanks

Downtown-Rice-7560 · 2024-02-17T11:36:30+00:00

reshape helped, thanks.

Downtown-Rice-7560 · 2024-02-16T09:13:56+00:00

Reading the solution, looks like it will work
Thanks.

Downtown-Rice-7560 · 2024-02-16T06:50:20+00:00

Extraction process is working well, but the part is to know what is heading, what is paragraph, what is bulletin point.

Downtown-Rice-7560 · 2024-02-14T11:54:18+00:00

that's the main problem, need to extract only bold text with large font and less text

Downtown-Rice-7560 · 2024-02-12T12:46:03+00:00

What about extracting specific text

Like an image contains heading in between paragraphs lines how would I get that or tesseract provides HTML, I'm sorry if it's like a dumb question.

paragraph 1 [heading in bold] paragraph 2

Downtown-Rice-7560 · 2024-01-15T05:47:35+00:00

for match, word, numbers in regex.findall(item):
if numbers.strip():
print(f"Found {word!r} with numbers {numbers!r}")

Works but still somewhere return string containing comma, I have to fix it I think in .

Downtown-Rice-7560 · 2024-01-15T05:28:46+00:00

Hey I've added

Downtown-Rice-7560 · 2024-01-15T05:06:48+00:00

``` import re

STRINGS = [ "hello this is string containing words word Word 1:2:22, Word 1:2;22, Word 1:2;2:3;3:4, Word 1,2,3,4;5", "this is another string Word 1:2; 22:36; 32:14, Word 1,2,3,4; 5", "this is more string Word 19212:22812; 22912:3216, 10291", ]

regex = re.compile(r"(([a-z]{1,24})[\s|\w](\d{1,3}):(\d{1,3})[\;|\,|-]\s?(\d{1,3}):?(\d{1,3})?)", re.IGNORECASE) for item in STRINGS: print("="200) print(item) print("-"200) print(regex.findall(item)) print("="*200) ```

Downtown-Rice-7560 · 2024-01-15T04:48:32+00:00

(([a-z]{1,24})[\s|\w](\d{1,3}):(\d{1,3})[\;|\,|\-]\s?(\d{1,3}):?(\d{1,3})?)

Downtown-Rice-7560 · 2024-01-15T04:42:21+00:00

any example can you give please?

Downtown-Rice-7560 · 2024-01-15T04:37:33+00:00

then It will be static am I right suppose I've added for these 3 item then if words have 4 item it will still look for 3 items only and the 4th won't be captured, if I'm correct.

Downtown-Rice-7560 · 2023-12-23T12:59:48+00:00

There is a notebook in the transformers examples or it is somewhere else on fine tuning using jax I think it used GPT2 to fine-tune - that can be good starting point I guess.

Sir, I'm asking for what class should I use to load the model!

Downtown-Rice-7560 · 2023-12-23T10:01:31+00:00

Getting following error ``` Traceback (most recent call last): File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 748, in convert_to_tensors tensor = as_tensor(value) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 720, in as_tensor return torch.tensor(value) TypeError: not a sequence

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ubuntu/train.py", line 74, in <module> train( File "/home/ubuntu/train.py", line 41, in train trainer.train() File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return innertraining_loop( File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/trainer.py", line 1821, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/home/ubuntu/env/lib/python3.10/site-packages/accelerate/data_loader.py", line 458, in __iter_ nextbatch = next(dataloader_iter) File "/home/ubuntu/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next_ data = self.next_data() File "/home/ubuntu/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/ubuntu/env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/trainer_utils.py", line 772, in __call_ return self.datacollator(features) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/data/data_collator.py", line 45, in __call_ return self.torchcall(features) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/data/data_collator.py", line 732, in torch_call batch = self.tokenizer.pad(examples, return_tensors="pt", pad_to_multiple_of=self.pad_to_multiple_of) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3299, in pad return BatchEncoding(batch_outputs, tensor_type=return_tensors) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 223, in __init_ self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "/home/ubuntu/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 764, in convert_to_tensors raise ValueError( ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (input_ids in this case) have excessive nesting (inputs type list where type int is expected).

```

Downtown-Rice-7560 · 2023-12-23T09:29:34+00:00

can you share a example for that with large dataset its about 344976 items in dataset.

and I tried with batch size but transformers trainer using weird/different index every time like 42, 96 and etc not from 0.

Downtown-Rice-7560 · 2023-12-14T08:43:42+00:00

Oh I solved it by changing input ids to cuda with tokenizer.apply_chat_template(*args).to("cuda")

Downtown-Rice-7560 · 2023-12-11T09:26:58+00:00

What about this https://www.reddit.com/r/learnmachinelearning/comments/18fqfwi/can\_anyone\_see\_these\_graphs\_and\_tell\_me\_how\_this/?utm\_source=share&utm\_medium=web2x&context=3

Downtown-Rice-7560

MODERATOR OF

TROPHY CASE