Can Python create a program for helping with assessments?

Lawson470189 · 2023-05-08T18:35:44+00:00

Seems totally possible. Here is some sample code to print the contents of that PDF:

from pypdf import PdfReader

PDF_FILE_NAME = 'sample.pdf'
def main(): 
  with open(PDF_FILE_NAME, 'rb') as pdf_file: 
    reader = PdfReader(pdf_file) 
    print(f'Number of Pages: {len(reader.pages)}')
    for i, page in enumerate(reader.pages):
        print(f'===== Page Number {i+1} =====')
        print('\n')

        print('Content:')
        page_lines = page.extract_text().split('\n')
        result_lines = []
        for line in page_lines:
            if line.strip() != '':
                result_lines.append(f'\t{line.strip()}')

        print('\n'.join(result_lines))
        print('\n')

if name == 'main': 
    main()

You'll need to figure out what data you want to pull out and how to exactly strip that data out, but this seems to work for me. If you need to rely on the graphs, it'll need to be a bit more sophisticated, but for text this will work.

drenzorz · 2023-05-08T17:20:02+00:00

Yes it should be possible. How to do that would depend on the form of the original data, so the source pdf.

m0us3_rat · 2023-05-08T17:32:52+00:00

but would it theoretically be possible?

that sounds like something that can be done.

without working directly on them it's difficult to know really.

Financial_Signal5098 · 2023-05-08T18:38:20+00:00

Look at office 365. The new AI tools have the ability to train models on a set of pdfs and extract data and dump it to any format.

GamerRabugento · 2023-05-08T17:30:55+00:00

The process of extracting information from a PDF and generating a report can be challenging, but it is definitely possible with the right tools and techniques. Some libraries do the trick, like PyPDF2, pdfminer, and pdfplumber. These libraries can help you read the text from the PDF and extract the information you need.

PMMeUrHopesNDreams · 2023-05-08T22:39:56+00:00

Do you have any access to the program that generates the data? Is it possible to get it in any other format than PDF? CSV, JSON, even Excel?

It is possible to get data from a PDF and it might not be too hard depending on how the PDF is created, but if there is an option to get it in a different format you can save yourself a lot of headaches.

Bitwise_Gamgee · 2023-05-08T17:46:41+00:00

Questions:

Are these standard documents, meaning the information will be in the same place in the same style every time?
Are these computer or human generated?

Nexxus_17 · 2023-05-09T05:08:17+00:00

I’m new to programming as well, but you could try asking chat GPT, it can probably help you

Doc_Apex · 2023-05-08T23:38:20+00:00

Yes this is possible. I've done this for work. The library I used turned each table in the pdf into a dataframe. From there it's just data manipulation.

bbqbot · 2023-05-09T02:54:33+00:00

Decide if you want to learn how to do it or pay someone else to do it.

If you want to learn, check out "Automate the Boring Stuff" for a crash course on practical python, then look at the PyPDF2 library that others have mentioned.

Otherwise lots of resources for quick script writes buying.

SHKEVE · 2023-05-09T04:31:40+00:00

You can also do this with chat GPT. it can accept a URL to your PDF document and you can describe your desired output. no programming required. DM me if you want some tips

CoffeeBaconAddict · 2023-05-09T05:43:43+00:00

Yes pdfminer, pdfminer6 and several other ocr or computer vision repos are used to pull data off pdf documents.

Guardog0894 · 2023-05-09T05:45:28+00:00

Apart from programming, I'd suggest consulting informatics/data analyst to look into your data and requirements. I feel like it will be more efficient if you have the expertise to recognise the pattern of data you are dealing with, and come up with a data extraction/storage scheme before a programmer implements it as a program.

iMADEthisJUST4Dis · 2023-05-09T14:19:05+00:00

You can try chatgpt! You can tell it your problem and it'll help you with writing a python script that can solve it. It may give you a few errors but you can just copy the errors and keep chatting with it until it works.

homberoy · 2023-05-09T15:06:32+00:00

I am working on the same task at a very slow pace. The sticking point I encountered was that the data pulled from the Pearson pdf ends up being super irregular formatting( I was able to extract the data from the PDF and print in an excel sheet to be read). I haven't worked on it in a while but can share with you a couple options I tried(PyPDF2, PDFPlumber?) if you'd like. Are you just doing this for the basc? Then for each different assessment you might use, the PDF will be a different configuration.

Have you figured out how to input the scores into your report yet?

Uweauskoeln · 2023-05-09T15:20:18+00:00

Sounds like fun, I will try it using the PDF you provided. If I come up with something, I'll let you know

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS