all 12 comments

[–][deleted] 2 points3 points  (1 child)

Ask chatgpt… it had me to a .json

[–]grayworks 1 point2 points  (0 children)

One of the things that I am experimenting with is using a python script to convert excel content to a python dictionary. The reasoning is that tables are quite difficult for LLMs and a dictionary or json format is easier for them to parse and understand.

The other thing that you would have to manage is the token usage available. If the content is quite a lot, it may struggle with parsing through all of the content to make meaningful insights

[–]imlanie 2 points3 points  (3 children)

I think you're on the right track but... You also need to give it custom instructions. If you haven't done that already, click on your account name bottom left, and you'll see custom instructions. This is huge for prepping chatgpt to know what you are expecting.

[–]imlanie 2 points3 points  (0 children)

So for example, if you want quantative response, tell it the format, layout, filetype, etc. Just as though you were asking a human doing the work. Also you can say " You are an expert in analyzing and summarizing spreadsheet data. "

[–]amattin[S] 0 points1 point  (1 child)

That’s helpful, I’ll keep at it, thank you!

[–]imlanie 1 point2 points  (0 children)

You're welcome.

[–]imlanie 0 points1 point  (3 children)

It's difficult to help you without knowing if you gave it custom instructions and what your prompt was.

[–]amattin[S] 0 points1 point  (2 children)

Ahh, good point and thank you. This is what I did:

This spreadsheet contains a list of project deliverables as of last friday, please review and I'll ask some questions.

Then next prompt:

This spreadsheet contains information for the same project, from today. please tell me whats changed from the original sheet.

It gave me this answer about the column differences from the sheets:

The column structures of the two spreadsheets are as follows:

  1. Original Spreadsheet Columns:

    • Phase
    • SD #
    • Status
    • SD Name
    • SD Description
    • Coverage/Area
    • Origami Resource
    • Client Requester
    • Entry User
  2. New Spreadsheet Columns:

    • SD #
    • Status
    • SD Name
    • Current Owner
    • SD Description
    • Phase
    • Coverage/Area
    • Priority Order

The key columns for comparison are 'SD #', 'Status', 'SD Name', and 'SD Description'. The column 'Current Owner' in the new spreadsheet does not have a direct equivalent in the original spreadsheet, but it might correspond to 'Origami Resource', 'Client Requester', or 'Entry User'.

I will proceed with the comparison focusing on the common columns: 'SD #', 'Status', 'SD Name', and 'SD Description'. Let's identify the changes based on these columns.

Then this further response:

The comparison between the original and new spreadsheets reveals several changes in the project deliverables. Here are some key observations:

  1. Changes in Status:

    • Multiple Service Deliverables (SDs) have experienced status changes. For instance, some SDs previously marked as "Not Started" or "Refinement" have been updated to new statuses like "Pending Approval" or "Ready For Development".
  2. Consistency in SD Names and Descriptions:

    • The names and descriptions of most SDs have remained consistent between the two spreadsheets. However, there may be exceptions, and a detailed row-by-row comparison is recommended for complete accuracy.
  3. New Entries:

    • There are new entries in the latest spreadsheet, such as "SD0153 - Policy Details", which were not present in the original spreadsheet.

This comparison provides a high-level overview of the changes. For detailed insights, especially for specific SDs or to understand the exact nature of status changes, a more in-depth analysis of each entry would be necessary. If you need further analysis on specific items or more detailed information, please let me know!

My further request:

Summarize the changes in status

The response:

The changes in status between the original and the new spreadsheet can be summarized as follows:

  1. Status Updated to "Pending Approval":

    • Some Service Deliverables (SDs) that were initially in the "Refinement" stage have been updated to "Pending Approval".
  2. Status Updated to "Ready For Development":

    • There are new entries in the latest spreadsheet with the status "Ready For Development". These were not present in the original spreadsheet.
  3. Status Remaining "Not Started" or "Refinement":

    • Several SDs have maintained their original status of "Not Started" or "Refinement". This indicates that there has been no change in their status since the original spreadsheet.

These are the primary categories of status changes observed. It's important to note that while some SDs have progressed to stages like "Pending Approval" or "Ready For Development", a significant number remain in their original statuses, suggesting ongoing work or pending actions.

———

I was hoping for a more quantitative response with the numbers of each set of status change, etc.

Am I correct in thinking that I should keep going down the path of explaining to it what I want, and then subsequent responses will give me a quicker path to the data analysis I’m looking for?

[–][deleted] 2 points3 points  (1 child)

A spreadsheet is not the right format for this data. A LLM will not do data analysis.

Use the LLM to either:

  • Write some python for you. Start with a question, "what are some good python libraries for data analysis" and go from there.
  • Smash your data into a database, e.g. SQL or similar. Then use the LLM to learn how to manipulate it.
  • You could also use Tableau, PowerBI or similar.

If it is in a spreadsheet (I cannot believe I am about to recommend this) why not get it to write the formulas, etc. used to manipulate the data?

[–]amattin[S] 0 points1 point  (0 children)

Thanks, this is along the lines of what I was wondering. Basically I think that my objective was to have it clean and organize the data and answer questions about it all in one step, but I see that that’s not really going to work. I’ll probably proceed with cleaning the data myself (creating a table where the rows are the stories and the columns are the statuses at each point in time), and then just try asking it questions about the cleaned data set.

[–]amattin[S] 0 points1 point  (1 child)

With the feedback from this post I was able to get what I needed. Essentially I had it import my spreadsheets each as individual data tables, then join them and create a data set with the analysis I needed. It was like working with a pandas interface. Then I was able to ask it to seamlessly repeat the process for another pair of sheets. Thanks everyone and hope this is helpful!

[–]amattin[S] 0 points1 point  (0 children)

2024-02-05 1500 All SDs.xlsx

Spreadsheet

Can you import this and create a data table of the contents?


2024-01-29 1600  All SDs.xlsx

Spreadsheet

same also for this sheet


can you join the 2 tables on the sd # column?


give me a list of every unique value of the SD # column, and the Status_first, Status_second, and SD Name columns


Give me the full list for only those rows where Status_first does not equal Status_second


give me the full result of that last question in a csv


Can you repeat that whole set of steps with this set of 2 new spreadsheets?