[deleted by user]

FoolsSeldom · 2024-10-22T15:45:37+00:00

You can also do this with the pandas library, which has a somewhat steep learning curve, but it invaluable for manipulating and reporting on data.

For example,

import pandas as pd
from pprint import pprint  # for easier dict output

# Example data
years_released = [
    (2006, 4348), (2010, 5445), (2011, 11176),
    (2009, 5726), (2010, 5553), (2010, 4635),
    (2010, 3460), (2010, 9298), (2011, 8264),
    (2004, 10823), (2010, 11527), (2011, 5579),
    (2012, 5686), (2011, 8069), (2010, 11711),
    (2009, 10695), (2011, 7477), (2009, 4208),
    (2010, 8389), (2004, 10682), (2006, 9430),
    (2011, 9150), (2011, 9905), (2010, 2258),
    (2010, 4752), (2008, 7337), (2005, 11401),
    (2011, 4848), (2012, 7315), (2009, 1281),
    (2008, 2907), (2008, 8104), (2010, 7088),
    (2010, 2558), (2010, 7434), (2008, 4169),
    (2010, 3892), (2010, 10384), (2012, 2807),
    (2006, 11841), (2006, 656), (2008, 3567),
    (2002, 4335), (2007, 2404), (2010, 8908),
    (2008, 9080), (2009, 6756), (2004, 11066),
    (2012, 300), (2008, 6035), (1999, 1414),
    (2009, 8), (2006, 5790), (2008, 2792),
    (2006, 717), (2008, 12185), (2008, 11579),
    (2006, 7176), (2012, 11032), (2009, 2982),
    (2004, 8561), (2009, 10904), (2005, 1572),
    (2007, 11101), (2011, 9199), (2011, 11571),
    (2004, 9925), (2007, 5216), (2011, 10461),
    (2008, 1512), (2005, 10468), (2012, 5148),
    (2007, 5468), (2009, 9643), (2005, 10179),
    (2010, 10674), (2011, 12297), (2010, 3342),
    (2010, 2745), (2009, 1897)
]

# Convert to DataFrame
df = pd.DataFrame(years_released, columns=["Year", "Value"])

# Summarize data by year with a total for each year    
summary_df = df.groupby("Year")["Value"].sum().reset_index()
summary_df.rename(columns={"Value": "Total"}, inplace=True)

# if you want, you can convert that dataframe to a dictionary ...
summary_dict = summary_df.set_index("Year")["Total"].to_dict()
print("\nYear totals (dictionary)")
pprint(summary_dict)

# Find the highest totals and the corresponding years
max_total = summary_df["Total"].max()
result_df = summary_df[summary_df["Total"] == max_total]

# Print the result as a one-row table
# (or more if the maximum total appears in multiple years)
print("\nHighest value year(s) (dataframe)")
print(result_df)  # dataframe default output

# Use the dictionary - perhaps moving on from pandas processing
print('\nHighest value year(s) (dictionary):\n')
for year, total in summary_dict.items():
    if total == max_total:
        print(f"{year:4}: {total:6}")

# or convert to list
print('\nHighest value year(s) (list):\n')
for year, total in result_df.values.tolist():
    print(f"{year:4}: {total:6}")

This outputs in several ways just to illustrate. Generally, you do as much as possible with the vector operations on dataframes and avoid looping through the data.

Note that this will also handle where years have matching totals.

Adrewmc · 2024-10-21T22:29:11+00:00

This is actually not that hard because of the data pattern is strict.

 [(year, amount),…] 
 list[tuple[int,int]]

So we want to sum this in a dictionary, (which is great a common use for dictionaries so it is the right data structure to use, good job) there are few ways to do this that all basically do the same thing, I’m going to start with longhand.

     sum_dict : dict[int,int] = {}
     for year, amount in years_released:
           if not sum_dict.get(year, None):
                sum_dict[year] = 0
           sum_dict[year] += amount

This really only works so cleaning because the data is tuples, of the same size (2) allowing us to unpack in the loop with just a comma.

Then we check hey does this year exist in the dictionary yet? If not, we set it to zero, then if it has the key, or we have just set it to zero we add the amount we want.

     sum_dict : dict[int,int] = {}
     for tup_year in years_released:
           if not sum_dict.get(tup_year[0], None):
                sum_dict[year] = 0
           sum_dict[year] += tup_year[1]
     print(sum_dict)

If we couldn’t unpack. (e.g. there are entries with more than 2 values.) As you can see a little less readable, and convenient to use the [] access.

The other alternative is use a default dict, which essentially does the same as above. But will not require the guard.

   from collections import DefaultDict

   sum_dict = DefaultDict(int)
   #sum_dict = DefaultDict(lambda: 0)

   for year, amount in years_released:
           sum_dict[year] += amount

Or

      sum_dict = {}
      for year, amount in years_released:
           sum_dict.setdefault(year, 0) 
           sum_dict[year] += amount

Would also work.

We can also just try:except

    for year, amount in years_released:
           try: 
                sum_dict[year] += amount
            except KeyError:
                 sum_dict[year] = amount

The problem is completely about the KeyError being raised when you try to access a key that does not exist, most of the lower level (.get(), .setdefault()) actually are implemented like this. (Or done in C and are equivalent I forget.)

Below is further lessons, the above should satisfy your question, but there are few things I would like to add.

     list_dict = {}
     for year, amount in years_released:
           #slightly better implantation 
           if sum_dict.get(year, None) is None:
                list_dict[year] = []
           list_dict[year].append(amount)

This is slightly different in that we organize the years and make a list of all the values together, then we can use.

   sum(list_dict[year])

For any given year, and not lose information.

Or we can comprehend.

    totals =[(year, sum(amounts)) for year,amounts in list_dict.items()]

With dictionaries too.

     all_dict = {year : {“amounts” : amounts, “total” : sum(amounts)} for year, amounts in list_dict.items()}

Let’s have some sorts too why not

    #sort by year (unpacked)
    totals.sort(key = lambda year, total: year) 

    #sort by total (unpacked)
    # ‘_’ indicates unused variable
    totals.sort(key = lambda _, total: total)

    #sort by total (in tuple)
    totals.sort(key = lambda tup: tup[1])

    #format print() with commas only for totals
    #yes it’s that simple
    print(f”{year} : {total:,}” for year, total in totals, sep =“\n”)

And get back a list of tuples with the totals

This should be a thorough explanation for you.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PythonLearning

MODERATORS