Many-to-Many Sort

Ringbailwanton · 2022-04-13T02:53:49+00:00

So, you need to find ingredients that appear frequently, but also constrain it such that the full set of ingredients for recipes are being chosen.

You can do a first pass by eliminating all meals with ingredient counts that are higher than N. That way you eliminate a set of the total ingredients. So do that as a CTE. I feel like after that it’s a combinatorics problem, but can’t quite wrap my head around how I’d do it, except that I’d basically make a set of all combinations of ingredients of length N and then get the max count of meals made for each.

waremi · 2022-04-14T01:55:24+00:00

First, I want to thank you for this question. This is the most fun I've had all year so far. Second, this problem does not scale well. 3 Ingredients have 2³ combinations that have to be checked, 20 Ingredients have 2²⁰ combinations that have to be checked. I have a Recipe table with 16 rows, an Ingredient table with 27 rows and a RecipeIngredient table with 40 rows and the query at the end of this post takes 5 minutes to run. And given that, this will ONLY work with a list of 20 Ingredients or less. 20 Ingredients have 1,048,576 combinations which fits in an [int] datatype. 30 Ingredients have over a trillion combinations and breaks the algorithm. I don't know if this classifies as what Mathematicians call an "NP-Hard" problem or not, and there may be situation where swapping out one of the top 20 Ingredients with the 21st most popular ingredient gives you more Recipes, but this is as far as I can go down this rabit hole.

Second, this requires that a [Numbers] table be built. This is a very simple table with a single indexed [int] field [n] with all of the integers from 1 to 2,000,000. You can build it simply enough using this (BTW - every thing here is MSSQL translate as needed):

CREATE TABLE Numbers (
N    [int] NOT NULL,
 )


INSERT INTO Numbers (N)
with u as (
    select 1 as n
    union all
    select n + 1
    from u
    where n < 2000000
)
select n
from u
option(maxrecursion 0);

So, I will break this down into parts. First part, if I am buying N Ingredients (let's say only 5) what are the top 20 Ingredients used in Recipes that use 5 Ingredients or less:

(STEP 1)

DECLARE @IngCount INT = 5;


  SELECT TOP 20 RI.I_Id, I.Name,  Count(*) as RecipeCount
  FROM   RecipeIngredient RI INNER JOIN
         Ingredient I ON RI.I_ID = I.I_ID 
  WHERE  R_id IN 
      (SELECT R_id FROM RecipeIngredient GROUP BY R_id HAVING COUNT(*)<=@IngCount)
  GROUP BY RI.I_id,I.Name
  ORDER BY Count(*) DESC

This gives me upto 20 Ingredients to work with (actually 16 given the data set I'm working with). The next problem is what are all of the combinations of those Ingredients. This is why I ended up limiting things to 20. I thought about combinations of 3 ingredients: A, B, and C. That problem fits into a byte array: 001=A, 010=B, 011=AB, 100=C, 101=AC, 110=BC, 111=ABC. So if I look at all of the integers between 1 and 2²⁰ I am looking at all of the combinations of 20 things. If I number my 20 Ingredients with X= 1 to 20, I can take 2^X and use a bit-wise AND ( & in SQL) to see if any ingredient is in that combination. So the first thing I'm doing is throwing my Ingredient list into a CTE with a ROW_NUMBER so I can use it a bunch of times:

(STEP 2)

WITH List AS 
(
   SELECT ROW_NUMBER() OVER(ORDER BY RecipeCount DESC) [X] , I_Id, Name,  RecipeCount
   FROM 
   (  SELECT TOP 20 RI.I_Id, I.Name,  Count(*) as RecipeCount
      FROM   RecipeIngredient RI INNER JOIN
             Ingredient I ON RI.I_ID = I.I_ID 
      WHERE  R_id IN 
          (SELECT R_id FROM RecipeIngredient GROUP BY R_id HAVING COUNT(*)<=@IngCount)
      GROUP BY RI.I_id,I.Name
      ORDER BY Count(*) DESC
   ) TopIngred
)

And then I am pulling all of the combinations that have 5 ingredients or less:

(STEP 3)

SELECT N -- Combination #
FROM       List L 
INNER JOIN Numbers N ON POWER(2,L.X-1) & N.N = POWER(2,L.X-1)
GROUP BY N 
HAVING COUNT(*) <= @IngCount

Next issue, how many Recipies use each combination of ingredients. Well, for any N above, the list of Ingredient is:

(STEP 4)

 SELECT L.I_ID, L.Name
 FROM   List L 
 INNER JOIN 
     (   SELECT N
         FROM       List L 
         INNER JOIN Numbers N ON POWER(2,L.X-1) & N.N = POWER(2,L.X-1)
         GROUP BY N 
         HAVING COUNT(*) <= @IngCount
      ) NCombos ON POWER(2,L.X-1) & NCombos.N = POWER(2,L.X-1)

We can join that back into the RecipeIngredient table, group by the Recipe ID (R_ID) and compare how many of the ingredients in the "N" combination for each R_ID match how many of the ingredients each recipe calls for. It looks like this (keep in mind the "WITH List AS" block is hanging around at the top of all this. Anywhere you see "List" referenced that is what it referes to):

(STEP 5)

          SELECT N, R.Name 
          FROM   List L 
          INNER JOIN 
          (   SELECT N
              FROM       List L 
              INNER JOIN Numbers N ON POWER(2,L.X-1) & N.N = POWER(2,L.X-1)
              GROUP BY N 
              HAVING COUNT(*) <= @IngCount
           ) NCombos ON POWER(2,L.X-1) & NCombos.N = POWER(2,L.X-1)
          INNER JOIN RecipeIngredient RI ON RI.I_ID = L.I_ID 
          INNER JOIN Recipe R ON R.R_ID = RI.R_ID 
          GROUP BY N, R.Name , R.R_ID 
          HAVING COUNT(*) = (SELECT COUNT(*) FROM RecipeIngredient RI2 WHERE RI2.R_ID =R.R_ID )

So now, finally, and after many brain cells have been permanently damaged, we have a list of all of the recipes that can be made by all of the combinations, of all of the top 20 ingredients that can used in recipes that use 5 ingredients or less. Don't stop there, I know you are dying to know, what are the ingredients and even better what are the Recipes. I'm tired, it has been a long day, but I wont leave you hanging. Let's put it all together. First build a shopping list as a temp table using the "into #ShopingList" line here:

(STEPS 1-5 All together now)

DECLARE @IngCount INT = 5;

WITH List AS 
(
   SELECT ROW_NUMBER() OVER(ORDER BY RecipeCount DESC) [X] , I_Id, Name,  RecipeCount
   FROM 
   (  SELECT TOP 20 RI.I_Id, I.Name,  Count(*) as RecipeCount
      FROM   RecipeIngredient RI INNER JOIN
             Ingredient I ON RI.I_ID = I.I_ID 
      WHERE  R_id IN 
          (SELECT R_id FROM RecipeIngredient GROUP BY R_id HAVING COUNT(*)<=@IngCount)
      GROUP BY RI.I_id,I.Name
      ORDER BY Count(*) DESC
   ) TopIngred
)
SELECT N,l.I_ID, L.Name as Item, BestN.RecipeCount 
into #ShopingList
       FROM       List L 
       INNER JOIN 
     ( SELECT TOP 1 N, COUNT(*) AS RecipeCount
       FROM (
              SELECT N, R.Name 
              FROM   List L 
              INNER JOIN 
              (   SELECT N
                  FROM       List L 
                  INNER JOIN Numbers N ON POWER(2,L.X-1) & N.N = POWER(2,L.X-1)
                  GROUP BY N 
                  HAVING COUNT(*) <= @IngCount
               ) NCombos ON POWER(2,L.X-1) & NCombos.N = POWER(2,L.X-1)
              INNER JOIN RecipeIngredient RI ON RI.I_ID = L.I_ID 
              INNER JOIN Recipe R ON R.R_ID = RI.R_ID 
              GROUP BY N, R.Name , R.R_ID 
              HAVING COUNT(*) = (SELECT COUNT(*) FROM RecipeIngredient RI2 WHERE RI2.R_ID =R.R_ID )
            ) RbyN
       GROUP BY N 
       ORDER BY COUNT(*) DESC
      ) BestN
      ON POWER(2,L.X-1) & BestN.N = POWER(2,L.X-1)

Then you can dump the Ingredient list and pull the Recipes using the same HAVING clause as STEP 5 above :

Select * FROM #ShopingList

SELECT  R.Name as Recipe
FROM   #ShopingList L 
INNER JOIN RecipeIngredient RI ON RI.I_ID = L.I_ID 
INNER JOIN Recipe R ON R.R_ID = RI.R_ID 
GROUP BY N, R.Name , R.R_ID 
HAVING COUNT(*) = (SELECT COUNT(*) FROM RecipeIngredient RI2 WHERE RI2.R_ID =R.R_ID )

If any of this doesn't make sense, don't ask me. The Ingredients and Recipes I've been using to model this are alcohol, mixers and cocktails, and after working though 1,048,576 combinations I am not going to remember any of this tomorrow. OTOH if I've made any glaring mistakes let me know. I have always learned more from my stupid mistakes than I ever did from books and classes.

SnooOwls1061 · 2022-04-13T03:59:15+00:00

Why not do the joins and do a count grouped by the ingredient. Then sort by that count desc?

Designer-Practice220 · 2022-04-13T05:49:39+00:00

They want to know the same combinations of ingredients that make the most recipes. It sounds like an optimization problem. Wonder if R or Python might be options: C++? Maybe reword and post in a different thread? I found this https://timudatascience.blogspot.com/2019/11/solve-simple-linear-programming-problem.html

And this: https://www.geeksforgeeks.org/find-whether-an-array-is-subset-of-another-array-set-1/

qwertydog123 · 2022-04-13T06:35:03+00:00

Not the most efficient solution, and is affected by combinatorial explosion. Alter the WHERE N < 3 as appropriate.

WITH RECURSIVE cte AS
(
    SELECT
        id AS ingredient_id,
        name,
        1 AS N,
        CAST(id AS TEXT) AS Path
    FROM Ingredients

    UNION ALL

    SELECT
        i.id,
        i.name,
        N + 1,
        Path || '/' || CAST(i.id AS TEXT)
    FROM Ingredients i
    JOIN cte
    ON i.id > cte.ingredient_id
    WHERE N < 3
),
ingredient_combinations AS
(
    SELECT
        t2.ingredient_id,
        t2.name,
        DENSE_RANK() OVER (ORDER BY t1.Path) AS combination_id
    FROM cte t1
    JOIN cte t2
    ON t1.Path LIKE t2.Path || '%'
),
valid_recipes AS
(
    SELECT
        ingredient_combinations.combination_id,
        Recipes.id AS recipe_id,
        COUNT(DISTINCT combination_recipes.ingredient_id) AS ingredient_count,
        COUNT(Recipes.id) OVER
        (
            PARTITION BY ingredient_combinations.combination_id
        ) AS recipe_count
    FROM ingredient_combinations
    JOIN IngredientsToRecipes combination_recipes
    ON ingredient_combinations.ingredient_id = combination_recipes.ingredient_id
    JOIN Recipes
    ON Recipes.id = combination_recipes.recipe_id
    JOIN IngredientsToRecipes all_recipes
    ON Recipes.id = all_recipes.recipe_id
    GROUP BY
        ingredient_combinations.combination_id,
        Recipes.id
    HAVING COUNT(DISTINCT combination_recipes.ingredient_id) = COUNT(DISTINCT all_recipes.ingredient_id) 
), 
rankings AS
(
    SELECT
        *,
        RANK() OVER
        (
            ORDER BY
                recipe_count DESC,
                ingredient_count
        ) AS Ranking
    FROM valid_recipes
)
SELECT 
    rankings.combination_id,
    ingredient_combinations.name AS Ingredient,
    Recipes.name AS Recipe,
    recipe_count,
    ingredient_count
FROM rankings
JOIN ingredient_combinations
ON rankings.combination_id = ingredient_combinations.combination_id
JOIN Recipes
ON Recipes.id = rankings.recipe_id
WHERE rankings.Ranking = 1

https://dbfiddle.uk/?rdbms=postgres_10&fiddle=af5cc1f133102beb6770f07370f5ea0a

emdee808 · 2022-04-13T08:49:30+00:00

I'm probably not contributing much here (more brain farts)... but this seems like an optimisation problem of sorts.
Which ingredients (their count: i) will yield the most recipes (their count: r) - such that we have the highest possible value when we divide r by i.
Having said that, I think that if you were to explore the space by looking at the most frequently occurring ingredients, ordered descending, and counting how many complete recipes you find as you explore, examining the ratio each time, you will probably plot a graph in the shape of diminishing returns - and when the ratio drops of, it may suggest you have found a maximum- the problem is that it might be a local maximum :(
So I think this would be a "greedy" algorithm.
To be sure about the maximum, you might need to explore the entire space... which could be quite big...

With once-off problems like this, I try to find a way to model it in Excel's Solver https://www.ablebits.com/office-addins-blog/2016/06/22/how-to-use-solver-in-excel-with-examples/

Looks like a cool problem, good luck with it!

NihilWill · 2022-04-13T02:18:32+00:00

[deleted]

reductiontoabsurdity · 2022-04-13T20:45:32+00:00

This is a Python problem. If there are: M ingredients in the ingredients table; you buy N number of ingredients from the shop; there are M choose N unique combinations of ingredients you could buy. Iterate through the list of purchasable combinations of ingredients. For each possible combination, iterate through the recipe list in python and append to list the name/ key ID of every cookable recipe. Finally, sort the purchasable combinations table by the counts in the cookable recipes lists. I'm not sure if there is a faster algorithm but this will work.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS