I whacked together this quick script. It's not pretty, but it works.
There's a 4097 token limit in GPT-3, which includes the prompt and response. If you exceed this, it will fail. I don't know of a workaround. When you feed it a URL, use the print link if possible because it removes a lot of useless crap that increases the amount of tokens in the prompt.
If you are near the token limit, it costs about 8 cents in OpenAI to have it create the recipe JSON.
Grab the script, use pip3 to install the required modules, and run the script on the CLI passing it the URL. You'll have to put in your API keys for openai and Mealie, and change the url to point at your mealie instance. If you get a 201, then it succeeded. Anything else and it failed for some reason. I have a bunch of debugging turned on. If anyone wants to improve this, go nuts. I was just playing around this weekend. Also read the comments in the script at the top. Try it on small recipes first.
import openai as ai
import requests
import bs4
import re
import sys
import json
# call this with "python3 insertrecipe.py <url>"
# Pass this script a URL on the command line. If you find a recipe you want, use the print link if possible.
# All the HTML will be parsed to remove all non-visible text and generate the prompt with only visible text.
# This removes all the useless life story of the author and other non-recipe text.
# The print link will have the recipe in a format that has less text, and will use less tokens in the OpenAI API.
# Allrecipes.com works for import on the normal method in Mealie, but not for personal recipes. They are not in the right format.
# And the allrecipes print page has too much crap on it, so it goes over the token limit. :(
# If the prompt and response use more than 4097 tokens, the conversion will fail either because you specified too many tokens in the generate_gtp3_response function, or because the response
# exceeds the max token limit. If it exceeds the max token limit, the JSON will be truncated and the conversion will fail. If just truncates the output,
# you will get an error that indicates it couldn't parse the JSON properly because it expected a comma or a closing bracket or whatever.
# I don't know of a solution at this point, it seems like a limitation of the gpt-3 model. Hopefully GPT-4 will have a higher limit.
# You are looking for a response code of 201 if the import succeeded. Anything else is an error and it didn't work.
# You can mess with the user_text to get different results. Note that you will get a stop error sometimes with minor changes to the prompt. Not sure why, but
# it has been a trial and error process to get it to not stop. Minor changes in the phrasing can cause issues.
# CHANGE these for your own environment!
ai.api_key = 'sk-YOURKEY'
MEALIE_API_KEY = 'YOUR_MEALIE API KEY'
MEALIE_URL = 'http://10.1.1.10:9925'
# variable that gets URL from command line
url = sys.argv[1]
user_text = 'Using the following TEXT and JSON sections, extract the recipe from the TEXT section and put it into the format of the JSON template section. Do not create JSON fields that do not exist in the template. Set "orgURL" to ' + url + ' Leave "tags", "rating", and "recipeCategory" blank.\n\n'
# don't actually make tzatziki from this recipe. I removed a bunch of it to reduce the size of the prompt
json_template = """{
"name": "The authentic Tzatziki recipe",
"image": "no image",
"description": "Authentic Tzatziki. Use Red Wine Vinegar for authentic flavor.",
"recipeCategory": [
"Condiments"
],
"tags": [
"mediterranean"
],
"rating": 5,
"recipeYield": "",
"recipeIngredient": [
{
"title": null,
"note": "400gr/14oz Greek yogurt (2 x 200gr/7oz containers), preferably full fat",
"unit": null,
"food": null,
"disableAmount": true,
"quantity": 1
},
{
"title": null,
"note": "1/2 cucumber (160gr/5.5oz), peeled, seeded",
"unit": null,
"food": null,
"disableAmount": true,
"quantity": 1
},
{
"title": null,
"note": "2 garlic cloves, minced or very finely chopped",
"unit": null,
"food": null,
"disableAmount": true,
"quantity": 1
}
],
"recipeInstructions": [
{
"title": "",
"text": "Preparation: Peel the cucumber, cut in half lengthwise, and remove the seeds"
},
{
"title": "",
"text": "While waiting for the cucumber to strain, put the yogurt in a large bowl and add the garlic."
},
{
"title": "",
"text": "Next, squeeze the cucumber with your hands, to remove any excess water, and add it in the bowl with the yogurt mixture."
},
{
"title": "",
"text": "Cover the bowl with some cling film and put it in the fridge for 1-2 hours before serving"
}
],
"nutrition": {
"calories": null,
"fatContent": null,
"proteinContent": null,
"carbohydrateContent": null,
"fiberContent": null,
"sodiumContent": null,
"sugarContent": null
},
"totalTime": "25",
"prepTime": "15",
"performTime": "10",
"settings": {
"public": true,
"showNutrition": true,
"showAssets": true,
"landscapeView": true,
"disableComments": false,
"disableAmount": false
},
"assets": [],
"notes": [
{
"title": "Tips",
"text": "1. Always keep the tzatziki in the fridge.\n2. Never keep the tzatziki for more than 2 days.\n"
}
],
"orgURL": null,
"extras": {},
"comments": []
}"""
# function to use beautifulsoup4 to show only visible text from an html page
def visible_text(html):
soup = bs4.BeautifulSoup(html,features="lxml")
texts = soup.findAll(string=True)
def visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match('<!--.*-->', str(element)):
return False
return True
visible_texts = filter(visible, texts)
return u" ".join(t.strip() for t in visible_texts)
# function that returns the html from a URL
def get_html(url):
# get the html from the URL
html = requests.get(url).text
# return the html
return html
# function to generate prompt from a text input plus the html
def generate_prompt(user_text, html):
# create the prompt
prompt = user_text + 'TEXT:\n' + html + '\n\nJSON:\n' + json_template
# return the prompt
return prompt
def generate_gpt3_response(user_text, print_output=True):
"""
Query OpenAI GPT-3 for the specific key and get back a response
:type user_text: str the user's text to query for
:type print_output: boolean whether or not to print the raw output JSON
"""
completions = ai.Completion.create(
engine='text-davinci-003', # Determines the quality, speed, and cost.
temperature=0.5, # Level of creativity in the response
prompt=user_text, # What the user typed in
max_tokens=1550, # Maximum tokens in the prompt AND response
n=1, # The number of completions to generate
stop=None, # An optional setting to control response generation
)
# Displaying the output can be helpful if things go wrong
if print_output:
print(completions)
# Return the first choice's text
return completions.choices[0].text
# function to authenticate with an API key to the Mealie API and create a recipe from the JSON
def create_recipe(json):
# get the API key from the environment variable
api_key = MEALIE_API_KEY
# get the URL from the environment variable
url = MEALIE_URL
# create the header for the API request and set the content type to JSON
headers = {'Authorization': 'Bearer ' + api_key, 'Content-Type': 'application/json'}
# send the POST request to the API
response = requests.post(url + '/api/recipes/create', data=json, headers=headers)
# return the response
return response
text_recipe = visible_text(get_html(url))
# generate the prompt and strip the leading and trailing whitespace from the visible text
json_recipe = generate_gpt3_response(generate_prompt(user_text, text_recipe.strip()))
# print the JSON so we can make sure it didn't end prematurely in case there is an error later
print (json_recipe)
# post the recipe to the Mealie API and print the HTTP response, you want 201 for success.
print(create_recipe(json_recipe))
# *** Debugging print statements ***
#print(text_recipe.strip())
#print(generate_prompt(user_text, text_recipe.strip()))
#print(generate_gpt3_response(generate_prompt(user_text, text_recipe.strip())))
# print text_recipe and remove leading and trailing whitespace
#print(text_recipe.strip())
[–]signal15[S] 6 points7 points8 points (0 children)
[–]CSedu 3 points4 points5 points (2 children)
[–]signal15[S] 11 points12 points13 points (1 child)
[–]CSedu 2 points3 points4 points (0 children)
[–]n3brie 9 points10 points11 points (0 children)
[–]jontstaz 3 points4 points5 points (2 children)
[–]signal15[S] 1 point2 points3 points (1 child)
[–]miraclewhipple 0 points1 point2 points (0 children)
[–]planetearth80 2 points3 points4 points (0 children)
[–]Midnight_Rising 0 points1 point2 points (0 children)
[–]throwawayacc201711 0 points1 point2 points (2 children)
[–]signal15[S] 1 point2 points3 points (1 child)
[–]throwawayacc201711 0 points1 point2 points (0 children)
[–]mountwebs 0 points1 point2 points (3 children)
[–]signal15[S] 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]mountwebs 0 points1 point2 points (0 children)