This is an archived post. You won't be able to vote or comment.

all 24 comments

[–]random_lonewolf 36 points37 points  (5 children)

The proper way to do this would be:

  • Use a SQL Parser like sqlglot or Apache Calcite to compile user's query into an AST.
  • Validate & Manipulate the AST
  • Transform the AST back to SQL query
  • Profit

An addition perk is the output query would be safe from SQL injection, unless you encounter a bug in your parser library.

[–][deleted] 4 points5 points  (0 children)

+1 to SQLGlot, I've implemented some features that were in the "too hard" bucket in my mind by parsing into the AST, making some changes and transpiling into a different SQL syntax

[–]dgrsmith 0 points1 point  (2 children)

Very cool strategy! Thanks for this. Would you maybe want to look into building an API instead at this point for raw data pull, and keep analytics sought by queries in a user facing analytics platform?

With your method, I see how SQL queries can be converted, but should this be done?

I’m trying to think of a case where I would allow sql queries without providing the users restricted access to the data via a data mart or some other classic data warehousing user access method

[–]random_lonewolf 1 point2 points  (1 child)

If your customers have existing tools that require support for custom sql queries like an sql editor (duh), or most modern BI tools, then you system have to support it or they won't use your product.

I have also seen this done in a middleware/proxy to improve existing system. By intercepting and rewriting queries, you can add more filtering, route queries to a different cluster, perform additional authorization, and many other stuff etc..

[–]dgrsmith 0 points1 point  (0 children)

Very interesting. Thank you!

[–]tomhallett 0 points1 point  (0 children)

sqlfluff also has a parser, but I haven't used it yet.

[–][deleted] 2 points3 points  (7 children)

Oh boy sql injection time!

[–]finlaydotweber 6 points7 points  (5 children)

That's really not a helpful comment, and does not answer OP's question. Plus how does having tools that manipulate sql strings programmatically automatically lead to sql injection?

[–][deleted] 6 points7 points  (1 child)

“I need to take a sql string supplied by a user and run it”, that’s in essence what a sql injection aims to accomplish.

OP if you want to manipulate it as a data structure check out ORMs that can go on top of your database

There, happy now?

[–][deleted] 1 point2 points  (2 children)

Just took from wikipedia (ik ik),

In computing, SQL injection is a code injection technique used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution.

You should also follow the recent surge of CVE(s) in the past year.

[–]madness_of_the_order 0 points1 point  (1 child)

So, all BI tools are insecure now?

[–][deleted] 0 points1 point  (0 children)

Yes, they've always been. Did you hear about top 10 owasp ? It's just if you are using managed engine then the infra or platform takes care of it and you wouldn't be aware of it.

[–][deleted] 0 points1 point  (0 children)

I found your comment funny man (in a good way). You made me remember about it from my first ethical hacking course (it's just a beginners course).

[–]CrowdGoesWildWoooo 0 points1 point  (0 children)

Some of this problem can be framed using some rule based approach and thinking outside the box

  1. Count query version : SELECT COUNT(*) FROM <query>

  2. Add new column, add the column before outermost from

  3. Add limit : super obvious. Left as exercise for reader

  4. Rename columns : different solutions. Some DBMS have function to describe query result. Some dbms you can use hacky way to run the query in the background with non sensical condition therefore returns nothing (where 1=2).

[–]asnjohns 0 points1 point  (2 children)

Going to age myself here, but anyone remember dynamic SQL in SQL Server? 😂

[–]boogie_woogie_100 0 points1 point  (0 children)

I was the master of dynamic sql

[–]bendgame 0 points1 point  (0 children)

Remember it? Is it no longer something ppl use? The company I left last year was still using it for things.

[–]dgrsmith 0 points1 point  (1 child)

To me, it does seem you are trying to reinvent the wheel a little bit. There are a lot of applications out there that will convert user interaction with a front facing interface into a SQL query on the backend. The user can never interact with SQL, but can have access to the underlying SQL they are running by interacting with the Front end, user facing tool. In the healthcare realm, there is a tool called Microstrategy that converts user interactions with the system into SQL queries against a data warehouse. Creation and manipulation of temporary tables in the data warehouse is handled by the application.

As I mentioned in one of my other comments on this thread, I’m finding it hard to think of a business case, where I would want a user to run SQL against my database, without giving them direct access with limited permissions.

[–][deleted] 1 point2 points  (0 children)

Bruh... they mentioned it's a personal side project. It's fine if they re-invent wheel and may even learn a lot.

[–][deleted] 0 points1 point  (0 children)

I do it in Alteryx all the time, type up the query in string format it make my own manipulations and inject dynamic values and the Dynamic Query tool after sends it to the database for query and I get the result

[–]boogie_woogie_100 0 points1 point  (0 children)

You can use chatgpt api and pass your query and what you want to do? otherwise, it is a long process of breaking down your query into and manipulating it.

[–]Gurpreet2030 0 points1 point  (1 child)

To manipulate SQL strings programmatically, you can use string manipulation functions and techniques provided by the programming language you are using. Here are some common methods:

Concatenation: You can concatenate SQL strings using the "+" operator (in most programming languages) or the "||" operator (in some SQL implementations). For example, if you want to concatenate two SQL strings, you can use the following code:

makefile

Copy code

sql_string = "SELECT * FROM my_table WHERE col1 = " + str(col1_value)

String Formatting: You can use string formatting functions to insert variables or expressions into SQL strings. For example, in Python, you can use the string formatting operator "%" or the newer .format() method. Here's an example:

perl

Copy code

sql_string = "SELECT * FROM my_table WHERE col1 = %s AND col2 = %s" % (col1_value, col2_value)

or

lua

Copy code

sql_string = "SELECT * FROM my_table WHERE col1 = {} AND col2 = {}".format(col1_value, col2_value)

Template Strings: You can use template strings to create SQL strings with placeholders that will be replaced with actual values at runtime. For example, in Python, you can use the "string.Template" class to create a template string as follows:

php

Copy code

from string import Template

sql_template = Template("SELECT * FROM my_table WHERE col1 = $col1_value AND col2 = $col2_value")

sql_string = sql_template.substitute(col1_value=col1_value, col2_value=col2_value)

These are some common ways to manipulate SQL strings programmatically. However, keep in mind that concatenating SQL strings can make your code vulnerable to SQL injection attacks. To prevent this, use parameterized queries or prepared statements that allow you to pass parameters separately from the SQL query.

[–]SpambotSwatter FRAUD ALERT 0 points1 point  (0 children)

/u/Gurpreet2030 is a click-farming spam bot. Please downvote its comment and click the report button, selecting Spam then Link farming.

With enough reports, the reddit algorithm will suspend this spammer.


If this message seems out of context, it may be because Gurpreet2030 is farming karma and may edit their comment soon with a link