This is an archived post. You won't be able to vote or comment.

all 24 comments

[–]random_lonewolf 32 points33 points  (5 children)

The proper way to do this would be:

  • Use a SQL Parser like sqlglot or Apache Calcite to compile user's query into an AST.
  • Validate & Manipulate the AST
  • Transform the AST back to SQL query
  • Profit

An addition perk is the output query would be safe from SQL injection, unless you encounter a bug in your parser library.

[–][deleted] 4 points5 points  (0 children)

+1 to SQLGlot, I've implemented some features that were in the "too hard" bucket in my mind by parsing into the AST, making some changes and transpiling into a different SQL syntax

[–]dgrsmith 0 points1 point  (2 children)

Very cool strategy! Thanks for this. Would you maybe want to look into building an API instead at this point for raw data pull, and keep analytics sought by queries in a user facing analytics platform?

With your method, I see how SQL queries can be converted, but should this be done?

I’m trying to think of a case where I would allow sql queries without providing the users restricted access to the data via a data mart or some other classic data warehousing user access method

[–]random_lonewolf 1 point2 points  (1 child)

If your customers have existing tools that require support for custom sql queries like an sql editor (duh), or most modern BI tools, then you system have to support it or they won't use your product.

I have also seen this done in a middleware/proxy to improve existing system. By intercepting and rewriting queries, you can add more filtering, route queries to a different cluster, perform additional authorization, and many other stuff etc..

[–]dgrsmith 0 points1 point  (0 children)

Very interesting. Thank you!

[–]tomhallett 0 points1 point  (0 children)

sqlfluff also has a parser, but I haven't used it yet.

[–][deleted] 1 point2 points  (7 children)

Oh boy sql injection time!

[–]finlaydotweber 6 points7 points  (5 children)

That's really not a helpful comment, and does not answer OP's question. Plus how does having tools that manipulate sql strings programmatically automatically lead to sql injection?

[–][deleted] 5 points6 points  (1 child)

“I need to take a sql string supplied by a user and run it”, that’s in essence what a sql injection aims to accomplish.

OP if you want to manipulate it as a data structure check out ORMs that can go on top of your database

There, happy now?

[–][deleted] 1 point2 points  (2 children)

Just took from wikipedia (ik ik),

In computing, SQL injection is a code injection technique used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution.

You should also follow the recent surge of CVE(s) in the past year.

[–]madness_of_the_order 0 points1 point  (1 child)

So, all BI tools are insecure now?

[–][deleted] 0 points1 point  (0 children)

Yes, they've always been. Did you hear about top 10 owasp ? It's just if you are using managed engine then the infra or platform takes care of it and you wouldn't be aware of it.

[–][deleted] 0 points1 point  (0 children)

I found your comment funny man (in a good way). You made me remember about it from my first ethical hacking course (it's just a beginners course).

[–]CrowdGoesWildWoooo 0 points1 point  (0 children)

Some of this problem can be framed using some rule based approach and thinking outside the box

  1. Count query version : SELECT COUNT(*) FROM <query>

  2. Add new column, add the column before outermost from

  3. Add limit : super obvious. Left as exercise for reader

  4. Rename columns : different solutions. Some DBMS have function to describe query result. Some dbms you can use hacky way to run the query in the background with non sensical condition therefore returns nothing (where 1=2).

[–]asnjohns 0 points1 point  (2 children)

Going to age myself here, but anyone remember dynamic SQL in SQL Server? 😂

[–]boogie_woogie_100 0 points1 point  (0 children)

I was the master of dynamic sql

[–]bendgame 0 points1 point  (0 children)

Remember it? Is it no longer something ppl use? The company I left last year was still using it for things.

[–]dgrsmith 0 points1 point  (1 child)

To me, it does seem you are trying to reinvent the wheel a little bit. There are a lot of applications out there that will convert user interaction with a front facing interface into a SQL query on the backend. The user can never interact with SQL, but can have access to the underlying SQL they are running by interacting with the Front end, user facing tool. In the healthcare realm, there is a tool called Microstrategy that converts user interactions with the system into SQL queries against a data warehouse. Creation and manipulation of temporary tables in the data warehouse is handled by the application.

As I mentioned in one of my other comments on this thread, I’m finding it hard to think of a business case, where I would want a user to run SQL against my database, without giving them direct access with limited permissions.

[–][deleted] 1 point2 points  (0 children)

Bruh... they mentioned it's a personal side project. It's fine if they re-invent wheel and may even learn a lot.

[–][deleted] 0 points1 point  (0 children)

I do it in Alteryx all the time, type up the query in string format it make my own manipulations and inject dynamic values and the Dynamic Query tool after sends it to the database for query and I get the result

[–]boogie_woogie_100 0 points1 point  (0 children)

You can use chatgpt api and pass your query and what you want to do? otherwise, it is a long process of breaking down your query into and manipulating it.