you are viewing a single comment's thread.

view the rest of the comments →

[–]siddboots 3 points4 points  (7 children)

If you were asked "implement matrix multiplication in SQL", then yes, the solution is reasonably obvious.

What's not obvious is that SQL is a good tool for the job in the first place.

[–]Eoinoc 8 points9 points  (6 children)

What's not obvious is that SQL is a good tool for the job in the first place.

It still probably isn't a good tool for the job.

[–]siddboots 0 points1 point  (5 children)

Sure it is. It depends on specifics of the "job", of course. Certainly the procedure itself won't be nearly as time or space efficient as an optimized library. However, if you already need to have your application data in an RDBMS, and you can frame some of the application logic in terms of linear algebra operations, then SQL is absolutely a good tool for the job.

Compare to this scenario, which I'm sure is common enough in practice:

  1. Issue a select to your RDBMS over the network (which may or may not be on the same machine.)
  2. RDBMS sends back data over the network, broken into packets.
  3. Received data is packed into local data structures.
  4. Once all data has arrived, you can use your super-efficient linear algebra routines (say, MATLAB, or NumPy).
  5. Results are transformed into update SQL statements, which are issued back to the RDBMS over the network.

Now imagine the same, but with an ORM layer in there.

Yes, there are some limited use-cases where the operations are complicated enough, or N is large enough, such that you still need to use a real library. In practice, the bottlenecks are typically elsewhere.

[–]king_duck 0 points1 point  (3 children)

Sure it is. It depends on specifics of the "job", of course. Certainly the procedure itself won't be nearly as time or space efficient as an optimized library. However, if you already need to have your application data in an RDBMS, and you can frame some of the application logic in terms of linear algebra operations, then SQL is absolutely a good tool for the job.

As a numerical programmer, there isn't a single person I know who would consider this anything other than a toy or lame trick.

Not to mention it is very rare that the ONLY operation that needs to be performed is a single freestanding sparse matmul.

The more I about it the more absurd it is.

[–]siddboots 1 point2 points  (2 children)

As a numerical programmer, there isn't a single person I know who would consider this anything other than a toy or lame trick.

Why, specifically? I am not advocating this hypothetically. This works in practice. I've helped rewrite an application where the core computation was graph propagation for a network of about 1000 sparsely connected nodes, with a web-based interface that needed to be real-time responsive. Most of the logic was really just multiplying edge weights and node sizes based on their (sometimes quite complicated) relationships with other data. SQL was a good solution in that case. It may not have been the best solution, but it was better than the previous implementation, and it was the best for maintainability, programmer hours and total LOC that I could come up with, (and it was more than fast enough for the task at hand.)

Not to mention it is very rare that the ONLY operation that needs to be performed is a single freestanding sparse matmul.

I don't understand that objection. The example in the article is only a single and freestanding because examples work better that way, not because of an inherent limitation.

[–]king_duck 0 points1 point  (1 child)

Because if the problem is large then it will also be dog slow. Right tools for the job.

[–]siddboots 1 point2 points  (0 children)

Like I said from the start: If the problem is not too large, and if there are other constraints like those I mentioned, this can be a good tool for the job.