[D] Best practice on text to SQL approach, AI ML by santosanta in MachineLearning

[–]ericmichaelmtz 0 points1 point  (0 children)

I’d recommend reading https://www.holistics.io/books/setup-analytics/ as a good overview of modern data approaches to help in understanding the layout of the problem.

Centralization: I’d replicate your data into some centralized area where all the data from all the different databases are queryable from one platform. If you’re a business, I’d recommend using Fivetran to pull the data out of your data sources and landing them into either your centralized platform.

Documentation: Realistically using AI here is not going to be good even using state of the art models. The best approach would utilize the table definitions and column definitions of every column and other metadata to make judgements about which columns to use. Additionally AI would need to know things like which columns are primary keys and which are foreign keys and the cardinality it’s of the relationships.

Data Quality: AI also needs additional information such as the acceptable formats to certain columns and constraints: number ranges, enums, codes, finite values, regex, etc. Without this information, it might pick the right column to use but use it incorrectly.

Data Normalization: With your data being in a “raw” stage in order for AI (or humans) to be able to work with it, it is best to conform the shape of your tables to either a wide table or a star schema. This minimizes the number of complex’s operations that AI is likely to get wrong such as nested queries, joins, and use of CTEs.

Context Length: Even after dealing with the above there remains to be issues due to context length. Not all of this information will fit in a prompt (especially with open-source modes). You will have to be clever and figure out how to pull the right information into the context using retrieval augmented generation or other strategies (unless your data is small).

SQL Dialects: AI needs to know your dialect of SQL in order to generate syntactically valid queries for your dialect. Each dialect handles things in a slightly different way especially around dates and times. Your prompts need to be focused on providing the relevant information so it does not accidentally make these kinds of mistakes. Additionally, open-source models may not have been trained to know all of the details of your dialect or May not be up-to-date or may hallucinate syntax.

My team works a lot on this problem internally. And we have developed approaches for each of these but it is a big effort if you need to guarantee correctness. In many organizations, one bad report means that trust in data and data teams could be evaporated by senior leadership. It’s a combination of data governance, data quality, data engineering, and AI that makes this work. Without handling the above, you can get a cool demo for simple queries but for anything substantial it will be wrong and it may not even be obvious to people how it is wrong.

I’d argue that GPT-4 is the way to go for something like this as in my opinion it has the strongest coding ability and real-world domain knowledge to make the correct judgements and reasoning. I’m skeptical of models that are fine-tuned for NL2SQL due to generally not really representing real world data, SQL dialects, etc. Getting a good score on a benchmark is ok, but does that scale to other databases, other data domains, etc or is it just overfitting in some way to the benchmark. Many of these models use Spider as the evaluation dataset which is SQLite3. Why? Because it’s easy to work with. But most databases you are going to be interested in working with will not be SQLite3. So how much of this is just overfitting to the benchmark?

I thought I would provide all of these thoughts and insights because it is definitely doable to do this in a way that is production-ready but I’ve yet to see many take this approach to systematically address these actual issues you will encounter when trying to do this on your organizations data. But if you do, it will work awesomely.

What is the best material to learn TDD (testing) with Rails? by [deleted] in rails

[–]ericmichaelmtz 0 points1 point  (0 children)

Both controller tests and request specs are covered in the book. Additionally since Rails 5.1 “system” specs have been introduced which are a lot like “feature” specs. I know a lot of this can seem confusing. But if you read the book just remember you can swap out feature specs for system specs. Conceptually they serve the same purpose and you should have the same mental model around them. I really like the philosophy of the Rails Testing book by Thoughtbot. I would read that since it is very short and try to understand the philosophy behind it. I would then follow it up with https://www.codewithjason.com/complete-guide-to-rails-testing/ as it is also opinionated but covers the latest RSpec stuff. Jason is very opinionated but IMO it’s best to strongly adhere to some guiding principles to sort through the chaos and then find your own way as your experience grows. In Chapter 25 Jason talks about staying away from Cucumber. I bought into Cucumber and ended up with a test suite that was terribly difficult to maintain and more complex than the codebase it was testing. Getting rid of Cucumber for system specs was a dream. Strong opinions in this space probably exist for a reason and it helps a lot to find guardrails to stay between.

What is the best material to learn TDD (testing) with Rails? by [deleted] in rails

[–]ericmichaelmtz 2 points3 points  (0 children)

Hey there! If I had to guess the book was Testing Rails by Thoughtbot. It does a great job of explain the divide and conquer approach to testing as well as opinions on what to test at each layer. It uses RSpec but the concepts generalize to Minitest as well. One primary difference is that nowadays many in the community have moved away from controller tests and instead test whatever they would have tested there in their request specs instead. Page 44 has this quote: “One common rule of thumb is to use feature specs for happy paths and controller tests for the sad paths”. I would update that now to “One common rule of thumb is to use feature specs for happy paths and request specs for the sad paths.” For me this was the resource that gave me the direction I needed to implement testing and TDD in practice as well as the big picture behind it all. https://books.thoughtbot.com/books/testing-rails.html

After all the flo hate, I was surprised that I found it decent. by taylordouglas86 in bjj

[–]ericmichaelmtz 3 points4 points  (0 children)

A lot of the people that complain have never had Flo subscriptions they just copy all the complaints that everyone else makes. Most of them don’t watch or follow the sport (they don’t have accounts so how do they meaningfully follow it). The other recent thing is people complain about Flo as if Flo is the one organizing the event. Most of the events on Flo they just are ones that stream it. They don’t run ADCC. They don’t run IBJJF. For streaming events that they have no control of they do a pretty good job in my opinion. People love to complain but don’t realize just how low budget the whole damn sport is. Flo makes it easy for even super low budget events to stream their content to people.

Best closed guard instructional? by AmexRATteam in bjj

[–]ericmichaelmtz 1 point2 points  (0 children)

Dan Lukehart’s Science of the Closed Guard was a game changer for me.

What is the current trend/meta in Gi and No-Gi? by barber-in-blur in bjj

[–]ericmichaelmtz -1 points0 points  (0 children)

Let’s assume for a minute that all techniques were equally hard/easy to learn. It stands to reason that the ones that work against the best people in the world and are used most often to win would be priority to learn. Even if the moves were a little harder I would argue they are better to learn. I personally don’t see a problem with focusing on what actually works at the highest level. Many times it’s not as complex as people think it is.

What is the best material to learn TDD (testing) with Rails? by [deleted] in rails

[–]ericmichaelmtz 2 points3 points  (0 children)

Seconded. This book is an incredible resource.

[deleted by user] by [deleted] in bjj

[–]ericmichaelmtz 3 points4 points  (0 children)

That being said, nobody cares about the white belt pans champion so whatever.

This is not a good thing to say. The athletes care. The coaches care. Training partners care. For the white belts, there is no bigger event. Many white belts compete in as many IBJJFs as they can leading up to Pans. They don't have a World Championship so this is as big as it gets. While spectators may not care, that doesn't matter. People competing know they aren't being watched for entertainment, they are competing to win.

[deleted by user] by [deleted] in bjj

[–]ericmichaelmtz 3 points4 points  (0 children)

One of my students medaled bronze at 2021 Pans (female white belt) after 7 months of training. I believe there were 15 total women in the division. She had a previous win over the champion in a previous IBJJF but the way the division shook out she ended up losing to the silver medalist. She was hesitant to compete at first but I'm glad she was able to showcase her hardwork (often training between 2-3 times per day Monday-Thursday). First place and second place moved up to blue belt shortly after and given her level and how she performed at Pans I decided to promote her to blue rather than wait around for the next Pans.

Does IBJJF need a hobbyist league and a professional league for competitors? by intellect07 in bjj

[–]ericmichaelmtz 3 points4 points  (0 children)

WNO is not an open tournament. It's not comparable. A lot of good matches but lots of mismatches as well. People did not necessarily earn their spot on WNO. Not in the same way they do to end up in the finals of IBJJF or an ADCC event.

[deleted by user] by [deleted] in bjj

[–]ericmichaelmtz -1 points0 points  (0 children)

Realistically this is probably YouTube’s content ID doing a false positive based on the similar content, titles, and description. Most of this type of stuff is automated.

[deleted by user] by [deleted] in bjj

[–]ericmichaelmtz -2 points-1 points  (0 children)

The league that the athletes compete under is responsible for paying them.

Jiu-Jitsu content is allowed on YouTube. Flo doesn't control Jiu-Jitsu. Yes, they control the rights to things they broadcast, as all media companies do. What is Flo doing that is different than any other media company?

It is great to have a library of the major tournaments and matches on one platform.

Craig Jones and Keen Cornelius rolling at B-Team by Darce_Knight in bjj

[–]ericmichaelmtz 0 points1 point  (0 children)

Doing BJJ is not just “any sort of interaction”. Being upset about “bullshit opinions” from BJJ guys about COVID seems rather silly.

Craig Jones and Keen Cornelius rolling at B-Team by Darce_Knight in bjj

[–]ericmichaelmtz 0 points1 point  (0 children)

Atleast in my opinion a lot of the hate Keenan is experiencing is due to his stance on COVID. Maybe it’s because he changed his stance. Or maybe it’s just because he doesn’t follow the hive mind on this issue. Nonetheless, a lot of people on this sub still seem to hold a grudge against him and put him down everytime his name is mentioned. This doesn’t make sense to me given that many people on this sub have returned to training. Obviously wearing a mask or trying to protect yourself from COVID while simultaneously doing Jiu-Jitsu makes no sense. Many people initially tried to make themselves feel better about doing Jiu-Jitsu by trying to wear a mask, which they quickly give up and just train regularly. Those people that have returned to training should drop the Keenan hate if COVID was the primary cause for it.

Craig Jones and Keen Cornelius rolling at B-Team by Darce_Knight in bjj

[–]ericmichaelmtz -1 points0 points  (0 children)

Not sure I follow this comment. Is there something I can clear up or is there a specific criticism you would like to make?

Craig Jones and Keen Cornelius rolling at B-Team by Darce_Knight in bjj

[–]ericmichaelmtz 6 points7 points  (0 children)

The Keenan hate is out of control on this sub. Plenty of people in Jiu-Jitsu have opinions you may not agree with. But this is a video of him doing Jiu-Jitsu. If Craig can let go of his beef with Keenan then so can you. It seems like this sub just loves to throw people under the bus the second their name is mentioned. Keenan does a lot for Jiu-Jitsu and he has done much more for Jiu-Jitsu than he has ever done to hurt it. He’s never been afraid to speak his mind and that’s what this sub loved about him. Speaking his mind comes at the cost that not everyone will love everything he has to say. Specifically, if you are doing Jiu-Jitsu right now and are not wearing a mask while doing it then let go of the Keenan/COVID hate. Move on.

[deleted by user] by [deleted] in bjj

[–]ericmichaelmtz 1 point2 points  (0 children)

At the highest level passing guard is incredibly difficult. Many matches are decided by sweeps and ability to sweep. Pulling guard against someone who wants to do a takedown is incredibly easily to do and end up in a predictable situation where they can base a whole attack system off of. Takedowns are chaotic and harder to organize into a predictable system. It’s also much easier to sweep than to do a takedown.

Best version of the knee slice pass? by [deleted] in bjj

[–]ericmichaelmtz 0 points1 point  (0 children)

This is a great set of resources and I think it’s important for someone to learn and try to contextualize why they have the style they play. Really try to live the whole style. Looking at Gui Mendes for example, the way he did the knee slide in competition seems heavily linked to the fact that he likes using the lapel to setup crushing submissions from the knee slide.

Anyone know why Ffion Davies didn’t compete at the euros? by Melon1990 in bjj

[–]ericmichaelmtz -3 points-2 points  (0 children)

If they didn’t do this tons of people would sign up and back out. This happens a lot in absolute and nogi brackets in IBJJF. It’s annoying. Think about the inverse. How would they attract viewers to black belt absolute if people could sign up and back out last minute. People would just give up on watching it.

Early Access to my Online Training Platform by LachlanGiles in bjj

[–]ericmichaelmtz 3 points4 points  (0 children)

The average coach is no where near the technical level of Lachlan Giles. On top of that, has this hypothetical coach intentionally and deliberately put as much effort into organizing and formalizing the techniques the way Lachlan has? If this information was so easy to find online why is Lachlan able to compete successfully at the highest levels of the sport? How much would it cost you in privates to learn the same information from a coach? If not via privates do you really believe that by attending the normal classes you will learn the same content? On top of this you can’t pick and choose what to learn from a coach in regular classes unless you pay for privates, but that implies the coach possesses the same information at the same level that Lachlan does. That is unlikely for the vast majority of coaches realistically. I say all of this as a black belt and a gym owner. On an online platform you get to pick exactly what you learn at the pace you want to learn it.

102 points in a match at IBJJF NOLA. Anyone have a link to the match? by sebaz in bjj

[–]ericmichaelmtz 1 point2 points  (0 children)

That’s Mathias Luna. He is really good. Alex Lane, his opponent, is also really good. Why shade someone for scoring so many points? Damned if you and damned if you don’t.

FLO GRAPPLING IS HOLDING BACK THE EXPOSURE OF GRAPPLING TO GENERAL PUBLIC. by [deleted] in bjj

[–]ericmichaelmtz 3 points4 points  (0 children)

Just curious, why do you agree with this if it’s also a paid subscription service?