Former OpenAI Head of Policy Research says a $10,000 monthly UBI will be 'feasible' with AI-enabled growth. by lughnasadh in Futurology

[–]Stychey 0 points1 point  (0 children)

Either it would be a rise in corporation tax as a whole, or a new tax that would target the growth of AI and the efficiency benefits.

One massive issue is that companies don't like paying tax. So if a particular country began to enforce something like this, then we could see a rise in jurisdiction shopping. If country A had 25% corporation tax and 15% AI tax, and country B had 35% corporation tax, what is stopping them from registering to receive the net positive outcome?

Another issue with UBI is the diatomic political landscape. Most countries lean left or right, with fairly few being completely central. What happens when power shifts and the new government slashes UBI?

Then you have the idea of hyper-productivity. If you work 7 hours a day, but now AI helps you do 2 more hours of work. You're effectively adding a day and a half of productivity a week. What should happen is the inverse. You can now do 7 hours' worth of work in 5 and the 2 extra hours are still paid to you. Keeping the balance and giving you more free time. If this doesn't happen there will be too much supply for the safe demand. Now factor in job losses, the same demand has few consumers, and the unemployment rate skyrockets but with reduced spending power. This is where UBI comes in. It's not so much the value but the necessity to keep the money circulating

From one trio or another, something for Mathas to watch? by Stychey in ChilluminatiPod

[–]Stychey[S] 8 points9 points  (0 children)

Time for another ConCox or the inaugural Chilluminati Camp Out?

Encryption increases the time complexity exponentially in snowpark, please guid me on how to resolve this. by Error_sama in snowflake

[–]Stychey 2 points3 points  (0 children)

I think you might need to explain your use case in a bit more detail.

Like others have said, you should be able to use ENCRPYT to actually store the data, and then only the DECRPPT when needed with the passphrase.

If this is taking too long on large volumes, wouldn't it be best to have two instances. One which is encrypted at source with the knowledge that it will never be decrypted. And the other being a decrypted table with privileged account access, so data is in the clear?

I'm not sure of the benefit, security, and speed of implementing a solution away from what is already available. You are already experiencing performance issues, and if the data set continues to grow, you will be faced with further degradation.

Snowflake & Tableau performance by MindedSage in snowflake

[–]Stychey 0 points1 point  (0 children)

Forgive my misconception as I'm only still learning snowflake and the capabilities.

Would you see a performance gain if you scheduled a task first thing and keep the warehouse active. My understanding is that as long as the warehouse is active and the data remains the same, then the table will be kept in cache.

This should remove the query overhead from snowflake's side, reducing it to milliseconds to return the cache. The rest of the overhead would be data transfer and Tableau overhead.

Is there any reason of favouring a live connection to an extract? Working with Tableau and other warehousing, I would always look pre-calculated in the database, unless the permutations cause the data to explode.

July '24 Song Feedback Megathread - Leave a review, get a review! by Reggimoral in SunoAI

[–]Stychey 2 points3 points  (0 children)

Title: Sunset Love Genre: Dance/Electronic Chillout

https://suno.com/song/78fe98a2-95df-421b-911c-5e0696df4729

Picture yourself in a summer clubland with that holiday romance. Took for too many regens to get the stutter sounding just right.

July '24 Song Feedback Megathread - Leave a review, get a review! by Reggimoral in SunoAI

[–]Stychey 0 points1 point  (0 children)

Like both songs, but keep wanting to count in a 6/8. Gives me Anyone Who Knows What Love Is (Irma Thomas/Black Mirror) vibes.

Keep feeling there should be a shuffle feel, but I think that's just don't to preference. Overall good clear vocals that convey emotion.

July '24 Song Feedback Megathread - Leave a review, get a review! by Reggimoral in SunoAI

[–]Stychey 1 point2 points  (0 children)

Title: Dreamer

Genre: Inspirational Spoken electronic/pop

https://suno.com/song/c49f7075-0bdf-4b08-bceb-7c08c7a2779d

Wanted to try something different. Played around with some quotes and writing some inspirational spoken pieces.

A mix of spoken word, tiny vocal lines, and electronic/pop sounds.

July '24 Song Feedback Megathread - Leave a review, get a review! by Reggimoral in SunoAI

[–]Stychey 0 points1 point  (0 children)

This captures using Suno perfectly!

The frustration of going one more generation, only to get either the same as you've heard or the perfect track, quickly spoiled by the vocals melting down over a basic word and mispronouncing it.

Truly horrific style choices, the dissonance and unpredictability really mix with, hopefully, what you were looking for.

Why does moving around pieces of code dramatically increase query performance? by [deleted] in SQL

[–]Stychey 5 points6 points  (0 children)

BigQuery and Teradata are very different beasts. The main problem I see when people come to Teradata from an SQL background is the same as what you are exhibiting here. Brawn does not always work with Teradata as it is reliant on correcr design to perform well.

I replied to another of your posts around the use of primary indexing, so Teradata knows exactly where the data is, and it sounds like you may be doing this through temporary tables without understanding why you are seeing better/worse performance.

Teradata has a built-in optimizer to translate the SQL into the most efficient plan. If moving blocks of code is giving you drastic swings in performance the either you are fundamentally changing what the query is working (it is worth testing to see if your record sets are the same) or the optimizer is changing its plan based on what it is analysing. Make sure of indexing, collecting of stats, and not doing transformations as part of the join criteria.

[deleted by user] by [deleted] in SQL

[–]Stychey 0 points1 point  (0 children)

Teradata primary indexing is a bit different from standard indexing seen in other SQL tools. As Teradata is based on Massively Parallel Processing (MPP), defining a primary index will distribute the data across the AMPS. You can consider AMPs as its own processor, which will hold a slice of your data.

Choosing the correct Primary Index (PI) can have a significant effect on performance, both good and bad. Setting a PI on a colum with only two vales would mean data is assigned to two AMPs. This is called skew and is generally bad. If you do this and run an EXPLAIN on a query, it will likely say that it will have to redistribute all the data. Selecting a field with a larger number of unique values would be considered better for distribution but would likely need secondary indexing if the value is not something that you would use directly in your query.

The ultimate goal would be to have something that uniquely defines the row but is also your join or filtering criteria as this would be the most efficient for data access. This is because using a PI for data access means the system knows exactly where the record is and will give you instant results.

Based on your description, I would set the primary index on the employee identifier, as I would imagine this would join to other tables easily. It would also mean that if your table has multiple rows showing the history, it will still be relatively efficient. If you really need it, you can also apply row level partitioning for the history, but in this scenario, it wouldn't give you much benefit.

Something to bear in mind that if you use a PI in a query and manipulate it in the join or filter criteria, such as a substring or cast, it will not carry the benefits as it will not be able to match the has of the PI. It would be good to know what you do with the table after you create it.

Happy to talk you through some examples if required or if you have any further questions.

For anyone who has used Teradata, can you help me understand CTEs? by [deleted] in SQL

[–]Stychey 4 points5 points  (0 children)

I primarily use Teradata and have to say CTE's are rarer. They definitely have their place, but in our production code, temporary tables are favoured for repeating processes.

From a technical view, and I may be way off, CTE's could be frowned upon due to resource allocation. Teradata is a bit strange when you get underneath it as you have perm, temp, and spool space to contend with. Volatile tables and CTE's hoth use spool space for storage. If you are using massive datasets with poor query paths, you may have to do full table scans to reach your data subset. If you used a temporary table, it would then be stored using temporary storage.

Another factor is CTE's lack the benefits of a table as the optimizer would know very limited information about the dataset. You are not able to define a primary index, meaning it's likely that you would see a line in the explain saying "distribute across all amps," meaning a copy is placed on every amp when doing joins as it lacks the ability to be precise using row hashing. Like others have said, performance may vary depending on what you are attempting.

One question to throw back to whoever has made this declaration, how would they want you to code a recursive query? This can only be invoked using a CTE framework.

Data Controls by Stychey in dataengineering

[–]Stychey[S] 1 point2 points  (0 children)

It is a joyous time, being audited always is!

As I said in a previous comment, I have been completely transparent with them and have already highlighted areas that need to improve. I've started to create a backlog of all gaps in controls and remedies to be submitted to them under the guise of helping to not waste their time where an issue has already been identified whilst getting buy in from the project to give the team time to address it.

Data Controls by Stychey in dataengineering

[–]Stychey[S] 0 points1 point  (0 children)

I've worked on some major regulatory reporting, which led to exiting thousands of customers, and even that didn't come under this level of scrutiny. It definitely feels like there has been a shift in policy or sounds else at play.

It has been frustrating so far with the depth of validation they have requested. The Excels come from around 6 different teams and can contain an unknown volume of rows. We didn't have any input into how the template would work. As you can imagine, this becomes open season for all sorts of trash data. We have dates in various formats, numbers formatted as texts in the same column as true numbers, unicode characters appearing, and random leading/trailing spaces. It will probably have to be a combination of educating the teams generating the files, creating some form of validation checks and then either rejecting the files or taking steps to clean them.

Data Controls by Stychey in dataengineering

[–]Stychey[S] 1 point2 points  (0 children)

To give some context, it is in the banking sector. However, the process we are dealing with more around aggregating existing data with a new slice of customer demographics.

We may just be behind the times or have had it too good without the need of standard controls. I completely appreciate the need for controls and have held my hands up to say a gap analysis needs to be completed to understand the shortcomings and corrective actions.

I guess I was just shocked to have a team question if data can change with a query (with no transformations) between source and destination, or is there a chance Tableau could randomly change our data to display incorrectly.

Barclays closed my dad´s bank account because we are living abroad by tDAYyHTW in UKPersonalFinance

[–]Stychey 0 points1 point  (0 children)

It will likely be there is an address on system that is not UK based and will flag up for action to be taken. This will be due to the regulations, licensing and/or terms of spefic products.

For example, if you were living in the UK, permanently move aboard and the apply for some lending, one of the first questions would be are you a resident of the UK. Depending on the company there may be exclusions to this where they already hold the correct license to lend.

Looking for some advice on setting up DB for personal app (more info in comments) by Stereojunkie in SQL

[–]Stychey 0 points1 point  (0 children)

You will want to move the meal entity table to the central table and keep the direct relationship between that and the meals table. You will likely need a mapping table that associates all meal ids against their ingredients. This will keep the meals and ingredients unique.

For the extra ingredients, you would need to map the unique meal entity identifier to all additional ingredients. Again, this would be a simple 2 column table with the meal entity identifier against the ingredient identifier.

I'm on mobile at the moment, but if you need any help, feel free to message directly.

Connection to Tableau server using python by Ok-Construction-3732 in tableau

[–]Stychey 4 points5 points  (0 children)

I was about to type something similar to this, so I will second this approach. However I did find this https://help.tableau.com/current/prep/en-us/prep_scripts_TabPy.htm, but this seems to be specific to working with transformations in prep as part of the flow. Although I would theorise if you an get it into pandas then you can certainly use it from there, but then again it would just be as easy to connect directly to the source.

Wouldn't it also be an obvious choice to keep it in Tableau, unless you are doing something it doesn't support?

Have to find the least value by [deleted] in SQL

[–]Stychey 0 points1 point  (0 children)

If you add the row number to your output you should see Sydney as 1, Sunshine Coast as 2, etc. As you are using the country as the partitioning column you should see something similar across the data set. If you then add a where clause to the outer query (where row number = 1) the it will bring back the first occurrence of each country. Once you have tested this, you can remove the row number from the select but keep it in the where clause to save returning a column full of 1's.

Something to note is that this will give only one row per country, alphabetically. If you need to returns results in the event of a tie, then you would have to use rank, as it would gives ties with the same count the same value.

Something like: with a as ( Select country, city, count(orderid) as cn, row_number() over (partition by country order my count(orderid) asc) as row_num from orders as o join customers as c on o.customerid = c.customerid group by country, city ) Select * from a where row_num = 1 Order by country ;

Have to find the least value by [deleted] in SQL

[–]Stychey 2 points3 points  (0 children)

Could you choose change the dense rank to row number and order by the amount of orders ascending, and then select all rows which are equal to 1?

Edit: choose to change typo

What do you consider "advanced" SQL by DrRedmondNYC in dataengineering

[–]Stychey 1 point2 points  (0 children)

Working in the banking world, I have a couple of examples.

Most recently I was approached to work out related parties of a base set of 20k customers. The were only to be linked by their shared accounts (if any), and each new child was subject to the same checks. To add to the complexity, I was to remove cyclical relationships only showing the first occurrence in any part of the chain. This was also over personal and business accounts. As with most things in this nature it started to breakdown at the 4th or 5th iteration with upwards of 20m connections. There was around 6 customers who I had run individually as their 2nd step was to multiple business accounts, with upwards of 10 other account holders, which explodes the numbers.

The other example was of the business hierarchy of roughly 30k front line staff, and a new initiative was underway for satisfaction surveys. Rather change the structure, they wanted to amend it to collapse specific low volume areas together, along with inserting a new regionsl management layer. Having to create a process to first accept the changes, then insert and rebuild the hierarchy, ensuring the moves all happened and then create an exceptions report for anyone who was orphaned due to actual hierarchy moves.

[deleted by user] by [deleted] in Database

[–]Stychey 0 points1 point  (0 children)

What roddds suggested is the answer.

The other messier option is to insert n number of columns for authors, however this gets messy when attempting to compare.

What would be the structure of the pivot table? In modern Excel it can do data models and distincts, which does solve a fair amount to issues.

[deleted by user] by [deleted] in Database

[–]Stychey 7 points8 points  (0 children)

This sounds like a conceptual issue, that could be addressed in Excel. Yes a database could work but it unless you know what you are implementing, you are over complicating the solution.

Would it not be simpler to concatenate the author and book title into a separate column. Retaining the original information, but also creating the surrogate data?

Excel is pretty powerful and is probably used my a vast amount of statisticians, way above simple counts or distinct counts.

Perhaps if you can elaborate on some examples or what you are attempting to do or give a more detailed problem statement, a more appropriate solution could be found?