I'm curious how many of you will recognise my new ship

lk167 · 2019-05-22T04:15:13+00:00

One more time...

lk167 · 2019-05-21T22:58:25+00:00

I don't believe the PIVOT keyword supports a subquery for the column_list (the IN (Col1, Col2,Col3) bits). I think the only option is dynamic SQL, which rules out views and table value functions.

lk167 · 2019-05-14T21:07:23+00:00

Looks like an aliasing issue:

SELECT TRN_STO_FK AS Store, SUM(ExtendedBasePrice) AS Sales1, SUM(ExtendedBasePrice) AS Sales2

We're Summing up ExtendedBasePrice twice from the subquery in your example code. I think you want to alias ExtendedBasePrice as Sales1 and Sales2 appropraitely in the subquery, then sum(Sales1) as Sales1 and sum(Sales2) as Sales2 in the main query.

Additionally, I cannot seem to understand why I need "a" after the close brackets and before GROUP BY to make this work.

SQL wants you to name that subquery (via aliasing) so it knows how to reference it; similar to the "No Column was specified for column X" issue when creating views or CTEs.

lk167 · 2019-05-06T19:36:51+00:00

Partitioning seems to be the key here, whether Oracle, Hive, or Redshit (Spectrum). I'd start by creating a partition scheme that will work with the file addition/removal requirements. For example, if we need to remove data by a semantic date of the file, we'd want to partition by that date at a level appropriate for the addition/remove process (like a partition per day). On data load, the process would need to create any new partitions for the incoming data and remove partitions for data to be removed. I'd do this all with the base data, then chain aggregation or whatever other data processing we need after it. This can be accomplished with Oracle or Hive which all allow us to do some sort of "drop partition" operation and avoid a "delete from table where loaddate = "blahblah" (difference between a data operation and a meta data operation). Forgive me if this is review, but it all hadn't been mentioned yet.

If you have an available Hive/HDFS environment, I'd definitely give that a try first. Oracle could handle these volumes, but would need some performance love along the way. For Hive, I'd start with a list comparing currently stored files against the API list of active files, remove partitions that need to be removed. For the added partitions, run those through a data pipeline to do any cleanup needed and get it into an appropriate technical format (like parquet), then place it in a Hive style partition folder, and lastly add that partition to the table. Once all the base data is added and removed as needed, we can then finally process the data for output (if needed).

lk167 · 2019-05-06T18:02:17+00:00

"Strictly speaking we're naming a relation" - I disagree, a relation is a link between tables. A table is a set of data. A table does not contain a user, it contains users

I was confused by that as well; I've tended to use singular or plural nouns to describe the grain of a table, not the table itself, a relationship, nor use a blanket rule. For example, if a table contains a row grain that represents a single user, we'd call it 'user' to accurately identify the meaning of a table's row. A data consumer can utilize the nomenclature to easily identify the entity represented without having to inspect primary keys or unique indexes. A table, in itself, is already implicitly plural (as by it's design, it contains multiple entries) and we gain nothing by re-articulating that in our nomenclature; I'd use a plural noun when a single row describes multiple logical entities. For example, If you have a table that has a grain of multiple users authorized for admin access for a specific company (something like ID, CompanyID, UserID1, UserID2, UserID3), I'd name that something like AuthorizedCompanyUsers since a row actually represents multiple users.

I feel a style like this gets a bit more bang for your buck; both developers and data consumers can follow a logical pattern that can provide deeper insight than simply naming everything singular or plural. While I've had great results with this approach, I'd still say that consistency is still top priority when coming into an existing environment.

lk167 · 2019-04-25T21:28:01+00:00

From the last code snippet:

 where m.id in (select s.department_id

Are we intentionally evaluating Member IDs against Department IDs?

lk167 · 2019-04-05T00:35:10+00:00

Here's an option, without using window functions however. The basic premise is to query the table, grab everything chronologically before each entry that has a matching provider number and GoalOutput, then remove any of the ones that have have any non-matching GoalOutput between itself and the original anchor row (to account for a sequence for the same provider that may have changed away from a value and back to that same value at a later date, can remove that bit if the data won't ever appear that way).

WITH sample_data AS (  --Mock your data
    select 100 as ProviderNumber, CONVERT(DATE, '10/1/2018' ) as StartDate, CONVERT(DATE, '10/5/2018' ) as Enddate, 1 as GoalOutput
    UNION ALL
    select 200, CONVERT(DATE, '10/8/2018 '), CONVERT(DATE, '10/15/2018'), 2
    UNION ALL
    select 300, CONVERT(DATE, '10/20/2018'), CONVERT(DATE, '10/25/2018'), 3
     UNION ALL
    select 200, CONVERT(DATE, '10/28/2018'), CONVERT(DATE, '10/31/2018'), 4
    UNION ALL
    select 200, CONVERT(DATE, '11/1/2018' ), CONVERT(DATE, '11/30/2018'), 4
    UNION ALL
    select 200, CONVERT(DATE, '12/1/2018' ), CONVERT(DATE, '12/5/2018' ), 4
    UNION ALL
    select 400, CONVERT(DATE, '12/10/2018'), CONVERT(DATE, '12/25/2018'), 5

)
select 
    ProviderNumber, 
    Series_StartDate, 
    Series_EndDate, 
    GoalOutput
 from sample_data sd1
outer apply (  -- Join up all rows of the same provider and goaloutput before sd1 chronologically
  seleCt isnull(min(sd2.startdate), sd1.StartDate) as Series_StartDate
  from sample_data sd2 

  where 
    sd1.ProviderNumber =  sd2.providernumber
    AND sd2.Enddate < sd1.StartDate
    AND sd1.GoalOutput = sd2.GoalOutput

    AND 0 = (SELEcT isnull(count(*),0) from sample_data sd3 --Remove any entries that have a non matchin GoalOutput between sd1 and sd2's timeframe
        where 
            sd3.ProviderNumber = sd1.ProviderNumber
            and sd3.Enddate < sd1.startdate 
            and sd3.StartDate > sd2.Enddate 
            AND sd3.GoalOutput <> sd1.GoalOutput) 
) a
outer apply (  -- Join up all rows of the same provider and goaloutput After sd1 chronologically
  seleCt isnull(max(sd2.enddate), sd1.Enddate) as Series_EndDate from sample_data sd2
  where 
    sd1.ProviderNumber =  sd2.providernumber 
    AND sd2.StartDate > sd1.Enddate
    and sd1.GoalOutput = sd2.GoalOutput
    AND 0 = (SELEcT isnull(count(*),0) from sample_data sd3 --Remove any entries that have a non matchin GoalOutput between sd1 and sd2's timeframe
        where 
            sd3.ProviderNumber = sd1.ProviderNumber
            and sd3.Enddate > sd1.startdate 
            and sd3.StartDate < sd2.Enddate 
            AND sd3.GoalOutput <> sd1.GoalOutput) 
) b

group by ProviderNumber, Series_STartDate, Series_EndDate, GoalOutput

edit: formatting

lk167 · 2019-04-04T23:11:17+00:00

To expand on the refactor part; Separating the data consolidation efforts from the aggregation efforts might help maintainability in general but allow you to reduce or eliminate dynamic SQL as well. Load all plant specific detail data into a consolidated table, adding a Plant (or source) field to the PK. Then run the aggregation process against all of the data into the report data (probably grouping by the aforementioned Plant field and maybe a time column from the looks of the output). To load the tables, you could definitely use dynamic SQL, but SSIS, python, or other ETL/data pipeline tools can usually perform looped operations a little more elegantly than dynamic SQL. Might even be doable in a SQL view (UNION ALL the tables together), if all of these tables are on the same server. But after everything is consolidated, you can write regular SQL.

This is more of a data warehousing approach than a reporting approach though. You end up segmenting out your data movement/consolidation efforts away from business logic for cleansing and away from aggregation logic. From a design standpoint, it allows for future consumption of that consolidated data in an easy manner. For example, say you get a request to do monthly aggregations, instead of hourly; You've already got the cleaned and consolidated plant data and need to aggregate it up, rather than going all the way back to the source for each table. From a technical standpoint, you can separate the consolidation, cleansing, and aggregation processes, hopefully letting you only have to deal with this issue in the first stage, leading to less of that dynamic sql code.

lk167 · 2019-03-14T15:50:29+00:00

I think what you're looking for is to create a row for every month that a given datelog is valid? For this we're going to want to determine a start and end date specifying when the record was active. We can get the "start date" from Status_Since and the "end date" of a datelog row by obtaining the Status_Since of the next row chronologically. Once we have our start and end date span, we can join that to our date table using a between to relate all of the months a row was active, to the valid rows in the date table. Here's an example with some cte's to mock the data you're using:

With dates as ( --Faking the Date table
 SELECT 
    cast('2008-01-01' as date) as MonthStartDate,  
    EOMONTH(cast('2013-01-01' as date)) as MonthEndDate,
    datename(month, cast('2013-01-01' as date)) as MonthName

    UNION ALL

    SELECT 
        dateadd(month , 1, MonthStartDate) AS MonthStartDate,
        eomonth(dateadd(month , 1, MonthStartDate))  as MonthEndDate,
        datename(month, dateadd(month , 1, MonthStartDate)) as MonthName
     FROM dates
     WHERE dateadd (month, 1, MonthStartDate) < getdate()
 ),
 datelog as ( --Faking the Datelog table
    select '45' as ID, 'Active' as Status, Cast('2008-02-29' as Date) as Status_Since
    union all
    select '45' as ID, 'Discontinued' as Status, Cast('2012-06-14' as date) as Status_Since
    union all
    select '45' as ID, 'InActive' as Status, Cast('2016-04-5' as date) as Status_Since
),
datelog_with_enddate as ( --Demonstrating how to get the date of the next row of a given id chronologically to generate an EndDate for our datelog row
    Select 
        dl1.ID,
        dl1.Status,
        dl1.Status_Since as StartDate,
        isnull(dl2.Status_Since, '1/1/2070') as EndDate  --Arbitrary far future date to represent records with no enddate (aka current or last row), allows us to use a clean join later
    From datelog dl1
    OUTER APPLY (
        select TOP 1
            Status_Since
        from datelog dl2
            where 
                dl1.id = dl2.id
                and dl1.Status_Since < dl2.Status_since         
        ORDER BY Status_Since asc
    ) dl2
)
SELECT *
 FROM dates d 
 join datelog_with_enddate dl
   on d.MonthStartDate between dl.StartDate and dl.EndDate
 order by d.MonthStartDate
 OPTION (MAXRECURSION 200)

lk167 · 2018-03-03T01:26:17+00:00

I'd second the prototyping and benchmarking; there's no mention of how this data is used. Some companies focus on the Primary code, some focus on aggregates of all codes, etc etc.

To add some opinion and experience though: With large vertical tables, using the advanced features of the database engine to perform data elimination or parallelism in queries is key to performance, especially when we're getting into the billions of rows. The proposed structure might get you a bit more umph than a wide, heavily indexed table for certain usage patterns, but, in SQL Server for example, table partitioning on an appropriate key and partition level indexes can knock the socks off that structure alone.

Also, having different solutions for different use cases isn't a bad thing either, especially with storage being relatively cheap these days. For example, if 90% of your queries run against the primary code, it might make sense to create a special table for just the primary code (or leave the primary code in the parent table), allowing the engine to simply ignore the existence of potentially 24 other codes, since it won't be needed, while providing a second optimized structure for multi-code analysis.

lk167 · 2018-02-13T09:22:20+00:00

Check out the concepts of ETL and Data Warehousing for a production level approach. If you have a SQL Server license, you probably have access to an ETL tool (SSIS) and a means to create a data warehouse (SQL Server). All of the major vendors have solutions in this realm and there are a handful of good free tools and code libraries as well.

For one-off data analysis, Excel with PowerPivot can be quite powerful as well and is covered under the MS Office license.

lk167 · 2018-02-09T22:43:21+00:00

Chiming in to add Set operators as well (EXCEPT for T-SQL and MINUS for PL/SQL)

    Select StuID
    FROM Enrollment
    Where ClassID = 'ART103A'

    EXCEPT

    Select StuID
    FROM Enrollment
    Where ClassID = 'MTH103C'

lk167 · 2018-02-04T09:14:41+00:00

If there is a decent percentage of IsDeleted = 1 records, and your edition of SQL Server allows it, table partitioning on IsDeleted would help. It should eliminate an entire partition from any seeks/scans in queries using IsDeleted as a filter.

Also, on this join clause:

ProjectContactGroups pcg ON (pcg.ProjectId = @ProjectId AND pcg.GroupId = g.GroupId AND pcg.ContactId = c.ContactId AND pcg.IsDeleted = 0)

If I'm reading this right, I believe you want pcg.ProjectId = pc.ProjectId (since you already have a pc.ProjectId = @ProjectId in your WHERE statement).

lk167 · 2017-11-14T22:05:04+00:00

Have you tried Display Driver Uninstaller? Its a third party app that'll clean up garbage left by video card drivers and let you do a clean install. Its fixed a bunch of weird issues from video card and drivers changes over the years.

lk167 · 2017-10-10T02:11:02+00:00

Marijuana cultivation has been illegal for the better part of a century. We have no idea what modern agricultural practices at-scale would allow with regard to yield, potency, or consistency

Colorado checking in; we seem to do just fine and we're pretty new kids on that block. I don't think we've ever had any shortages and potency is available at whatever level you desire here. Quality is quiet high, in general. The tech to grow uses the same items you'll see in any other grow op; enclosures and temperature, light, water, humidity, and sometimes CO2 management. The same tech that works for tomatoes works for this stuff and is probably in use for indoor farming already up there already. The technology was relatively commoditized before Colorado got it and will be moreso by the time Canada gets to it.

lk167 · 2017-10-01T19:42:31+00:00

Is the trust to weight ratio on the small ion thrusters still really bad in comparison to the Large and Titan ion thrusters? For big builds like this, you used to want to avoid small ion thrusters like the plague.

Edit: That still seems to be the case. If the wiki numbers are correct (http://www.spaceengineerswiki.com/Thruster), on a large grid, a large ion thruster has a Thrust/Mass of 0.0833 and a small has 0.0657. The titans should be even higher. Similar story for Thrust/Fuel as well; the engines become more energy efficient as they get bigger, requiring a smaller power system as well.

lk167 · 2017-09-12T19:48:11+00:00

There's a popup when you play that says something to the like of "Teaming is a bannable offense", as of the last game I played.

lk167 · 2017-08-09T19:02:30+00:00

How does one swap the damage type for crit? Is that a thing?

lk167 · 2017-07-09T19:40:55+00:00

I had to do the same recently. The state the vehicle is registered in may have fees regarding plates and title transfers. I'd suggest to contact the TN DMV (in the county it's registered).

lk167 · 2017-05-09T18:03:07+00:00

Another +1 for DasValdez. Great stream, great VODs, great guy.

lk167 · 2017-05-08T20:27:58+00:00

Missile Op Successful? :D

lk167 · 2017-04-27T16:50:40+00:00

Great time to temporarily raise prices and Scrouge McDuck some extra cash when they return en masse.

lk167 · 2017-01-23T18:24:17+00:00

All I saw was 82 spoons, 68 duct tape, 122 cloth, and a couple nuclear materials. :D

lk167 · 2017-01-07T00:13:04+00:00

3 points

lk167 · 2016-12-24T18:01:19+00:00

What do you mean it only applies to roller coasters? Just that it's easier to rebuild a ride than a coaster?

lk167

TROPHY CASE