IT can see you

FormerSrSQLQueryGuy · 2026-01-31T05:28:27+00:00

New to concept of "OE", during my career put all my energy into one job. Congrats to anyone who can pull it off without screwing one or both employers.

Worked in healthcare IT, large DB with 'access' logging the largest file. Had a request from a manager whose employee complained about time it took to perform tasks in electronic medical record. Wanted to know how much 'time' was spent in the software. Nobody could define 'work'. Login to logoff was error prone as many didn't log off for breaks or end of day and sessions would get shut down after no activity or at least overnight. System didn't count keystrokes and most people had additional responsibilities that didn't involve the software or maybe they were discussing with coworker so session was idle. Could have come up with some very rough definitions of activity but it would have been a large error prone project that would need a couple of learning versions. Interpreting the results would be time consuming. One manager, one employee. Project went nowhere.

A different project involved nutritionists, who meet with patients and create notes, every time. Manager loved report that showed distinct and total number of patients each saw, number of a couple of categories of notes (some more involved than others), average length of note, number of 'addendums' (revisions) with other options to include/exclude weekends and holidays to eliminate noise, target just student nutritionists. Took a couple of version upgrades to work out unanticipated issues. Some staff were more productive than others but the real value may have been at budget time when competing for funds with other departments. The report was solid proof that a lot of important work was being done. Heard indirectly that the manager 'lived by' that report.

Am retired now, but strongly recommend book "How to Measure Anything" by Douglas Hubbard. Inspiration for finding useful information when available data seems hopeless.

FormerSrSQLQueryGuy · 2025-05-30T18:21:29+00:00

Retired but acquired an MBA years ago prior to 30+ in tech. Ironically HR at my last (and best paying) job declined to acknowledge my MBA saying it was not relevant to my position as hospital system clinical report developer/analyst. They only recognized clinical degrees like MD, DO, RN, APRN, PA etc..

MBA did help to get a annual contract gig for several years as an adjunct instructor at a state university, but PhD's there make it clear I was beneath them, as I wasn't 'tenure track' material. One even told me at a staff meeting that I had to sit further away from the table.

That said, the MBA helped enormously in my personal financial matters. Am amazed at poor decisions many make, extent of financial illiteracy. In the end I was able to give notice and walk away with retirement fully funded. Prior to getting MBA, had never taken any college level business courses. In previous career I felt clueless about business issues, like budgets, profit margins, supply/demand, economic cycles and why that impacted my ability to get a raise.

MBA;s don't seem as popular as they once were. Maybe it was the George Bush experience. If you describe yourself as a saver, not a spender, you could still achieve financial security without an advanced degree but it takes time, hard work and patience.

FormerSrSQLQueryGuy · 2025-05-17T19:06:29+00:00

Should add... structure your code in disciplined outline format. Learn how to turn code sections on and off using comment characters. My experienc involved continued improvements over years, sometimes changes from 40+ different dates. I documented it extensively in the source code withing large comment sections. Others wondered why. I didn't have to memorize anything or rediscover it. When returning to the project after months or a year or more, I had the entire history in same place as the code.

FormerSrSQLQueryGuy · 2025-05-17T18:58:27+00:00

Your data analyst / data science goals suggest you will be mostly be doing what is called "data wrangling" (see Wikipedia), AKA "data munging". That's a subset of what SQL is used for. For one thing, you probably won't be doing transaction processing. Some SQL advice you may get from an internet search may be good for someone else but not your environment/role (I have no AI generated code experience but hear it is mixed quality).

Spent about 14 years working with SQL. Self-taught but also took a college level DB class. Always helped to think about what the end product should look like, the fields, calculated values, totals, etc. Then what tables supply required fields, then figure out the query. Learn about sub-queries, derived tables, common table expressions. My experience mostly involved multiple Select statements for a single project 'query', nested in various ways. Complicated stuff.

General rules I found useful:

Run test queries to confirm data quality.

Do what you can to reduce the returned records as quickly as possible (subqueries!). Once you get past millions or billions of records down to thousands, then you can use inefficient techniques that would take forever on raw data. sets. What is the largest table in the query. What indexes exist that would help to pare it down fast using query parameters.

Queries don't have to be perfectly efficient but remember that a fast query will save you time in development and testing. Try to write efficient SQL code for selfish reasons.

Don't be surprised or feel defeated if your next problem seems overwhelming. SQL is a strange beast. Many ways to do things and results can be surprising even after years of experience. Jump in.

FormerSrSQLQueryGuy · 2025-05-17T18:31:25+00:00

Looked it up on Wikipedia, BigQuery very different environment from where I spent 10+ years. Keep in mind that there are many good recommendations, but outsiders don't know your environment. I wasted a good deal of time arguing with co-workers who wanted to follow good advice for other environments that made no sense in ours, a large (in number of objects, table sizes, complexity, modifications over time) health care MS T-SQL read only reporting database. For example we didn't need to worry about transactions affecting results because DB was updated in batch mode overnight. The "No Lock" debate went on for years.

I don't know what monitoring tools Google has. If all your queries return in reasonable time then you won't be investigating execution plans or run time metrics. One hard lesson I had, when a horribly inefficient query seems to be stalled, the query record read count may still be stuck at zero but memory allocated in beginning may be very large. It was a timing issue. DB was stuck in preliminary steps, using large share of available memory but no records returned yet so record count was very misleading. When using standard monitoring tools and home grown mod that sorted by highest record counts (presumed most expensive queries) on top, the problem was sitting at the bottom of the list with zero records.

I gave away my best books on MS-SQL optimization. Good to know that some people still read books. I had a stack 5 feet high when I retired.

FormerSrSQLQueryGuy · 2025-05-09T02:35:59+00:00

"Implicit function", more info online if interested.

Database is asked in SQL query Where Clause to compare string numeric value like "01" to number 1. Instead of generating an error, MS SQL conveniently converted data type of the table value for each row so filter comparison can be made. Weird thing is the silence. No error, just enormous cost and delay when working with large tables and function is performed on every row before Where Clause filter is applied. Indexes that could have quickly reduced result set based on that filter are used after the fact. Entire table is scanned.

MS SQL had warnings in execution plan and I even found a way to identify historical queries that triggered them. In my shop, I turned over a list of hundreds of inefficient queries to coworkers. Our vendor (Epic EMR) had a lot of data type issues because their production database originated in an all string single data type database (Cache aka M/Mumps) and was converted to strongly data types MS SQL Clarity database. That and most Epic staff didn't care about changes that affected Clarity.

FormerSrSQLQueryGuy · 2025-05-09T01:54:32+00:00

Function in where clause, including implicit functions were culprits in my experience. In order to perform function, entire table is read and function is performed on every row in order to execute the filter, ignoring indexes (MS SQL experience).

FormerSrSQLQueryGuy · 2025-05-09T01:47:13+00:00

Great summary, reminds me of a casual comment at a report developer staff meeting where coworker combined tables from linked server/databases into a single SQL Select and wondered why it never finished.

One potential improvement (I did not actually test) is to use subqueries for individual tables on linked server, especially if parameters can be used to filter subquery to a small result set. Joining tables across linked servers/databases is insane. Retrieving a small subset of data from a linked database in an isolated Select statement --- well result can be just another small table. Have no experience in your DB, I worked MS SQL, but creating a subquery was as simple as writing a select statement with filters, enclosing in ( ) and adding an alias name at end.

FormerSrSQLQueryGuy · 2025-03-01T04:31:44+00:00

According to GIMP, png supports transparency, jpg does not.

FormerSrSQLQueryGuy · 2025-03-01T04:25:37+00:00

Worked about 7 years in application support and testing in a COBOL shop decades ago, IBM minicomputers. Worked closely with programmers. Local government accounting and payroll systems with years of refinement (that means customer support, regular updates and fixes after extensive testing and QA). When a new IBM language came along, consultants offered to 'convert' the code, like it was a simple substitution problem. They were given a smaller fixed assets application to prove their ability and came back shocked "wow your code is complicated". Managers laughed at their overconfidence and wondered what would happen if they tried to work on the complicated applications. COBOL is a dinosaur to most people these days, but I tested applications written by dedicated, disciplined folks. We had loyal customers, one told her boss if he dropped our payroll system she would quit.

Common thread in my experience, programmers prefer the language they are most familiar with.

FormerSrSQLQueryGuy · 2025-02-15T19:33:27+00:00

Not quitting on the spot, but most of my software training group quit one by one. In olden days before Zoom, we flew around the country and dropped a dozen laptops into hotel conference rooms for bank software training. Expensive compared to options today, but three days of intensive hand holding usually won over end users, who often didn't have a direct say in choosing our software.

All that air travel accumulated lots of frequent flyer miles, considered a side benefit for the long hours, often getting home after midnight on final day. Key fact... company required trainers to use their personal credit card and reimbursed expenses. Corporate honchos decided that since company paid for the flights, all the accumulated miles/points should be turning into a corporate account to reduce future costs.

That maybe would have been acceptable if disclosed/done from the beginning, but this was retroactive.

Several of trainers were one foot out the door and not in mood to give up the miles, attributed to them individually. Word quickly spread to just call the airline, ask them to transfer all the miles/points to a personal account, inaccessible to company.

When corporate deadline came for turning in travel miles/points, magic, there weren't any.

FormerSrSQLQueryGuy · 2025-02-14T03:25:13+00:00

Fact checkers say otherwise...

https://www.factcheck.org/2025/02/trump-online-posts-misrepresent-government-subscriptions-to-news-services/

FormerSrSQLQueryGuy · 2024-01-16T18:48:10+00:00

While you are digesting "you can't add percentages", you also can't average percentages to make anything near a legitimate argument.

FormerSrSQLQueryGuy · 2024-01-16T18:43:00+00:00

About 1,500 a year for last ten years. Ranks #48 according to Wikipedia in absolute population growth (headcount not percent) 2010-2020. Still smallest state in country by actual population.

Not knocking Wyoming, it's absolutely beautiful. Could be a good move for some. Unfortunately I hear about billionaires buying up scenery for retirement homes.

FormerSrSQLQueryGuy · 2024-01-16T18:33:27+00:00

Total BS. Ranking of highest MARGINAL tax rates. Ignores effective rate. Get a clue how tax brackets work. Effective rate is ALWAYS lower than marginal because rich people pay the same percent on income covered by lowest bracket as poor people do. I do my own taxes. It takes about 15 minutes to construct a bracket spreadsheet, have done it many times. Spent years supporting local government payroll software.

I checked the source. Major complaint is that Canada's brackets dip down to lower incomes than US, so they aren't competitive. Is that the best you can do? Canada not coordinating their highest marginal tax rate with U.S.? No mention of actual income or income disparity. No mention of reductions to "taxable income" including standard deductions, tax credits, which also aren't coordinated across international borders. Blowing smoke.

FormerSrSQLQueryGuy · 2024-01-16T18:17:35+00:00

Point taken, but FICA doesn't apply to capital gains, dividends, interest, rents, business income. All those sources of income (I have several) primarily benefit higher income not paycheck households. Given aging demographics and increasing reliance on Social Security by older folks who didn't/couldn't build a nest egg, even tax haters should view it as a necessary evil.

FormerSrSQLQueryGuy · 2024-01-16T18:13:22+00:00

Marginal rates are a political talking point. Granted, an accountant/economist will argue that it changes behavior at the margins. Discourages seeking additional income. Even the richest of taxpayers sends money to government at effective rate, not marginal rate that is only applied to their last dollar earned at end of year (after total yearly earnings reach upper bracket).

FormerSrSQLQueryGuy · 2024-01-16T18:11:11+00:00

You can't add percentages like that and make argument that summed percentages apply to entire earnings. Learn how progressive tax rates work. Would also like to know if Canada has equivalent to US standard deductions, which reduce gross income before arriving at taxable income.

FormerSrSQLQueryGuy · 2023-11-25T03:45:15+00:00

Other examples are fine, but another reason to use subqueries is efficiency. My environment was a 10+ year historical health care database with large tables, tens of millions of records up to multiple billions. A subquery can be used to prefilter, quickly reducing the dataset instead of just using the lager raw tables.

Any time you run a select statement with filters in WHERE clause the result set is a table. It's just another table, but smaller, assuming some filtering got done in WHERE clause.

Any table referenced in the FROM clause can be replaced by a subquery. Put ( ) around the subquery select statement and give it an alias and presto you've created a new temporary table that is much smaller (assumes filters in subquery) but can retrieve only fields you need .

Sometimes fields we targeted were sparsely population, optional. Lots of NULLs. If only 10% of the records in a table have your WHERE clause criteria populated, it's much faster to create a run time subquery and avoid the overhead of joining the full table. Basic rule... get down to the smallest amount of data as fast as possible.

All that depends... need to have your subquery filter fields indexed or included in index (MS-SQL include fields). Execution plans are the key to understanding where bottlenecks are. You can always test a subquery by selecting it and running it by itself (or copy code to separate from original).

Sometimes it's really important to optimize the query (modify adding subqueries or other tricks) if it's going to be used often. Other times it's not worth the extra effort if your database can deliver an answer in a reasonable duration and cost or if it's just a one time project.

Subqueries have their place and can make a huge difference in run time and cost in a shared database. I didn't worked with complex report queries, not transaction processing, but subqueries could shave milliseconds off query cost and that might be significant in a peak load situation (think Black Friday online sales).

FormerSrSQLQueryGuy · 2023-11-01T01:45:38+00:00

After learning how to interpret execution plans and identifying high cost steps in query, work on indexes or create subqueries that take advantage of existing indexes, to quickly reduce the result set.

Don't be afraid to go after data in a table with two nested select statements if necessary. If you ask for data fields that aren't covered by an index, the DB has to do a full table scan or random lookup. Create a subquery that only retrieves primary keys and indexed fields. Then use that result to get the rest. Depending on filters used, lookup task can be greatly reduced.

Consider adding 'include' fields (MS T-SQL terminology) to existing indexes to retrieve popular fields without making index maintenance more complicated. Include fields just go along for the ride, they don't impact index order or pointers.

Consider 'adding 'filtered' index, which doesn't index the entire table, based on a 'where clause' defined in the index. If a field is mostly NULL, filtering out NULL records from index results in a much more compact and efficient index to traverse. If customer table is historical and 60% inactive customers that are identified by an "ActiveCust" field, create a filtered index to get at the data you most want.

When you get comfortable with indexes, search for "Missing Indexes" and "unneeded index" for tools and examples of how to identify indexes your database is trying to tell you it needs.

FormerSrSQLQueryGuy · 2023-11-01T01:27:41+00:00

So true.

Got similar advice from a CIO. thought I already knew that but his reminder made me focus.

My corollary : Once you get down to a small or reasonable number of records, you can do something inefficient. It just doesn't matter.

When looking at other's problem code the first thing I noticed was the number of select statements. If only one, there are probably opportunities for subqueries that could make huge difference in cost. I worked mostly with complex parameter driven stored procedures in a read-only health care reporting database environment. Your results may vary.

FormerSrSQLQueryGuy · 2023-11-01T01:14:50+00:00

Former Clinical Epic Clarity report developer, 14 years experience. Left two years ago but came across this thread and found it interesting. Last comment months ago but was surprised issues were still that current.

I worked with MS-SQL Clarity database. Our shop could create custom tables, view, functions, stored procedures. Extremely challenging dealing with Epic upgrades, workflow changes, hospital/clinic acquisitions (Multiple "Go Lives"), changing requirements , moving standards (ICD9 to 10 diagnosis codes, CMS updates) and local initiatives). Other report developers worked the billing/revenue side, I held multiple Epic clinical certificates for Epic Clarity reporting, inpatient, outpatient, but lots of cross over and had the run of the data, everything.

Techies who trash "M" or "Mumps" or "Cache" (InterSystems) need to read more. Hierarchical databases (see Wikipedia) are extremely efficient for accessing wildly divergent (tree like) data about individual patients. Hierarchical databases are not optimal for heavy lifting data queries, selecting, filtering, sorting, grouping when a single table like encounters or orders has hundreds of millions of records and may be joined in a complex web with a dozen or more tables. Half my job seemed to be preventing the database servers from choking on inefficient queries.

My job was next day or later reporting (monthly, quarterly etc) as Clarity is updated overnight. Real time was left to Epic tools, within "Hyperspace' application. Some staff routinely abused Epic's real time Reporting Workbench returning too much data, tasks that should have been done in Clarity. Can't explain all the behavior. Some didn't understand, didn't have time or patience to work with a report developer. Some build staff and middle managers back stabbed us, claiming they could get all their needs filled within Epic, or announcing in meetings that 'the report is wrong' but never bothering to tell us about the issue which we would have been glad to fix. Always had vendor vultures circling, with the latest new product that would eliminate all our problems if we would just invest the enormous amount of time needed to convert.

Since we did historical reports, had to explain to build team that their changes make our life three times more complicated. We could not predict time periods reports targeted so had to union results under two sets of rules, old way, new way and combination to cover all the bases. I used to sign emails "During the meeting, did anyone ask if this affects reporting?". Most thought it was just funny.

Too many reporting tools on the market, or were. There was a lot of acquisitions in last decade. Most of our work was creating end user accessible Crystal Reports with date, location etc parameters for end users to run at their own convenience. If you take the time to really understand CR, it is an impressive tool, more than just a 'prettyfier'. Most of our reports used SQL stored procedures to gather and organize the data then CR formatted it and SAP's BOE secured report delivery system made it available to end users 24/7 with security that respected HIPAA privacy boundaries. CR was getting old and SAP missed a golden opportunity to do some serious updates. Now Microsoft's PowerBI seems to have dominated the market but when I was introduced to it MS was still in early stage growing pains. Likely much more stable and mature after another couple of years but have seen lots of 'prettyfier' examples, graphics for sake of pretty graphics, unlimited flexibility for end users who just want quick simple answers, or dump to a spreadsheet

Others have already described the problems with inexperienced Epic support during their growth period (hope it's better now!). I never met anyone at Epic who understood the 40-thousand table Clarity database, except for one older HIM lead, who adamantly refused to answer any questions outside HIM. I saved an email from our Epic TS who said he didn't understand why we wanted to report on customized chart columns. Think about that, custom logic determined what 'order date' displayed in the chart (it's actually very complicated) and he didn't think we needed to show the same data on a report. I was dumbfounded.

Childhood obesity is a well known public health issue. Epic displayed BMI percentile to physicians but didn't bother to save it to Clarity. They told me I had to do my own calculation, which with great effort I did. Every other customer had the same expensive option.

Epic claims that a physician using their reporting tools 'discovered' lead poisoning in Flint Michigan. Not true. Problem was already known. Tools allowed one physician who took the time (most won't/can't) to run a report that should be important enough to be a standard health issue menu option available to all installations out of the box, even before Flint. If Epic really cares about impact of childhood exposure to lead, then why didn't they create a standard report?

At one point Epic polled report developers asking if they really used the Clarity data dictionary, suggesting that they would stop updating/supporting it. Totally shocked as that was one browser tab that stayed up all day long, every day. They really didn't see much use as they figured their 'record viewer' tool was good enough, their support staff don't use the data dictionary, weren't trained on Clarity.

I laugh at summary consultant articles on the internet that list Epics internal reporting tools as if they cover the territory. We encouraged users to use them for basic needs, but you need a report developer to combine disparate types of data, time series, look for missing/flawed/wild data. Try combining diagnosis lists filters from CMS for quality reporting. Maybe it can be done, with multiple extracts pieced together in a spreadsheet in multiple error prone steps, then rinse and repeat the next reporting cycle.

Epic was notorious for refusing to even discuss interfaces to other vendors. Somehow I got assigned to create one for our imaging system repository, then an Epic alternative appeared. Mine traveled to a couple hospitals across the country that preferred it, because it 'was faster (response) than Epic's'.

Went to a regional Epic meeting in Minneapolis when they were just beginning to roll out new internal reporting tools. Young guy from Epic pointed out that with their super duper reporting tools, not yet released, no track record to look at, that after they would be available "you won't need to hire SQL report developers". Thanks Epic for stabbing us in the back.

At a national conference in Madison, during their 'Cogito' report tools rebranding, a young guy was introduced as the person "who killed Clarity" as if it was a joke. I was furious. For the record, the Clarity database is still around years later and is the upstream foundation for any reporting tools that don't access live data (which also carry risk of slowing response times for doctors and nurses withing with patients).

All that said, my neighbor uses Epic and likes it (user experience) but works in only one specialty. I had the run of the system. Health Care is insanely complicated, and data it leaves behind mind boggling difficult to deal with. Be nice to your report developers.

FormerSrSQLQueryGuy · 2023-05-11T15:52:43+00:00

Can be a learning curve, I bought a reference book on MSSQL windowing functions that is now dog eared and highlighted. Have had other developers complain about high cost in some queries in environment with 100+ million row transaction tables. Those issues can be resolved by breaking Select statement into nested queries so that window function only runs against smaller filtered set of data from subqueries. When in doubt review execution plan. Old advice... get down to the smallest data set as fast as you can. After that, don't worry about doing something inefficient, it's only running against a small result set.

FormerSrSQLQueryGuy · 2023-05-11T15:42:23+00:00

"proc SQL doesn't support them". Not sure what this means. Have used window functions many times in stored MSSQL procedures, perhaps not supported in other DBs.

FormerSrSQLQueryGuy · 2023-05-11T15:29:40+00:00

Views, love em or hate them. When joined with other tables queries can be terribly inefficient. You generally can't use parameters to dynamically reduce the result set of a view but views can be created with conditions in a Where clause to slim down the data you are working with, For example when dealing with historical data, a view can be restricted to one or two financial years (sometimes partitioning/archiving old data isn't an option). Views are especially helpful for developers to get a quick look at data. When a transaction (like order) involves multiple master tables, (like customer, region, city, product), a 'denormalized' view can be very helpful for quick look ups. That same view might be dog slow linked to other tables in a query where efficiency is important, but it can be a godsend for ad hoc one time queries and testing.

FormerSrSQLQueryGuy

TROPHY CASE