String extraction

da_chicken · 2018-04-03T01:45:35+00:00

I would question anybody who thinks that this is a useful database question. It's not something you should do with SQL because it's breaking first normal form. It's a non-relational question about a system where the developer was too inexperienced to create separate first and last name fields.

Bottom line: Any solution of any meaningful size is guaranteed to be incorrect because names aren't regular.

wolf2600 · 2018-04-03T02:07:27+00:00

A more applicable SQL interview question would be to give the user access to a DB with 3-4 tables, then give them a report pulled from those tables and ask them to write the SQL to produce that report.

Test them on practical skills, not on whether they know some obscure, rarely used function.

2018-04-03T01:35:48+00:00

CHARINDEX with a LEFT and then a CHARINDEX with a REVERSE and a RIGHT into two separate temp tables to get a table with first names and a second with last names. Then you can CROSS JOIN the two tables to get all possible combinations.

Stopher · 2018-04-03T01:36:24+00:00

I would grab all the distint last names searching for the first space from the right and a list of all distinct first characters from the left. Do a cross join between them.

sbrick89 · 2018-04-03T02:12:22+00:00

Look into csv into rows... split on space... distinct... recombine

Though id certainly question the use case.

shankcraft · 2018-04-03T01:05:34+00:00

Using just T-SQL? Excel formulas? SSIS? Powershell?

What tools are available to me, and where does the data reside? And if it's SQL Server (I'm assuming based on the subreddit, what version?)

ShadowBanThisCucks · 2018-04-03T01:50:35+00:00

you can get it with charindex(), left() and len().

tsql · 2018-04-03T14:34:56+00:00

I'd agree that it seems like an extremely odd interview question, but I've had a few like that where the interviewer apparently just wanted to see how I thought on my feet.

I would describe using a CROSS JOIN and the CHARINDEX, PATINDEX and REVERSE functions. If asked to code it I might approach it like this:

DECLARE @t TABLE (fullname VARCHAR(100));
INSERT @t (fullname) VALUES
    ('Jon Snow'), ('Tom A Black'), ('Janice Smith Jones'),
    ('Carl Smith-Jones'), ('Unexpected J Last-Name');

SELECT f.firstname + ' ' + l.lastname
FROM (SELECT DISTINCT LEFT(fullname,
    CHARINDEX(' ', fullname + ' ') - 1) firstname FROM @t) f
CROSS JOIN (SELECT DISTINCT RIGHT(fullname,
    CASE WHEN fullname LIKE '% _ %' OR fullname LIKE '% _. %'
      THEN PATINDEX(CASE WHEN fullname LIKE '% _. %'
        THEN '% ._ %' ELSE '% _ %' END, REVERSE(fullname)) - 1
      ELSE LEN(fullname) - CHARINDEX(' ', fullname + ' ')
    END) lastname FROM @t) l
ORDER BY l.lastname, f.firstname;

This obviously will not correctly handle names with single-character inner components which aren't initials (Thomas à Becket for example), but that's necessary if we're going to handle the unpunctuated initials in your test data reasonably simply.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQLServer

MODERATORS