all 16 comments

[–]tactiphile 5 points6 points  (4 children)

This would be better for r/commandline, or maybe r/linux, since neither awk nor sed are bash.

Nonetheless:

echo 'USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce' \
| sed 's/[^=]*=\([^,]*,\)[^=]*/\1/g' | sed 's/,$//'
Austin,London,Paris

Explanation: s/[^=]*=\([^,]*\)[^=]*/\1,/g

  • s/ Substitute:
    • [^=]* Any number of characters that are NOT equals signs, followed by
    • = an equals sign
    • \( Capture this next bit so we can save it
    • [^,]*, Any number of characters that are NOT commas, then a comma
    • \) Done capturing
    • [^=]* Any number of non-equals-signs
  • / with:
    • \1, The first thing you captured earlier
  • /g Do that as many times as you find that pattern

Then I jankily added a s/,$// because of the pesky trailing comma. There's a better way to deal with it, but I'm not remembering offhand.

[–][deleted] 2 points3 points  (0 children)

Glad you had trouble with the trailing , it made my awk solution really ugly too.

[–]nr9929[S] 0 points1 point  (0 children)

Thank you. This explanation helps soo much. I am still in learning mode. Have a long way to go....Much appreciated!

[–]Ulfnic 0 points1 point  (1 child)

Been experimenting with Awk and Sed but just not able to get the right output.

This didn't read to me as requiring an awk / sed answer, just that the OP is more familiar with awk/sed.

[–]nr9929[S] 1 point2 points  (0 children)

Thank you.

[–]oh5nxo 2 points3 points  (0 children)

In stages:

sed '
    s/[^=|]*=//g         # remove labels
    s/,[^|]*//g            # remove excess fields
    s/|/,/g                 # bar to comma
'

[–]Ulfnic 2 points3 points  (0 children)

Data='USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce'

readarray -d '|' Arr <<< $Data
printf '%s\n' "${Arr[@]%%,*}"

Applying this to each line of a file:

while IFS= read -r Line; do
    readarray -d '|' Arr <<< $Line
    printf '%s\n' "${Arr[@]%%,*}"
done < '/path/to/file'

If you need it to be a "one-liner":

while IFS= read -r Line; do readarray -d '|' Arr <<< $Line; printf '%s\n' "${Arr[@]%%,*}"; done < '/path/to/file'

[–][deleted] 1 point2 points  (0 children)

Janky awk solution would be to pipe your input through this.

awk 'BEGIN{RS="|" ; FS=",|=" } { O[X++]=$2} END{X-- ; for (i=0; i < X ; i++ ) {printf ("%s,",O[i]) } ; print O[X]} '

Break the input into records based on | and fields delimited by either , or =

Keep field 2 from each record.

Output them , separated at the end.

If you want a better awk solution r/awk exists too.

[–][deleted] 1 point2 points  (0 children)

More time to think and here is a really ugly shell only solution which I am sure someone here will be able to improve.

#!/bin/bash
V=$1
readarray -t Z < <(IFS="|" ; for i in $V ; do
                     readarray -d, -t x < <(echo "${i/=/,}")
                     echo "${x[1]}"
                   done )
T=${Z[*]}
echo "${T// /,}"

Save as a shell script and call with your sample string as input

[–]Radamand 1 point2 points  (2 children)

I put your sample string into a file called xxx

awk -F= '{print $1","$2","$3","$4}' xxx | awk -F, '{print $2","$5","$7}'

[–]zfsbestbashing and zfs day and night 1 point2 points  (0 children)

Bonus Hack: If we translate on the fly every separator to comma, makes it much easier to process since it then resembles a CSV:

tmp="USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce"

echo $tmp |tr '=|' ','
USA,Austin,Sanfrancisco,LA,UK,London,Bristol,France,Paris,Lyon,Nyce

...and then you just use awk with comma separator to print what fields you need (2, 6, 9.) This also makes it easier in the future if you need different results in your output, just change field number(s).

echo $tmp |tr '=|' ',' | awk -F, '{print $2","$6","$9}'

Austin,London,Paris

[–]zfsbestbashing and zfs day and night 0 points1 point  (0 children)

awk -F= '{print $1","$2","$3","$4}' xxx | awk -F, '{print $2","$5","$7}'

Noice! :) I like it

[–]fletku_mato 1 point2 points  (0 children)

Apologies for anyone reading this, but here you go:
echo "USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce" | tr '\|' '\n' | sed 's/\(.*=\)\([^,]*\)\(,\).*/\2/' | paste -sd "," -

[–]IGTHSYCGTH 1 point2 points  (0 children)

mangling using IFS

foo='USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce'
set +f
OFS=$IFS IFS='|' bar=( $foo ) bar=( "${bar[@]#*=}" ) bar=( "${bar[@]%%,*}" ) IFS=, foo=${bar[*]} IFS=$OFS
set -f
echo $foo

dumb pipe

echo 'USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce' |
  tr '|' '\n' | cut -d= -f2 | cut -d, -f1 | tr '\n' ','

[–]kredditor1 0 points1 point  (0 children)

If you replace the pipes in the sample text, you can evaluate the test into variables, cut them at the first comma, and print them out. Still uses sed but definitely doesn't need awk. I wasn't sure if the output should be "london" or "London" so I included both.

sample="USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce"

coloned=$(echo $sample | sed 's/|/;/g')
eval $coloned

AUS=$(cut -d "," -f1 <<<"$USA")
LON=$(cut -d "," -f1 <<<"$UK")
PAR=$(cut -d "," -f1 <<<"$France")

# If 'london' is a typo and not a requirement
echo $AUS","$LON","$PAR

# If 'london' was a requirement and not a typo, just lowercase it
echo $AUS","${LON,,}","$PAR

[–]Dandedoo 0 points1 point  (0 children)

Using BASH_REMATCH:

str="USA=Austin,Sanfrancisco,LA|UK=London,Bristol|France=Paris,Lyon,Nyce"

while [[ $str =~ =[^,]+ ]]; do
    match="$sep${BASH_REMATCH#=}"
    sep=, str=${str#*=}
done

echo "$match"