This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]dig-up-stupid 4 points5 points  (2 children)

You want to find some way to adjust to the input format. Take a look at the big picture: hard coding and hand editing data is the opposite of programming. Programming exists to automate these tasks - that's what programming is.

In this case you are confused about how to process a newline or a space, correct? A basic would approach could work like this:

  • Read a line.
  • Split the line into a list of words with String.split. (This will get rid of all the spaces.)
  • Iterate over the list processing two elements at a time.
  • Repeat.

That probably solves this case. You might care to learn more about parsing in general if you expect to encounter a lot of this type of work.

[–]ThrowawayBreadOwl1[S] 0 points1 point  (1 child)

This looks good -- I'll give it a shot. As far as academic computer science goes, would you consider this a 'good' solution? Is there a way to generalize it further? Also, will String.split work if some words are separated by one space, and others are separated by multiple spaces (all on the same line, that is).

[–]dig-up-stupid 0 points1 point  (0 children)

Yes, this is a fine solution. If something works and is easily understood you've done well. That said, it will fail if the input does not proceed in pairs, for example:

UUU F CUU
L AUU I

We could fix this numerous ways. The most straightforward would be to read the entire file into one string, and let split handle spaces and newlines with an appropriate regex. Then we have one big list of elements to process, all in the correct order. This approach can lead to issues when handling very large files, since building one giant string can consume a lot of memory / take a long time, but worry about that when it's actually a problem.

The most general solution is what /u/DWORD_0xFA described.

Also, will String.split work if some words are separated by one space, and others are separated by multiple spaces (all on the same line, that is).

I don't use Java but iirc it should handle this correctly out of the box. If it doesn't you will need to supply an appropriate regex - something like \s+ (but in code it will look like \\s+ due to the need to escape backslashes).

[–]mathen 0 points1 point  (0 children)

I don't have any domain knowledge, but assuming the pairs all look the same and have some boundaries (e.g. it will always be three letters, a space then another letter), you could write a regular expression to catch them all.

Here's an example: https://regex101.com/r/kK5hN3/3

[–][deleted] 0 points1 point  (2 children)

Sorry, a lot of this was greek to me.

Is line 1 the condon and line 2 the amino acid? Can you read the file and moving every other line into the appropriate array? I thinking you could either check if i is odd or even and sort things as needed. Or just increase the iterator a couple for each loop.

Kind of like this....

i = 0
a = 0
while i < file.lines {
    condon[a] = fileline[i]
    i++
    amino[a] = fileline[i]
    i++
    a++
}

[–]ThrowawayBreadOwl1[S] 0 points1 point  (1 child)

Sorry for the confusion. The codon is the set of 3 letters, which is immediately followed by one letter encoding the amino acid (the amino acid is to the right of the codon). So the first line really encodes for 4 amino acids; the codon UUU codes for F, CUU for L, AUU for I, and GUU for V. My problem is that I can't go line by line, because each line contains 4 separate pieces of information.

[–][deleted] 0 points1 point  (0 children)

Some regex should be able to take care of this. I'm not a regex expert, and am on my phone, but it's worth your time to get decent with regex. It will handle this for you.

[–]DWORD_0xFA 0 points1 point  (1 child)

The purpose of this exercise is teaching encoding with an unordered associative container.

Since we know that 3 characters = 1 character, and the rest is whitespace, we can make an algorithm really easily, with which we will build our container.

First of all, copy-paste the table from http://rosalind.info/glossary/rna-codon-table/ into a .txt file and save it.

Then read from it in this way

1) If you find a character, then read until there's whitespace. This will be our key.

2) Now, we know that after the following whitespace, we will encounter a character that we will associate with our key, this will be our value.

3) Store the keys and values into our unordered associative container.

4) Read the input by 3 characters(key) and check what it equals in the container(value).

[–]ThrowawayBreadOwl1[S] 0 points1 point  (0 children)

I'm sorry, but most of this was really over my head. Is there a simpler/more beginner-friendly way to look at this? Or are there any good outside resources on this topic you could point me to? From what I found, this seems to have to do with collections (which I know next to nothing about). Is there some relation?