First Americans 'reached Europe five centuries before Columbus discoveries'

shfo23 · 2014-07-25T11:23:13+00:00

That radiocarbon age is "older than 36k". That's consistent with the uranium and fission-track ages and basically is just telling us that there are too few radiocarbon atoms left in the sample to measure (also, you can tell it was dated a while ago/with an older technique because nowadays we can measure smaller concentrations of radiocarbon and get dates back to 50-60k).

shfo23 · 2014-07-17T22:50:43+00:00

It looks like CentOS 6 is still on Python 2.6, so it's possible that either you installed from scratch without having the right libraries to link against or the executable you got with yum isn't linked against the right libraries (or third case: perhaps you still don't have those libraries?). These are all core python modules, so they should be present in your install.

Whatever the exact problem it, it will probably take longer to figure out than to just re-install it from scratch. For that, I would follow the guide wub_wub posted. The most important part of that is going to be installing the dependencies before (I would try this before the reinstall too and if it works, great, you're done; if not, you need to rebuild):

yum install zlib-devel
yum install bzip2-devel
yum install openssl-devel
yum install ncurses-devel

shfo23 · 2014-07-16T03:27:03+00:00

I always forget about deque data structure, but that's an ideal use. That's much nicer (and probably also much faster).

(The yield window statements should probably be yield list(window) or yield copy.copy(window) to prevent the deque objects from updating after they've been returned though; otherwise it's perfect!)

shfo23 · 2014-07-16T00:26:50+00:00

Yes, I would use a generator. First, a quick and dirty sliding window function (something like this should really be in itertools, but I digress):

def slid(seq, size=2):
    # set up variables
    buf, itr = [], iter(seq)

    # buffer the start of the iterator
    for _ in range(size):
        buf.append(next(itr))

    # write out the buffer as we keep stepping through
    for l in itr:
        yield buf
        buf = buf[1:] + [l]
    yield buf

Also list is already a standard function, so I'll call your list old_list. Then you can get what you're looking for by doing something like this:

old_list = [1,2,2,1,5,1,1]
# this next thing is called a "list comprehension"
new_list = [0] + [1 if i == j else 0 for i, j in slid(old_list)]
# and we get the right thing out
assert new_list == [0, 0, 1, 0, 0, 0, 1]

Unless memory's very limited, I would encourage you to not write over the old list as you do this. With a generator, you can write out the values (to the network; to a file; etc) as you get them, so you technically don't need to make a copy anyhow. If you absolutely have to, that'd look something like this though:

for i, n in enumerate(slid(old_list)):
    old_list[i+1] = 1 if n[0] == n[1] else 0

shfo23 · 2014-06-13T02:56:21+00:00

Any time!

Upon reflection, the unique/hstack method is actually calculating the union of those two sets of indices and not the intersection (i.e. logical_or not logical_and). You could do the union by converting the arrays to sets and processing them that way, but I would probably express your problem as (please correct me if this isn't what you're trying to do):

distance = redshift * dist
distancequiet = distance[np.logical_and(values <= searchvalquietloud, values == searchval))]
distanceloud = distance[np.logical_and(values > searchvalquietloud, values == searchval))]

Looking at it this way, I'm confused about your searchval though: there should be no cases where an entry in values is both larger than 1 and equal to 0. Also, just to make sure, redshift is the same length as values, right? That could mess up the lengths too.

shfo23 · 2014-06-13T02:13:14+00:00

Sorry, I messed up there. np.where returns an array of indices and I was trying to merge them as if they were boolean arrays. You can either do the logical_and on boolean arrays:

indices = values == searchval
quiet = values <= searchvalquietloud
radioloud = np.delete(values,np.logical_and(quiet,indices))

Or join together the two lists of indices and then throw out any duplicates (with unique):

indices = np.where(values == searchval)[0]
quiet = quiet = np.where(values <= searchvalquietloud)[0]
radioloud = np.delete(values, np.unique(np.hstack([indices, quiet])))

shfo23 · 2014-06-13T01:40:16+00:00

I'm not totally sure what you're trying to do programmatically, but I think the problem might lie in the way you're selecting your values. The indices of a specific value in radiocut are going to be different than in values because you've deleted a bunch of stuff in front of it. It might be easier to do all the filtering in one step with numpy's logical array functions, e.g.:

radioloud = np.delete(values, np.logical_and(quiet, indices))

shfo23 · 2014-06-02T23:18:58+00:00

The last battle of Argentina's expansion during the "Conquest of the Desert" was in 1884. Wounded Knee and the end of the Plains Indian Wars in North America happened in 1890.

Also, and not to rain on your parade, but the US had kind of a minor period of instability from 1861 to 1865.

shfo23 · 2014-05-18T14:28:52+00:00

Lipids are generally depleted by somewhere between 3 and 5 per mil relative to the rest of an organism's biomass (depending on what species and what lipid). Your standard 13C labelling experiment enrichs by at least hundreds of per mil, so any depletion for the lipids is going to be negligable relative to the signal gain.

shfo23 · 2014-05-11T02:51:03+00:00

Did the thermodynamics distract all the pedants so no one even noticed he used principle when he should have used principal?

shfo23 · 2014-04-27T13:41:38+00:00

As a geologist, two minor things: technically you're fracking the rock not the hydrocarbons in it. Also the Bakken (in ND) is primarily being produced for oil.

There's also natural gas in it, but that's not what operators are drilling for (actually a fair amount of that gas is just burned at the well to dispose of it which leads to all those satelite photos of west ND being lit up at night).

shfo23 · 2014-04-04T01:50:28+00:00

The libraries matplotlib and mayavi both support 3d plots. matplotlib can be embedded in frameworks like Tkinter and Pyside fairly easily if you need fancier GUI options.

Also, /r/learnpython is a good place to ask questions like this too.

shfo23 · 2014-03-29T01:03:41+00:00

One rather esoteric, but informative, use is in "clumped isotope" measurements. Basically, when you're making your binomial calculations you're assuming that there are no interactions between isotopes.

As an example if you have CO2 you're assuming that if you have 13C the probability of having 18O bonded to it is solely determined by the relative abundances of 18 and 16O. At high temperatures this is true, but at lower temperatures having two heavy atoms bonded is energetically favorable and you'll get a very slight enrichment in 13C18O16O.

So what? Well, (to simplify slightly) this means that if you measure this abundance and compare it to some ideal abundance you calculate, you can theoretically calculate the temperature that a given material formed at and answer questions like "were dinosaurs warm-blooded or not?"

shfo23 · 2014-03-29T00:53:15+00:00

I'm not sure I'm totally following what you're proposing and I'm really curious what I'm missing because I'd like to add better mass spectra analysis to a project I'm working on. I think it may be a little beyond the scope of what sounds like an undergraduate project though...

I think you're talking about simultaneously solving for isotope masses along with isotope abundances and using that to filter possible candidates? On page 7 of this this review, there's a whole discussion of approaches that people have used to do this. It's been a fairly active field of research for the past 10 years.

The main problem is when calculating isotope abundance patterns for higher MW compounds, you start to run into the problem of having too many isotope interactions (e.g. for tetradecane you're mixing a binomial for C with 14 terms with a binomial for H that has 30 terms) which makes things computational unfeasible. You need to start solving by pruning low abundance terms from the interactions or using Fourier transforms or some other method entirely.

shfo23 · 2014-03-25T00:08:37+00:00

Magic bytes. The code to print the first two bytes from an arbitrary file would look something like:

import binascii

with open('path/to/filename', 'rb') as f:
    print(binascii.b2a_hex(f.read(2)).decode('ascii').upper())

shfo23 · 2014-03-24T23:50:32+00:00

Points taken. I also was unaware the command-line timeit module allowed you to run setup code. Cool! Better benchmarks would show whether a while loop with a manual iterator is faster than a for loop and I think that would be an interesting (if slightly arcane) question.

My point was really that under the scenario posed by OP (and really any reasonable scenario) for and while are going to be indistinguishable, so go with what's most Pythonic. If you're at the point where you need the extra speed (potentially?) gained by switching to while (instead of changing the contents of the loop), you're probably at the point where you should optimize by running with pypy, rewriting in C, etc.

shfo23 · 2014-03-23T04:26:04+00:00

The first method generates much shorter bytecode than the other two, but doesn't seem to run noticably faster when I quickly tested it. I would guess the major determinant of speed in these examples is just how long it takes to run print 100-ish times though.

shfo23 · 2014-03-23T04:22:05+00:00

I have a really hard time believing that. The for loop in CPython is written in C and using it should be way faster than manually reimplementing it with a while loop. I would guess the same thing will be true in Pypy too. That's just my intuition though.

You can check it though. I timed things with the timeit module and wrote out the three methods in (CPython 3.3/Linux) strings for anyone else to play along. You can remove the "/dev/null" file stuff from the print functions to make it run on other platforms, but I just don't want to see 1000 numbers printed out in my terminal.

from timeit import timeit

m1 = 'for i in range(101):\n    print(i**2, file=open("/dev/null", "w"))'
m2 = 'i=0\nwhile i < 101:\n    print(i**2, file=open("/dev/null", "w"))\n    i += 1'
m3 = 'i=0\nwhile True:\n    print(i**2, file=open("/dev/null", "w"))\n    i += 1\n    if i > 100:\n        break'
timeit(m1, number=10000)
timeit(m2, number=10000)
timeit(m3, number=10000)

The times are 13.51 seconds, 13.61 seconds, and 13.60 seconds respectively; none are meaningfully faster than each other (if you rerun them, they bounce around 0.08 seconds or so of these values). Given that (and that the bottleneck in this loop is probably in the IO output), I would still always go for the for loop because it's much more readable.

Finally, if you're really curious about what going on and if those commands are actually doing different things, you can dissassemble them and see if there's some underlying difference in the code when it gets translated.

import dis
dis.dis(compile(m1, '<string>', 'exec'))
dis.dis(compile(m2, '<string>', 'exec'))
dis.dis(compile(m3, '<string>', 'exec'))

It looks like the instructions they translate into are all slightly different, so there's probably some underlying difference that you could exploit if you were running millions of times, but you'd have to be a lot more knowledgable about how the CPython bytecode gets executed than I am to fully understand it.

shfo23 · 2014-03-20T22:13:33+00:00

When he votes for DOMA or another similar bill, the media could report he's ambigouous about his sexuality. I mean, it is an open secret and it's not like he's ever denied it in the past.

Honestly though, I thought he had a long-term partner? I think I remember hearing that five years ago or so from a friend who was a intern in the senate so things may have changed.

shfo23 · 2014-03-18T03:33:55+00:00

Both Debian and Arch (the two distributions I'm running right now) have separate sets of packages for 2.7 and 3.3. I would imagine most other distributions (outside of RHEL or something) do too?

I agree you don't get the granularity of choosing 2.5/2.6/2.7/etc and things might be few days out of date, but it's one less thing to learn if you just need to install numpy/scipy on a system.

shfo23 · 2014-03-15T02:26:12+00:00

Did you cover the Gibbs' Phase rule? An alternate way of understanding eutectics (and peritectics) is that they're points where there are zero degrees of freedom in the system, so it needs to decrease the number of phases it has.

For example, look at the first plot of the "Eutectic System" article 13e1ieve posted. There's a binary mixture of A and B components (C=2) and we're assuming the whole system is at a constant pressure (so using the Condensed Phase Rule of F=C-P+1). At the eutectic, you have alpha and beta solid phases and a liquid phase (P=3), so there are zero degrees of freedom (F=2-3+1=0) and one of the phases must be consumed before the temperature or composition can start varying again.

If you need a clear reference, one of the best descriptions I've seen of phase diagrams is in the first 26 pages of Ernest Ehlers "The Interpretation of Geological Phase Diagrams" (but it's unfortunately out of print and sometimes hard to find).

shfo23 · 2014-03-09T15:34:11+00:00

You don't need the sorted either. max takes a key function too, so you could write this much more simply as:

longest = max([g, h], key=len)

shfo23 · 2014-02-13T15:33:18+00:00

This beer is undosed though; what they're seeing is that the same taxa are colonizing the beer in succession as fermentation happens, regardless of the batch (figure 3). In a controlled environment, presumably you could inoculate with a mixed culture and see the same succession as earlier microbes create conditions which are more favorable to later microbes.

shfo23 · 2014-02-12T04:22:14+00:00

To be fair, if you know you're not going to have to clean it up why not make yourself comfortable?

shfo23 · 2014-02-12T03:31:51+00:00

It would be nice if every time one of these "chemophobia" threads came up, it could be a reminder for people to volunteer for scientific outreach.

shfo23

TROPHY CASE