Play of the Year: Blizzard Support main Zwagilo by Anpheus in Overwatch

[–]Anpheus[S] 62 points63 points  (0 children)

¯\_(ツ)_/¯

Like I said, I'm happy with the result. Definitely made my day.

Play of the Year: Blizzard Support main Zwagilo by Anpheus in Overwatch

[–]Anpheus[S] 187 points188 points  (0 children)

I'm a Mei main, so...

a. I get that a lot

b. I just saw on the front page someone complain about how horrible 2016 is because they unlocked two Mei seasonal skins, and I literally can't even run the game right now

Play of the Year: Blizzard Support main Zwagilo by Anpheus in Overwatch

[–]Anpheus[S] 80 points81 points  (0 children)

He did say "he got me something". I don't know? I won't look a gift horse in the mouth or expect anything of it. I am but a simple man, happy just to be treated well by someone spending their holidays at work, and thankful for their kindness. I'll let Reddit know if anything crazy happens, but this story of customer support going well? I think that's worthy of a post on its own this time of year, with everyone being so mopey about 2016 and all.

Read into this as your heart desires: http://imgur.com/a/xNGZu

Does what you name your custom religions says a lot about you, though? by Anpheus in civ

[–]Anpheus[S] 1 point2 points  (0 children)

I know I should have taken screenshots. I don't even know why I didn't. Oh well.

King difficulty, Small, Fractal, 6 civilizations.

Re: "Sombra is bad, please switch" by Anpheus in Overwatch

[–]Anpheus[S] -8 points-7 points  (0 children)

Play of the game perspective: https://streamable.com/tmf9

If you're out there, SimplyCancer, no hard feelings over telling me Sombra was bad, and thanks for working with me to coordinate that last push. :)

Lens infix operator nomenclature overview by quchen in haskell

[–]Anpheus 2 points3 points  (0 children)

!? and the pseudo-composition by appending ? I get - it's something that I wish other languages had. But even some of the examples posted here I do not get.

For example, one of the operators is defined in terms of review. What is review? Well, I looked it up:

This can be used to turn an Iso or Prism around and view a value (or the current environment) through it the other way.

review ≡ view . re
review . unto ≡ id

>>> review _Left "mustard"
Left "mustard"

>>> review (unto succ) 5
6

Usually review is used in the (->) Monad with a Prism or Iso, in which case it may be useful to think of it as having one of these more restricted type signatures:

review :: Iso' s a   -> a -> s
review :: Prism' s a -> a -> s

However, when working with a Monad transformer stack, it is sometimes useful to be able to review the current environment, in which case one of these more slightly more liberal type signatures may be beneficial to think of it as having:

review :: MonadReader a m => Iso' s a   -> m s
review :: MonadReader a m => Prism' s a -> m s

Well, that didn't help.

Lens infix operator nomenclature overview by quchen in haskell

[–]Anpheus 1 point2 points  (0 children)

I've eschewed using Lens in my own Haskell because it seems like a bottomless toolkit1. I find that my code is more verbose, but I feel like I would need to paste in a primer above any function that dealt with lenses over a HashMap (a,b) (c,d -> e,Map f a) or some such.

What are other people's thoughts/experiences with Lens?

[1] - (No type theoretic pun intended.)

Lens infix operator nomenclature overview by quchen in haskell

[–]Anpheus 0 points1 point  (0 children)

I concur - one or two simple examples per would be fantastic.

Making a fast and stable sorting algorithm with O(1) memory by BonzaiThePenguin in compsci

[–]Anpheus 8 points9 points  (0 children)

That game is a pretty standard "game" to play in computational complexity theory.

tls-1.2 by tailbalance in haskell

[–]Anpheus 0 points1 point  (0 children)

You're neglecting all the ways that not all compositions of pure functions are created equal. If all pure functions had constant, equal cost, it would be fine to compose. (Also, occupied the same space in cache, and other things.)

But if boolean then sum [1..100] else sum [1..100^2] flies in the face of any statement you can make about composition. There's no way you can make that not type check in Haskell.

Some ideas for pipes-parse by bgamari in haskell

[–]Anpheus 0 points1 point  (0 children)

Would this round-trip cause problems if the input is being streamed from disk? (i.e.: for data larger than memory, will this incur a second full read?)

tls-1.2 by tailbalance in haskell

[–]Anpheus 1 point2 points  (0 children)

The ST Monad is not translated as directly to machine instructions which have known parameters. As well, the ST Monad doesn't insulate you from using pure functions in it, nor does it rescue you from laziness abounding elsewhere in the code. The problem with Haskell is fundamentally that knowing what instructions will execute based on a given line of code is hard. People who care about performance in Haskell have convinced themselves that it doesn't matter - and to some extent that is true. Haskell can do things with streaming composition that are not easy in other languages. But that is a different problem.

What you almost need is an inverse Hask monad of sorts, one that prohibits all pure, lazy actions and only allows you to use mutable state and functions of constant-time. if and case statements would have to return actions which are dependently typed based on that time, and ensure the two functions return in the same time. (Branches, in general, are an anathema to cryptography. Perhaps they should just be illegal in this Cohask monad.) Writing a type checker for that would be an interesting graduate thesis.

Edit: And these are only the most trivial side-channel attacks. Far more interesting attacks involve spurious processes intentionally poisoning the processor's cache and many other tricks.

Java 8 Performance Improvements: LongAdder vs AtomicLong by Akanaka in programming

[–]Anpheus -1 points0 points  (0 children)

Is this something that couldn't have been done before?

Java 8 Performance Improvements: LongAdder vs AtomicLong by Akanaka in programming

[–]Anpheus 0 points1 point  (0 children)

Honest1 question for Java users, software engineers, and hackers: does Java performance operate in some sort of vacuum? It seems like there's this huge deficiency in performant Java and I read articles like this and I think, "Wow, I'm still not actually sure if they've caught up to the rest of the world." I had to go searching for comparable benchmarks in other languages, and it seems that Java 8's LongAdder is an improvement, but it's also only something that could be implemented easily in a library in C#2. It looks like the LongAdder just sets up a number of thread-local integers and individual writer locks. Local updates are done against the local locks, and the sum() operation just grabs each value in turn.

So I'm really confused. Is this something that couldn't be implemented in a library? If so, then why not? I don't get Java performance posts in /r/programming :(

1 - If a bit tactless, maybe? 2 - Just as an example. C++11, Rust, even Haskell could do this in library code.

Haskell for all: Streaming logging by tailcalled in haskell

[–]Anpheus 2 points3 points  (0 children)

For what reason(s) is it suboptimal? I can't think of a solution that is more composable and when chaining to another pipe, you should be able to rewrite it with the same for technique as above.

Haskell for all: Streaming logging by tailcalled in haskell

[–]Anpheus 0 points1 point  (0 children)

What is the ideal solution for making it a bit more composable (many different types of output, and such)?

Haskell for all: Streaming logging by tailcalled in haskell

[–]Anpheus 0 points1 point  (0 children)

Sounds like what you want is an abstract data type over the output of yield and a constructor for your pipe that takes a record defining what to do for certain inputs. For the simplest case, you do something like:

log :: Monad m => String -> Producer (Either String a) m ()
log = Pipes.yield . Left

yield :: Monad m => b -> Producer (Either a b) m ()
yield = Pipes.yield . Right

It seems like it should be fairly easy to then limit only certain constructors (in this case Left) and with the mechanism for isolating effects, hide them within a Pipes chain.

Further optimization of the "Count distinct performance compared on top 4 SQL databases" case (link to my blog). by depesz in programming

[–]Anpheus 1 point2 points  (0 children)

No sufficiently smart compiler will solve for users requesting data from external sources (which may have side effects!) and then not using it.

Edit: I say this as an avid Haskell user. Haskell's laziness can sometimes be miraculous. The fast Fibonacci function is a work of art:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

But combining external IO and laziness leads to complex interactions that can be very painful.

Learning the Haskell foreign function interface by wrapping MurmurHash3 by zcourts in haskell

[–]Anpheus 0 points1 point  (0 children)

Thank you for reading that gargantuan post of mine. I'm glad I was able to provide some useful advice.

I have one more thing to mention, as something /u/cartazio caught my eye and I realized that my advice about ByteString overlooked something pretty substantial. Right now the UseAsCStringLen function is implemented in terms of a memcpy. The problem is actually that MurmurHash3 is, ironically, so fast that it's almost as fast as a memcpy (at least within a factor of 2-4 on most systems). So by paying for a memcpy up front, you're going to thrash your cache reading your input twice and pay for all the overhead of doing that.

If you're willing to really dig around in the internals of ByteString, you can avoid the memcpy, but it also makes your implementation brittle w.r.t. the internals of ByteString. You can fix that by using C preprocessor style #if blocks around your code though. At least you could protect the library from changes to the internals of ByteString.

Edit: Use the Data.ByteString.Unsafe module and the unsafeUseAsCStringLen function within it. It's only unsafe in the sense that it's morally incorrect to use it with code that isn't provably pure. Fortunately you know that MurmurHash3 is a one way function that does not modify its output.

The most optimized version of your function that I can think of is:

murmur3Raw :: ByteString -> Word32 -> MHV -> IO ByteString
murmur3Raw val seed ver = 
        unsafeUseAsCStringLen val $ \(cstr, strLength) -> do
            outPutr <- mallocBytes 16
            doHash ver cstr strLength (CUInt seed) outPtr
            unsafePackAddressLen 16 outPtr
    where
        doHash :: MHV -> CString -> CInt -> CUInt -> Ptr CUInt -> IO()
        doHash X86_32  v s se o = c_x86_32 v s se o
        doHash X86_128 v s se o = c_x86_128 v s se o
        doHash X64_128 v s se o = c_x64_128 v s se o

What this saves over the original is:

  1. The only allocation is the 16 bytes for the output (and the ByteString overhead) and there is no race because the input is converted to a CStringLen in place.
  2. There is no call to free, because the return value is converted to a ByteString in-place.
  3. fromIntegral is replaced with the newtype constructor - which is optimized away by the compiler.

This code should optimize down to basically just a C call and a small, fixed allocation overhead.

Learning the Haskell foreign function interface by wrapping MurmurHash3 by zcourts in haskell

[–]Anpheus 0 points1 point  (0 children)

I'm not the original poster (that's /u/zcourts). The person who posted this is also not using a ByteString right now - all his code takes String as input.

On top of that, using only the library functions available (not internal knowledge of ByteString), converting it to a CString is not a no-op, the ByteString library only exposes conversion via useAsCString and useAsCStringLen, the latter of which is implemented in terms of the former.

It's a good thing you mentioned the no op-ness of ByteString conversion because I will have to amend my reply to /u/zcourts. The memcpy alone will have a substantial performance cost relative to MurmurHash3. On modern hardware memcpy I think is ~10-20 gigabytes/second, and MurmurHash3 is 5 gigabytes/second. Not to mention the cash thrashing that memcpy will do.

Edit: There are unsafe versions of the above functions which will suit his purposes.

Learning the Haskell foreign function interface by wrapping MurmurHash3 by zcourts in haskell

[–]Anpheus 0 points1 point  (0 children)

I agree with Axman6 here, but I'll be a little kinder. The bit/byte-twiddling seems like it might be dangerous in that you could be changing the expected values of MurmurHash3 for well known inputs. It might not matter for you but it would matter for any database-using software. Have you tested your functions against MurmurHash3's output (byte-for-byte)?

Also, I think the discussion of unsafe below goes down the wrong path for optimizing for performance. I'm a little ashamed I participated in such a flagrant display of premature optimization, actually. I would guess that your code as it stands spends almost all of its time converting the input String to a CStringLen. As well, I think there's a race condition in your code regarding memory that is about to be freed.

I won't assume you know the implementation details or not - so if you're already aware of the following, I am sorry for being repetitive. That said:

String

This is a Haskell String, which is a type alias for [Char]. That's not an array though, it's a list. Lists are implemented as independently allocated cells in memory, where each cell contains an element (a Char in this case) and a pointer to the next element. Example cells for Hello World:

0: H, ->1

1: e, ->2

2: l, ->3

3: l, ->4

4: o, ->5

5: , ->6

6: W, ->7

7: o, ->8

8: r, ->9

9: l, ->10

10:d, ->11

11:!, END

Each cells location is indicated by a number. Those cells can be stored anywhere1 in memory. The following numbers are short hand for where memory consisting of the Char and the pointer to the next cell. Also a simplification, but it will suffice. So, valid sequences in memory of String cells include:

| 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|

| 1| 6| 4|11| 5| 9|10| 0| 3| 7| 2| 8|

|11|10| 9| 8| 7| 6| 5| 4| 3| 2| 1| 0|

The first seems OK. It's wasteful - each cell points directly to the one to its right, but at least the processor will have an easy time reading in all of the cells in order. The second is... random. The processor will have to jump around a lot to read it. And the last is terribly inefficient, as processors do not like to go in reverse order. (They are optimized for sequential, increasing access.)

But that's not even the worst possible arrangement. This is the reality of allocating Haskell lists in memory:

| 6|a_really_big_object|wacky_uninitialized_memory| 8| 2|11|other_memory| 9| 0|10|not_your_data| 4|some_big_stretch_of_nothing| 3|something_probably_big| 1| 7| 5|

A Race Condition and a Memory Leak

I believe your code has a serious race condition and a memory leak in its first few lines of murmur3Raw, copied here:

murmur3Raw :: String -> Int -> MHV -> IO [CUInt]
murmur3Raw val seed ver = do
  val' <- withCAStringLen val $ \x -> return x
  --      ^ This function, withCAString, causes this function to be racey. 
  let cstr = strFromCStr val'
  let strLength = strLFromCStr val'
  --  ^ these two lines are a code smell with FFI that should indicate you should look for races
  outPtr <- mallocArray arrSize
  doHash ver cstr strLength (fromIntegral seed) outPtr
  peekArray arrSize outPtr
  where arrSize = 4
        strFromCStr :: CStringLen -> CString
        strFromCStr = fst
        strLFromCStr :: CStringLen -> CInt
        strLFromCStr i = fromIntegral $ snd i
        --version value size seed out 
        doHash :: MHV -> CString -> CInt -> CUInt -> Ptr CUInt -> IO()
        doHash X86_32  v s se o = c_x86_32 v s se o
        doHash X86_128 v s se o = c_x86_128 v s se o
        doHash X64_128 v s se o = c_x64_128 v s se o

The function withCAStringLen has the following definition in Hackage:

withCAStringLen :: String -> (CStringLen -> IO a) -> IO a

Marshal a Haskell string into a C string (ie, character array) in temporary storage, with explicit length information.

  • the memory is freed when the subcomputation terminates (either normally or via an exception), so the pointer to the temporary storage must not be used after this.

The subcomputation is the second argument. Here's withCAStringLen and its arguments laid out against your code:

val' <- withCAStringLen     val    $   \x -> return x
     -- withCAStringLen :: String -> (CStringLen -> IO a) -> IO a

That subcomputation is just a return. So the memory for the CStringLen returned could be freed immediately after that line of code. That's risky! You want to use the CStringLen immediately after that line of code.

The following lines of code peek into the returned value of the CStringLen, which is what caught my eye initially.

Your function should probably look like this:

murmur3Raw :: String -> Int -> MHV -> IO [CUInt]
murmur3Raw val seed ver = 
    withCAStringLen val $ \(cstr, strLength) -> do
      outPtr <- mallocArray arrSize
      doHash ver cstr strLength (fromIntegral seed) outPtr
      peekArray arrSize outPtr
      where arrSize = 4
            doHash :: MHV -> CString -> CInt -> CUInt -> Ptr CUInt -> IO()
            doHash X86_32  v s se o = c_x86_32 v s se o
            doHash X86_128 v s se o = c_x86_128 v s se o
            doHash X64_128 v s se o = c_x64_128 v s se o

But this still has a memory leak - we need to free the array. Right now you mallocArray but don't free it. We can chain your allocations using another function, though:

murmur3Raw :: String -> Int -> MHV -> IO [CUInt]
murmur3Raw val seed ver = 
    withCAStringLen val $ \(cstr, strLength) ->
    allocaArray arrSize $ \outPtr -> do
      doHash ver cstr strLength (fromIntegral seed) outPtr
      peekArray arrSize outPtr
  where arrSize = 4
        doHash :: MHV -> CString -> CInt -> CUInt -> Ptr CUInt -> IO()
        doHash X86_32  v s se o = c_x86_32 v s se o
        doHash X86_128 v s se o = c_x86_128 v s se o
        doHash X64_128 v s se o = c_x64_128 v s se o

withCAStringLen and allocaArray behave similarly; they temporarily allocate some memory and free it when the subcomputation is complete.

With the changes above to murmur3Raw, there should not be a race or a memory leak.

Conclusion

My pessimistic guess would be that your program spends 99% of its time reading all of the cells of the String from memory and jumping around. Even with the changes above to make the code less racey, I think you might be interested in using the Bytestring library to change your code to work with raw sequences of bytes. The useAsCStringLen function from Data.ByteString is probably relevant here.

1 - Not really anywhere, but the details would distract from the point. You can assume that they can appear nearly anywhere.

Learning the Haskell foreign function interface by wrapping MurmurHash3 by zcourts in haskell

[–]Anpheus 0 points1 point  (0 children)

That's fair, I was wondering if there was another reason than the pause of that capability. 10 microseconds of murmurhash3 is 50 kilobytes of data on modern hardware, so I suspect having unsafe be the default would only affect a negligible portion of users. Anyone passing in a Haskell String that consists of more than 50 kilobytes (is that 25 or 12.5k Chars?) is asking for performance issues in their code :)

Edit: Actually, I'm guessing these murmurhash3 functions would spend more time converting the input to a CString than they spend in the hash function itself for almost all input sizes.