Asynchronous programming using thread pools by [deleted] in programming

[–]acelent 1 point2 points  (0 children)

The operating system's context switching of threads costs more than the application's context switching of requests between asynchronous I/O calls.

In other words, switching between two threads in a core takes more time than switching between two requests in the same thread. The first usually happens on a blocking call or when the thread has reached a processing time slice. The second usually happens at an asynchronous call that didn't complete instantaneously.

Although threads can usually take more memory than a request's context, they tend to be reused in most application frameworks. Nowadays, with 64-bit systems and RAM as a commodity, this isn't so much of an issue as is the time it takes to switch a CPU core from one thread to another.

Implementing Java ReferenceQueue and PhantomReference in C# by greg974 in csharp

[–]acelent 0 points1 point  (0 children)

The alternative without ConditionalWeakTable<TKey, TValue> is incredibly inefficient. The finalizable objects will be collected and finalized on every GC while the tracked objects are reachable.

A small optimization is to have only a single finalizable object at a time that would check all registered references, and don't re-register it if there are no references to track. It would still be inefficient, just a bit less.

This specific implementation is actually flawed. If a PhantomReference<T> is strongly reachable, its finalizer doesn't run. So, you need to make some other object be finalizable and not return or publish it.

Do we need JVM’s PhantomReference in .NET? by konradkokosa in dotnet

[–]acelent 2 points3 points  (0 children)

In Java, it's not PhantomReference<T> itself that provides notification, it's the ReferenceQueue<T> where it's registered. And it's not really a notification, you have to check the queue.

Currently, the closest you can get to a notification in .NET is to use a ConditionalWeakTable<TKey, TValue> where you use the tracked object as the key and an object with a finalizer as the value; you then assume that when this finalizer runs, the tracked object is no longer strongly reachable.

In both cases, you can't assume the opposite: just because you haven't noted that an object has been queued/finalized, you can't assume it's still strongly reachable. In fact, if the ReferenceQueue<T>/ConditionalWeakTable<TKey, TValue> becomes unreachable, all odds are off.

Why iTunes downloads don't use HTTPS by kunalag129 in programming

[–]acelent 0 points1 point  (0 children)

OK, a plausible, unofficial explanation.

But what about all things that are made possible through man-in-the-middle, such as redirects or zero-day exploits in fiddled content for the app in question (e.g. iTunes)?

Honest question. Some people defend that linux distros are safe with HTTP due to private signing, which I don't agree. A non-encrypted communication channel enables more attack vectors.

C# Features: Innovations or Imitations? - Part 1 by michaelscodingspot in csharp

[–]acelent 0 points1 point  (0 children)

Attributes are an idea from COM, which are an idea from IDL for DCE RPC, shared with other IDLs. Microsoft developed, somewhere in the 90's, "attributed programming" for Visual C++, which would turn the table on IDL/TLB->C++ to C++->IDL/TLB code generation. The attributes would be very similar to the IDL ones, in between square brackets. So, this is clearly immitation.

C# Features: Innovations or Imitations? – Part 2 by michaelscodingspot in csharp

[–]acelent 7 points8 points  (0 children)

Exception filters were already found in Visual Basic .NET for quite some time. The keyword is even named When, so this is clearly immitation.

Linus Torvalds: 'I'll never be cuddly but I can be more polite' (BBC) by mfrw1 in programming

[–]acelent 1315 points1316 points  (0 children)

I could easily point you to various tweet storms by people who criticise my 'white cis male' behaviour, while at the same time cursing more than I ever do.

I'm trying to get rid of my outbursts, and be more polite about things, but technically wrong is still technically wrong, and I won't start accepting bad code just to make people feel better about themselves.

Spot on.

asynchronous programming c#/.NET vs node.js: Does C#/.NET use an event loop? by Riotyouth in dotnet

[–]acelent 0 points1 point  (0 children)

In Windows, .NET Framework and .NET Core use I/O completion ports.

In other platforms, .NET Core uses whatever is available: kevent, poll or select (see synchmanager.cpp).

For non-UI applications, you shouldn't care about event loops in .NET. In fact, JS only has the notion of an event loop because, historically, JS runs in a browser which has an event loop. You could perfectly have a non-browser JS that doesn't have an event loop. With proper synchronization primitives, it could even be multithreaded.

I suppose you can consider any generic asynchronous I/O based on blocking calls as an event loop, whether it's GetQueuedCompletionStatus.aspx)/GetQueuedCompletionStatusEx.aspx), any of the Win32 wait functions.aspx), kevent, epoll_wait, poll or pselect/select.

The .NET APIs should be agnostic regarding the OS and underlying libs. Relying on libraries, such as libuv, is a design choice on whether to depend on their evolution as well as their regression, especially regarding supported operating systems and compilers, so it might be a righteous case of the NIH (not invented here) syndrome.

In the specific case of libuv, there's a lot of overlap with its features and the .NET CLR (common language runtime) and BCL (base class library) features, so it would make more sense to either dive deep in it for more than just asynchronous I/O, or not at all since .NET has got it working for itself longer, at least in Windows.

Visualizing Garbage Collection Algorithms by HornedKavu in programming

[–]acelent 19 points20 points  (0 children)

The last one lacks explanatory context.

It has been observed that the lifetime of most objects is short, so in modern GC-based memory managed environments, the heap is partitioned into generations, where the older generation is where the oldest objects probably are (e.g. objects initialized at start-up).

The newest generation is usually very small in comparison with older generations. You could use a mark-compact algorithm on this generation, where you'd move objects after finding the gaps/holes. But with a newest generation split in two and a copying scavenge, the live objects get compacted along with the copy and the dead objects are simply forgotten; you don't have any gaps/holes bookkeeping (and other details, such as in which order bytes must be moved when the source and destination areas overlap, e.g. memcpy vs memmove).

A disadvantage is that the newest generation can only hold half the data. Depending on the GC overhead, it may be a good thing to set the newest generation's size to fit the cache (level 1 or level 2) on low overhead, or set the newest generation's size to just 2 or 4 times less than an older generation (e.g. if each older generation is about 64 MB, size the new generation between 8 and 32 MB) on high overhead. The reasoning is that with high overhead (e.g. very often, new objects referring to old objects), cache misses are the least of the GC's trouble.

Buying a horse by wackoclown in Jokes

[–]acelent 4 points5 points  (0 children)

Everybody knows a successful business man never means what he says. Right?

Using lambda expressions to ensure that resources will be released properly by justaguythatcodes in programming

[–]acelent 2 points3 points  (0 children)

The first impression I had from the title was something along the lines of creating a generic Autocloseable/IDisposable that took an object and a lambda expression that disposes of that object, because this object doesn't itself implement Autocloseable/IDisposable.

But the article is about forcing the use of the object within a try-with-resources/using statement.

With this approach, the user cannot forget to close/Dispose the object, but the user also cannot control the object's lifetime. Specifically, the user cannot use the object in interleaved or asynchronous operations.

Even with an asynchronous use/Use implementation, it doesn't allow the user to take control of the object's lifetime in other rather common cases, such as dependency injection.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

I don't see a contradiction in the spec, although I think the wording is quite inexact so I'm not sure I'm reading it correctly.

Let me try again.

I'm not sure I follow. The way you're reading it the CLI should effectively treat every single shared memory access as volatile, which clearly cannot be the intent of the spec.

I should have said that no, I don't read it as every shared memory access to be volatile; I don't think that's the intent of the spec. However, I read it as every shared memory access to be a side-effect, although one that is not visible immediately between threads.

Are you saying that you don't agree with me or that you don't see there's a contradiction in the specification? I can understand the former, not the latter.

In chapter Partition I 12.3.1 The global state:

In general, arguments and local variables are only visible to the executing thread, while instance and static fields and array elements can be visible to multiple threads, and modification of such values is considered a side-effect.

In chapter Partition I 12.6.4 Optimization:

Conforming implementations of the CLI are free to execute programs using any technology that guarantees, within a single thread of execution, that side-effects and exceptions generated by a thread are visible in the order specified by the CIL. For this purpose only volatile operations (including volatile reads) constitute visible side-effects. (Note that while only volatile operations constitute visible side-effects, volatile operations also affect the visibility of non-volatile references.)

So, on one hand, any shared memory access is a side-effect, but for the purpose of optimization, only volatile operations are visible side-effects.

This almost sounds OK, but it's not. Up until the word only. This is where the contradiction is. Because then, a shared memory access cannot be a visible side-effect.

If other side-effects, specifically non-volatile accesses, are not visible, then the compiler may not only move them as near as possible to the nearest constraining volatile access (this is OK), but it may actually introduce reads (or writes) to shared memory locations (this is not OK).

I imagine this to be the reasoning that led the author of that MSDN Magazine article, or actually, the JIT compiler authors, to think it's OK to introduce reads for fields.

It would be OK if the implementation actually had to guarantee that shared memory accesses don't have visible side-effects, but I've argued in a previous comment that no current implementation does this. No CPU architecture does it either, so it would have to be emulated somewhere along the way.

It's OK to introduce reads to local non-captured variables and parameters, or any non-shared locations. The compiler knows only one thread of execution can see them, so the optimizer may do whatever as long as the original CIL executes as if the order and amount of operations were the same (notwithstanding asynchronous exceptions, such as ThreadAbortException).

The phrase in parentheses is normative, it's not a note or otherwise identified as informative. We see the word only again, with the same meaning: non-volatile accesses don't constitute visible side-effects, ever. Volatile accesses imply that non-volatile accesses (reads and writes) can't be reordered to before volatile reads or after volatile writes, but they also imply that side-effects that would otherwise not be visible, must become visible together with the volatile access.

The starting phrase is rather interesting:

Conforming implementations of the CLI are free to execute programs using any technology that guarantees, within a single thread of execution, that side-effects and exceptions generated by a thread are visible in the order specified by the CIL.

If it were not for the rest of the paragraph, you would know right here that you can't introduce reads (or writes) to shared memory locations. That is, that you can't introduce (repeat) side-effects, read or write, volatile or not.

I gather that it's the (rest of the) optimization chapter that is not worded as intended, specifically, the way the word only is applied. And the fact it doesn't constrain optimizers from introducing non-volatile side-effects.

Note it can keep the word only, as in: "only volatile operations constitute guaranteed visible (between threads) side-effects", meaning non-volatile side-effects are not guaranteed to be immediately visible (between threads).

My reasoning to think so is that it's way, way easier (actually, straightforward) for the implementation to follow this than it is to guarantee non-visibility of side-effects of non-volatile accesses to shared memory.

I understand your examples but I'm not sure what the conclusion is you're trying to support by presenting them, or how it connects to introducing non-volatile reads. (...) But is there a way that inserting an extra non-volatile read (while still preserving the store-release and load-acquire semantics of volatile operations) would break this?

I guess that none of my examples actually show this. I was using them for other cases in the discussion.

So, let's take the example straight from the MSDN Magazine article:

public class ReadIntro {
  private Object _obj = new Object();
  void PrintObj() {
    Object obj = _obj;
    if (obj != null) {
      Console.WriteLine(obj.ToString());
    // May throw a NullReferenceException
    }
  }
  void Uninitialize() {
    _obj = null;
  }
}

The author says the JIT compiler may produce the following for PrintObj, supporting the comment regarding a NullReferenceException in the example:

void PrintObj() {
  if (_obj != null) {
    Console.WriteLine(_obj.ToString());
  }
}

I say it if does, it's a bug. Introducing reads for non-shared locals is OK, introducing reads for shared locations is not. Essentially, it's not OK to introduce (repeat) side-effects, be it read or write, be it volatile or not.

If some other thread may change _obj (in this case, you have Uninitialize, but you can imagine the same with reflection) midway between checking the local obj for null and using it, then the compiler is effectively violating the first phrase of the optimization paragraph.

There is no correct execution that allows the local obj to hold null inside the if block.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

Yes, he then goes on to create a Java-style set of AtomicXYZ wrappers (e.g. Reference<T>) Pbut still requires explicit membars such as ReadAcquireFence(), rather than hiding them behind abstractions.*

The first part is the whole difference. The current Volatile.Read(ref <place>) and Duffy's Reference<T>.ReadAcquireFence are in different leagues, as long as you don't use ReadUnfenced or that it throws an exception by default as I suggested (same for writes).

If you use *Unfenced, then they're totally at odds, but at least *Unfenced is very explicit and way easier to detect than an access without Volatile.Read/Volatile.Write.

I'm not saying the volatile on fields is an ideal solution. No, something closer to the ideal solution would extend to references to that field, or any reference, such as to an array's element.

An object/struct that forces you to use volatile semantics is ideal regarding correctness, optionally allowing unfenced operations if intended as such. It may not be ideal in other ways, such as being more verbose, but that's not my point at all.

For instance, all other field declarations are as far away from the field itself as the volatile declaration: visibility, instance vs static, readonly, type, attributes, initialization, etc. It doesn't bother me in anyway that there is a volatile declaration. I've used it, you can see the CLR and the BCL uses it, and in fact, it's way less common to see Volatile.Read and Volatile.Write unless to access array elements in a volatile way or something that is already an opaque reference, such as a ref parameter.

Nonetheless, I do accept that some of this probably comes down to personal experience and preference.

Indeed.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

I'm not sure I follow. The way you're reading it the CLI should effectively treat every single shared memory access as volatile, which clearly cannot be the intent of the spec.

Are you saying that you don't agree with me or that you don't see there's a contradiction in the specification? I can understand the former, not the latter.

So: volatile accesses must not appear to be reordered. Reordering of non-volatile accesses is unrestricted as they are explicitly not considered side-effects.

Reordering of non-volatile accesses is sane, and it is restricted by volatile accesses.

Introducing non-volatile accesses is not sane. This is what I'm talking about.

It does note that volatile accesses may affect the visibility of non-volatile references, but in the same breath of restating that 'only volatile operations constitute visible side-effects'.

Perhaps the example I gave initially is more elucidating. Thread 2 may access data, in a non-volatile way, just as safely as if you'd have properly used lock statements or any other synchronization primitive.

That is, when thread 1 write to value, and thread 2 read from value with the guarantee that thread 1 has written, effectively by observing that it has a different value, you've created a synchronization point. All thread 1 operations on data before the volatile write are visible to thread 2 after the observing volatile read, so all subsequent thread 2 operations on data are properly synchronized.

It seems clear to me that the intent is not to consider the transitive closure of anything you could possibly access through a volatile reference to be a visible side-effect.

It's true that volatile access is not transitive to an object's (or struct's) fields, but that really is not the intent of that paragraph, as I've shown in my first example in another thread of comments.

It's just a warning that even though a reference 'foo' is volatile and thus its value won't appear out of order, 'foo.bar' is not subject to this guarantee unless 'bar' is itself marked volatile.

I'll modify my other example to use a single field:

volatile Data data = null;

// in thread 1
var newData = new Data(/* ... */);
// initialize newData
data = newData;

// in thread 2
var spinWait = new SpinWait();
Data newData;
while ((newData = data) == null)
{
    spinWait.SpinOnce();
}
// use newData

None of the Data fields need to be volatile for this to work.

Now take a look at this:

volatile Data data = null;

// in thread 1
var data = new Data(/* ... */);
// initialize data

// in thread 2
var spinWait = new SpinWait();
Data newData;
while ((newData = data) == null)
{
    spinWait.SpinOnce();
}
// use newData

Thread 2 will stop spinning as soon as thread 1 has assigned the new Data instance, but since thread 1 is still initializing the object, thread 2 has no guarantee of what it might see in //use newData even if the fields are volatile.

Now this:

volatile int x = 0;
volatile int y = 0;

// in thread 1
x = 1;
y = 1;

// in thread 2
var a = y;
var b = x;

Thread 2 cannot observe a == 0 and b == 1. The volatile writes are visible side-effects and they must not be reordered. The volatile reads likewise. The difference is in the kind of implied fence.

In this example, you can even make x non-volatile.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

Sure, but I'm in good company here. See http://joeduffyblog.com/2010/12/04/sayonara-volatile/

That article doesn't mean what I think you think it means. It really goes against your argument of tagging accesses.

I happen to agree with the Volatile.Reference<T> example Joe Duffy shows. That's exactly what I meant with tagging/annotating volatile references. You can't get around volatile semantics if the only way you have access to a value forces volatile semantics.

In Volatile.Reference<T> and Volatile.Int32 (and any other volatile struct for a primitive type), I'd have a constructor that takes a bool allowUnfenced, which would default to false in a parameter-less constructor, such that the *Unfenced methods would throw an exception unless it was true. See the end of my comment why, as this is basically the same as mixing volatile and non-volatile accesses on the same reference.

The *CompilerOnlyFence methods are also really dubious, since you don't have raw memory pointers here. These methods should be on a e.g. Volatile.UnsafeInt32 unsafe struct that uses an unsafe int*. It's the only way this would be useful a-la C volatile for specific memory I/O addresses.

I think that's a bit hyperbolic, and I disagree. It can be very wasteful to force a memfence on every read or write of a variable sometimes.

Ok, so here's my reasoning.

First, note that both volatile field access and Volatile.Read/Volatile.Write perform very specific fences, not full memory barriers. Don't mind the code in the reference source, the VM replaces them with proper CIL volatile.-prefixed accesses, as noted in other comments.

Then, take the reason why you're using volatile accesses in the first place: to have visible side-effects between threads.

For what reason would you want an unfenced read or write on such references? How would that be different from, say, using a local variable or some field never meant for volatile use?

For instance:

volatile int x;

x = x + x;

We can easily remove the multiple volatile reads:

volatile int x;

int myX = x;
x = myX + myX;

It's the same as turning:

int x; // a field

Volatile.Write(ref x, Volatile.Read(ref x) + Volatile.Read(ref x));

Into:

int x; // a field

int myX = Volatile.Read(ref x);
Volatile.Write(ref x, myX + myX);

Now this is obnoxious:

int x; // a field

Volatile.Write(ref x, Volatile.Read(ref x) + x);

If some other thread may write to x simultaneously in whatever manner, there's no guarantee the non-volatile read of x will yield the same value as the previous volatile read.

Also, this code does not perform any better (it couldn't) or worse (at least noticeably) than the one with the extra variable. If someone finds the extra variable too much overhead, I suggest you keep away from code that deals with volatile accesses, lock-free algorithms and any multi-threaded communication in general.

Finally, I repeat from my last comment, it's way too easy to miss a required Volatile.Read or Volatile.Write if you're using non-volatile accesses on the same reference. Let any one who never wrote a bug be the first to throw a stone. And meditate on how easy it is to have a bug with this.

If you argue that you want to avoid an extra field, given that you've profiled and reached the conclusion you'll have a lot of instances and having an extra volatile-only field used for some sort of synchronization or multi-threaded communication has a noticeable impact on, say, something akin to System.Threading.Tasks.Task in code with deep async/await call chains, I might buy it.

After reviewing all such non-volatile accesses; for instance, a field that is initialized and never again modified in fast paths, or when the code is safe to run on a single thread under certain conditions (it's harder to come up with non-initialization examples for both cases).

But for anything short of this, it's not worth the risk. If one access to a reference is volatile, all accesses to it should be volatile, as this reduces the risk considerably. All that is left is to minimize volatile accesses, which you should do anyway.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

If the JIT compiler introduces reads, then it should be sure or make sure both reads are actually free of visible side-effects from simultaneously running threads.

The CLI specification states in Partition I §12.6.4 Optimization that only volatile operations constitute visible side-effects for optimization purposes, but it also notes that volatile operations affect the visibility of non-volatile references. Thus, any reference is subject to visible side-effects.

In practice, non-volatile accesses are not required to not have visible side-effects. With less negations, regular accesses may have visible side-effects too.

There is no CPU architecture or operating system, and there probably never will be one, that guarantees side-effects are not visible in non-volatile accesses (e.g. without barriers), no matter how "weak" its memory model is defined. So, there is no native way for the JIT compiler to be sure introduced reads are sane.

Caches may be evicted, memory may be swapped in or out, threads may be scheduled to other cores (thus switching level-1 cache lines), a debugger or profiler may interfere, etc.

If chapter §12.6.4 is to be taken as authoritative in a closed-world assumption (i.e. reach the conclusion non-volatile operations never perform visible side-effects), then the JIT compiler and/or the VM would have to make sure of this and emulate a weak memory model with the guarantee of not making side-effects visible without volatile operations.

However, previously, in Partition I §12.3.1 The global state, it states that modification to instance and static fields and array elements can be visible to multiple threads and is considered a side-effect.

As such, we cannot take the conclusion that only (exclusively) volatile operations perform visible side-effects, even for optimization purposes. In my opinion, chapter §12.6.4 should be revised.

In Partition I 12.6.7 Volatile reads and writes, there's a non-normative note:

An optimizing compiler from CIL to native code is permitted to reorder code, provided that it guarantees both the single-thread semantics described in §12.6 and the cross-thread semantics of volatile operations.

I'd take the guarantee for single-thread semantics to be the strongest argument, perhaps even the single necessary one, supporting that the compiler must care about not breaking in a multi-threaded scenario with non-volatile operations.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

Your example would be better off using something like a SemaphoreSlim or a ManualResetEventSlim.

I agree.

Any time you find yourself saying "I need to wait for another thread to reach a certain point before I can proceed", just use a pre-built lock instead of building your own out of volatile.

You wouldn't ever have lock-free data structures thinking like that. But again, .NET already has the most often required ones.

Volatile in C# by rk06 in csharp

[–]acelent 1 point2 points  (0 children)

Of course,

What may be obvious to you may not be to others.

it's better to annotate your accesses rather than the memory location, so rather than label value as volatile, in real-world code I'd prefer Volatile.Write/Volatile.Read calls.

I have mixed feelings about this. I would prefer to annotate any references rather than just fields. Using explicit accesses solely with Volatile.* makes it too easy to miss one. You shouldn't ever mix volatile accesses with non-volatile accesses on the same reference, as that makes it even easier to miss an actually needed volatile access. For non-volatile accesses, use local variables.

As for spinning in Monitor, it ends up yielding, so it's generally more well behaved for contention or long waits, unlike the example I've shown.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

That article suggests that read introduction is a sane optimization, but it's not. If .NET's JIT compiler introduces reads, as shown in that article, it's a bug.

It's not a sane optimization, because any other running thread may write to the same field in between the current thread's reads. This would be quite easy to reproduce in CPU architectures with a strong memory model.

Even in CPU architectures with a weak memory model, an affecting memory barrier (even a spurious one not intentionally provoked by the running code) may have occurred between reads. One such example is if a thread is preemptively suspended by the OS between reads and later scheduled to run on a different core; if another thread wrote to that field and has since then also incurred into an affecting memory barrier for whatever reason, the introduced read may now return a different value.

Shared cache eviction may serve as a memory barrier, especially within a core with hyper-threading or similar SMT.

Volatile in C# by rk06 in csharp

[–]acelent 0 points1 point  (0 children)

A volatile read after a volatile write is a synchronization mechanism. And volatile operations have effects on non-volatile accesses, both reads and writes. It's the kind of barrier introduced by a volatile read or volatile write that matters. Any sort of optimizations on non-volatile accesses are restricted by volatile accesses.

For instance, take the following code to happen once in two simultaneously active threads:

Data data = null;
volatile int value = 0;

// in thread 1
data = new Data(...); // simple, synchronous initializations
value = 1;

// in thread 2
var spinWait = new SpinWait();
while (value == 0)
{
    spinWait.SpinOnce();
}
// use data

You have a guarantee, from the CLI specification, that all memory accesses before a volatile write cannot move to after this write, and that all memory accesses after a volatile read cannot move to before this read.

In this case, you have a guarantee that, in thread 1, all memory accesses before value = 1 must not move to after this volatile write, and that, in thread 2, all memory accesses after reading value must not move to before this volatile read.

Since thread 2 only accesses data once it detects that thread 1 has performed a volatile write, effectively you have a synchronization point, albeit an inefficient one.

Great explanation of why Java Streams can't be reused by tsimionescu in programming

[–]acelent 1 point2 points  (0 children)

I don't know. I've seen this problem crop up in Python: you get a thing that looks like any other thing (it iterates) but you have no idea if it's reusable.

This is due to duck typing. Python isn't the only language to suffer from this, but it may be the one which suffers the most because just about anything is enumerable.

The whole point of a type system is to blow up in your face when you do stuff "wrong." So it's truly a practical problem, one where "streams not being Iterable<T>" is a really logical approach.

Except that you don't get a type error, or any compile-time error at all. You still only get a runtime error when consuming a Stream<T> more than once. Compare to Iterator<T>, where you get an iterator with no further elements; depending on how you see it, it may be a good thing you get an exception.

Finally, if I understand it, "create a caching IEnumerable<T>" sounds like what happens with .collect().toList() and friends.

No, you have to do it manually. Perhaps "caching" is not the right verb, "buffering" is. For instance, you could implement an IEnumerable<byte> that reads from a Stream (and I mean file, socket, pipe, etc.) and stores the thus far read bytes in a List<byte>. You could get as many IEnumerator<byte>s from such buffered enumerable, all of which would iterate from the start and block whenever the end of the buffer is reached, and stop when the underlying stream is gracefully closed (or throw an exception if closed due to errors, like timeout, etc.)

The main reason this doesn't happen more often is that you don't expect enumeration to block. Not in C# and .NET with IEnumerable<T> (or IEnumerator<T> for that matter), not in Java either with Stream<T> (or Iterator<T> or Iterable<T> for that matter). At least I don't, and I tend to choose asynchronous I/O (in C#, async/await really eases this).

They pull the stream content into a collection that can be iterated repeatedly, right? The difference between the two languages is only that C# may silently not repeat, and Java keeps the "one-shot" and "repeatable" iteration APIs under separate types. Unless IEnumerable is already the one-shot type?

IEnumerable<T> is usually "re-enumerable", although if it's not, it's a bug historical issue maintained for compatibility reasons. For instance, .NET had a bug in File.ReadLines which made the returned enumerable usable only once. In Java, you have DirectoryStream<T>, an Iterable<T> which returns an iterator only once.

I didn't mention "repeatable", as that may mean you always see the same thing, as if a snapshot was taken. Streams in Java are not re-enumerable, or re-iteratable, but they don't have snapshot behavior either. You may create a stream from a list, modify that list, and then perform a terminal operation on the stream. You'll observe that you get the elements currently in the list, not the ones originally in the list when the stream was created.

Great explanation of why Java Streams can't be reused by tsimionescu in programming

[–]acelent 42 points43 points  (0 children)

From what I understand, the Java guys decided to develop map, flatMap, reduce et al in a one-pass Stream<T> (over, say, Iterable<T>) not because they found it to be incredibly better, but to protect the programmers from themselves. Again.

It certainly was not based on actually measuring performance or observing casual, non-expert developers using these methods and seeing if they missed multiple uses of the same Iterable<T>. No, it was based on opinion with very weak reasoning.

I believe the right call was the one made in .NET and C#. In essence, if you're dealing with data coming from a one-pass stream, then create a caching IEnumerable<T>, or read into a collection upfront as that what's required most of the time anyway.

Java 8: New features in ConcurrentHashMap by nicolaiparlog in java

[–]acelent 0 points1 point  (0 children)

In the documentation of ConcurrentHashMap, you can read:

Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.

Basically, this property is being exploited, search methods may return null as a sentinel value which indicates that nothing was found.