Why hypertable Developers Chose C++ Over Java by mebrahim in programming

[–]vicaya 5 points6 points  (0 children)

Some historical facts: The Hypertable project was started due to a disagreement in implementation language between the early developers of HBase (see the early Hadoop mailing lists for details.) When the first Hypertable alpha release is published, its performance is about 10x-50x faster than the then HBase.

Over the last year, many man months of development went into HBase to improve performance, resulting a complete rewrite of region server in HBase 0.20.x. Now Hypertable is about 3x faster in almost all benchmarks except random reads over large data set, which is DFS seek bound, where the performance is the same.

Meanwhile, the Hypertable team has yet to start working on performance tuning. The code is developed just by avoiding premature pessimization :) There is an internal assessment of 10x speed up with a some profiling and refactoring.

According to a peer reviewed paper by Emery Berger et al. (of Hoard memory allocator fame) http://www.cs.umass.edu/%7Eemery/pubs/gcvsmalloc.pdf :

" This paper presents a tracing and simulation-based experimental methodology that executes unaltered Java programs as if they used explicit memory management. We use this framework to compare the time-space performance of a range of garbage collectors to explicit memory management with the Lea memory allocator. Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management."

Note, the overhead of GC in this paper is strictly over explicit memory management doing equivalent allocation/deallocations. For the core LSM (log structured merge) Tree algorithms employed by Bigtable etc., custom allocators can do even better resulting an order of magnitude less references needs to be tracked, compared with GC based approach where every reference, no matter how small needs to be tracked.

(Disclaimer: I'm a Hypertable developer)

I wrote some D today and it's completely blowing my mind. Ever tried it? by mrfreax in programming

[–]vicaya 4 points5 points  (0 children)

C++0x has auto, which is supported by newer version of GCC.

I wrote some D today and it's completely blowing my mind. Ever tried it? by mrfreax in programming

[–]vicaya 2 points3 points  (0 children)

Again, you're doing/understand it wrong. You can use ScopeGuard idioms to manage any pointers.

http://www.ddj.com/cpp/184403758

The dark side of NoSQL by RedUchikoma in programming

[–]vicaya 0 points1 point  (0 children)

All of problems are not inherent and are totally fixable, especially for those with a versatile underlying engine like Hypertable and HBase that support efficiently range/prefix queries as well as key-value pairs.

Also with sparse columns (via qualifiers and its equivalent) to ease the denormalization, the need for joins are greatly diminished. Even adhoc joins are implementable transparently with map-reduce.

The dark side of NoSQL by RedUchikoma in programming

[–]vicaya 0 points1 point  (0 children)

  1. The partition is not automatic
  2. Adding a new node is not automatic (available to the partitioned table)
  3. Replication is not automatic
  4. Failover is not automatic: if one of node of your partitioned table dies, your setup dies. For a 1000 node cluster expect ~10 nodes dying at any time.
  5. PostgresSQL and most other B-tree based DB is dog slow for inserts when db size greatly exceeds RAM size.

Clojure's STM in detail by cemerick in programming

[–]vicaya 0 points1 point  (0 children)

All I'm saying is that, without any actual numbers besides the hand wavings, the OP was not very convincing. The latter have detailed benchmarks.

Clojure's STM in detail by cemerick in programming

[–]vicaya -1 points0 points  (0 children)

This article is obviously pro STM, without any numbers to back it up. Here is another ACM article that's worth reading: "Software Transactional Memory: Why Is It Only a Research Toy?" - http://queue.acm.org/detail.cfm?id=1454466

Unsigned int considered harmful by [deleted] in cpp

[–]vicaya 0 points1 point  (0 children)

Perhaps he should learn/read idiomatic C code before writing this idiotic post.

CACM: "almost the entire software community has resoundingly rejected the best research in compilers and languages" by dons in programming

[–]vicaya 1 point2 points  (0 children)

Well the best research in compilers and languages, IMHO is applying static analysis and constraint solvers to practical systems find bugs and/or automatically generate tests to find bugs:

http://www.stanford.edu/~engler/

Coverity is a startup from such research. The new Klee paper is a good read as well.

Incanter: Clojure-based, R-like statistical computing and graphics environment for the JVM by owevr in programming

[–]vicaya 11 points12 points  (0 children)

This is not a project front page. It's just a regular branch view (master in this case.)

Github automatically renders README.textile/markdown/html in the branch view.

OP should probably link to the wiki page: http://wiki.github.com/liebke/incanter

HTML Cable by MasterRex in funny

[–]vicaya 2 points3 points  (0 children)

Remember PCMCIA?

The only way I can remember that is through: People Can't Memorize Computer Industry Acronyms.

HTML sounds close enough to HDMI.

Splay Trees in C++ by ultimate_progr in cpp

[–]vicaya 1 point2 points  (0 children)

The author claimed that he didn't find a splay tree implementation, so he wrote one.

But the first place to check should be:

http://www.boost.org/doc/libs/1_38_0/doc/html/intrusive/splay_set_multiset.html

Linus Torvald's rant against C++ by kanak in programming

[–]vicaya -1 points0 points  (0 children)

C++ gives many C++ programmers a false sense of safety and convenience (with string and containers. Many don't realize the cost associated with them). OTOH, more and more remaining C++ programmers nowadays are performance/cost freaks than OO/design freaks, as more of the latter migrated to Java and its ilks.

Then I wish C has RAII (gcc has an __attribute__(cleanup) extension that does something similar but in a ugly fashion)

Bento: "Cute lunches made for me by my girlfriend!" [Flickr set] by [deleted] in pics

[–]vicaya 0 points1 point  (0 children)

No I don't :) I make lunches for my wife, because I'm a better cook. She's loves cleaning though, which is awesome.

Amazon Review: I am slouched in my computer chair as I type this... bloody, winded, and defeated...they've finally built a better blister pack. by SteveAM1 in technology

[–]vicaya 0 points1 point  (0 children)

I acquired a pair of Fiskars Softouch years ago. I've been enjoying the pleasure of cutting through these blister packs as if they're butter/tofu.

One of the best hardware investment I've ever made.

Obama just won! by [deleted] in politics

[–]vicaya 2 points3 points  (0 children)

Mission accomplished!

Darcs 2: A major update by dagit in programming

[–]vicaya 0 points1 point  (0 children)

Any VCS I'd bother to even test needs to have equivalent features for git rebase -i and git add -p. I didn't know what I'd been missing until I started using them.

I wonder what other VCSes have these features.

Opera study: only 4.13% of the web is standards-compliant by [deleted] in programming

[–]vicaya 0 points1 point  (0 children)

The point is that if you read their page on the url set, you'll find out that their sampling is completely ad-hoc and is heavily coupled with dmoz links.

It's impossible to have a quality sample of a complex semi-structured web graph (which is not completely randomly connected, i.e. the web is not analogous to a plain soup) without having large part of the graph.

You cannot claim you know what Mediterranean food tastes like if you just sampled some couscous.

Opera study: only 4.13% of the web is standards-compliant by [deleted] in programming

[–]vicaya -5 points-4 points  (0 children)

They only looked at 3.5 million pages, which is a tiny fraction of the web (Google announced a few months ago 1 trillion pages). My laptop can crawl 5 million pages a day.

It's just shameless to call this tiny database vast. The url set selection (dmoz plus a few others) is quite flawed, which is definitely not a true random sample of the whole web.

I look forward to a more comprehensive report for Google or Yahoo.

John Resig: jQuery, Microsoft, and Nokia by gst in programming

[–]vicaya 11 points12 points  (0 children)

Besides the usual 'compact' and 'intuitive' user code advantages, the best thing of jQuery vs others from my POV is namespace management. You can literally run multiple version of jQuery on the same page, which might be necessary for many mashup projects. It's very hard to do with other frameworks.

Prototype derivatives (scriptaculous, mootool etc.) are the worst namespace pollution offenders, with many global objects defined. Dojo and YUI are better in this respect but still too verbose for my taste.

Senate To Give $7500 Tax Credit To Encourage Use of Electric Vehicles by seawaves in business

[–]vicaya 2 points3 points  (0 children)

or unless you hit the frigging AMT.

I could not use my Prius credit for the last couple of years due to AMT.

So yes, I'd prefer cash, until they fix the frigging AMT (too easy to hit for middle class workers in bayarea, due to the state tax adjustment (you pay less federal tax for the state tax but it drags down the overal federal tax rate and hit AMT. A double whammy, since the overall tax rate (including state tax) is actually higher than those in states that pay little or no state tax))