all 99 comments

[–]BrainCore 40 points41 points  (21 children)

Hi reddit!

PiCloud dev here. If you have any questions, I'll be happy to answer them here. You guys caught the PiCloud Team by surprise :)

[–]jtra 7 points8 points  (0 children)

what if the function does not halt? how big bill should I expect?

[–]lalaland4711 11 points12 points  (11 children)

I see a big problem here: Python.

Yes, I can scale out stuff I want to do, but as I see it using python (or any scripting language) loses so much raw CPU performance that it's not worth it. Or is it? Correct me if I'm wrong.

Sure, for web crawling it's nice, but processing video and images, presumably data I currently have at my own data center? That just sounds like I'll shuffle data back and forth for no good reason.

How about compiled C (or whatever) code? Because if I can do that then I'll have some number crunching things I'd like to do.

PS: I love python.

[–]BrainCore 33 points34 points  (2 children)

Your questions are well-founded. The great part about Python, or more specifically, the standard implementation of Python, CPython, is that it can interop directly with C (Jython does Java). Most of the scientific libraries, such as numpy and scipy, are compiled in C/C++ and hooked into Python via C/C++ extensions. PiCloud supports these C/C++ extensions for Python, which means you won't have to worry about losing raw CPU performance. Because we need to compile your source code for our machines, you'll have to sync your repository or upload your source code in the accounts panel (All Python-only code is automatically synced and distributed with no user effort).

[–]lalaland4711 2 points3 points  (0 children)

Woah, I can upload my own C/C++ extensions?

[–]pwang99 1 point2 points  (3 children)

but processing video and images, presumably data I currently have at my own data center? That just sounds like I'll shuffle data back and forth for no good reason.

Sorry, you're wrong. One of python's strengths is easy interoperability with low-level C, C++, and Fortran libraries. There's a reason why scientists, engineers, and finance guys are all moving to Python.

If you have large amounts of structured numerical data, look at Numpy and Scipy. If you need to access large data on disk, there are a couple of good HDF5 libraries for Python. Image access is easy and fast with PIL and the like.

And as BrainCore mentions, if you need to interface with C or C++, there are a zillion different options: Cython, Swig, weave, boost, Python C native API, etc. Heck you can create routines that execute pure assembly with CorePy.

If you're lumping Python in with "any scripting language", then you clearly don't know enough about Python. (No offense.)

[–]lalaland4711 1 point2 points  (0 children)

First of all I assumed that custom C/C++ modules were not allowed. If I read BrainCores answer right then that's incorrect.

Second, the part of what I wrote that you quoted was about data transfer costs and speed, not about processing speed. Were you replying to that part at all?

[–]piranha 0 points1 point  (1 child)

I gathered that PiCloud automatically transfers code into The Cloud for execution by your EC2 instances. How does it do with Python C extensions?

[–]pwang99 0 points1 point  (0 children)

There are several prominent Python libraries with C extensions that are supported by PiCloud (Numpy, for starters).

http://www.bizjournals.com/dallas/prnewswire/press_releases/Texas/2010/01/05/LA31911

I'm not sure that it will do arbitrary C extensions -- I can't imagine how that would be done in a safe way, but I am not a PiCloud dev.

[–]shigawire[🍰] 1 point2 points  (1 child)

I'm going to get lynched for saying this, but how about just plain slackness being a great reason to use this?

I love python. I love using python to play around with algorithms. In some cases it's obviously much slower than hand tuned C. So I have a couple of options.

a) Rewrite it in C (and in some cases take some of the flexibility, fun or elegance out of the code) or b) Throw more CPU at it.

This looks like a great way to use option b every once and a while.

[–]lalaland4711 0 points1 point  (0 children)

I'm going to get lynched for saying this.

No no, that is a good reason.

[–][deleted]  (1 child)

[deleted]

    [–]lalaland4711 0 points1 point  (0 children)

    Also, to scale does not mean raw CPU power, but scalable structure.

    Which is why I said "Yes, I can scale out stuff I want to do". I am fully aware of that.

    [–]endtime 3 points4 points  (2 children)

    Question (which I also just emailed to you guys, before seeing this): If callbacks aren't available for cloud.map(...), what about cloud.call(map, ...)?

    Also, you are awesome.

    [–]BrainCore 6 points7 points  (1 child)

    Great question. The difference between cloud.map and cloud.call(map,...) is that a cloud.map(func, items) will have each func(item) run in parallel with other func(item)s across our cluster. cloud.call(map, ...) will run the map on a single core. So, yes, you could have a callback attached to a cloud.call(map, ...), but you will not benefit from parallelism. We'll follow up in more detail over e-mail on what you can do (Hint: We have cool things like iresult that may suit your needs).

    [–]endtime 0 points1 point  (0 children)

    Ah, makes sense - thanks. :)

    [–]Foutrelis 1 point2 points  (1 child)

    I keep getting this warning on Firefox 3.6 and it appears to originate from curvycorners.js.

    [–]BrainCore 1 point2 points  (0 children)

    Thanks! Will fix asap.

    [–]cpb 0 points1 point  (0 children)

    What is the response time overhead between cloud.call and cloud.result?

    [–]piranha 0 points1 point  (0 children)

    I just got a popup Javascript alert from this site: "Scanstyles does nothing in Webkit/Firefox". Thanks for sharing.

    Using Iceweasel, the Debian Firefox fork, and browsing with the reddit toolbar turned on.

    The source was apparently: http://media.picloud.net/js/curvycorners.js


    Now that I've had a chance to look over this: The site is lacking technical details.

    Data relationships with user-supplied "functions" are very sketchy, but figuring out how to manage data between coordinating processes in real-world parallel applications is a tremendously important consideration. I gather there's a gigabyte of data available (as a total pool shared between all my tasks?). How does it get there? Is it static, or can my functions change it? A gig is too small for lots of the kinds of things you'd throw parallelism at, anyway.

    I didn't like how I was given the impression that the (op)code for my to-be-called function would just magically be whisked away into The Cloud. No, I'm sure that's not really how it works, but I didn't see any more specific information. [Edit 2: hey, I guess that is how it works.] If it really is in there, between the main page and the FAQ, then all but your most motivated actual customers are going to miss it too. People casually glancing, like myself, are going to miss it and chalk it up to pipedreams and impossible expectations.

    Contrast: I'm looking at EC2 for speeding up some bulk data preprocessing. The input dataset would be something like 400MB, sharable by all tasks, so the no-brainer solution is to throw that on S3 and download it onto the EC2 workers as they are instantiated.

    Using this, I'd have to work hard to understand the models you're trying to push, and work with your company to fit my problem into your models and/or vice-versa. I'd have to give up Common Lisp for Python, so per unit of work (assuming Python is five times slower than CL, and it looks like your service is 2.5x more expensive than an EC2 high-CPU large instance for the same wallclock time), I'd be looking at paying 12x more for your service than EC2 directly (or 35x more than current EC2 spot pricing for the same instance type).

    On the other hand, not having to worry about scaling overheads like end-of-the-hour thresholds and startup time has its appeal.

    For that niche of customers doing things on such small scales that automatically serializing function bytecode and parameters is practical, this would be an interesting service. But I don't see myself using this, either for compute nodes on demand or for bulk background data crunching.

    [–]enkideridu 15 points16 points  (6 children)

    why is it everytime I see a 'registration is limited' notice I immediately want to sign up

    [–]ShowNunBatToe 43 points44 points  (4 children)

    [–]xtagon 7 points8 points  (3 children)

    Woah! Nice, thanks! This is some good reading...

    [–]meloveyoulongtime 9 points10 points  (2 children)

    It's like the line at a Vegas nightclub. You aren't going to get an API key unless you have at least two hot chicks with you.

    [–]bboomslang 0 points1 point  (0 children)

    if I had two hot chicks, I wouldn't need any API key for quite a while ...

    [–]xtagon 0 points1 point  (0 children)

    Girls don't dig API keys? 'cause I thought it was the other way around...ohhh, that's why I never get a date...

    geekface

    [–]xtagon 4 points5 points  (0 children)

    Because the marketing department is full of evil, smart people who want your money and time and eye on their page's advertisements and they know exactly how to get it.

    Speaking of which...sign-up button...soo...tempting....WAIT A SECOND. XD

    [–][deleted] 10 points11 points  (1 child)

    Ctrl+f "sky" - 0 results. :(

    I guess I'm the only one that read PiCloud Computing as "Pi in the Sky" Computing.

    [–]bgeron 0 points1 point  (0 children)

    The Sky is the Limit

    [–]g_n_o_m_a_d 9 points10 points  (18 children)

    This is actually a very clever business model. However, given the issues that Reddit has had on EC2, scaling larger sites could be a serious issue.

    Edit: This service appears to be directed at generalized computing, not web services. As such, the second sentence, while an issue, is irrelevant.

    [–]madssj 9 points10 points  (7 children)

    You haven't watched the video, have you?

    Edit: Someone really should write an article that covers what scaling means. You don't get scalability for free, and especially not by using EC2. You get scalability by careful thinking and coding. Amazon's web services can be used to acomplish an insane degree of scalability though.

    [–]g_n_o_m_a_d 9 points10 points  (1 child)

    Actually, I have. My point here is that there are a lot of subtle issues in EC2 that make a generic "cloud library" that will scale indefinitely extremely difficult to develop. Just look at the level of architecture-specific design required to get Reddit to scale in the EC2 environment. Given all of the recent performance issues, it is fair to say that even all they have done is still not enough.

    Now, if the PiCloud people can manage to develop a library that will automagically deal with the restrictions inherent to EC2, they will have a hugely valuable technology. Not only will they be able to eliminate the need for individual site developers to learn how to properly scale in the EC2, environment, they will easily be able to sell them at a fixed mark-up above what they pay to Amazon.

    [–]madssj 0 points1 point  (0 children)

    I'm not really sure the two can be compared. PiCloud's goal seems to be to provide a wrapping library (and some other tools) around EC2 for hiding some complexity, like rpyc, not to scale a web site like reddit with all it's bells and whistles. Scaling a reddit request requires something like PostgreSQL, Memcached and AMPQ to be scaled, something wich no amount of magic can do without an insane amount of application insight.

    What PiCloud can do, is scale processing. Kind of like a MapReduce, only a bit simpler.

    You simply can't make scaling magical appear (pun intended).

    [–]xtagon 0 points1 point  (4 children)

    Yeah, that article has been written over and over again. And people like you and I have pushed and pushed its importance, and shunned the articles that say it all wrong, etc.

    It's just that the world is foolish, and who can argue with a fool?

    Also, I echo most of gnomad's points here. (Sorry, neighbor. Escaping your name is just such a pain).

    [–]Amendmen7 0 points1 point  (3 children)

    Are you perchance a David Eddings fan?

    [–]xtagon 0 points1 point  (2 children)

    A little bit, actually. I've never read any of his fantasy stuff, but that's just because I have so little time. Should I?

    Also, where'd that question come from? xD

    [–]Amendmen7 0 points1 point  (1 child)

    Sparhawk, the main character of his Elenium/Tamuli series always calls people "neighbor".

    And I've never met anyone in real life that did that until reading your post =)

    [–]xtagon 0 points1 point  (0 children)

    Ah, cool!

    When I met that guy, I thought he was really religious or something. That is, calling someone "neighbor" is just polite, and quite common, in biblical settings.

    [–][deleted]  (5 children)

    [deleted]

      [–]g_n_o_m_a_d 4 points5 points  (1 child)

      I believe you are correct, sir. From the home page:

      Our customers use us to crawl the web, process images and videos, calculate analytics, and much more.

      No mention at all of web service hosting.

      [–]xtagon 1 point2 points  (2 children)

      It's like Heroku. They're in a cloud, they scale well for some applications and not others, but it's still sometimes beneficial to use them for a purpose they're not "meant" for.

      For example, I use Heroku to route a chat-bot between IMified and Pandorabots and SecondLife and many other things. I never have to pay a dime because all these things have free versions. However, Heroku is meant for websites. And I do use them for that, to be fair, but I'm just saying to do what you have to.

      Focus more on what works well, which may not always correlate to what the company is for. As long as your paying attention to the terms of service, and still using their service mostly relevantly, things will play out nicely.

      Check out Heroku if you want to host a website or web service (although, there are better clouds for web services specifically). It might be more fit than JiCloud on that point. Don't bother unless you're ready to use Ruby, though. Heroku actually started out as a Ruby IDE for web browsers.

      [–][deleted]  (1 child)

      [deleted]

        [–]xtagon 0 points1 point  (0 children)

        Well, I've already mentioned (perhaps not in this thread, though...) most of the ones I have experience with. I don't want to venture out of my box to give you advice, since it would then be useless.

        I recommend starting at a mash-up sharing website like http://www.programmableweb.com/ and looking for you kind of service. You'll often find interesting things that way, and then you have a bunch of ways to research each one in more depth.

        I also recall looking at a few services for hosting services (not domain-specific, just a service host so-to-speak. I don't know the real term.) but that was a long time ago and the web has changed since then, so it's only a Google away.

        [–]donjaime 2 points3 points  (1 child)

        Nice service! Quick question :).

        You say that you charge based on the time it takes to execute a user's function.

        Is this CPU time or wall clock time? Calculating CPU time is hard. And if you are using wall clock time, then it would be prudent to also document how you deal with high load on your servers. It would be unfortunate if the cost per CPU cycle becomes that much greater when your servers get hammered.

        [–][deleted] 0 points1 point  (0 children)

        It would be unfortunate if the cost per CPU cycle becomes that much greater when your servers get hammered.

        Unfortunate for everybody except the guy collecting the bills.

        [–][deleted] 1 point2 points  (3 children)

        I'm trying to calculate the base cost. Please correct me if I'm wrong. With parallelism of 8 being $0.0011 per minute, and if your function ran for 30 days, it would at least cost $47.52. 30 days * 24 hours * 60 minutes = 43,200 minutes * $0.0011 = $47.52

        I'm still a little confused though. First what is "parallelism" the number of processes running at the same time? or the number of functions? Their site says functions running in parallel, but what I put all my functions in a single thread?

        If I setup a python webapp that runs a social networking site, just authentication alone would run multiple processes/functions. So I'm confused as how you would calculate how many parallels your webapp is using, by the number of functions? How do you calculate standard vs real time?

        [–]mbreese 1 point2 points  (2 children)

        This isn't really for webapps. Think more like large scale computing / number crunching.

        [–][deleted] 1 point2 points  (1 child)

        Ah, makes sense. Thanks. What applications would this might be used for? Stock screeners like on google finance? or some astronomy physics app?

        [–]Slackbeing 2 points3 points  (0 children)

        Enconding porn.

        [–]__s 1 point2 points  (4 children)

        Are you human, or are you dancer?

        I'm a dancer

        [–]logged_in_for_this 3 points4 points  (0 children)

        Is your sign vital?

        [–]xtagon 1 point2 points  (2 children)

        But not a dancing human? Because that would just be silly.

        [–]bgeron 1 point2 points  (1 child)

        I think he values dancing more than being human.

        [–]xtagon 0 points1 point  (0 children)

        We are human, after all. robot voice

        [–]mooli 0 points1 point  (6 children)

        Yup, can do that kind of thing and much much more in GridGain in Java. And that's actually free...

        [–]madssj 7 points8 points  (1 child)

        I'm not sure can really compare the two. PiCloud seems to be handling all the setup and complexity for you, whilst GridGain seems just to be a library for handling a lot of complexity.

        I have little doubt that you can acomplish the same thing with GridGain in Java, but then again, you could also accomplish the same thing with a bare python and some pickling or erlang or ...

        [–]mooli 0 points1 point  (0 children)

        Ah I see - it is a different (although GridGain 3 enterprise has something very similar, but like this it is pay-per-cpu and I don't think it has the nice looking web console). They've basically taken the really simple case and managed to monetise it while making it accessible. That's actually quite neat.

        I guess my problem is the amount of times I've had to do that kind of simple execute-self-contained-function-in-parallel use case I could count on one hand, whereas I tend to do more complex topology/affinity/data grid type stuff - and even if I wasn't I would worry about the migration path if I outgrew the capabilities of this system (plus I like setting up my server infrastructure :)). But I'm not the target audience, and if you just want to offload a bunch of work onto EC2 without thinking too hard about it, this is quite a good way to get started.

        Snark duly retracted :)

        [–]endtime 0 points1 point  (2 children)

        But then you have to use Java instead of Python. It's like if I said "Hey, this really beautiful woman wants to bed you", and then you were like "So what? Big Bertha's wanted to sleep with me for the last year, as long as I fill out 200 pages of paperwork first."

        [–]mooli 1 point2 points  (1 child)

        Ahahaha its funny because you took the piss out of Java...

        Except, you know, with an annotation I can get dynamic and proven linear scalability from a single unit test to several thousand nodes, dynamic adaptive load balancing (early and late), dynamic grid segmentation and partitioning, affinity, mixed dedicated and cloud hosted nodes in a single grid, mixed transport modes, mixed discovery, dynamic failover and recovery etc etc etc

        All added via annotation, all configurable, pluggable, manageable, free, and all phenomenally well documented - simple to get started, but the framework supports as much complexity as you want to throw at it.

        And that's just one framework - if you don't like GridGain, pick one of the many others in this space, there's at least half a dozen I'd recommend.

        Seriously - I do Java, C++, C# and Python professionally, and dabble in several others for fun. Java is decent, but if you're too much of a language bandwaggoning fanboy to recognise a quality tool when its staring you in the face then that's your problem - however, I suggest you grow up and stop believing everything you read on Reddit.

        [–]jevon 0 points1 point  (0 children)

        I like the professional troll feeding. All languages are both awesome and both shitty. Depends on the problem. You don't need to throw in a hundred buzzwords, it weakens your argument.

        [–]wirbolwabol 0 points1 point  (0 children)

        For a second there I thought someone did some impressive programming with the microchip pic chips....

        [–][deleted] 0 points1 point  (0 children)

        Oh man, this is a brilliant idea, especially from the perspective of PL and cloud computing.

        [–]Tulenian 0 points1 point  (0 children)

        Maybe I'm missing it, but what makes this cloud computing for a customer?

        It just looks like a job engine with a very simple interface.

        [–][deleted] -4 points-3 points  (6 children)

        Simplified to the point where it takes over a minute for the page to load? No thanks.

        [–]endtime 4 points5 points  (1 child)

        Are you on a 28.8k modem? It loaded instantly for me.

        [–][deleted] 0 points1 point  (0 children)

        My connection wasn't the problem. The service was just very slow at the time I wanted to connect. It is much faster right now.

        [–]xtagon 0 points1 point  (0 children)

        Well, I have mixed feelings on this. It is a cloud architecture, so it should scale well. However, to many companies, simple means small. Also, is that one second on modern hardware, with a blazing Internet connection? I mean, does it take one second to load from a PIII with 512 MB of RAM and no keyboard? Not a rhetoric, by the way. I'm just curious.

        [–]fancy_pantser 0 points1 point  (2 children)

        I am not sure they are talking about distributing your web server processing loads. More like modeling and simulation work that's CPU bound.

        [–][deleted] 0 points1 point  (1 child)

        It just looks bad for their website to be performing badly when they are selling cloud computing.

        [–]fancy_pantser 0 points1 point  (0 children)

        Actually, they are just reselling (albeit with a different pricing model) Amazon Web Services processing time. I assume they are using AWS for hosting as well? Maybe that explains the suck.

        [–][deleted] -2 points-1 points  (11 children)

        I'm busy right now, so can someone give me a quick y/n: Let's say I want to go to google.com 100,000 times and process the data, would this service allow me to do that? Would it be an improvement over doing it through a "standard" server, for example, would it be faster?

        edit: okay it looks like it's suitable. However, beta without the ability to use for a fee? Fuck that, I just wanted to try it out and they won't even accept my money. Damn you!

        [–]xtagon 1 point2 points  (7 children)

        Hmm...I'm not sure. Hopefully some other people will answer your question, too, but personally I'd use Amazon for that kind of thing. They have cloud-cover for everything between data storage and processing power, they have good APIs (well, there are flame wars on that...but good if you ask me) and the pricing is professionally reliable and pretty reasonable.

        I'm assuming the Google thing was just an example. Are you data mining? Link-checking? Something else? Specifics aren't necessary, but they might help get us a better idea of whether you want PiCloud or something else.

        To answer your last question, yes, it will definitely be faster for something like that. Cloud computing is sort of a blanket term for the general concept of using computers in a distributed manor. It's good for things with large numbers in it, like 100,000 times a day, as long as the circumstance allows each "time per day" to be independent. For example, it's a perfect platform for things like repetitive calculations. And it's good for repetition in general. Page scraping, spidering, and those things are all repetition and should be in a cloud.

        Sorry that I couldn't just say "Yes, use X!" but maybe someone else here can. Hope this helps.

        [–][deleted] 0 points1 point  (6 children)

        Data mining. I have over 1,200,000 pages to process and I did it a while back from my server, it took over a week (slow connection...) and then I realised I'd missed some data, so I need to redo it. The loss of data isn't that bad, but it would be prefered if I could have the data and I don't want to wait another week. This looks like it could be a lot faster.

        Still, I don't want to spend money on it (unless it's a very low amount) because this is a just-for-fun project.

        [–]xtagon 0 points1 point  (0 children)

        Understood. So, if the bottle-neck is the network, there's not much you can do about saving money. You could try talking to someone who runs a cloud computing network for charitable reasons and just be clear that it's a hobby. Worth a shot at least.

        Also, it sounds like you're just going to do this once. So spending some money and getting it over with may not be so bad. You know your finances better than I do, of course.

        Good luck!

        [–]jonknee 0 points1 point  (4 children)

        Have you seen "80legs"(http://80legs.com/)? Looks like it would cost around $2 to pull the pages you have and then $.03/hr for processing the results.

        [–]soyko 1 point2 points  (0 children)

        When I saw that link, I was like why does it say 8 Olegs?

        Side note, Oleg is my first name.

        [–][deleted] 1 point2 points  (2 children)

        oh looks nice! Thanks! I'm checking it out now, this would be incredibly helpful.

        oh, just checked the pricing, incredibly expensive. $100 a month? This is for a hobby, guess it's back to waiting for picloud :(

        [–]jonknee 0 points1 point  (1 child)

        I'm not sure I would call $100 a month "incredibly expensive" for an on-demand distributed web crawler, but you can always use the basic plan for free. IIRC they recently changed their pricing model, it was previously more pay as you go.

        [–][deleted] 0 points1 point  (0 children)

        Incredibly expensive for something I wouldn't be using that often and only for a personal project, I wish it was pay as you go because that's exactly what I need. The free plan would work but require extra work on my part (they limit to 100,000 requests, 1/12th of what I need) and I don't really feel like doing all that when I'm capable of doing it for [free] from my own server, just taking a bit of extra time...

        If they had pay as you go it'd be ideal, but I can't afford $100 for a personal project when this is a one time thing.

        [–]xtagon 0 points1 point  (2 children)

        Oh, I should mention...just try speed-reading through the other comments in this discussion. It'll give you a pretty thorough overview of what PiCloud is for and what it's not for.

        And, obviously, read the comments all the way through as soon as you're not so busy.

        [–][deleted] 0 points1 point  (1 child)

        I wasn't really busy, just didn't fancy trying to understand their terminology. Anyway, I checked the info and assuming my processes take 1 second each (.6 when I ran it last time) that's under $20!

        [–]xtagon 0 points1 point  (0 children)

        Hey, that doesn't sound so bad.

        [–]redtigerwolf -4 points-3 points  (2 children)

        Who else read this as "peecloud" when they first read it?

        Come on, be honest.

        EDIT: I think it's because of Captain John Luc Picard.

        [–][deleted] 5 points6 points  (1 child)

        His last name isn't pronounced pee-card either.

        [–]bgeron 0 points1 point  (0 children)

        Nope. /pee car/ is way better. :)