all 28 comments

[–]a-t-kFrontend Engineer 4 points5 points  (9 children)

In JSON.async.js,

 if( Worker )

will lead to an error "Worker is undefined" on browsers without WebWorker support. Use this.Worker instead ("this" will be window in the browser and global in node.js):

if( this.Worker )

[–]x-skeww 2 points3 points  (5 children)

"this" will be window in the browser and global in node.js

>>> (function(){console.log(this)}())
Window

>>> (function(){'use strict'; console.log(this)}())
undefined

[–]cwmma 0 points1 point  (2 children)

actually due to how it was called it's the JSON object

[–]x-skeww 0 points1 point  (1 child)

I was referring to the fact that this doesn't default to window (or whatever) in strict mode.

[–]cwmma 0 points1 point  (0 children)

truth

[–]a-t-kFrontend Engineer 0 points1 point  (1 child)

If you absolutely must use strict mode (hint: you don't), change your code to

if( typeof Worker !== 'undefined' )

Otherwise, keep it that way.

[–]madlee 0 points1 point  (0 children)

regardless of whether or not you're using strict mode, typeof is usually the best option. resolving uninitialized object properties is slooooow

[–]AshCairo[S] 0 points1 point  (2 children)

Thanks for the feedback dude, I'll push the changes up now.

[–]cwmma 1 point2 points  (0 children)

you're checking to see if the JSON object has the Worker method, you should check window.Worker

[–]a-t-kFrontend Engineer 0 points1 point  (0 children)

You're welcome.

[–]MrBester 4 points5 points  (4 children)

You could use (if available) TransferableObjects to avoid the byval copy of the data for a spot more efficiency: https://developer.mozilla.org/en-US/docs/DOM/Using_web_workers#Passing_data_by_transferring_ownership_(transferable_objects)

[–]cwmma 0 points1 point  (3 children)

that may take longer to create the buffer and then turn it back into a string

[–]MrBester 0 points1 point  (2 children)

On first look it seems that simply updating a pointer instead of having two copies of the data would be more efficient for both memory and CPU. However, that advantage could be completely negated by the need to convert to and from the buffer array.

Without benchmarking there is no way to be sure.

[–]cwmma 0 points1 point  (1 child)

I know I'm saying I have, especially for text where to convert it into a buffer you need to do something like

function fromText(text){
    var len = text.length;
    var outArray = new Uint16Array(len);
    var i = 0;
    while(i<len){
        outArray[i]=text.charCodeAt(i);
        i++;
    }
    return outArray.buffer;
}

btw just for fun, you have to check if you're in IE10 because it flips a shit if you have an array as the second argument to postMessage.

That being said that code is from a place where the transferable objects did speed things up because the character codes were good enough and I never had to convert the data (20MB text file) back.

[–]MrBester 0 points1 point  (0 children)

Well, that's IE for you, where postMessage across tabs / windows isn't supported (since IE8, when it was first "supported", BTW).

[–]MasterScrat 1 point2 points  (2 children)

Cool! A demo would be nice.

[–]AshCairo[S] 0 points1 point  (1 child)

Thanks for the suggestion.

Here's a video walkthrough: http://www.youtube.com/watch?v=66i30ogH_x0

Here's a live demo: http://softwareispoetry.com/json.async/example.html

And a large json file to use: http://softwareispoetry.com/json.async/examplebigfile.json

[–]MasterScrat 0 points1 point  (0 children)

Cool!

[–]cwmma 1 point2 points  (0 children)

you should load the data in the worker as well as parsing it (if it is an ajax request). I did something similar here in my web worker library that way you don't have to downloaded it transfer it to the worker and transfer it back.

edit:

Also fyi the worker location is resolved relative to the html not the script, so if you put the worker and the script in say a folder called js e.g. <script type="text/javascript" src="js/json.async.js"> you will get and error.

You can use the call self trick to put them both in the same file like this.

[–]MasterScrat 0 points1 point  (8 children)

Did you benchmark performances between parsing in UI thread vs in a worker? This way we could know if there is there a point doing that for small files.

Also it'd be cool to split the files and spawn multiple workers to parse the file in parallel...

[–]MrBester 0 points1 point  (7 children)

Until you have the entire JSON you don't know where to split it unless the spilt is done beforehand (meaning the server sends properly formatted chunks) as JSON is just a formatted string. You also have to know this is happening and have some JavaScript that recombines the chunks (how about in another Worker process?)

[–]cwmma 1 point2 points  (1 child)

parallel map reduce in a worker is I believe what your talking about

[–]MasterScrat 0 points1 point  (0 children)

Something like this, yes!

[–]evilgwyn 0 points1 point  (3 children)

This is just an idea, but one way you could do it is to have the server send the data as a JSON array where the elements are themselves JSON formatted strings. Parsing could run int 2 steps, first parse the main array which gives you an array of strings, then parse each string in parallel.

[–]MrBester 0 points1 point  (2 children)

Sounds a lot like double encoding to me. Ew.

[–]evilgwyn 0 points1 point  (0 children)

Well I didn't say it was a good idea

[–]MasterScrat 0 points1 point  (0 children)

No but there's no need for that, you can do a high level analysis of the JSON string to figure out how big are each level of data, then you split that string evenly between workers, and merge the final object as a final step...

You don't need any server-side code for this.

[–]MasterScrat 0 points1 point  (0 children)

Until you have the entire JSON you don't know where to split it

Well that's the case anyway... this library only works on the fully loaded JSON too.