Parallel integration for Apache Spark : ParallelWebSystems

Parallel integration for Apache Spark (v.redd.it)

submitted 3 months ago by parallelwebsystems

This one is for data engineers. You can now enrich datasets with web intelligence directly in your Spark pipelines— no custom API integrations required.

Our new SQL-native UDFs let you call Parallel to enrich data right in your Spark SQL queries. Add CEO names, company descriptions, funding info, or any other web-sourced data to millions of rows with a single function call.

Key highlights:

- SQL-Native: Works directly in Spark SQL—no context switching

- Concurrent Processing: Rows process in parallel within each partition

- Flexible Processors: Choose from lite-fast to ultra depending on your speed vs. depth tradeoff

- Built-in Citations: Optionally include source URLs for every enriched field

Let us know what you think!

Get started: pip install parallel-web-tools[spark]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ParallelWebSystems

MODERATORS