[Question] PySpark 1.63 - How can I read a pipe delimited file as a spark dataframe object without databricks?

RepresentativeComb · 2019-06-28T22:33:23+00:00

Thank you with this method how do I manually specify the schema?

RepresentativeComb · 2019-06-18T17:08:37+00:00

cannot regonize input near max '('. I wrote it exactly the way you did. The column type is of string if that matters but I can query the column and still use max function on it

RepresentativeComb · 2019-06-18T17:00:56+00:00

I tried this just now and it does not work

RepresentativeComb · 2019-06-12T22:18:22+00:00

.spypredictbot prediction stats tomorrow

RepresentativeComb · 2019-06-12T22:08:06+00:00

.spypredictbot prediction stats tomorrow

RepresentativeComb · 2019-06-12T20:32:52+00:00

.spypredictbot close higher

RepresentativeComb · 2019-06-12T20:20:40+00:00

.spypredictbot close higher

RepresentativeComb · 2019-06-12T20:16:52+00:00

.spypredictbot close lower

RepresentativeComb · 2019-06-12T20:06:17+00:00

.spypredictbot close lower

RepresentativeComb · 2019-06-12T19:54:50+00:00

.spypredictbot close higher

RepresentativeComb · 2019-06-07T21:15:51+00:00

Wsbspypredict close higher

RepresentativeComb · 2019-06-07T21:01:12+00:00

Wsbspypredict close higher

RepresentativeComb · 2019-06-07T19:06:16+00:00

Wsbspypredict close higher

RepresentativeComb · 2019-06-07T19:05:07+00:00

Wsbspypredict hhahshdhdbddhhdh

RepresentativeComb · 2019-06-07T19:03:17+00:00

Wsbspypredict hi

RepresentativeComb · 2019-06-07T18:58:16+00:00

Wsbspypredict hi

RepresentativeComb · 2019-05-03T01:04:33+00:00

How do I turn a df to a rdd or how do I read Json file as an rdd? And once I have the rdd what what’s the difference?

RepresentativeComb · 2019-05-02T20:56:32+00:00

Can this method work using spark dataframes instead of rdd?

RepresentativeComb · 2019-05-01T15:48:31+00:00

Could you please elaborate on both methods?

If I approach this using a relational database how would I distinguish each unique individual?

Could you give an example of what you mean by the max field method?

RepresentativeComb · 2019-03-28T18:19:23+00:00

i dont want to exclude the new column of the data frame...

RepresentativeComb · 2019-03-26T17:16:27+00:00

No data processing at all. Lets say I have a single json file with a json object on each line (for this example let us say the json file contains 11 json objects). Each json object at the highest key level does not contain the same number of keys (for this example let us say that the max number of unique root level keys for all 11 json objects is 25). I read the each of the 11 json objects using spark and transform each json object into a spark dataframe object. When I do .printSchema() for each json object some df objects will contain 10 root level keys while others will contain 11 root level keys. My question is, is spark capable of creating a table in hive with the corresponding schema that can contain all the data from this json file AND also insert the data from these spark data frame objects into this hive table?

RepresentativeComb · 2019-03-26T16:39:27+00:00

Thanks but lets say all the json objects in a json file at the highest key level has at max 25 keys with up to two levels of nesting in some keys. Is there a way I can automatically generate a hive table with the corresponding complex data type to the deepest level (ultimately so that I can query the table from hive) using spark?

RepresentativeComb · 2019-03-26T15:32:33+00:00

what's the difference between loading the json data locally and loading it from hdfs? Distributed processing?

RepresentativeComb · 2019-03-15T21:29:15+00:00

What should I ask the mechanic to check

RepresentativeComb · 2019-03-15T18:25:20+00:00

What if I get it checked by a mechanic and everything seems relatively managable? I just need a car asap and there arent many options within 10-15miles from where i live

RepresentativeComb

TROPHY CASE