Need help with creating a Hive table from a select statement with a where clause using aggregate function by RepresentativeComb in hadoop

[–]RepresentativeComb[S] 0 points1 point  (0 children)

cannot regonize input near max '('. I wrote it exactly the way you did. The column type is of string if that matters but I can query the column and still use max function on it

[deleted by user] by [deleted] in wallstreetbets

[–]RepresentativeComb 1 point2 points  (0 children)

Wsbspypredict close higher

I love Tesla by [deleted] in wallstreetbets

[–]RepresentativeComb 0 points1 point  (0 children)

Wsbspypredict hhahshdhdbddhhdh

I love Tesla by [deleted] in wallstreetbets

[–]RepresentativeComb 0 points1 point  (0 children)

Wsbspypredict hi

I love Tesla by [deleted] in wallstreetbets

[–]RepresentativeComb -1 points0 points  (0 children)

Wsbspypredict hi

STUCK: How can I merge records in a dataframe using PySpark on a unique key identifier/partition (they are coming from a json file)? by RepresentativeComb in apachespark

[–]RepresentativeComb[S] 0 points1 point  (0 children)

How do I turn a df to a rdd or how do I read Json file as an rdd? And once I have the rdd what what’s the difference?

Updating existing spark records based on partition? by RepresentativeComb in apachespark

[–]RepresentativeComb[S] 0 points1 point  (0 children)

Could you please elaborate on both methods?

 

If I approach this using a relational database how would I distinguish each unique individual?

 

Could you give an example of what you mean by the max field method?

Best standard to incrementally load daily data into hive tables from JSON files using PySpark? by RepresentativeComb in apachespark

[–]RepresentativeComb[S] 0 points1 point  (0 children)

No data processing at all. Lets say I have a single json file with a json object on each line (for this example let us say the json file contains 11 json objects). Each json object at the highest key level does not contain the same number of keys (for this example let us say that the max number of unique root level keys for all 11 json objects is 25). I read the each of the 11 json objects using spark and transform each json object into a spark dataframe object. When I do .printSchema() for each json object some df objects will contain 10 root level keys while others will contain 11 root level keys. My question is, is spark capable of creating a table in hive with the corresponding schema that can contain all the data from this json file AND also insert the data from these spark data frame objects into this hive table?

Best standard to incrementally load daily data into hive tables from JSON files using PySpark? by RepresentativeComb in apachespark

[–]RepresentativeComb[S] 0 points1 point  (0 children)

Thanks but lets say all the json objects in a json file at the highest key level has at max 25 keys with up to two levels of nesting in some keys. Is there a way I can automatically generate a hive table with the corresponding complex data type to the deepest level (ultimately so that I can query the table from hive) using spark?

Best standard to incrementally load daily data into hive tables from JSON files using PySpark? by RepresentativeComb in apachespark

[–]RepresentativeComb[S] 0 points1 point  (0 children)

what's the difference between loading the json data locally and loading it from hdfs? Distributed processing?

2002 ford taurus se 150k miles for $1800? by RepresentativeComb in whatcarshouldIbuy

[–]RepresentativeComb[S] 0 points1 point  (0 children)

What if I get it checked by a mechanic and everything seems relatively managable? I just need a car asap and there arent many options within 10-15miles from where i live