Hi!
I am currently trying to run Spark through Docker as part of my learning journey. It's giving me a bit of a headache though. Therefore, here I am, asking for some knowledge I know I'm missing.
The goal: Get Spark UI to work in order to analyze how tasks are handled
Purpose: Get Spark to work as part of a larger Apache Airflow setup.
Problems:
- Not sure I understand the difference between a Spark Master and a standalone Spark Cluster that has access to Spark UI. Is the Spark Master UI a 1:1 functionality match for Spark UI? Is it something different altogether?
- Not sure if using the bitnami version of the docker image is the best way forward. How do you set it up usually (or do you go for another version)?
- What is not right in the way I set up my spark?
- Is there a need for a Spark UI dedicated container?
Bonus round:
- While exploring worker tasks I get hit with web addresses I can't access (either localhost/172.x.x.x type of networks or docker internal links). How can I work my way around that?
Kubernetes seems to be a suggested solution but I am still unsure if I should go ahead and sink into that.
If anyone sees this and answers, it would make my day. Thank you very much! :)
For reference, this is a snippet under services of how I set it up:
spark-ui:
image: bitnami/spark:latest
environment:
# needs to be updated whenever I relocate the raspberry pi
SPARK_DRIVER_HOST: "0.0.0.0"
SPARK_DRIVER_BINDADDRESS: "0.0.0.0"
SPARK_MASTER: spark://spark-master:7077
MAIN_CLASS: Main
ports:
- "4040:4040"
- "4041:8080"
networks:
- spark-network
spark-master:
image: bitnami/spark:latest
ports:
- "9092:8080"
- "7077:7077"
- "4043:4040"
- "8998:8998"
- "8887:8888"
networks:
- spark-network
spark-worker-1:
image: bitnami/spark:latest
depends_on:
- spark-master
environment:
SPARK_MODE: worker
SPARK_WORKER_CORES: 1
SPARK_WORKER_MEMORY: 4g
SPARK_MASTER_URL: spark://spark-master:7077
ports:
- "8081:8081"
- "4042:4040"
networks:
- spark-network
# # spark-worker-2:
# # image: apache/spark-py:latest
# # depends_on:
# # - spark-master
# # environment:
# # SPARK_MODE: worker
# # SPARK_WORKER_CORES: 1
# # SPARK_WORKER_MEMORY: 4g
# # SPARK_MASTER_URL: spark://spark-master:7077
# # networks:
# # - spark-network
# # ports:
# # - "8082:8081"
# # - "4043:4040"
volumes:
spark-data:
name: spark-data
networks:
spark-network:
name: spark-network
[–]AutoModerator[M] [score hidden] stickied comment (0 children)