Xircuits Spark Submit
Before proceeding, ensure that you have already configured your system to run Spark submit operations. If you can perform
spark-submit example.py
You're ready to proceed with this section.
Xircuits Spark Submit
Xircuits provides an user interface to submit Spark applications via custom remote run. Running it is simple:
- Create your spark application in the Xircuits canvas.
- Compile! Verify that the script has been generated.
- Run the script by going to the Run Options > Selecting
Spark
> selecting the Spark run type.
Video
If you have chosen cluster mode, your application should run in the Spark dashboard at localhost:8080
.
Video
Modifying the Spark Submit Parameters
You can update the Spark submit parameters by configuring the Xircuits remote submit config.
- Launch Xircuits with
xircuits --ContentsManager.allow_hidden=True
In the
View
panel, clickShow Hidden Files
. You should be able to see the.xircuits
directory.Open
.xircuits/config.ini
. You will see the remote submit configs.Under
[CLUSTER]
, you may add / modify the submit parameters as desired.
For multiple lines, you will need to add a forward slash ( \ ). You also should not append the python script to the spark submit parameters as it is automatically added by Xircuits.
By default, the following are the configs set:
Spark Submit Local Config
[LOCAL]
name = LOCAL
command = $SPARK_HOME/bin/spark-submit
msg = Running Spark Submit using local mode
url = http://localhost:8088/
Spark Submit Cluster Config
name = CLUSTER
command = $SPARK_HOME/bin/spark-submit \
--py-files env_spark.zip \
--archives env_spark.zip \
--master yarn \
--conf spark.yarn.appMasterEnv.LD_LIBRARY_PATH='/usr/local/cuda-11.2/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH' \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON='/usr/local/bin/python3.9' \
--conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON='/usr/local/bin/python3.9' \
--num-executors=8 --executor-cores=1 --executor-memory=10G --driver-memory=10G \
--name cluster_mode \
--deploy-mode cluster \
--conf spark.rpc.message.maxSize=1024 \
--conf spark.driver.maxResultSize=10G
msg = Running Spark Submit using YARN cluster mode
url = http://localhost:8088/
You will need to first select the normal Run
before the Spark Submit
for the changes reflected in the Spark Submit dialog.