Logging Spark Application on standalone cluster

Logging of the application is much important to debug application, and logging spark application on standalone cluster is little bit different. We have two components for our spark application – Driver and Executer. Spark default use log4j logger to log  application. So whenever we use spark on local machine or spark-shell its use default log4j.properties from /spark/conf/log4j.properties by default spark logging rootCategory=INFO, console. But when we deploy our application on spark standalone cluster its different, we need to log executer and driver logs into some specific file.

So to log spark application on standalone cluster we don’t need to add log4j.properties into the application jar we should create the log4j.properties for driver and executer.

We need to create separate log4j.properties file for executer and driver both like below

# Set everything to be logged to the console
log4j.rootCategory=INFO,FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={Enter path of the file}
log4j.appender.FILE.MaxFileSize=10MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

As above we can create a log4j.properties file with FileAppender for executer and driver. And when we deploy the application on standalone cluster we need to define path of log4j.properties in java options of driver and executer as below:

spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J" --master MASTER_IP:PORT JAR_PATH

Now you will see the log files on driver and executers machine on file path you define in log4j.properties file.

We can also log user define logs in our spark application by simply create the object of Logger of log4j and use that object to logging logs all these logs are also log in log file define in log4j.properties file.

5 thoughts on “Logging Spark Application on standalone cluster

Leave a Reply

%d bloggers like this: