Logging Spark Application on standalone cluster


Logging of the application is much important to debug application, and logging spark application on standalone cluster is little bit different. We have two components for our spark application – Driver and Executer. Spark default use log4j logger to log  application. So whenever we use spark on local machine or spark-shell its use default log4j.properties from /spark/conf/log4j.properties by default spark logging rootCategory=INFO, console. But when we deploy our application on spark standalone cluster its different, we need to log executer and driver logs into some specific file.

So to log spark application on standalone cluster we don’t need to add log4j.properties into the application jar we should create the log4j.properties for driver and executer.

We need to create separate log4j.properties file for executer and driver both like below

# Set everything to be logged to the console
log4j.rootCategory=INFO,FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={Enter path of the file}
log4j.appender.FILE.MaxFileSize=10MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

As above we can create a log4j.properties file with FileAppender for executer and driver. And when we deploy the application on standalone cluster we need to define path of log4j.properties in java options of driver and executer as below:

spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J" --master MASTER_IP:PORT JAR_PATH

Now you will see the log files on driver and executers machine on file path you define in log4j.properties file.

We can also log user define logs in our spark application by simply create the object of Logger of log4j and use that object to logging logs all these logs are also log in log file define in log4j.properties file.

Advertisements

About sandeep

I m working as an software consultant in Knoldus Software LLP . I m working on scala, play, spark,hive, hdfs, hadoop and many big data technologies.
This entry was posted in apache spark, Scala, Spark and tagged , . Bookmark the permalink.

5 Responses to Logging Spark Application on standalone cluster

  1. Paris Lordi says:

    Due to this property, new Masters can be created at any time, and the only thing you need to worry about is that

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s