Databricks: Make Log4J Configurable

Reading Time: 3 minutes

Goal

The goal of this blog is to define the processes to make the databricks log4j configuration file configurable for debugging purpose

Using the below approaches we can easily change the log level(ERROR, INFO or DEBUG) or change the appender.

Databricks Approach-1

There is no standard way to overwrite log4j configurations on clusters with custom configurations. You must overwrite the configuration files using init scripts.

The current configurations are stored in two log4j.properties files:

On the driver:
%sh cat /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

On the worker:
%sh cat /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties

To set class-specific logging on the driver or on workers, use the following script:

#!/bin/bash

echo "Executing on Driver: $DB_IS_DRIVER"

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties"
else
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties"
fi

echo "Adjusting log4j.properties here: ${LOG4J_PATH}"
echo "log4j.<custom-prop>=<value>" >> ${LOG4J_PATH}

Replace <custom-prop> with the property name, and <value> with the property value.

Upload the script to DBFS and select a cluster using the cluster configuration UI.

The above script append my log4j configuration into the default log.properties file on each node(DRIVER and WORKER) in the spark cluster

Limitations

  • Whenever you want to change in the script you need to restart the cluster
  • Init script dependent, so only cluster edit permission can add the init script.

Databricks Approach-2

Another way to configure the log4j configuration is to use the Spark Monitoring library method which can load the custom log4j configuration from dbfs.

Using this approach we will not depend on the Data solutions team to setup the init script on each cluster. We can easily load the configuration by calling a method in a notebook.

Prerequisite:

  • Spark Monitoring library set up on the cluster : We need this library to setup on the databricks cluster.

Steps:

Create custom log4j.properties file for any of the package or class logs.

Example:

log4j.appender.custom=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.custom.layout=org.apache.log4j.PatternLayout
log4j.appender.custom.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.custom.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.custom.rollingPolicy.FileNamePattern=logs/custom-logs-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.custom.rollingPolicy.ActiveFileName=logs/customfile-active.log

log4j.logger.<package> = DEBUG, custom

The above custom log4j.properties file create a custom appender for your package and store the logs in the logs/customfile-active.log, you can change the appender name and the file name according to your requirement.

We also applied the logs rollover policy which rolls over the logs hourly basis and makes the .gz file for your logs which is stored in the cluster log delivery location mentioned in the cluster configuration.

Now we created the custom log4j.properties file, the next step is to copy this file into the dbfs.

After that, you can easily load the custom log4j.properties file in the notebook itself by calling the above method.

import com.microsoft.pnp.logging.Log4jConfiguration

Log4jConfiguration.configure("/dbfs/databricks/spark-monitoring/log4j.properties")

Whenever you execute the notebook, It logs the custom log4j properties file for your package and writing the logs with your log level into the file which you mentioned in the configuration.

Set Executor Log Level

To set the log level on all executors, set it inside the JVM on each worker. Run the code below to set it:

sc.parallelize(Seq("")).foreachPartition(x => {
  import org.apache.log4j.{LogManager, Level}
  import org.apache.commons.logging.LogFactory

  LogManager.getRootLogger().setLevel(Level.DEBUG)
  val log = LogFactory.getLog("EXECUTOR-LOG:")
  log.debug("START EXECUTOR DEBUG LOG LEVEL")
})

Run the above code in the notebook, then It will change the executor root log level.

Thank you for sticking to the end. If you like this blog, please do show your appreciation by giving thumbs ups and share this blog and give me suggestions on how I can improve my future posts to suit your needs. Follow me to get updates on different technologies

References:

Written by 

Azmat Hasan is a Software Consultant at Knoldus Software LLP. He has done MCA from CDAC Noida in 2019. He has good knowledge of DevOps technologies i.e docker, Ansible, CI/CD(Jenkins, Bamboo), Kubernetes, Monitoring(Prometheus, Grafana), Logging(ELK Stack), etc. He is a self-motivated, enthusiastic person who believes in striving to achieve what we can sustain over a longer period of time, instead of working for short term benefits. He believe in working together to create synergy.

1 thought on “Databricks: Make Log4J Configurable4 min read

Comments are closed.