SparkConfWhat is SparkConf? The SparkConf offers configuration for any Spark application. To start any Spark application on a local Cluster or a dataset, we need to set some configuration and parameters, and it can be done using SparkConf. Features of Sparkconf and their usage The most commonly used features of the Sparkconf when working with PySpark is given below:
Consider the following example to understand some attributes of SparkConf: Output: 'PySpark Demo App' The initial thing any spark program does is creating a SparkContext object which tells the application how to access a cluster. To accomplish the task, you need to implement SparkConf so that the SparkContext object contains the configuration information about the application. Below we are describing the SparkContext in detail: SparkContextWhat is SparkContext? The SparkContext is the first and essential thing that gets initiated when we run any Spark application. The most important step of any Spark driver application is to generate SparkContext. It is an entry gate for any spark derived application or functionality. It is available as sc by default in Pyspark. Note: You need to remember that creating the other variable instead of sc will give an error.Parameters:SparkContext accepts the following parameter that we have described below: Master The URL of the cluster connects to Spark. appName The name of your task. SparkHome SparkHome is a Spark installation directory. pyFiles .zip or .py files are send to the cluster and then added to the PYTHONPATH. Environment It represents the worker nodes environment variables. BatchSize The number of Python object represents the BatchSize. If you want to disable the batching, set it to 1. It automatically chooses the batch size based on object size 0, set 1 for unlimited batch size. Serializer It represents the Serializer, an RDD. Conf It set all the spark properties. An object of L {SparkConf} is there. profiler_cls It is a class of custom profile which is used to do the profiling, although make sure the pyspark.profiler.BasicProfiler is the default one. The Master and Appname are the most widely used parameter among the parameters. The following are the initial code for any PySpark application.
Next TopicPySpark SQL
|