Pythonic run configuration#

This feature is considered experimental.

This guide acts as an introduction to providing parameters to assets and jobs using the experimental Pythonic configuration APIs, which makes defining and passing this configuration more lightweight.

It's often useful to provide user-chosen values to Dagster jobs or software-defined assets at runtime. For example, you might want to choose what dataset an op runs against, or provide a connection URL for a database resource. Dagster exposes this functionality through a configuration API.


Using Pythonic configuration#

Defining Pythonic configuration on assets and ops#

Configurable parameters accepted by an op or asset are specified by defining a config model subclass of Config and a config parameter to the corresponding op or asset function. Under the hood, these config models utilize Pydantic, a popular Python library for data validation and serialization.

During execution, the specified config is accessed within the body of the op or asset using the config parameter, which is reserved specifically for this purpose.

Using software-defined assets#

Here, we define a subclass of Config holding a single string value representing the name of a user. We can access the config through the config parameter in the asset body.

from dagster import asset, Config

class MyAssetConfig(Config):
    person_name: str

@asset
def greeting(config: MyAssetConfig) -> str:
    return f"hello {config.person_name}"

Defining and accessing Pythonic configuration for a resource#

Configurable parameters for a resource are defined by specifying attributes for a resource class, which subclasses ConfigurableResource. The below resource defines a configurable connection URL, which can be accessed in any methods defined on the resource.

from dagster import op, ConfigurableResource

class MyDatabaseResource(ConfigurableResource):
    connection_url: str

    def query(self, query: str):
        return get_engine(self.connection_url).execute(query)

For more information on using resources, see the Pythonic resources guide.

Specifying runtime configuration in code#

To execute a job or materialize an asset that specifies config, you'll need to provide values for its parameters. When specifying config from the Python API, we can use the run_config argument for JobDefinition.execute_in_process or materialize. This takes a RunConfig object, within which we can supply config on a per-op or per-asset basis. The config is specified as a dictionary, with the keys corresponding to the op/asset names and the values corresponding to the config values.

from dagster import job, materialize, op, RunConfig

@job
def greeting_job():
    print_greeting()

job_result = greeting_job.execute_in_process(
    run_config=RunConfig({"print_greeting": MyOpConfig(person_name="Alice")})
)

asset_result = materialize(
    [greeting],
    run_config=RunConfig({"greeting": MyAssetConfig(person_name="Alice")}),
)

Using environment variables with config#

Ops and assets can be configured using environment variables by passing an EnvVar when constructing your config object. This is useful when the value may vary based on environment or is sensitive. If you're using Dagster Cloud, environment variables can be set up directly in the UI.

from dagster import job, materialize, op, RunConfig, EnvVar

job_result = greeting_job.execute_in_process(
    run_config=RunConfig({"print_greeting": MyOpConfig(person_name=EnvVar("PERSON_NAME"))})
)

asset_result = materialize(
    [greeting],
    run_config=RunConfig(
        {"greeting": MyAssetConfig(person_name=EnvVar("PERSON_NAME"))}
    ),
)

Defining complex config schemas#

In some cases, you may want to define a more complex config schema. For example, you may want to define a config schema that takes in a list of files or complex data. Below we'll walk through some common patterns for defining more complex config schemas.

Attaching metadata to config fields#

Config fields can be annotated with metadata, which can be used to provide additional information about the field, using the Pydantic Field class.

For example, we can annotate a config field with a description, which will be displayed in the documentation for the config field. We can add a value range to a field, which will be validated when config is specified.

from dagster import Config
from pydantic import Field

class MyMetadataConfig(Config):
    # Here, the ellipses `...` indicates that the field is required and has no default value.
    person_name: str = Field(..., description="The name of the person to greet")
    age: int = Field(
        ..., gt=0, lt=100, description="The age of the person to greet"
    )

# errors!
MyMetadataConfig(person_name="Alice", age=200)

Defaults and optional config fields#

Config fields can be marked as optional by specifying a default value. For example, we can mark the greeting_phrase field as optional by specifying a default of hello. Optional fields, such as person_name, can be specified a default value of None.

from typing import Optional
from dagster import asset, Config, materialize, RunConfig

class MyAssetConfig(Config):
    person_name: Optional[str] = None
    greeting_phrase: str = "hello"

@asset
def greeting(config: MyAssetConfig) -> str:
    if config.person_name:
        return f"{config.greeting_phrase} {config.person_name}"
    else:
        return config.greeting_phrase

asset_result = materialize(
    [greeting],
    run_config=RunConfig({"greeting": MyAssetConfig()}),
)

Basic data structures#

Many basic Python data structures can be used in your config schemas, including lists and mappings.

For example, we can define a config schema that takes in a list of user names and a mapping of user names to user scores.

from dagster import Config, materialize, asset, RunConfig
from typing import List, Dict

class MyDataStructuresConfig(Config):
    user_names: List[str]
    user_scores: Dict[str, int]

@asset
def scoreboard(config: MyDataStructuresConfig):
    ...

result = materialize(
    [scoreboard],
    run_config=RunConfig(
        {
            "scoreboard": MyDataStructuresConfig(
                user_names=["Alice", "Bob"],
                user_scores={"Alice": 10, "Bob": 20},
            )
        }
    ),
)

Nested schemas#

Schemas can be nested in one another, or in basic Python data structures.

Here, we define a schema which contains a mapping of user names to complex user data objects.

from dagster import asset, materialize, Config, RunConfig
from typing import Dict

class UserData(Config):
    age: int
    email: str
    profile_picture_url: str

class MyNestedConfig(Config):
    user_data: Dict[str, UserData]

@asset
def average_age(config: MyNestedConfig):
    ...

result = materialize(
    [average_age],
    run_config=RunConfig(
        {
            "average_age": MyNestedConfig(
                user_data={
                    "Alice": UserData(age=10, email="alice@gmail.com", profile_picture_url=...),
                    "Bob": UserData(age=20, email="bob@gmail.com", profile_picture_url=...),
                }
            )
        }
    ),
)

Union types#

Union types are supported using Pydantic discriminated unions. Each union type must be a subclass of Config. The discriminator argument to Field specifies the field that will be used to determine which union type to use.

Here, we define a config schema which takes in a pet field, which can be either a Cat or a Dog, as indicated by the pet_type field.

from dagster import asset, materialize, Config, RunConfig
from pydantic import Field
from typing import Union
from typing_extensions import Literal

class Cat(Config):
    pet_type: Literal["cat"] = "cat"
    meows: int

class Dog(Config):
    pet_type: Literal["dog"] = "dog"
    barks: float

class ConfigWithUnion(Config):
    # Here, the ellipses `...` indicates that the field is required and has no default value.
    pet: Union[Cat, Dog] = Field(..., discriminator="pet_type")

@asset
def pet_stats(config: ConfigWithUnion):
    if isinstance(config.pet, Cat):
        return f"Cat meows {config.pet.meows} times"
    else:
        return f"Dog barks {config.pet.barks} times"

result = materialize(
    [pet_stats],
    run_config=RunConfig(
        {
            "pet_stats": ConfigWithUnion(
                pet=Cat(meows=10),
            )
        }
    ),
)