Pipeline components reference

This page documents the top-level configuration for pipeline components: sources, transforms, sinks, and enrichment tables.

These fields define the structure of your observability data pipeline. Each component is defined as a table within these sections, with component-specific configuration options.

For other top-level configuration options, see:

Global Options - Global settings like data directories and timezone
API - Configure Vector's observability API
Schema - Configure Vector's internal schema system
Secrets - Configure secrets management

enrichment_tables

optional object

All configured enrichment tables.

enrichment_tables.*

required object

An enrichment table.

enrichment_tables.*.file

required object

File-specific settings.

Relevant when: type = "file"

enrichment_tables.*.file.encoding

required object

File encoding configuration.

enrichment_tables.*.file.encoding.delimiter optional string

The delimiter used to separate fields in each row of the CSV file.

default: ,

enrichment_tables.*.file.encoding.include_headers optional bool

Whether or not the file contains column headers.

When set to true, the first row of the CSV file will be read as the header row, and the values will be used for the names of each column. This is the default behavior.

When set to false, columns are referred to by their numerical index.

default: true

enrichment_tables.*.file.encoding.type required string enum

File encoding type.

Enum options

Option	Description
`csv`	Decodes the file as a CSV (comma-separated values) file.

enrichment_tables.*.file.path

required string

The path of the enrichment table file.

Currently, only CSV files are supported.

enrichment_tables.*.flush_interval

optional uint

The interval used for making writes visible in the table. Longer intervals might get better performance, but there is a longer delay before the data is visible in the table. Since every TTL scan makes its changes visible, only use this value if it is shorter than the scan_interval.

By default, all writes are made visible immediately.

Relevant when: type = "memory"

enrichment_tables.*.graph

optional object

Extra graph configuration

Configure output for component when generated with graph command

enrichment_tables.*.graph.node_attributes

optional object

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

enrichment_tables.*.graph.node_attributes.* required string

A single graph node attribute in graphviz DOT language.

Examples

{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}

enrichment_tables.*.inputs

optional [string]

A list of upstream source or transform IDs.

Wildcards (*) are supported.

See configuration for more info.

enrichment_tables.*.internal_metrics

optional object

Configuration of internal metrics

Relevant when: type = "memory"

enrichment_tables.*.internal_metrics.include_key_tag

optional bool

Determines whether to include the key tag on internal metrics.

This is useful for distinguishing between different keys while monitoring. However, the tag’s cardinality is unbounded.

default: false

enrichment_tables.*.locale

optional string

The locale to use when querying the database.

MaxMind includes localized versions of some of the fields within their database, such as country name. This setting can control which of those localized versions are returned by the transform.

More information on which portions of the geolocation data are localized, and what languages are available, can be found here.

Relevant when: type = "geoip"

default: en

enrichment_tables.*.max_byte_size

optional uint

Maximum size of the table in bytes. All insertions that make this table bigger than the maximum size are rejected.

By default, there is no size limit.

Relevant when: type = "memory"

enrichment_tables.*.path

required string

Path to the MaxMind GeoIP2 or GeoLite2 binary city database file (GeoLite2-City.mmdb).

Other databases, such as the country database, are not supported. mmdb enrichment table can be used for other databases.

Relevant when: type = "geoip" or type = "mmdb"

enrichment_tables.*.scan_interval

optional uint

The scan interval used to look for expired records. This is provided as an optimization to ensure that TTL is updated, but without doing too many cache scans.

Relevant when: type = "memory"

default: 30

enrichment_tables.*.schema

optional object

Key/value pairs representing mapped log field names and types.

This is used to coerce log fields from strings into their proper types. The available types are listed in the Types list below.

Timestamp coercions need to be prefaced with timestamp|, for example "timestamp|%F". Timestamp specifiers can use either of the following:

One of the built-in-formats listed in the Timestamp Formats table below.
The time format specifiers from Rust’s chrono library.

Types

bool
string
float
integer
date
timestamp (see the table below for formats)

Timestamp Formats

Format	Description	Example
`%F %T`	`YYYY-MM-DD HH:MM:SS`	`2020-12-01 02:37:54`
`%v %T`	`DD-Mmm-YYYY HH:MM:SS`	`01-Dec-2020 02:37:54`
`%FT%T`	ISO 8601/RFC 3339, without time zone	`2020-12-01T02:37:54`
`%FT%TZ`	ISO 8601/RFC 3339, UTC	`2020-12-01T09:37:54Z`
`%+`	ISO 8601/RFC 3339, UTC, with time zone	`2020-12-01T02:37:54-07:00`
`%a, %d %b %Y %T`	RFC 822/RFC 2822, without time zone	`Tue, 01 Dec 2020 02:37:54`
`%a %b %e %T %Y`	ctime format	`Tue Dec 1 02:37:54 2020`
`%s`	UNIX timestamp	`1606790274`
`%a %d %b %T %Y`	date command, without time zone	`Tue 01 Dec 02:37:54 2020`
`%a %d %b %T %Z %Y`	date command, with time zone	`Tue 01 Dec 02:37:54 PST 2020`
`%a %d %b %T %z %Y`	date command, with numeric time zone	`Tue 01 Dec 02:37:54 -0700 2020`
`%a %d %b %T %#z %Y`	date command, with numeric time zone (minutes can be missing or present)	`Tue 01 Dec 02:37:54 -07 2020`

Relevant when: type = "file"

enrichment_tables..schema.

required string

Represents mapped log field names and types.

enrichment_tables.*.source_config

optional object

Configuration for source functionality.

Relevant when: type = "memory"

enrichment_tables.*.source_config.export_batch_size

optional uint

Batch size for data exporting. Used to prevent exporting entire table at once and blocking the system.

By default, batches are not used and entire table is exported.

enrichment_tables.*.source_config.export_expired_items

optional bool

Set to true to export expired items via the expired output port. Expired items ignore other settings and are exported as they are flushed from the table.

default: false

enrichment_tables.*.source_config.export_interval

optional uint

Interval for exporting all data from the table when used as a source.

enrichment_tables.*.source_config.remove_after_export

optional bool

If set to true, all data will be removed from cache after exporting. Only valid if used as a source and export_interval > 0

By default, export will not remove data from cache

default: false

enrichment_tables.*.source_config.source_key

required string

Key to use for this component when used as a source. This must be different from the component key.

enrichment_tables.*.ttl

optional uint

TTL (time-to-live in seconds) is used to limit the lifetime of data stored in the cache. When TTL expires, data behind a specific key in the cache is removed. TTL is reset when the key is replaced.

Relevant when: type = "memory"

default: 600

enrichment_tables.*.ttl_field

optional string

Field in the incoming value used as the TTL override.

Relevant when: type = "memory"

enrichment_tables.*.type

required string enum

enrichment table type

Enum options

Option	Description
`file`	Exposes data from a static file as an enrichment table.
`geoip`	Exposes data from a MaxMind GeoIP2 database as an enrichment table.
`memory`	Exposes data from a memory cache as an enrichment table. The cache can be written to using a sink.
`mmdb`	Exposes data from a MaxMind database as an enrichment table.

sinks

optional object

All configured sinks.

sinks.*

required object

A sink.

sinks.*.buffer

optional object

Configures the buffering behavior for this sink.

More information about the individual buffer types, and buffer behavior, can be found in the Buffering Model section.

sinks.*.buffer.max_events

optional uint

The maximum number of events allowed in the buffer.

Relevant when: type = "memory"

default: 500

sinks.*.buffer.max_size

required uint

The maximum allowed amount of allocated memory the buffer can hold.

If type = "disk" then must be at least ~256 megabytes (268435488 bytes).

sinks.*.buffer.type

optional string enum

The type of buffer to use.

Enum options

Option Description

Option	Description
`disk`	Events are buffered on disk. This is less performant, but more durable. Data that has been synchronized to disk will not be lost if Vector is restarted forcefully or crashes. Data is synchronized to disk every 500ms.
`memory`	Events are buffered in memory. This is more performant, but less durable. Data will be lost if Vector is restarted forcefully or crashes.

disk

Events are buffered on disk.

This is less performant, but more durable. Data that has been synchronized to disk will not be lost if Vector is restarted forcefully or crashes.

Data is synchronized to disk every 500ms.

memory

Events are buffered in memory.

This is more performant, but less durable. Data will be lost if Vector is restarted forcefully or crashes.

default: memory

sinks.*.buffer.when_full

optional string enum

Event handling behavior when a buffer is full.

Enum options

Option Description

Option	Description
`block`	Wait for free space in the buffer. This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge.
`drop_newest`	Drops the event instead of waiting for free space in buffer. The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.

block

Wait for free space in the buffer.

This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge.

drop_newest

Drops the event instead of waiting for free space in buffer.

The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.

default: block

sinks.*.graph

optional object

Extra graph configuration

Configure output for component when generated with graph command

sinks.*.graph.node_attributes

optional object

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

sinks.*.graph.node_attributes.* required string

A single graph node attribute in graphviz DOT language.

Examples

{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}

sinks.*.healthcheck

optional object

Healthcheck configuration.

sinks.*.healthcheck.enabled

optional bool

Whether or not to check the health of the sink when Vector starts up.

default: true

sinks.*.healthcheck.timeout

optional float

Timeout duration for healthcheck in seconds.

default: 10(seconds)

sinks.*.healthcheck.uri

optional string

The full URI to make HTTP healthcheck requests to.

This must be a valid URI, which requires at least the scheme and host. All other components – port, path, etc – are allowed as well.

sinks.*.inputs

required [string]

A list of upstream source or transform IDs.

Wildcards (*) are supported.

See configuration for more info.

sinks.*.proxy

optional object

Proxy configuration.

Configure to proxy traffic through an HTTP(S) proxy when making external requests.

Similar to common proxy configuration convention, you can set different proxies to use based on the type of traffic being proxied. You can also set specific hosts that should not be proxied.

sinks.*.proxy.enabled

optional bool

Enables proxying support.

default: true

sinks.*.proxy.http

optional string

Proxy endpoint to use when proxying HTTP traffic.

Must be a valid URI string.

Examples

"http://foo.bar:3128"

sinks.*.proxy.https

optional string

Proxy endpoint to use when proxying HTTPS traffic.

Must be a valid URI string.

Examples

"http://foo.bar:3128"

sinks.*.proxy.no_proxy

optional [string]

A list of hosts to avoid proxying.

Multiple patterns are allowed:

Pattern	Example match
Domain names	`example.com` matches requests to `example.com`
Wildcard domains	`.example.com` matches requests to `example.com` and its subdomains
IP addresses	`127.0.0.1` matches requests to `127.0.0.1`
CIDR blocks	`192.168.0.0/16` matches requests to any IP addresses in this range
Splat	`*` matches all hosts

Configure output for component when generated with graph command

sources.*.graph.node_attributes

optional object

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

sources.*.graph.node_attributes.* required string

A single graph node attribute in graphviz DOT language.

Examples

{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}

sources.*.proxy

optional object

Proxy configuration.

Configure to proxy traffic through an HTTP(S) proxy when making external requests.

Similar to common proxy configuration convention, you can set different proxies to use based on the type of traffic being proxied. You can also set specific hosts that should not be proxied.

sources.*.proxy.enabled

optional bool

Enables proxying support.

default: true

sources.*.proxy.http

optional string

Proxy endpoint to use when proxying HTTP traffic.

Must be a valid URI string.

Examples

"http://foo.bar:3128"

sources.*.proxy.https

optional string

Proxy endpoint to use when proxying HTTPS traffic.

Must be a valid URI string.

Examples

"http://foo.bar:3128"

sources.*.proxy.no_proxy

optional [string]

A list of hosts to avoid proxying.

Multiple patterns are allowed:

Pattern	Example match
Domain names	`example.com` matches requests to `example.com`
Wildcard domains	`.example.com` matches requests to `example.com` and its subdomains
IP addresses	`127.0.0.1` matches requests to `127.0.0.1`
CIDR blocks	`192.168.0.0/16` matches requests to any IP addresses in this range
Splat	`*` matches all hosts

Configure output for component when generated with graph command

transforms.*.graph.node_attributes

optional object

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

transforms.*.graph.node_attributes.* required string

A single graph node attribute in graphviz DOT language.

Examples

{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}

transforms.*.inputs

required [string]

A list of upstream source or transform IDs.

Wildcards (*) are supported.

See configuration for more info.