Pipeline components reference

This page documents the top-level configuration for pipeline components: sources, transforms, sinks, and enrichment tables.

These fields define the structure of your observability data pipeline. Each component is defined as a table within these sections, with component-specific configuration options.

For other top-level configuration options, see:

  • Global Options - Global settings like data directories and timezone
  • API - Configure Vector's observability API
  • Schema - Configure Vector's internal schema system
  • Secrets - Configure secrets management

enrichment_tables

optional object
All configured enrichment tables.

enrichment_tables.*

required object
An enrichment table.
File-specific settings.
Relevant when: type = "file"
File encoding configuration.
The delimiter used to separate fields in each row of the CSV file.
default: ,

Whether or not the file contains column headers.

When set to true, the first row of the CSV file will be read as the header row, and the values will be used for the names of each column. This is the default behavior.

When set to false, columns are referred to by their numerical index.

default: true
File encoding type.
Enum options
OptionDescription
csvDecodes the file as a CSV (comma-separated values) file.

The path of the enrichment table file.

Currently, only CSV files are supported.

The interval used for making writes visible in the table. Longer intervals might get better performance, but there is a longer delay before the data is visible in the table. Since every TTL scan makes its changes visible, only use this value if it is shorter than the scan_interval.

By default, all writes are made visible immediately.

Relevant when: type = "memory"

Extra graph configuration

Configure output for component when generated with graph command

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

A single graph node attribute in graphviz DOT language.
Examples
{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}

A list of upstream source or transform IDs.

Wildcards (*) are supported.

See configuration for more info.

Configuration of internal metrics
Relevant when: type = "memory"

Determines whether to include the key tag on internal metrics.

This is useful for distinguishing between different keys while monitoring. However, the tag’s cardinality is unbounded.

default: false

The locale to use when querying the database.

MaxMind includes localized versions of some of the fields within their database, such as country name. This setting can control which of those localized versions are returned by the transform.

More information on which portions of the geolocation data are localized, and what languages are available, can be found here.

Relevant when: type = "geoip"
default: en

Maximum size of the table in bytes. All insertions that make this table bigger than the maximum size are rejected.

By default, there is no size limit.

Relevant when: type = "memory"

Path to the MaxMind GeoIP2 or GeoLite2 binary city database file (GeoLite2-City.mmdb).

Other databases, such as the country database, are not supported. mmdb enrichment table can be used for other databases.

Relevant when: type = "geoip" or type = "mmdb"
The scan interval used to look for expired records. This is provided as an optimization to ensure that TTL is updated, but without doing too many cache scans.
Relevant when: type = "memory"
default: 30

Key/value pairs representing mapped log field names and types.

This is used to coerce log fields from strings into their proper types. The available types are listed in the Types list below.

Timestamp coercions need to be prefaced with timestamp|, for example "timestamp|%F". Timestamp specifiers can use either of the following:

  1. One of the built-in-formats listed in the Timestamp Formats table below.
  2. The time format specifiers from Rust’s chrono library.

Types

  • bool
  • string
  • float
  • integer
  • date
  • timestamp (see the table below for formats)

Timestamp Formats

FormatDescriptionExample
%F %TYYYY-MM-DD HH:MM:SS2020-12-01 02:37:54
%v %TDD-Mmm-YYYY HH:MM:SS01-Dec-2020 02:37:54
%FT%TISO 8601/RFC 3339, without time zone2020-12-01T02:37:54
%FT%TZISO 8601/RFC 3339, UTC2020-12-01T09:37:54Z
%+ISO 8601/RFC 3339, UTC, with time zone2020-12-01T02:37:54-07:00
%a, %d %b %Y %TRFC 822/RFC 2822, without time zoneTue, 01 Dec 2020 02:37:54
%a %b %e %T %Yctime formatTue Dec 1 02:37:54 2020
%sUNIX timestamp1606790274
%a %d %b %T %Ydate command, without time zoneTue 01 Dec 02:37:54 2020
%a %d %b %T %Z %Ydate command, with time zoneTue 01 Dec 02:37:54 PST 2020
%a %d %b %T %z %Ydate command, with numeric time zoneTue 01 Dec 02:37:54 -0700 2020
%a %d %b %T %#z %Ydate command, with numeric time zone (minutes can be missing or present)Tue 01 Dec 02:37:54 -07 2020
Relevant when: type = "file"
Represents mapped log field names and types.
Configuration for source functionality.
Relevant when: type = "memory"

Batch size for data exporting. Used to prevent exporting entire table at once and blocking the system.

By default, batches are not used and entire table is exported.

Set to true to export expired items via the expired output port. Expired items ignore other settings and are exported as they are flushed from the table.
default: false
Interval for exporting all data from the table when used as a source.

If set to true, all data will be removed from cache after exporting. Only valid if used as a source and export_interval > 0

By default, export will not remove data from cache

default: false
Key to use for this component when used as a source. This must be different from the component key.
TTL (time-to-live in seconds) is used to limit the lifetime of data stored in the cache. When TTL expires, data behind a specific key in the cache is removed. TTL is reset when the key is replaced.
Relevant when: type = "memory"
default: 600
Field in the incoming value used as the TTL override.
Relevant when: type = "memory"
enrichment_tables.*.type
required string enum
enrichment table type
Enum options
OptionDescription
fileExposes data from a static file as an enrichment table.
geoipExposes data from a MaxMind GeoIP2 database as an enrichment table.
memoryExposes data from a memory cache as an enrichment table. The cache can be written to using a sink.
mmdbExposes data from a MaxMind database as an enrichment table.

sinks

optional object
All configured sinks.

sinks.*

required object
A sink.
sinks.*.buffer
optional object

Configures the buffering behavior for this sink.

More information about the individual buffer types, and buffer behavior, can be found in the Buffering Model section.

The maximum number of events allowed in the buffer.
Relevant when: type = "memory"
default: 500

The maximum allowed amount of allocated memory the buffer can hold.

If type = "disk" then must be at least ~256 megabytes (268435488 bytes).

sinks.*.buffer.type
optional string enum
The type of buffer to use.
Enum options
OptionDescription
disk

Events are buffered on disk.

This is less performant, but more durable. Data that has been synchronized to disk will not be lost if Vector is restarted forcefully or crashes.

Data is synchronized to disk every 500ms.

memory

Events are buffered in memory.

This is more performant, but less durable. Data will be lost if Vector is restarted forcefully or crashes.

default: memory
sinks.*.buffer.when_full
optional string enum
Event handling behavior when a buffer is full.
Enum options
OptionDescription
block

Wait for free space in the buffer.

This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge.

drop_newest

Drops the event instead of waiting for free space in buffer.

The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.

default: block
sinks.*.graph
optional object

Extra graph configuration

Configure output for component when generated with graph command

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

A single graph node attribute in graphviz DOT language.
Examples
{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}
sinks.*.healthcheck
optional object
Healthcheck configuration.
Whether or not to check the health of the sink when Vector starts up.
default: true
Timeout duration for healthcheck in seconds.
default: 10(seconds)

The full URI to make HTTP healthcheck requests to.

This must be a valid URI, which requires at least the scheme and host. All other components – port, path, etc – are allowed as well.

sinks.*.inputs
required [string]

A list of upstream source or transform IDs.

Wildcards (*) are supported.

See configuration for more info.

sinks.*.proxy
optional object

Proxy configuration.

Configure to proxy traffic through an HTTP(S) proxy when making external requests.

Similar to common proxy configuration convention, you can set different proxies to use based on the type of traffic being proxied. You can also set specific hosts that should not be proxied.

Enables proxying support.
default: true
sinks.*.proxy.http
optional string

Proxy endpoint to use when proxying HTTP traffic.

Must be a valid URI string.

Examples
"http://foo.bar:3128"
sinks.*.proxy.https
optional string

Proxy endpoint to use when proxying HTTPS traffic.

Must be a valid URI string.

Examples
"http://foo.bar:3128"
sinks.*.proxy.no_proxy
optional [string]

A list of hosts to avoid proxying.

Multiple patterns are allowed:

PatternExample match
Domain namesexample.com matches requests to example.com
Wildcard domains.example.com matches requests to example.com and its subdomains
IP addresses127.0.0.1 matches requests to 127.0.0.1
CIDR blocks192.168.0.0/16 matches requests to any IP addresses in this range
Splat* matches all hosts

sources

optional object
All configured sources.

sources.*

required object
A source.
sources.*.graph
optional object

Extra graph configuration

Configure output for component when generated with graph command

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

A single graph node attribute in graphviz DOT language.
Examples
{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}
sources.*.proxy
optional object

Proxy configuration.

Configure to proxy traffic through an HTTP(S) proxy when making external requests.

Similar to common proxy configuration convention, you can set different proxies to use based on the type of traffic being proxied. You can also set specific hosts that should not be proxied.

Enables proxying support.
default: true
sources.*.proxy.http
optional string

Proxy endpoint to use when proxying HTTP traffic.

Must be a valid URI string.

Examples
"http://foo.bar:3128"
sources.*.proxy.https
optional string

Proxy endpoint to use when proxying HTTPS traffic.

Must be a valid URI string.

Examples
"http://foo.bar:3128"
sources.*.proxy.no_proxy
optional [string]

A list of hosts to avoid proxying.

Multiple patterns are allowed:

PatternExample match
Domain namesexample.com matches requests to example.com
Wildcard domains.example.com matches requests to example.com and its subdomains
IP addresses127.0.0.1 matches requests to 127.0.0.1
CIDR blocks192.168.0.0/16 matches requests to any IP addresses in this range
Splat* matches all hosts

transforms

optional object
All configured transforms.

transforms.*

required object
A transform.
transforms.*.graph
optional object

Extra graph configuration

Configure output for component when generated with graph command

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

A single graph node attribute in graphviz DOT language.
Examples
{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}
transforms.*.inputs
required [string]

A list of upstream source or transform IDs.

Wildcards (*) are supported.

See configuration for more info.