Understand the full path through your distributed application. 了解分布式应用程序的完整路径。
Traces give us the big picture of what happens when a request is made to an application. Whether your application is a monolith with a single database or a sophisticated mesh of services, traces are essential to understanding the full “path” a request takes in your application. Traces让我们全面了解请求发送到应用程序时发生的情况。无论您的应用程序是具有单个数据库的单体架构,还是具有复杂的服务网格,Trace对于了解请求在您的应用程序中完整的传播“路径”都至关重要。
Let’s explore this with three units of work, represented as Spans: 让我们用三个工作单元来探讨这个问题,这些单元是Spans
hello span:
{
"name": "hello",
"context": {
"trace_id": "0x5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "0x051581bf3cb55c13"
},
"parent_id": null,
"start_time": "2022-04-29T18:52:58.114201Z",
"end_time": "2022-04-29T18:52:58.114687Z",
"attributes": {
"http.route": "some_route1"
},
"events": [
{
"name": "Guten Tag!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
}
]
}
This is the root span, denoting the beginning and end of the entire operation. Note that it has a trace_id field indicating the trace, but has no parent_id. That’s how you know it’s the root span. 这是Root Span,表示整个操作的开始和结束。 请注意,它有一个指向Trace的字段——trace_id,但没有parent_id。这就是你如何判断它是否是Root Span的方法。
hello-greetings span:
{
"name": "hello-greetings",
"context": {
"trace_id": "0x5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "0x5fb397be34d26b51"
},
"parent_id": "0x051581bf3cb55c13",
"start_time": "2022-04-29T18:52:58.114304Z",
"end_time": "2022-04-29T22:52:58.114561Z",
"attributes": {
"http.route": "some_route2"
},
"events": [
{
"name": "hey there!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
},
{
"name": "bye now!",
"timestamp": "2022-04-29T18:52:58.114585Z",
"attributes": {
"event_attributes": 1
}
}
]
}
This span encapsulates specific tasks, like saying greetings, and its parent is the hello span. Note that it shares the same trace_id as the root span, indicating it’s a part of the same trace. Additionally, it has a parent_id that matches the span_id of the hello span. 此Span封装了特定任务,例如说问候语,其父级(parent_id)是hello span。请注意,它与Root Span共享相同的trace_id, 表示它们是同一Trace的一部分。此外,它的parent_id指向hello span,即值与hello span的span_id相同。
hello-salutations span:
{
"name": "hello-salutations",
"context": {
"trace_id": "0x5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "0x93564f51e1abe1c2"
},
"parent_id": "0x051581bf3cb55c13",
"start_time": "2022-04-29T18:52:58.114492Z",
"end_time": "2022-04-29T18:52:58.114631Z",
"attributes": {
"http.route": "some_route3"
},
"events": [
{
"name": "hey there!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
}
]
}
This span represents the third operation in this trace and, like the previous one, it’s a child of the ‘hello’ Span. That also makes it a sibling of the hello-greetings span. 该Span代表此Trace中的第三个操作,与前一个Span一样,它是hello span的子级。这也使它和Span hello-greetings 处于Span的同级。
These three blocks of JSON all share the same trace_id, and the parent_id field represents a hierarchy. That makes it a Trace! 这三个JSON块都共享相同的trace_id(同属于一个Trace),并且通过parent_id表示层次结构。这使它成为 Trace!
Another thing you’ll note is that each Span looks like a structured log. That’s because it kind of is! One way to think of Traces is that they’re a collection of structured logs with context, correlation, hierarchy, and more baked in. However, these “structured logs” can come from different processes, services, VMs, data centers, and so on. This is what allows tracing to represent an end-to-end view of any system. 另一个您会注意到的事情是,每个Span看起来都像一个结构化日志。那是因为它就是这种类型!一种看待Traces 的方式是:它是一个 具有上下文、相关性、层次结构等结构化日志的集合。 但是,这些“结构化日志”可以来自不同的进程、服务、 虚拟机、数据中心等。这就是为什么可以通过Trace表示任何系统的端到端视图的原因。
To understand how tracing in OpenTelemetry works, let’s look at a list of components that will play a part in instrumenting our code. 要了解 OpenTelemetry 中的Trace工作原理,让我们看一下将在测量我们的代码中发挥作用的一系列组件。
A Tracer Provider (sometimes called TracerProvider) is a factory for Tracers. In most applications, a Tracer Provider is initialized once and its lifecycle matches the application’s lifecycle. Tracer Provider initialization also includes Resource and Exporter initialization. It is typically the first step in tracing with OpenTelemetry. In some language SDKs, a global Tracer Provider is already initialized for you. Tracer Provider(有时又称TraceProvider)是Traces的生产工厂。在大多数应用程序中,Tracer Provider只需要初始化一次,且其生命周期和应用程序的生命周期一致。Tracer Provider的初始化包括Resource和Exporter的初始化。在使用OpenTelemetry的程序中,初始化Tracer Provider往往是第一步。在一些语言的SDK中,会为零自动生成一个已经初始化好的、全局的Tracer Provider。
A Tracer creates spans containing more information about what is happening for a given operation, such as a request in a service. Tracers are created from Tracer Providers. Tracer 创建的Spans包含有关给定操作(例如服务中的请求)所发生情况的详细信息。Tracers是由Tracer Providers创建的。
Trace Exporters send traces to a consumer. This consumer can be standard output for debugging and development-time, the OpenTelemetry Collector, or any open source or vendor backend of your choice. Trace Exporters会将Traces发送给消费者。消费者提供标准输出以方便调试和开发工作,OpenTelemetry Collector和其他开源或商业项目都是您的可选项。
上下文传播 Context Propagation is the core concept that enables Distributed Tracing. With Context Propagation, Spans can be correlated with each other and assembled into a trace, regardless of where Spans are generated. To learn more about this topic, see the concept page on Context Propagation. Context Propagation是分布式跟踪的核心概念。正因为有Context Propagation,产生于不同位置的Spans才可以相互关联,并组成一个Trace。想学习更多这个主题,可以参阅Context Propagation。
A span represents a unit of work or operation. Spans are the building blocks of Traces. In OpenTelemetry, they include the following information: Span表示一个工作或者操作单元。Spans是Traces的组成块。在Opentelemetry中,它是由下面信息组成:
Sample span:
{
"name": "/v1/sys/health",
"context": {
"trace_id": "7bba9f33312b3dbb8b2c2c62bb7abe2d",
"span_id": "086e83747d0e381e"
},
"parent_id": "",
"start_time": "2021-10-22 16:04:01.209458162 +0000 UTC",
"end_time": "2021-10-22 16:04:01.209514132 +0000 UTC",
"status_code": "STATUS_CODE_OK",
"status_message": "",
"attributes": {
"net.transport": "IP.TCP",
"net.peer.ip": "172.17.0.1",
"net.peer.port": "51820",
"net.host.ip": "10.177.2.152",
"net.host.port": "26040",
"http.method": "GET",
"http.target": "/v1/sys/health",
"http.server_name": "mortar-gateway",
"http.route": "/v1/sys/health",
"http.user_agent": "Consul Health Check",
"http.scheme": "http",
"http.host": "10.177.2.152:26040",
"http.flavor": "1.1"
},
"events": [
{
"name": "",
"message": "OK",
"timestamp": "2021-10-22 16:04:01.209512872 +0000 UTC"
}
]
}
Spans can be nested, as is implied by the presence of a parent span ID: child spans represent sub-operations. This allows spans to more accurately capture the work done in an application. Spans可以嵌套,如同parent span ID所表达的:Child Spans表示子操作。这使得Spans可以更精准地匹配应用程序中的行为特点。
Span上下文 Span context is an immutable object on every span that contains the following: Span context是每个Span上不可变的对象,它包含如下信息:
Span context is the part of a span that is serialized and propagated alongside Distributed Context and Baggage. Span context是Span的一部分,它是一种被序列化的数据,在Distributed Context(分布式上下文)和Baggage上传递。
属性 Attributes are key-value pairs that contain metadata that you can use to annotate a Span to carry information about the operation it is tracking. Attributes是包含元数据的键值对,您可以用它来注释Span,以携带有关正在跟踪的操作需要的信息。
For example, if a span tracks an operation that adds an item to a user’s shopping cart in an eCommerce system, you can capture the user’s ID, the ID of the item to add to the cart, and the cart ID. 举个例子,如果一个Span跟踪电子商务系统中用户将商品“加入购物车”的操作,您需要获取用户的ID、添加到购物车中商品ID以及购物车ID。
You can add attributes to spans during or after span creation. Prefer adding attributes at span creation to make the attributes available to SDK sampling. If you have to add a value after span creation, update the span with the value. 您可以在Span创建期间或者之后,给Span添加Attributes。在Span创建期间添加Attributes,可以让软件开发包在采样时,Attributes就处于可用状态。如果您希望在Span创建后添加Attributes,则需要更新Span。
Attributes have the following rules that each language SDK implements: 对于Attributes,每种语言实现的SDK都需要实现如下的规则
Additionally, there are Semantic Attributes, which are known naming conventions for metadata that is typically present in common operations. It’s helpful to use semantic attribute naming wherever possible so that common kinds of metadata are standardized across systems. 此外,还有Semantic Attribute(语义属性),它们是通常出现在常见操作中的元数据的命名约定。尽可能使用Semantic Attribute命名是很有帮助的,以便跨系统标准化常见类型的元数据。
A Span Event can be thought of as a structured log message (or annotation) on a Span, typically used to denote a meaningful, singular point in time during the Span’s duration. Span Event可以看做Span中的结构化日志消息(或注释),通常用于表示Span的持续时间内有意义的、特定的时间点。
For example, consider two scenarios in a web browser: 例如,我们看下Web浏览器中的两个场景:
A Span is best used to the first scenario because it’s an operation with a start and an end. Span最适合第一个场景,因为它是一个包含开始和结束时间的操作。
A Span Event is best used to track the second scenario because it represents a meaningful, singular point in time. Span Event最适合于跟踪第二个场景,因为它代表了有意义的单一时间点。
何时使用Span Event 与Span Attributes
Since span events also contain attributes, the question of when to use events instead of attributes might not always have an obvious answer. To inform your decision, consider whether a specific timestamp is meaningful. 由于Span Events也包含属性,因此何时使用Span Events而不是Span Attributes的问题可能并不总是有明显的答案。为了告知您的决定,请考虑特定时间戳是否有意义。
For example, when you’re tracking an operation with a span and the operation completes, you might want to add data from the operation to your telemetry. 例如,当您使用Span跟踪的操作完成时,您可能希望将操作中的数据添加到遥测数据中。
Links exist so that you can associate one span with one or more spans, implying a causal relationship. For example, let’s say we have a distributed system where some operations are tracked by a trace. Span Links的存在可以让一个Span与其他一个或者多个Span关联,这暗示着它们的因果关系。比如,在被Trace跟踪操作的分布式系统中。
In response to some of these operations, an additional operation is queued to be executed, but its execution is asynchronous. We can track this subsequent operation with a trace as well. 作为对某些操作的响应,会异步地排队执行一个附加操作。我们也可以通过Trace来跟踪后续操作。
We would like to associate the trace for the subsequent operations with the first trace, but we cannot predict when the subsequent operations will start. We need to associate these two traces, so we will use a span link. 我们希望将后续操作的Trace与第一个Trace关联起来,但我们无法预测后续操作何时开始。因此我们需要使用Span Link来关联这两个Trace,。
You can link the last span from the first trace to the first span in the second trace. Now, they are causally associated with one another. 您可以将第一条Trace的最后一个Span链接到第二条Trace中的第一个Span。现在,它们彼此之间存在因果关系了。
Links are optional but serve as a good way to associate trace spans with one another. Span Link是可选的,但它是一种将跟踪Span相互关联的好方法。
Each span has a status. The three possible values are: 每个Span都有一个状态。这三个可能的值是:
The default value is Unset. A span status that is Unset means that the operation it tracked successfully completed without an error. 默认值为Unset。一个状态为Unset的Spa意味着它跟踪的操作已成功完成,没有错误。
When a span status is Error, then that means some error occurred in the operation it tracks. For example, this could be due to an HTTP 500 error on a server handling a request. 当 Span 状态为Error时,则表示它所跟踪的操作发生了错误。例如,这可能是由于服务器处理请求时发生了HTTP 500错误。
When a span status is Ok, then that means the span was explicitly marked as error-free by the developer of an application. Although this is unintuitive, it’s not required to set a span status as Ok when a span is known to have completed without error, as this is covered by Unset. What Ok does is represent an unambiguous “final call” on the status of a span that has been explicitly set by a user. This is helpful in any situation where a developer wishes for there to be no other interpretation of a span other than “successful”. 当 Span 状态为Ok时,这意味着该 Span 被应用程序开发人员显式地标记为没有错误。虽然这很不直观, 因为当Span状态为Unset 时已经表达了操作完成时没有错误,完全可以覆盖Ok状态的意思,所以没有必要设置Ok状态。Ok 的作用是表示对用户已明确设置的、Span的“最终调用”的状态。 在开发人员希望除了“成功”之外没有其他对Span的解释的情况下,这很有帮助。
To reiterate: Unset represents a span that completed without an error. Ok represents when a developer explicitly marks a span as successful. In most cases, it is not necessary to explicitly mark a span as Ok. 重申:Unset表示Span最终没有发生任何错误。Ok表示开发人员明确设置Span为成功状态。大部分情况下,没有没必要显示设置Ok状态。
When a span is created, it is one of Client, Server, Internal, Producer, or Consumer. This span kind provides a hint to the tracing backend as to how the trace should be assembled. According to the OpenTelemetry specification, the parent of a server span is often a remote client span, and the child of a client span is usually a server span. Similarly, the parent of a consumer span is always a producer and the child of a producer span is always a consumer. If not provided, the span kind is assumed to be internal. 创建Span时,它可以是Client 、Server 、Internal、Producer之一或Consumer。这种Span类型向Trace后端提供了有关如何组装Trace的依据。根据OpenTelemetry规范,Server类型Span的父级通常是远程Client类型Span,以及Client类型 Span的子级Span通常是Server类型Span。同样,Consumer类型Span的父级始终是Producer类型Span,而Producer类型Span的孩子始终是Consumer类型的Span。如果没有提供类型,假定 Span 类型是internal。
For more information regarding SpanKind, see SpanKind.
A client span represents a synchronous outgoing remote call such as an outgoing HTTP request or database call. Note that in this context, “synchronous” does not refer to async/await, but to the fact that it is not queued for later processing. 客户端Span表示同步的出站远程调用,例如出站HTTP请求或数据库调用。请注意,在这种情况下,“同步”并不是指异步/等待, 而是指其不会被排队以供稍后处理。
A server span represents a synchronous incoming remote call such as an incoming HTTP request or remote procedure call. 服务器Span表示同步的入站远程调用,例如入站的HTTP请求或远程过程调用。
Internal spans represent operations which do not cross a process boundary. Things like instrumenting a function call or an Express middleware may use internal spans. Internal Spans表示不跨越流程边界的操作。诸如测量函数调用或Express中间件之类的装置可能会使用到Internal Spans。
Producer spans represent the creation of a job which may be asynchronously processed later. It may be a remote job such as one inserted into a job queue or a local job handled by an event listener. Producer Spans表示创建一个稍后可能会异步处理的作业。它可能是一项远程作业,例如插入作业队列的作业或由事件侦听器处理的本地作业。
Consumer spans represent the processing of a job created by a producer and may start long after the producer span has already ended. Consumer Spans代表对生产者创建的作业的处理,并且可能在Producer Spans已经结束之后很久才开始。
For more information, see the traces specification.