Triton Client推理示例代码

Question

在操作系统: Ubuntu 22.04，安装了docker，在docker上，通过docker pull nvcr.io/nvidia/tritonserver:23.08-py3安装了Triton 镜像，镜像中有个算法模型people_detector，整体的背景情况如【】内所示。现需求：在宿主机上编写Triton客户端代码，连接Triton Server并测试模型推理服务，包括发送图片、返回识别结果、绘制目标框及保存图片，给出整体的客户端测试源码。
【
（1）yolov8n.pt的模型从ultralytics的官网上下载的，没有做任何改变。
（2）yolo export model=./models/yolov8n.pt format=onnx，转化生成yolov8n.onnx
（3）trtexec --onnx=./models/yolov8n.onnx --saveEngine=people_detector.engine --fp16，转化生成people_detector.engine；
（4）把模型放到docker环境的models下：mv people_detector.engine /models/people_detector/1/model.plan
（5）models在docker环境下的目录结构
root@08dd250b903b:/workspace# tree /models
/models
|-- car_detector
| |-- 1
| | -- model.plan | -- config.pbtxt
-- people_detector |-- 1 | -- model.plan
`-- config.pbtxt
（6）每个模型创建 config.pbtxt的内容如下：
4 directories, 4 files
root@08dd250b903b:/workspace#
为每个模型创建 config.pbtxt，指定模型的输入输出和类型。
people_detector 的 config.pbtxt:
nano /models/people_detector/config.pbtxt
内容如下：
name: "people_detector"
platform: "tensorrt_plan"
max_batch_size: 1
input [
{
name: "images"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [3, 640, 640]
}
]
output [
{
name: "output0"
data_type: TYPE_FP32
dims: [84, 8400]
}
]
instance_group [
{
kind: KIND_GPU
}
]
（7）trition server模型运行起来的情况如下：
root@08dd250b903b:/workspace# tritonserver --model-repository=/models
I1104 05:55:25.323236 1225 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I1104 05:55:25.323261 1225 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I1104 05:55:25.323263 1225 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
I1104 05:55:25.408497 1225 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7fe236000000' with size 268435456
I1104 05:55:25.408587 1225 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1104 05:55:25.409377 1225 model_lifecycle.cc:462] loading: car_detector:1
I1104 05:55:25.409388 1225 model_lifecycle.cc:462] loading: people_detector:1
I1104 05:55:25.409711 1225 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I1104 05:55:25.409718 1225 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.15
I1104 05:55:25.409720 1225 tensorrt.cc:81] 'tensorrt' TRITONBACKEND API version: 1.15
I1104 05:55:25.409722 1225 tensorrt.cc:105] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I1104 05:55:25.410435 1225 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: people_detector (version 1)
I1104 05:55:25.410533 1225 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: car_detector (version 1)
I1104 05:55:25.410860 1225 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I1104 05:55:25.418639 1225 logging.cc:46] Loaded engine size: 8 MiB
I1104 05:55:25.418872 1225 logging.cc:46] Loaded engine size: 8 MiB
I1104 05:55:25.436701 1225 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 12 (MiB)
I1104 05:55:25.438990 1225 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +6, now: CPU 0, GPU 6 (MiB)
I1104 05:55:25.442070 1225 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: people_detector_0_0 (GPU device 0)
I1104 05:55:25.444352 1225 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: car_detector_0_0 (GPU device 0)
I1104 05:55:25.447806 1225 logging.cc:46] Loaded engine size: 8 MiB
I1104 05:55:25.455944 1225 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +6, now: CPU 0, GPU 6 (MiB)
I1104 05:55:25.456093 1225 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I1104 05:55:25.456259 1225 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +11, now: CPU 0, GPU 17 (MiB)
I1104 05:55:25.456390 1225 instance_state.cc:188] Created instance people_detector_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I1104 05:55:25.456467 1225 model_lifecycle.cc:819] successfully loaded 'people_detector'
I1104 05:55:25.461242 1225 logging.cc:46] Loaded engine size: 8 MiB
I1104 05:55:25.470004 1225 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +6, now: CPU 0, GPU 23 (MiB)
I1104 05:55:25.470312 1225 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +12, now: CPU 1, GPU 35 (MiB)
I1104 05:55:25.470440 1225 instance_state.cc:188] Created instance car_detector_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I1104 05:55:25.470778 1225 model_lifecycle.cc:819] successfully loaded 'car_detector'
I1104 05:55:25.470883 1225 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1104 05:55:25.470928 1225 server.cc:631]
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-c |
| | | ompute-capability":"6.000000","default-max-batch-size":"4"}} |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------+

I1104 05:55:25.495944 I1104 05:55:25.496040 I1104 05:55:25.496123 +------------------- | Option +------------------- | server_id | server_version | server_extensions | | model_repository_path[0] | model_control_mode | strict_model_config | rate_limit | pinned_memory_pool_byte_size | cuda_memory_pool_byte_size{0} | min_supported_comp | strict_readiness | exit_timeout | cache_enabled +------------------- 1225 metrics.cc:810] Collecting metrics for GPU 0: NVIDIA GeForce RTX 4090
1225 metrics.cc:703] Collecting CPU metrics
1225 tritonserver.cc:2435]
---------------+----------------------------------------------------------------------------------------------------------------------------------------+
| Value |
---------------+----------------------------------------------------------------------------------------------------------------------------------------+
| triton |
| 2.37.0 |
| classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory |
| cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| /models |
| MODE_NONE |
| 0 |
| OFF |
| 268435456 |
| 67108864 |
ute_capability | 6.0 |
| 1 |
| 30 |
| 0 |
---------------+----------------------------------------------------------------------------------------------------------------------------------------+

I1104 05:55:25.496855 1225 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I1104 05:55:25.496963 1225 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I1104 05:55:25.538731 1225 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
】

Answer 1

以下是一个完整的Python客户端代码，用于连接Triton Server，发送图片进行推理，接收识别结果，绘制目标框并保存图片。请确保您已经安装了必要的Python库，包括tritonclient, numpy, 和 opencv-python。

python
import cv2
import numpy as np
import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException

def letterbox(image, new_shape=(640, 640), color=(114, 114, 114)):
    """Resize image with unchanged aspect ratio using padding."""
    shape = image.shape[:2]  # current shape [height, width]
    ratio = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    new_unpad = (int(round(shape[1] * ratio)), int(round(shape[0] * ratio)))
    dw = new_shape[1] - new_unpad[0]  # width padding
    dh = new_shape[0] - new_unpad[1]  # height padding
    dw /= 2  # divide padding into 2 sides
    dh /= 2
    image = cv2.resize(image, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    image = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
    return image, ratio, (dw, dh)

def preprocess_image(image_path):
    """Preprocess the image for model input."""
    image = cv2.imread(image_path)
    original_image = image.copy()
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image, ratio, (dw, dh) = letterbox(image, new_shape=(640, 640))
    image = image.astype(np.float32) / 255.0
    image = np.transpose(image, (2, 0, 1))  # HWC to CHW
    image = np.expand_dims(image, axis=0)  # Add batch dimension
    return image, original_image, ratio, (dw, dh)

def postprocess_output(output_data, ratio, dwdh):
    """Postprocess the model output to get detections."""
    output_data = output_data[0]  # Remove batch dimension
    output_data = np.transpose(output_data, (1, 0))  # [8400, 84]

    # Split bbox coordinates and class probabilities
    boxes = output_data[:, :4]
    scores = output_data[:, 4:]

    # Apply sigmoid to scores
    scores = 1 / (1 + np.exp(-scores))

    # Get max scores and corresponding class indices
    class_scores = np.max(scores, axis=1)
    class_ids = np.argmax(scores, axis=1)

    # Thresholding
    conf_threshold = 0.5
    idxs = np.where(class_scores > conf_threshold)

    boxes = boxes[idxs]
    class_scores = class_scores[idxs]
    class_ids = class_ids[idxs]

    # Rescale boxes to original image
    boxes[:, 0] -= dwdh[0]
    boxes[:, 1] -= dwdh[1]
    boxes[:, 2] -= dwdh[0]
    boxes[:, 3] -= dwdh[1]
    boxes /= ratio[0]

    # Convert from [x_center, y_center, width, height] to [x1, y1, x2, y2]
    boxes_xyxy = np.zeros_like(boxes)
    boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2  # x1
    boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2  # y1
    boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2  # x2
    boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2  # y2

    detections = []
    for i in range(len(boxes_xyxy)):
        detection = {
            'box': boxes_xyxy[i],
            'score': class_scores[i],
            'class_id': class_ids[i]
        }
        detections.append(detection)

    return detections

def draw_boxes(image, detections):
    """Draw bounding boxes on the image."""
    for det in detections:
        x1, y1, x2, y2 = det['box'].astype(int)
        cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
        label = f"Class {det['class_id']}: {det['score']:.2f}"
        cv2.putText(image, label, (x1, y1 - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.9, (36, 255, 12), 2)
    return image

def main():
    # Configuration
    image_path = 'input.jpg'  # Path to your input image
    output_image_path = 'output.jpg'  # Path to save the output image
    model_name = 'people_detector'
    server_url = 'localhost:8000'  # Triton server URL

    # Preprocess the image
    input_image, original_image, ratio, dwdh = preprocess_image(image_path)

    # Create Triton client and send request
    try:
        client = httpclient.InferenceServerClient(url=server_url)
        inputs = [
            httpclient.InferInput('images', input_image.shape, "FP32")
        ]
        inputs[0].set_data_from_numpy(input_image)

        outputs = [
            httpclient.InferRequestedOutput('output0')
        ]

        response = client.infer(model_name, inputs, outputs=outputs)
        output_data = response.as_numpy('output0')

        # Postprocess the output
        detections = postprocess_output(output_data, ratio, dwdh)

        # Draw bounding boxes and save image
        image_with_boxes = draw_boxes(original_image, detections)
        cv2.imwrite(output_image_path, image_with_boxes)
        print(f"Output saved to {output_image_path}")

    except InferenceServerException as e:
        print("Inference failed: " + str(e))

if __name__ == '__main__':
    main()

注意事项：

安装必要的库：
- 您需要安装Triton客户端库、NumPy和OpenCV：
```
bash
pip install tritonclient[http] numpy opencv-python
```
代码说明：
- preprocess_image函数用于读取并预处理输入图像，调整大小并归一化。
- postprocess_output函数用于处理模型的输出，提取检测结果。
- draw_boxes函数在原始图像上绘制检测到的目标框。
- main函数连接Triton Server，发送推理请求，接收并处理结果，然后保存带有检测框的图像。
模型输出处理：
- 由于模型输出的维度是[1, 84, 8400]，需要对输出进行转置和处理，以提取边界框和类别信息。
- 使用了Sigmoid函数对分数进行激活，然后应用阈值筛选。
运行代码：
- 将您的输入图像路径替换image_path变量中的'input.jpg'。
- 确保Triton Server正在运行并且模型已正确加载。
- 运行脚本：
```
bash
python client.py
```
结果：
- 代码将在指定的输出路径保存一张带有检测框的图像。

提示：

如果您对模型的输入输出有特殊的预处理或后处理需求，请根据实际情况调整代码。
如果需要进行非极大值抑制（NMS），可以使用如torchvision库中的NMS函数，或者自行实现。

希望这能帮助您完成Triton客户端的编写和测试。如有任何问题，请随时提问。

Triton Client推理示例代码

Question

Thought

Answer

Share this Q&A