<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Deployment on RockB</title><link>https://baeseokjae.github.io/tags/deployment/</link><description>Recent content in Deployment on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Mon, 18 May 2026 03:04:17 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/deployment/index.xml" rel="self" type="application/rss+xml"/><item><title>YOLOX Object Detection Python Deployment Developer Guide 2026</title><link>https://baeseokjae.github.io/posts/yolox-object-detection-python-deployment-developer-guide-2026/</link><pubDate>Mon, 18 May 2026 03:04:17 +0000</pubDate><guid>https://baeseokjae.github.io/posts/yolox-object-detection-python-deployment-developer-guide-2026/</guid><description>Complete guide to deploying YOLOX object detection in Python: training, ONNX/TensorRT export, FastAPI Docker API, and production benchmarks.</description><content:encoded><![CDATA[<p>YOLOX is Megvii&rsquo;s anchor-free object detection framework that ships models from 0.91M to 99.1M parameters, all deployable via PyTorch, ONNX, TensorRT, OpenVINO, or ncnn with five lines of Python. This guide covers every stage: environment setup, custom dataset training, multi-backend export, TensorRT quantization, and wrapping inference in a FastAPI/Docker production service.</p>
<h2 id="what-is-yolox-anchor-free-detection-for-python-developers">What Is YOLOX? Anchor-Free Detection for Python Developers</h2>
<p>YOLOX is an anchor-free, single-stage object detector introduced by Megvii in mid-2021 that removed the anchor hyperparameters that historically plagued YOLO variants. Instead of predicting bounding-box offsets relative to predefined anchors, YOLOX regresses absolute box coordinates directly at each grid cell, eliminating the tedious anchor-tuning step that had to be repeated for every new dataset. The architecture pairs this anchor-free head with a decoupled head design — separate branches for classification and localization — which the authors showed significantly improves convergence speed and final accuracy. On COCO, YOLOX achieves 47.3% AP in its standard configuration, and the XLarge variant pushes this to 51.1% mAP with 99.1M parameters at 16.1ms on a T4 GPU. The anchor-free approach makes YOLOX a natural fit for Python deployment pipelines where dataset diversity makes anchor pre-computation impractical. For developers already familiar with NumPy-style tensor manipulation, the output format — a flat <code>(num_proposals, 5 + num_classes)</code> tensor — is far easier to post-process than anchor-grid outputs from older YOLO versions.</p>
<h3 id="why-anchor-free-matters-for-deployment">Why Anchor-Free Matters for Deployment</h3>
<p>Anchor-free detection eliminates a class of preprocessing decisions that previously had to be baked into each exported model. With YOLOX you export once and run on any dataset shape without re-generating anchor configs. This is particularly valuable for Docker-based microservices where the model artifact needs to be immutable.</p>
<h2 id="yolox-model-variants-nano-to-xlarge--which-should-you-choose">YOLOX Model Variants: Nano to XLarge — Which Should You Choose?</h2>
<p>YOLOX model variants span six orders of magnitude in compute budget, ranging from YOLOX-Nano at 0.91M parameters to YOLOX-XLarge at 99.1M, giving Python developers a single unified codebase that covers everything from Raspberry Pi 4 inference to multi-GPU datacenter throughput. The Nano and Tiny variants (0.91M and 5.06M params) target embedded hardware like Jetson Nano and mobile devices; their ONNX exports fit comfortably within 4MB, making them deployable even on microcontrollers with ONNX Runtime Lite. The Small and Medium variants are the workhorses for CPU-server and single-GPU deployments, while Large and XLarge are tuned for maximum accuracy on high-end GPUs where inference latency is a secondary concern. For most production APIs running on cloud GPU instances (T4, A10), YOLOX-M delivers the best throughput-accuracy tradeoff at roughly 36.9ms on a T4. Choose Nano/Tiny for battery-powered or sub-$50 hardware, Small/Medium for cloud REST endpoints with SLA latency under 100ms, and Large/XLarge for offline batch processing pipelines where accuracy is paramount.</p>
<table>
  <thead>
      <tr>
          <th>Variant</th>
          <th>Parameters</th>
          <th>COCO mAP</th>
          <th>T4 Latency</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Nano</td>
          <td>0.91M</td>
          <td>25.8%</td>
          <td>6.5ms</td>
      </tr>
      <tr>
          <td>Tiny</td>
          <td>5.06M</td>
          <td>32.8%</td>
          <td>9.7ms</td>
      </tr>
      <tr>
          <td>Small</td>
          <td>8.94M</td>
          <td>40.5%</td>
          <td>15.0ms</td>
      </tr>
      <tr>
          <td>Medium</td>
          <td>25.3M</td>
          <td>46.9%</td>
          <td>36.9ms</td>
      </tr>
      <tr>
          <td>Large</td>
          <td>54.2M</td>
          <td>49.7%</td>
          <td>54.0ms</td>
      </tr>
      <tr>
          <td>XLarge</td>
          <td>99.1M</td>
          <td>51.1%</td>
          <td>99.1ms</td>
      </tr>
  </tbody>
</table>
<h3 id="yolox-nano-for-edge-devices">YOLOX-Nano for Edge Devices</h3>
<p>YOLOX-Nano&rsquo;s 0.91M parameter count makes it one of the smallest YOLO-family models capable of real-time COCO detection. On a Jetson Nano with TensorRT FP16, it consistently exceeds 30 FPS at 416×416 input resolution — enough for most surveillance and robotics applications.</p>
<h2 id="installing-yolox-and-setting-up-your-python-environment">Installing YOLOX and Setting Up Your Python Environment</h2>
<p>Setting up YOLOX requires Python 3.8+, PyTorch 1.13 or 2.x, and the Megvii YOLOX package installed from source — pip install from PyPI installs only a stub and is not sufficient for training or advanced inference. The canonical install path clones the GitHub repo, installs dependencies from <code>requirements.txt</code>, and runs <code>pip install -e .</code> in editable mode. This matters for deployment because the source install makes the <code>yolox</code> package importable directly, which is required for loading experiment configs programmatically from Python. CUDA 11.8 or 12.x is recommended; YOLOX&rsquo;s ONNX export path calls <code>torch.onnx.export</code> internally and requires a matched <code>onnxruntime-gpu</code> version. For Docker-based workflows, the project ships a pre-built <code>Dockerfile</code> that pins CUDA 11.8 + PyTorch 2.0, which is the safest starting point for reproducible deployments.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Clone and install</span>
</span></span><span style="display:flex;"><span>git clone https://github.com/Megvii-BaseDetection/YOLOX.git
</span></span><span style="display:flex;"><span>cd YOLOX
</span></span><span style="display:flex;"><span>pip install -r requirements.txt
</span></span><span style="display:flex;"><span>pip install -e .
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Download pretrained COCO weights (YOLOX-S as example)</span>
</span></span><span style="display:flex;"><span>wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth
</span></span></code></pre></div><h3 id="verifying-the-installation">Verifying the Installation</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> yolox
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> yolox.exp <span style="color:#f92672">import</span> get_exp
</span></span><span style="display:flex;"><span>exp <span style="color:#f92672">=</span> get_exp(exp_file<span style="color:#f92672">=</span><span style="color:#66d9ef">None</span>, exp_name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;yolox_s&#34;</span>)
</span></span><span style="display:flex;"><span>print(exp<span style="color:#f92672">.</span>num_classes)  <span style="color:#75715e"># Should print 80 for COCO</span>
</span></span></code></pre></div><p>If this succeeds without import errors, the editable install is working correctly and you can proceed to inference or training.</p>
<h2 id="running-yolox-inference-with-pretrained-coco-weights">Running YOLOX Inference with Pretrained COCO Weights</h2>
<p>Running YOLOX inference with pretrained COCO weights requires three steps: loading the experiment config, building the model, and calling the <code>Predictor</code> class that handles preprocessing and NMS. The <code>Predictor</code> class in <code>yolox/tools/demo.py</code> wraps the full inference pipeline — resize to input shape, normalize with ImageNet statistics, forward pass, and <code>multiclass_nms</code> — into a single <code>.inference()</code> call that returns a list of <code>(boxes, scores, class_ids)</code> tuples. YOLOX pretrained weights are released for all six variants and cover the full 80-class COCO vocabulary. For custom use cases, pretrained weights work as zero-shot detectors for common object categories immediately after download, without any fine-tuning. The model outputs raw predictions before NMS, which developers can intercept for custom post-processing (e.g., tracking integration where you want raw detections rather than filtered boxes).</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> cv2
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> torch
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> yolox.exp <span style="color:#f92672">import</span> get_exp
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> yolox.utils <span style="color:#f92672">import</span> fuse_model, get_model_info
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> yolox.data.data_augment <span style="color:#f92672">import</span> ValTransform
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> yolox.utils <span style="color:#f92672">import</span> postprocess
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Load experiment config</span>
</span></span><span style="display:flex;"><span>exp <span style="color:#f92672">=</span> get_exp(exp_file<span style="color:#f92672">=</span><span style="color:#66d9ef">None</span>, exp_name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;yolox_s&#34;</span>)
</span></span><span style="display:flex;"><span>exp<span style="color:#f92672">.</span>test_conf <span style="color:#f92672">=</span> <span style="color:#ae81ff">0.3</span>   <span style="color:#75715e"># confidence threshold</span>
</span></span><span style="display:flex;"><span>exp<span style="color:#f92672">.</span>nmsthre <span style="color:#f92672">=</span> <span style="color:#ae81ff">0.45</span>    <span style="color:#75715e"># NMS IoU threshold</span>
</span></span><span style="display:flex;"><span>exp<span style="color:#f92672">.</span>test_size <span style="color:#f92672">=</span> (<span style="color:#ae81ff">640</span>, <span style="color:#ae81ff">640</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>model <span style="color:#f92672">=</span> exp<span style="color:#f92672">.</span>get_model()
</span></span><span style="display:flex;"><span>model<span style="color:#f92672">.</span>eval()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Load weights</span>
</span></span><span style="display:flex;"><span>ckpt <span style="color:#f92672">=</span> torch<span style="color:#f92672">.</span>load(<span style="color:#e6db74">&#34;yolox_s.pth&#34;</span>, map_location<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;cpu&#34;</span>)
</span></span><span style="display:flex;"><span>model<span style="color:#f92672">.</span>load_state_dict(ckpt[<span style="color:#e6db74">&#34;model&#34;</span>])
</span></span><span style="display:flex;"><span>model<span style="color:#f92672">.</span>cuda()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Inference</span>
</span></span><span style="display:flex;"><span>preproc <span style="color:#f92672">=</span> ValTransform(legacy<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>)
</span></span><span style="display:flex;"><span>img <span style="color:#f92672">=</span> cv2<span style="color:#f92672">.</span>imread(<span style="color:#e6db74">&#34;image.jpg&#34;</span>)
</span></span><span style="display:flex;"><span>img_tensor, _ <span style="color:#f92672">=</span> preproc(img, <span style="color:#66d9ef">None</span>, exp<span style="color:#f92672">.</span>test_size)
</span></span><span style="display:flex;"><span>img_tensor <span style="color:#f92672">=</span> torch<span style="color:#f92672">.</span>from_numpy(img_tensor)<span style="color:#f92672">.</span>unsqueeze(<span style="color:#ae81ff">0</span>)<span style="color:#f92672">.</span>float()<span style="color:#f92672">.</span>cuda()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> torch<span style="color:#f92672">.</span>no_grad():
</span></span><span style="display:flex;"><span>    outputs <span style="color:#f92672">=</span> model(img_tensor)
</span></span><span style="display:flex;"><span>    outputs <span style="color:#f92672">=</span> postprocess(outputs, exp<span style="color:#f92672">.</span>num_classes, exp<span style="color:#f92672">.</span>test_conf, exp<span style="color:#f92672">.</span>nmsthre)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># outputs[0]: tensor of shape (N, 7) — x1, y1, x2, y2, obj_conf, class_conf, class_id</span>
</span></span><span style="display:flex;"><span>boxes <span style="color:#f92672">=</span> outputs[<span style="color:#ae81ff">0</span>][:, :<span style="color:#ae81ff">4</span>]
</span></span><span style="display:flex;"><span>scores <span style="color:#f92672">=</span> outputs[<span style="color:#ae81ff">0</span>][:, <span style="color:#ae81ff">4</span>] <span style="color:#f92672">*</span> outputs[<span style="color:#ae81ff">0</span>][:, <span style="color:#ae81ff">5</span>]
</span></span><span style="display:flex;"><span>class_ids <span style="color:#f92672">=</span> outputs[<span style="color:#ae81ff">0</span>][:, <span style="color:#ae81ff">6</span>]<span style="color:#f92672">.</span>int()
</span></span></code></pre></div><h2 id="training-yolox-on-a-custom-dataset-step-by-step">Training YOLOX on a Custom Dataset (Step-by-Step)</h2>
<p>Training YOLOX on a custom dataset follows a structured pipeline: annotate images in COCO JSON format, create an experiment (<code>.py</code>) config file that subclasses <code>Exp</code>, set <code>num_classes</code> and dataset paths, then launch training with the <code>train.py</code> entrypoint. YOLOX&rsquo;s experiment system is pure Python — every hyperparameter (batch size, learning rate schedule, augmentation policy, input resolution) lives in the experiment file, making it version-controllable and reproducible across environments. The COCO JSON format is the single required format; Roboflow, CVAT, and Labelbox all export to it directly. For datasets under 5,000 images, the recommended starting point is YOLOX-S pretrained weights with a 50-epoch schedule, cosine LR decay, and the default MixUp + Mosaic augmentation, which achieves competitive mAP without overfitting. For larger datasets (10K+), enabling multi-scale training (input sizes ranging from 448 to 832) typically adds 1–2 mAP points at the cost of ~30% longer training time.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># custom_exp.py — minimal custom dataset config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> yolox.exp <span style="color:#f92672">import</span> Exp <span style="color:#66d9ef">as</span> MyExp
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">CustomExp</span>(MyExp):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">def</span> __init__(self):
</span></span><span style="display:flex;"><span>        super()<span style="color:#f92672">.</span>__init__()
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>num_classes <span style="color:#f92672">=</span> <span style="color:#ae81ff">5</span>          <span style="color:#75715e"># your class count</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>depth <span style="color:#f92672">=</span> <span style="color:#ae81ff">0.33</span>             <span style="color:#75715e"># YOLOX-S depth multiplier</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>width <span style="color:#f92672">=</span> <span style="color:#ae81ff">0.50</span>             <span style="color:#75715e"># YOLOX-S width multiplier</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>max_epoch <span style="color:#f92672">=</span> <span style="color:#ae81ff">100</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>data_dir <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/data/coco_custom&#34;</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>train_ann <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;instances_train.json&#34;</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>val_ann <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;instances_val.json&#34;</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>input_size <span style="color:#f92672">=</span> (<span style="color:#ae81ff">640</span>, <span style="color:#ae81ff">640</span>)
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>random_size <span style="color:#f92672">=</span> (<span style="color:#ae81ff">14</span>, <span style="color:#ae81ff">26</span>)   <span style="color:#75715e"># multi-scale training</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>warmup_epochs <span style="color:#f92672">=</span> <span style="color:#ae81ff">5</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Launch training (8 GPUs)</span>
</span></span><span style="display:flex;"><span>python tools/train.py -f custom_exp.py -d <span style="color:#ae81ff">8</span> -b <span style="color:#ae81ff">64</span> --fp16 -o <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -c yolox_s.pth   <span style="color:#75715e"># load pretrained COCO weights</span>
</span></span></code></pre></div><h3 id="dataset-format-requirements">Dataset Format Requirements</h3>
<p>YOLOX requires COCO-style JSON annotations with <code>images</code>, <code>annotations</code>, and <code>categories</code> keys. Each annotation needs <code>bbox</code> in <code>[x, y, width, height]</code> format (not <code>[x1, y1, x2, y2]</code>). The <code>category_id</code> field must be 1-indexed; YOLOX internally subtracts 1 before computing losses. Failing to match this convention is the most common cause of silent training failures where loss drops but mAP stays at zero.</p>
<h2 id="exporting-yolox-to-onnx-for-cross-platform-deployment">Exporting YOLOX to ONNX for Cross-Platform Deployment</h2>
<p>Exporting YOLOX to ONNX decouples the model from PyTorch and unlocks deployment on any runtime that supports the ONNX standard, including ONNX Runtime (CPU/GPU), TensorRT, OpenVINO, and CoreML via onnxmltools. YOLOX ships a first-class <code>tools/export_onnx.py</code> script that handles the full export pipeline, including fusing batch normalization layers into convolutions and optionally folding NMS into the graph. The resulting ONNX graph takes a single <code>(1, 3, H, W)</code> float32 input and returns either raw proposals or post-NMS boxes depending on export flags. Exporting with <code>--decode_in_inference</code> and <code>--no-onnxsim</code> produces the most portable graph for runtime compatibility; running <code>onnxsim</code> afterwards to simplify the graph improves performance by 10–15% on ONNX Runtime CPU. ONNX opset 11 is the safe minimum; opset 17 enables more aggressive fusion in TensorRT 10.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Export to ONNX</span>
</span></span><span style="display:flex;"><span>python tools/export_onnx.py <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --output-name yolox_s.onnx <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -f custom_exp.py <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -c best_ckpt.pth <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --input-size <span style="color:#ae81ff">640</span> <span style="color:#ae81ff">640</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --batch-size <span style="color:#ae81ff">1</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --decode_in_inference
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Simplify (optional but recommended)</span>
</span></span><span style="display:flex;"><span>python -m onnxsim yolox_s.onnx yolox_s_simplified.onnx
</span></span></code></pre></div><h3 id="running-inference-with-onnx-runtime">Running Inference with ONNX Runtime</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> onnxruntime <span style="color:#66d9ef">as</span> ort
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> numpy <span style="color:#66d9ef">as</span> np
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> cv2
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>sess <span style="color:#f92672">=</span> ort<span style="color:#f92672">.</span>InferenceSession(<span style="color:#e6db74">&#34;yolox_s_simplified.onnx&#34;</span>,
</span></span><span style="display:flex;"><span>                             providers<span style="color:#f92672">=</span>[<span style="color:#e6db74">&#34;CUDAExecutionProvider&#34;</span>, <span style="color:#e6db74">&#34;CPUExecutionProvider&#34;</span>])
</span></span><span style="display:flex;"><span>input_name <span style="color:#f92672">=</span> sess<span style="color:#f92672">.</span>get_inputs()[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>name
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Preprocess: resize + normalize</span>
</span></span><span style="display:flex;"><span>img <span style="color:#f92672">=</span> cv2<span style="color:#f92672">.</span>imread(<span style="color:#e6db74">&#34;image.jpg&#34;</span>)
</span></span><span style="display:flex;"><span>img_resized <span style="color:#f92672">=</span> cv2<span style="color:#f92672">.</span>resize(img, (<span style="color:#ae81ff">640</span>, <span style="color:#ae81ff">640</span>))
</span></span><span style="display:flex;"><span>img_input <span style="color:#f92672">=</span> img_resized[<span style="color:#f92672">...</span>, ::<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]<span style="color:#f92672">.</span>transpose(<span style="color:#ae81ff">2</span>, <span style="color:#ae81ff">0</span>, <span style="color:#ae81ff">1</span>)  <span style="color:#75715e"># BGR→RGB, HWC→CHW</span>
</span></span><span style="display:flex;"><span>img_input <span style="color:#f92672">=</span> np<span style="color:#f92672">.</span>ascontiguousarray(img_input, dtype<span style="color:#f92672">=</span>np<span style="color:#f92672">.</span>float32)[np<span style="color:#f92672">.</span>newaxis]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>outputs <span style="color:#f92672">=</span> sess<span style="color:#f92672">.</span>run(<span style="color:#66d9ef">None</span>, {input_name: img_input})
</span></span><span style="display:flex;"><span><span style="color:#75715e"># outputs[0]: shape (1, num_anchors, 85) for 80-class COCO</span>
</span></span></code></pre></div><h2 id="tensorrt-deployment-for-maximum-gpu-throughput">TensorRT Deployment for Maximum GPU Throughput</h2>
<p>TensorRT deployment of YOLOX reduces inference latency by 2–4× over ONNX Runtime GPU through kernel fusion, precision calibration, and layer-level optimization specific to the NVIDIA architecture. The recommended path for YOLOX TensorRT deployment in 2026 is: export to ONNX (opset 17), then use <code>trtexec</code> or the TensorRT Python API to build a serialized engine with INT8 calibration for maximum throughput. YOLOX&rsquo;s architecture is fully compatible with TensorRT 10.x layer types — no custom plugins are required for standard variants. INT8 calibration requires a calibration dataset of 200–500 representative images; using the validation set works well in practice. On an A100 GPU, TensorRT INT8 YOLOX-S achieves sub-5ms inference at batch size 8, enabling throughput of 1,600+ frames per second. For edge deployment on Jetson Orin, TensorRT FP16 YOLOX-Nano sustains 120+ FPS at 416×416, suitable for real-time video analytics at 30–60 FPS camera input with headroom for preprocessing.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> tensorrt <span style="color:#66d9ef">as</span> trt
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> numpy <span style="color:#66d9ef">as</span> np
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> pycuda.driver <span style="color:#66d9ef">as</span> cuda
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> pycuda.autoinit
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>TRT_LOGGER <span style="color:#f92672">=</span> trt<span style="color:#f92672">.</span>Logger(trt<span style="color:#f92672">.</span>Logger<span style="color:#f92672">.</span>WARNING)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">build_engine</span>(onnx_path, precision<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;fp16&#34;</span>):
</span></span><span style="display:flex;"><span>    builder <span style="color:#f92672">=</span> trt<span style="color:#f92672">.</span>Builder(TRT_LOGGER)
</span></span><span style="display:flex;"><span>    config <span style="color:#f92672">=</span> builder<span style="color:#f92672">.</span>create_builder_config()
</span></span><span style="display:flex;"><span>    config<span style="color:#f92672">.</span>set_memory_pool_limit(trt<span style="color:#f92672">.</span>MemoryPoolType<span style="color:#f92672">.</span>WORKSPACE, <span style="color:#ae81ff">4</span> <span style="color:#f92672">&lt;&lt;</span> <span style="color:#ae81ff">30</span>)  <span style="color:#75715e"># 4GB</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> precision <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;fp16&#34;</span>:
</span></span><span style="display:flex;"><span>        config<span style="color:#f92672">.</span>set_flag(trt<span style="color:#f92672">.</span>BuilderFlag<span style="color:#f92672">.</span>FP16)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">elif</span> precision <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;int8&#34;</span>:
</span></span><span style="display:flex;"><span>        config<span style="color:#f92672">.</span>set_flag(trt<span style="color:#f92672">.</span>BuilderFlag<span style="color:#f92672">.</span>INT8)
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Set up INT8 calibrator here</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    network <span style="color:#f92672">=</span> builder<span style="color:#f92672">.</span>create_network(<span style="color:#ae81ff">1</span> <span style="color:#f92672">&lt;&lt;</span> int(trt<span style="color:#f92672">.</span>NetworkDefinitionCreationFlag<span style="color:#f92672">.</span>EXPLICIT_BATCH))
</span></span><span style="display:flex;"><span>    parser <span style="color:#f92672">=</span> trt<span style="color:#f92672">.</span>OnnxParser(network, TRT_LOGGER)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> open(onnx_path, <span style="color:#e6db74">&#34;rb&#34;</span>) <span style="color:#66d9ef">as</span> f:
</span></span><span style="display:flex;"><span>        parser<span style="color:#f92672">.</span>parse(f<span style="color:#f92672">.</span>read())
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    serialized_engine <span style="color:#f92672">=</span> builder<span style="color:#f92672">.</span>build_serialized_network(network, config)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">&#34;yolox_s_fp16.engine&#34;</span>, <span style="color:#e6db74">&#34;wb&#34;</span>) <span style="color:#66d9ef">as</span> f:
</span></span><span style="display:flex;"><span>        f<span style="color:#f92672">.</span>write(serialized_engine)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>build_engine(<span style="color:#e6db74">&#34;yolox_s_simplified.onnx&#34;</span>, precision<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;fp16&#34;</span>)
</span></span></code></pre></div><h3 id="int8-calibration-for-maximum-throughput">INT8 Calibration for Maximum Throughput</h3>
<p>INT8 quantization of YOLOX achieves the highest throughput on NVIDIA GPUs by running matrix multiplications in 8-bit integer arithmetic. Calibration accuracy degrades by less than 0.5 mAP for YOLOX-S with 500 calibration images, making INT8 viable for most production use cases where the absolute accuracy ceiling is not critical.</p>
<h2 id="openvino-and-ncnn-yolox-on-cpu-and-mobile-devices">OpenVINO and ncnn: YOLOX on CPU and Mobile Devices</h2>
<p>OpenVINO and ncnn target the two most common non-NVIDIA deployment environments: Intel CPU servers and ARM mobile/embedded devices, respectively. OpenVINO&rsquo;s model optimizer converts the YOLOX ONNX graph into an Intermediate Representation (IR) that the OpenVINO Runtime executes on Intel CPU, integrated GPU, or Myriad VPU with automatic layer fusion and quantization. On a modern Intel Xeon server (Ice Lake or Sapphire Rapids), OpenVINO FP32 YOLOX-S achieves 45–60ms per frame — comparable to a mid-range NVIDIA GPU and significantly faster than raw PyTorch CPU inference. ncnn, developed by Tencent, supports ARM NEON intrinsics and is the preferred path for Android and iOS deployment; the YOLOX repo includes a first-party ncnn conversion guide that handles the custom post-processing operators that ncnn&rsquo;s ONNX importer would otherwise reject. For Intel CPU cloud instances (AWS c7i, Azure Dv5), OpenVINO with INT8 post-training quantization using NNCF adds roughly 2× throughput over FP32 with less than 1% mAP drop on COCO.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Export to OpenVINO IR</span>
</span></span><span style="display:flex;"><span>mo --input_model yolox_s_simplified.onnx <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>   --output_dir openvino_model/ <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>   --input_shape <span style="color:#f92672">[</span>1,3,640,640<span style="color:#f92672">]</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>   --data_type FP32
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Run OpenVINO inference</span>
</span></span><span style="display:flex;"><span>from openvino.runtime import Core
</span></span><span style="display:flex;"><span>core <span style="color:#f92672">=</span> Core<span style="color:#f92672">()</span>
</span></span><span style="display:flex;"><span>model <span style="color:#f92672">=</span> core.read_model<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;openvino_model/yolox_s_simplified.xml&#34;</span><span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>compiled <span style="color:#f92672">=</span> core.compile_model<span style="color:#f92672">(</span>model, <span style="color:#e6db74">&#34;CPU&#34;</span><span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>infer_req <span style="color:#f92672">=</span> compiled.create_infer_request<span style="color:#f92672">()</span>
</span></span></code></pre></div><h3 id="ncnn-deployment-on-android">ncnn Deployment on Android</h3>
<p>ncnn conversion requires first exporting YOLOX to a Caffe-like <code>.param</code>/<code>.bin</code> format using <code>onnx2ncnn</code>. The YOLOX repo&rsquo;s <code>demo/ncnn</code> directory includes a C++ inference example with pre-built Android JNI wrappers, enabling 20–40 FPS object detection on mid-range Snapdragon 8xx devices at 416×416 resolution.</p>
<h2 id="building-a-production-api-with-fastapi-and-docker">Building a Production API with FastAPI and Docker</h2>
<p>Building a production YOLOX inference API with FastAPI and Docker creates a self-contained, horizontally scalable object detection microservice that accepts image uploads via HTTP and returns structured JSON bounding-box annotations. The architecture separates model loading (done once at startup) from request handling, using FastAPI&rsquo;s <code>lifespan</code> context to load the YOLOX ONNX session before the first request arrives — this eliminates cold-start latency on the critical path. Docker Compose manages the service stack: a GPU-enabled inference container built on <code>nvcr.io/nvidia/cuda:12.1-cudnn8-runtime</code>, optionally fronted by Nginx for SSL termination and request queuing. For high-throughput scenarios, the inference container uses a request batching middleware that accumulates requests over a 10ms window and processes them in a single batched ONNX Runtime call, improving GPU utilization from ~30% (single-request mode) to 85%+ under load.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># main.py — FastAPI YOLOX inference service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> contextlib <span style="color:#f92672">import</span> asynccontextmanager
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> fastapi <span style="color:#f92672">import</span> FastAPI, UploadFile, File
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> fastapi.responses <span style="color:#f92672">import</span> JSONResponse
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> numpy <span style="color:#66d9ef">as</span> np
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> onnxruntime <span style="color:#66d9ef">as</span> ort
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> cv2
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>MODEL_PATH <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;yolox_s_simplified.onnx&#34;</span>
</span></span><span style="display:flex;"><span>COCO_CLASSES <span style="color:#f92672">=</span> [<span style="color:#f92672">...</span>]  <span style="color:#75715e"># 80-class list</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@asynccontextmanager</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">lifespan</span>(app: FastAPI):
</span></span><span style="display:flex;"><span>    app<span style="color:#f92672">.</span>state<span style="color:#f92672">.</span>session <span style="color:#f92672">=</span> ort<span style="color:#f92672">.</span>InferenceSession(
</span></span><span style="display:flex;"><span>        MODEL_PATH,
</span></span><span style="display:flex;"><span>        providers<span style="color:#f92672">=</span>[<span style="color:#e6db74">&#34;CUDAExecutionProvider&#34;</span>, <span style="color:#e6db74">&#34;CPUExecutionProvider&#34;</span>]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    app<span style="color:#f92672">.</span>state<span style="color:#f92672">.</span>input_name <span style="color:#f92672">=</span> app<span style="color:#f92672">.</span>state<span style="color:#f92672">.</span>session<span style="color:#f92672">.</span>get_inputs()[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>name
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">yield</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>app <span style="color:#f92672">=</span> FastAPI(lifespan<span style="color:#f92672">=</span>lifespan)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@app.post</span>(<span style="color:#e6db74">&#34;/detect&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">detect</span>(file: UploadFile <span style="color:#f92672">=</span> File(<span style="color:#f92672">...</span>), conf_thresh: float <span style="color:#f92672">=</span> <span style="color:#ae81ff">0.3</span>):
</span></span><span style="display:flex;"><span>    contents <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> file<span style="color:#f92672">.</span>read()
</span></span><span style="display:flex;"><span>    img_array <span style="color:#f92672">=</span> np<span style="color:#f92672">.</span>frombuffer(contents, np<span style="color:#f92672">.</span>uint8)
</span></span><span style="display:flex;"><span>    img <span style="color:#f92672">=</span> cv2<span style="color:#f92672">.</span>imdecode(img_array, cv2<span style="color:#f92672">.</span>IMREAD_COLOR)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Preprocess</span>
</span></span><span style="display:flex;"><span>    img_resized <span style="color:#f92672">=</span> cv2<span style="color:#f92672">.</span>resize(img, (<span style="color:#ae81ff">640</span>, <span style="color:#ae81ff">640</span>))
</span></span><span style="display:flex;"><span>    inp <span style="color:#f92672">=</span> img_resized[<span style="color:#f92672">...</span>, ::<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]<span style="color:#f92672">.</span>transpose(<span style="color:#ae81ff">2</span>, <span style="color:#ae81ff">0</span>, <span style="color:#ae81ff">1</span>)<span style="color:#f92672">.</span>astype(np<span style="color:#f92672">.</span>float32)[np<span style="color:#f92672">.</span>newaxis]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Inference</span>
</span></span><span style="display:flex;"><span>    outputs <span style="color:#f92672">=</span> app<span style="color:#f92672">.</span>state<span style="color:#f92672">.</span>session<span style="color:#f92672">.</span>run(<span style="color:#66d9ef">None</span>, {app<span style="color:#f92672">.</span>state<span style="color:#f92672">.</span>input_name: inp})[<span style="color:#ae81ff">0</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Post-process NMS output</span>
</span></span><span style="display:flex;"><span>    detections <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> outputs <span style="color:#f92672">is</span> <span style="color:#f92672">not</span> <span style="color:#66d9ef">None</span> <span style="color:#f92672">and</span> len(outputs[<span style="color:#ae81ff">0</span>]) <span style="color:#f92672">&gt;</span> <span style="color:#ae81ff">0</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">for</span> det <span style="color:#f92672">in</span> outputs[<span style="color:#ae81ff">0</span>]:
</span></span><span style="display:flex;"><span>            x1, y1, x2, y2, obj_conf, cls_conf, cls_id <span style="color:#f92672">=</span> det
</span></span><span style="display:flex;"><span>            score <span style="color:#f92672">=</span> float(obj_conf <span style="color:#f92672">*</span> cls_conf)
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> score <span style="color:#f92672">&gt;=</span> conf_thresh:
</span></span><span style="display:flex;"><span>                detections<span style="color:#f92672">.</span>append({
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;bbox&#34;</span>: [float(x1), float(y1), float(x2), float(y2)],
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;score&#34;</span>: score,
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;class&#34;</span>: COCO_CLASSES[int(cls_id)]
</span></span><span style="display:flex;"><span>                })
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> JSONResponse({<span style="color:#e6db74">&#34;detections&#34;</span>: detections, <span style="color:#e6db74">&#34;count&#34;</span>: len(detections)})
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#75715e"># Dockerfile</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> nvcr.io/nvidia/cuda:12.1-cudnn8-runtime-ubuntu22.04</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apt-get update <span style="color:#f92672">&amp;&amp;</span> apt-get install -y python3-pip libgl1-mesa-glx libglib2.0-0<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> requirements.txt .<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> pip3 install -r requirements.txt<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> . /app<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">WORKDIR</span><span style="color:#e6db74"> /app</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">CMD</span> [<span style="color:#e6db74">&#34;uvicorn&#34;</span>, <span style="color:#e6db74">&#34;main:app&#34;</span>, <span style="color:#e6db74">&#34;--host&#34;</span>, <span style="color:#e6db74">&#34;0.0.0.0&#34;</span>, <span style="color:#e6db74">&#34;--port&#34;</span>, <span style="color:#e6db74">&#34;8000&#34;</span>, <span style="color:#e6db74">&#34;--workers&#34;</span>, <span style="color:#e6db74">&#34;1&#34;</span>]<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><h2 id="yolox-vs-yolov8-vs-yolo11-which-to-use-in-2026">YOLOX vs YOLOv8 vs YOLO11: Which to Use in 2026?</h2>
<p>Choosing between YOLOX, YOLOv8, and YOLO11 in 2026 depends on three factors: ecosystem maturity, accuracy ceiling, and deployment backend flexibility. YOLO11x outperforms YOLOX-X in accuracy (54.7 mAP vs 51.1 mAP) while using roughly half the parameters (56.9M vs 99.1M), making YOLO11 the default choice for new greenfield projects targeting Ultralytics&rsquo; supported export formats. YOLOv8 and YOLO11 benefit from Ultralytics&rsquo; unified training API and actively maintained export pipelines for all major runtimes. YOLOX remains the better choice in three specific scenarios: (1) existing research pipelines built on the Megvii codebase where migration cost is prohibitive, (2) ncnn deployment on Android/iOS where YOLOX&rsquo;s first-party ncnn support is better maintained than Ultralytics&rsquo; ncnn path, and (3) academic reproducibility requirements where YOLOX&rsquo;s published COCO checkpoints and experiment configs match specific paper results. For new production deployments starting in 2026, YOLO11 is generally the right choice unless one of these three scenarios applies.</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>COCO mAP</th>
          <th>Parameters</th>
          <th>TensorRT FP16 (T4)</th>
          <th>Ecosystem</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>YOLOX-X</td>
          <td>51.1%</td>
          <td>99.1M</td>
          <td>~40ms</td>
          <td>Megvii (OSS)</td>
      </tr>
      <tr>
          <td>YOLOv8x</td>
          <td>53.9%</td>
          <td>68.2M</td>
          <td>~28ms</td>
          <td>Ultralytics</td>
      </tr>
      <tr>
          <td>YOLO11x</td>
          <td>54.7%</td>
          <td>56.9M</td>
          <td>~24ms</td>
          <td>Ultralytics</td>
      </tr>
  </tbody>
</table>
<h3 id="when-yolox-is-still-the-right-call">When YOLOX Is Still the Right Call</h3>
<p>If your deployment targets ncnn on ARM or your codebase is already built on YOLOX experiment configs, migration to Ultralytics carries real risk for marginal accuracy gain. In those cases, staying on YOLOX with a well-tuned TensorRT export is the pragmatic choice.</p>
<h2 id="performance-benchmarks-and-real-world-considerations">Performance Benchmarks and Real-World Considerations</h2>
<p>Real-world YOLOX performance depends heavily on hardware, input resolution, batch size, and post-processing strategy — published COCO benchmarks measure model quality but not end-to-end pipeline throughput. In production video analytics pipelines, the bottleneck often shifts from inference to frame capture and preprocessing: a single T4 GPU running YOLOX-S in TensorRT FP16 can process 1,200 frames per second of pure inference, but a naively implemented OpenCV video decode loop caps at 200 FPS. To fully utilize GPU throughput, preprocessing (resize, normalize) must be moved to the GPU using NVIDIA DALI or custom CUDA kernels, which is why production deployments often show 3–4× end-to-end speedup from pipeline optimization beyond model-level tuning alone. For CPU-only deployments on cloud instances, OpenVINO consistently outperforms raw ONNX Runtime by 40–60% on Intel CPUs due to MKL-DNN backend selection and automatic INT8 quantization through NNCF. Memory bandwidth is the primary constraint on CPU inference — YOLOX-S&rsquo;s 8.94M parameters fit entirely in L3 cache on modern Xeon processors, which is why inference latency remains predictable under concurrent load. The practical recommendation: benchmark your specific hardware and input pipeline end-to-end, not just model inference in isolation, before selecting a model variant and runtime.</p>
<h3 id="latency-vs-throughput-trade-offs">Latency vs Throughput Trade-offs</h3>
<p>For real-time single-stream applications (one camera at 30 FPS), latency matters more than throughput — optimize for minimum per-frame latency by using batch size 1 and TensorRT with <code>IBuilderConfig::setMinTimingIterations</code> tuned to your input resolution. For batch processing or multi-stream pipelines, maximize GPU utilization by increasing batch size and using CUDA streams for async inference overlap with preprocessing.</p>
<hr>
<h2 id="faq">FAQ</h2>
<p><strong>Q: Can YOLOX run without a GPU?</strong></p>
<p>Yes. YOLOX supports CPU inference through PyTorch, ONNX Runtime CPU, and OpenVINO. For CPU-only deployments, use YOLOX-Nano or YOLOX-Tiny to stay within practical latency budgets (under 200ms per frame on modern server CPUs).</p>
<p><strong>Q: What is the difference between YOLOX decoupled head and the original YOLO head?</strong></p>
<p>Traditional YOLO heads use a single convolutional branch to predict class probabilities and bounding box coordinates simultaneously. YOLOX&rsquo;s decoupled head uses separate branches for each task, which the authors showed prevents conflicting gradient updates between classification and localization loss signals, resulting in faster convergence and ~1.5 mAP improvement.</p>
<p><strong>Q: How do I convert YOLOX weights to ONNX for deployment?</strong></p>
<p>Run <code>python tools/export_onnx.py -f your_exp.py -c best_ckpt.pth --output-name model.onnx --decode_in_inference</code>. Then optionally run <code>onnxsim model.onnx model_simplified.onnx</code> to reduce graph complexity. The resulting ONNX model runs on ONNX Runtime, TensorRT, and OpenVINO without modification.</p>
<p><strong>Q: What is the minimum image size for YOLOX inference?</strong></p>
<p>YOLOX processes images at the resolution specified in the experiment config (<code>test_size</code>). The minimum practical resolution is 320×320 for YOLOX-Nano; smaller inputs significantly degrade detection of small objects. The model internally pads inputs to multiples of 32, so any resolution divisible by 32 is valid.</p>
<p><strong>Q: How does YOLOX compare to YOLO11 for edge deployment in 2026?</strong></p>
<p>YOLOX-Nano (0.91M params) is smaller than YOLO11n (2.6M params) and has better-maintained first-party ncnn support for ARM deployment. For Jetson and mobile Android targets, YOLOX-Nano remains competitive. For Intel-based edge devices (NUC, industrial PCs), YOLO11 with OpenVINO export typically edges out YOLOX due to Ultralytics&rsquo; more actively maintained OpenVINO pipeline.</p>
]]></content:encoded></item></channel></rss>