基于open-gpu-kernel-modules的p2p vram映射bar1提高通信效率

news/2024/9/19 3:28:26 标签: p2p, 网络协议, 网络

背景

bar1 Base Address Register 1 用于内存映射的寄存器,定义了设备的内存映射区域,BAR1专门分配给gpu的一部分内存区域,允许cpu通过pcie总线直接访问显存VRAM中的数据。但bar1的大小是有限的,在常规的4090上,bar1只有256M,基于nvidia开源的open-gpu-kernel-modules模块通过将bar1的寄存器地址增大至32G来提高计算效率

系统版本

root@exai-165:~# cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
root@exai-165:~# uname -a 
Linux exai-165 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

实施

  1. 编译开源的nvidia驱动模块
  2. 编译p2p模块

破解前bar1大小

root@exai-165:/opt# lspci -s 0000:81:00.0 -v
81:00.0 VGA compatible controller: NVIDIA Corporation Device 2684 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 167c
	Flags: bus master, fast devsel, latency 0, IRQ 164, IOMMU group 27
	Memory at b8000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 20030000000 (64-bit, prefetchable) [size=256M]  # 这里
	Memory at 20040000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 6000 [size=128]
	Expansion ROM at b9000000 [virtual] [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Capabilities: [bb0] Physical Resizable BAR
	Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
	Capabilities: [d00] Lane Margining at the Receiver <?>
	Capabilities: [e00] Data Link Feature <?>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

nvidia驱动模块

卸载机器上原本的驱动

./NVIDIA-Linux-x86_64-535.183.01.run --uninstall

克隆开源的驱动
自行配置git使用代理

git clone --branch 550.54.15 --single-branch https://github.com/NVIDIA/open-gpu-kernel-modules.git
git branch
git checkout -b 550.54.15

因为机器上的CC和编译内核使用的gcc不是同一个版本,所以这里手工指定make使用哪个gcc

make CC=x86_64-linux-gnu-gcc-12 modules -j$(nproc)
make modules_install CC=x86_64-linux-gnu-gcc-12 modules -j$(nproc)

备注:通过机器上的多版本管理工具来实现cc版本管理不生效
验证

root@exai-165:~# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64  550.54.15  Release Build  (root@exai-165)  2024年 09月 06日 星期五 10:49:38 CST
GCC version:  gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)

p2p_80">p2p

https://github.com/tinygrad/open-gpu-kernel-modules
克隆,编译,按照readme里面的来没啥问题

root@exai-165:/opt/nvidia-p2p/open-gpu-kernel-modules# ./install.sh 
make -C src/nvidia
make -C src/nvidia-modeset
make[1]: Entering directory '/opt/nvidia-p2p/open-gpu-kernel-modules/src/nvidia'
make[1]: Entering directory '/opt/nvidia-p2p/open-gpu-kernel-modules/src/nvidia-modeset'
make[1]: Nothing to be done for 'default'.
make[1]: Leaving directory '/opt/nvidia-p2p/open-gpu-kernel-modules/src/nvidia-modeset'
cd kernel-open/nvidia-modeset/ && ln -sf ../../src/nvidia-modeset/_out/Linux_x86_64/nv-modeset-kernel.o nv-modeset-kernel.o_binary
make[1]: Nothing to be done for 'default'.
make[1]: Leaving directory '/opt/nvidia-p2p/open-gpu-kernel-modules/src/nvidia'
cd kernel-open/nvidia/ && ln -sf ../../src/nvidia/_out/Linux_x86_64/nv-kernel.o nv-kernel.o_binary
make -C kernel-open modules
make[1]: Entering directory '/opt/nvidia-p2p/open-gpu-kernel-modules/kernel-open'
make[2]: Entering directory '/usr/src/linux-headers-6.5.0-44-generic'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
  You are using:           cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
make[2]: Leaving directory '/usr/src/linux-headers-6.5.0-44-generic'
make[1]: Leaving directory '/opt/nvidia-p2p/open-gpu-kernel-modules/kernel-open'
make -C kernel-open modules_install
make[1]: Entering directory '/opt/nvidia-p2p/open-gpu-kernel-modules/kernel-open'
make[2]: Entering directory '/usr/src/linux-headers-6.5.0-44-generic'
  INSTALL /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia.ko
  INSTALL /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-uvm.ko
  INSTALL /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-modeset.ko
  INSTALL /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-drm.ko
  INSTALL /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-peermem.ko
  SIGN    /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-peermem.ko
  SIGN    /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-modeset.ko
  SIGN    /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-drm.ko
  SIGN    /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia.ko
  SIGN    /lib/modules/6.5.0-44-generic/kernel/drivers/video/nvidia-uvm.ko
  DEPMOD  /lib/modules/6.5.0-44-generic
Warning: modules_install: missing 'System.map' file. Skipping depmod.
make[2]: Leaving directory '/usr/src/linux-headers-6.5.0-44-generic'
make[1]: Leaving directory '/opt/nvidia-p2p/open-gpu-kernel-modules/kernel-open'
Fri Sep  6 15:24:49 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 30%   36C    P0             53W /  450W |       0MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:81:00.0 Off |                  Off |
| 31%   44C    P0             69W /  450W |       0MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        Off |   00000000:C1:00.0 Off |                  Off |
| 31%   39C    P0             55W /  450W |       0MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 4090        Off |   00000000:C2:00.0 Off |                  Off |
| 31%   42C    P0             64W /  450W |       0MiB /  24564MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

验证

root@exai-165:/opt/nvidia-p2p/open-gpu-kernel-modules# lspci -s 0000:81:00.0 -v
81:00.0 VGA compatible controller: NVIDIA Corporation Device 2684 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 167c
	Flags: bus master, fast devsel, latency 0, IRQ 164, IOMMU group 27
	Memory at b8000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 18800000000 (64-bit, prefetchable) [size=32G]  # 这里
	Memory at 18400000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 6000 [size=128]
	Expansion ROM at b9000000 [virtual] [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Capabilities: [bb0] Physical Resizable BAR
	Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
	Capabilities: [d00] Lane Margining at the Receiver <?>
	Capabilities: [e00] Data Link Feature <?>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

reference:
https://github.com/NVIDIA/open-gpu-kernel-modules
https://github.com/tinygrad/open-gpu-kernel-modules


http://www.niftyadmin.cn/n/5664904.html

相关文章

dcmtk的自动输入数据纠错模式对DICOMDIR读取的影响

软件版本 dcmtk 3.6.7 自动纠错的全局变量 输入数据的自动纠错是一个全局变量&#xff0c;定义在dcmtk/dcmdata/dcobject.h中&#xff0c;如下所示&#xff1a; /** This flags defines whether automatic correction should be applied to input* data (e.g.\ stripping …

【C语言】带你手把手拿捏指针(3)(含转移表)

文章目录 一、字符指针变量二、数组指针变量1.数组指针变量是什么2.数组指针变量的初始化 三、二维数组传参的本质四、函数指针变量1. 函数指针变量的创建2.函数指针的使用3.案例解析&#xff1a; 五、typedof关键字六、函数指针数组和转移表1.函数指针数组2.转移表 一、字符指…

TypeScript 枚举

枚举: 使用枚举我们可以定义一些带名字的常量。 使用枚举可以清晰地表达意图或创建一组有区别的用例。 TypeScript支持数字的和基于字符串的枚举。 例子&#xff1a; function a(sex: string) {console.log(张三的性别是:${sex});} a(男); a(狗); 用枚举&#xff1a; 1.en…

【大数据方案】智慧大数据平台总体建设方案书(word原件)

第1章 总体说明 1.1 建设背景 1.2 建设目标 1.3 项目建设主要内容 1.4 设计原则 第2章 对项目的理解 2.1 现状分析 2.2 业务需求分析 2.3 功能需求分析 第3章 大数据平台建设方案 3.1 大数据平台总体设计 3.2 大数据平台功能设计 3.3 平台应用 第4章 政策标准保障体系 4.1 政策…

交流电力控制电路之交流调功电路、交流电力电子开关

目录 一、交流调功电路 二、交流电力电子开关 交流调压电路可看&#xff1a;交流调压电路 交流调压电路、交流调功电路和交流电力开关的异同点&#xff1a; 一、交流调功电路 交流调功电路用于调节电力设备的功率输出&#xff0c;通过改变电路中电压、电流的有效值&#xff…

数据治理实施步骤

数据治理的实施步骤是一个系统性的过程&#xff0c;旨在确保数据的有效管理、使用和保护。以下是数据治理的一般实施步骤&#xff1a; 一、明确目标和策略 确定需求与目标&#xff1a;明确数据治理的需求和目标&#xff0c;如提高数据质量、保障数据安全、提升数据处理效率等。…

Linux配置静态IP详细步骤及联网问题,以及更改主机名问题

一&#xff0c;Linux配置静态IP详细步骤及联网问题 我的Linux操作系统版本是是CentOS7/CentOS8 1.网络适配器&#xff1a;NAT模式点击设置-网络适配器-网络连接 &#xff08;选择NAT模式&#xff09;-点击确定 2.查看网关相关配置点击 编辑-虚拟网络编辑器-选择VMnet8-点击更…

OJ题-合并K个已排序的链表

描述 合并 k 个升序的链表并将结果作为一个升序的链表返回其头节点。 数据范围&#xff1a;节点总数 0≤n≤50000≤n≤5000&#xff0c;每个节点的val满足 ∣val∣<1000∣val∣<1000 要求&#xff1a;时间复杂度 O(nlogn)O(nlogn) 下面给出C代码的两种方法&#xff1…