Nvidia on Ubuntu: Difference between revisions

time dialation
use Subpages template
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Because I wanted to run a local [[Artificial Intelligence]] platform called [[Ollama]], I wanted to ensure that my GPU was fully utilized in the system since GPUs are the particular type of hardware best suited for these [[Vector database|Vector]] calculations. And, I have a 'decent' GPU - [[PC Build 2024#Video Card (GPU)|Nvidia GeForce RTX 4060]] (the best you could get in 2024). In trying to install the latest Nvidia driver, I set off on a week-long journey of learning, frustration and perseverance discovering the inner workings of Ubuntu 24.04, Xorg, the Linux kernel and kernel modules, DRM, Secure Boot, initramfs and more.  
Because I wanted to run a local [[Artificial Intelligence]] platform called [[Ollama]], I wanted to ensure that my GPU was fully utilized in the system since GPUs are the particular type of hardware best suited for these [[Vector database|Vector]] calculations<ref>https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#the-benefits-of-using-gpus</ref>. And, I have a 'decent' GPU - [[PC Build 2024#Video Card (GPU)|Nvidia GeForce RTX 4060]] (the best you could get when I built the system in 2024). In trying to install the latest Nvidia driver, I set off on a week-long journey of learning, frustration and perseverance discovering the inner workings of Ubuntu 24.04, Xorg, the Linux kernel and kernel modules, DRM, Secure Boot, initramfs and more.  


I still do not have the Nvidia driver loaded - even after 40+ reboots and attempts. Instead I'm using the Nouveau driver but at least I have a working system and I believe now that I've finally figured out what needs to be done to disable Nouveau and install Nvidia - a project that I am approaching with greater scrutiny now. I'm documenting the things that I encounter in this journey.
I still do not have the Nvidia driver loaded - even after 40+ reboots and attempts. Instead I'm using the Nouveau driver but at least I have a working system and I believe now that I've finally figured out what needs to be done to disable Nouveau and install Nvidia - a project that I am approaching with greater scrutiny now. I'm documenting the things that I encounter in this journey.


Why not just continue to use [https://nouveau.freedesktop.org/index.html Nouveau], a project of the [https://www.freedesktop.org/wiki/ freedesktop] community? I mean "if it ain't broke, don't fix it" - right? In principle, I'd very much like to use nouveau. I'm not even sure that any alternative is "better" in any way - especially since '''I am not a gamer'''. My use case is to get the best performance from local LLMs.
Why not just continue to use [https://nouveau.freedesktop.org/index.html Nouveau], a project of the [https://www.freedesktop.org/wiki/ freedesktop] community? I mean "if it ain't broke, don't fix it" - right? In principle, I'd very much like to use nouveau. I'm not even sure that any alternative is "better" in any way - especially since ''I am not a gamer''<ref>I'm not opposed in any way, I just don't have the time to add another hobby. This is a clarifying statement for my use-case, and therefore, requirements.</ref>. My use case is to get the best performance from local LLMs. As I become familiar with the methods to switch video drivers reliably, I intend to run benchmarks and explore the benefits of one configuration vs another.
 
== Status ==
As of 2025-07-03, I'm still not running with an NVIDIA driver. According to [https://www.reddit.com/r/Ubuntu/comments/1li7wg7/how_do_i_install_nvidiadriver575_correctly/ a Reddit thread] just days ago, it's always been rather messy getting the right system together. I should "upgrade your system either to Ubuntu 25.04 for Wayland experience and no working suspend to RAM, or to 24.04 if you need suspend to RAM, but are Ok with using X11 instead of Wayland."
 
Since I'm on 24.04, and I've tried using X11 instead of Wayland without success, I plan to ensure my home directory is on its own partition and reinstall the OS to 25.04


== Opposite ==
== Opposite ==
Line 9: Line 14:


== About this System ==
== About this System ==
In your desktop environment, you can access 'System Settings' -> '[[About this System]]' to display basic info about your Software and Hardware environment including the 'graphics processor'. Mine says '''NV197''' - which is the codename given to the card by the Nouveau project<ref>https://nouveau.freedesktop.org/CodeNames.html</ref>. You can click on 'Show More Information' which reveals a multi-tab dialog for OpenCL, OpenGL, Vulkan, Window Manager and X-Server with extensive Graphics info.
In your desktop environment, you can access 'System Settings' -> '[[About this System]]' ([https://docs.kde.org/stable5/en/kinfocenter/kinfocenter/index.html KInfoCenter]) to display basic info about your Software and Hardware environment including the 'graphics processor'. Mine says '''NV197''' - which is the codename given to the card by the Nouveau project<ref>https://nouveau.freedesktop.org/CodeNames.html</ref>. You can click on 'Show More Information' which reveals a multi-tab dialog for OpenCL, OpenGL, Vulkan, Window Manager and X-Server with extensive Graphics info.


Or, you can also get details from a variety of CLI commands like glxinfo, lspci etc.
Or, you can also get details from a variety of CLI commands like glxinfo, lspci etc.
Line 18: Line 23:


'''OpenGL version''' string: 4.3 (Compatibility Profile) Mesa 24.2.8-1ubuntu1~24.04.1
'''OpenGL version''' string: 4.3 (Compatibility Profile) Mesa 24.2.8-1ubuntu1~24.04.1
If you are on a TTY (without a display), <code>lspci</code> shows the same info
<code>lspci | grep VGA</code>
01:00.0 '''VGA''' compatible controller: NVIDIA Corporation AD107 [GeForce RTX 4060] (rev a1)
After the installation of Nvidia drivers fails, you won't have a functioning GPU, since you will no longer have the nouveau driver available either, and so the output of the same glxinfo command will show that "llvmpipe" is the renderer.
'''OpenGL renderer''' string: llvmpipe (LLVM 19.1.1, 256 bits)
'''OpenGL version''' string: 4.5 (Compatibility Profile) Mesa 24.2.8-1ubuntu1~24.04.1
[https://docs.mesa3d.org/drivers/llvmpipe.html LLVMpipe] is a software rasterizer within the Mesa 3D graphics library that utilizes the LLVM compiler infrastructure to perform rendering entirely on the CPU. It acts as a software fallback when a dedicated GPU or its drivers are unavailable or malfunctioning, allowing OpenGL applications to run without hardware acceleration. Essentially, LLVMpipe takes over the rendering process when the GPU can't or shouldn't be used.
If dpkg shows xserver-xorg-video-'''nouveau is installed''', then you can switch to it from e.g. "Driver Manager" in Settings. 
Synaptic will allow you to view drivers, but you won't be able to switch from that interface (you'll get an error message about a lock file).
Although switching drivers from the system settings interface appears to complete without error, I'm not sure how well it works - if at all.
I was getting a broken desktop (single monitor, no good results from things like nvidia-smi) after installing Nvidia drivers, and so I tried switching to nouveau - and it somehow eventually worked.  <pre>
apt-get remove -y --purge '^libnvidia-.*' && apt-get remove -y --purge '^nvidia-*' && apt-get remove -y --purge '*575*' && apt -y autoremove
apt -y autoclean
shutdown -r now
(recovery mode)
vim /etc/default/grub
update-grub
</pre>After doing a couple of reboots changing the boot 'modeline' and 'nosplash' options from a recovery console or the TTY, I didn't seem to get anywhere. But when I issued a 'startX' command, and the system booted into the GNOME desktop instead of KDE (?!!??), then I had dual monitors again. Amazingly nvidia-smi returned results, but glxinfo says now that I'm using onboard graphics from the CPU (not the GPU) but it doesn't say LLVMpipe
'''OpenGL renderer''' string: Mesa Intel(R) Graphics ('''RPL-S''') 
'''OpenGL version''' string: 4.6 (Compatibility Profile) Mesa 24.2.8-1ubuntu1~24.04.1<pre>
sudo lsmod|grep -i nvidia
nvidia_uvm          2158592  4
nvidia_drm            139264  5
nvidia_modeset      1736704  6 nvidia_drm
nvidia              11550720  81 nvidia_uvm,nvidia_modeset
ecc                    45056  2 ecdh_generic,nvidia
video                  77824  3 xe,i915,nvidia_modeset
</pre>Grub right now is 'normal'  <pre>
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
</pre>The NVIDIA persistence daemon is running <pre>
systemctl list-units --type service --all | grep nvidia
  nvidia-persistenced.service                          loaded    active    running      NVIDIA Persistence Daemon
</pre>[[DKMS|dkms]] shows that kernel modules are installed for two kernels<pre>
dkms status
nvidia/575.57.08, 6.8.0-60-generic, x86_64: installed (Original modules exist)
nvidia/575.57.08, 6.8.0-62-generic, x86_64: installed (Original modules exist)
</pre>


== GUI is stuck ==
== GUI is stuck ==
Line 99: Line 154:


== NVidia ==
== NVidia ==
The installation guide (46 chapters) is at https://download.nvidia.com/XFree86/Linux-x86_64/570.153.02/README/  
Documentation for installing NVidia drivers is at https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/
 
The installation guide for the v570 of the driver (46 chapters) is at https://download.nvidia.com/XFree86/Linux-x86_64/570.153.02/README/  


I've read the whole thing.  
I've read the whole thing.  
Line 112: Line 169:
We explore these in more detail below.
We explore these in more detail below.


Over at StackExchange, a user asked [https://unix.stackexchange.com/questions/352828/how-to-switch-nvidia-driver-from-nouveau-to-nvidia-proprietary how to switch nvidia driver from nouveau to nvidia proprietary] and succeeded in part by '''modifying the boot parameters in grub''' to deny nouveau.
Over at StackExchange, a user asked [https://unix.stackexchange.com/questions/352828/how-to-switch-nvidia-driver-from-nouveau-to-nvidia-proprietary how to switch graphics driver from nouveau to nvidia] and succeeded in part by '''modifying the boot parameters in grub''' to deny nouveau. Note that the boot parameters were used only during the process to stop using one driver and install the other driver. It is not a configuration that would allow you to have two different boot menu entries in GRUB in order to use two graphics modes.
 
Over in the Manjaro Linux forums, a user asked a similar question: [https://forum.manjaro.org/t/how-do-i-switch-between-nvidia-and-nouveau-drivers-on-boot/92044 How do I switch between Nvidia and Nouveau drivers on boot?] They tried using
 
<code>modprobe.blacklist=nvidia systemd.setenv=GPUMOD=nouveau rd.driver.blacklist=nvidia nouveau.modeset=1 nvidia.modeset=0</code>
 
But ultimately had to install the OS twice on different disk partitions in order to choose to boot one system or the other depending on what graphics driver they needed to use.


=== Denylist ===
=== Denylist ===
Line 165: Line 228:
On systems with Secure Boot enabled (mine), you most likely need to sign the module. See [https://download.nvidia.com/XFree86/Linux-x86_64/570.153.02/README/installdriver.html#modulesigning Signing NVIDIA Kernel Module]. However, I didn't get an explicit message that signing was a problem; and I did see that the installation process signs the module with a generated key. I assume that the MOK process hooks into the trust system somehow.
On systems with Secure Boot enabled (mine), you most likely need to sign the module. See [https://download.nvidia.com/XFree86/Linux-x86_64/570.153.02/README/installdriver.html#modulesigning Signing NVIDIA Kernel Module]. However, I didn't get an explicit message that signing was a problem; and I did see that the installation process signs the module with a generated key. I assume that the MOK process hooks into the trust system somehow.


When troubleshooting keeps turning up mysteries, you have to check your assumptions.<syntaxhighlight lang="text">
sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': Key was rejected by service
</syntaxhighlight>See [[Nvidia on Ubuntu/Kernel modules|Kernel modules]]
== Tools and Troubleshooting ==
Ubuntu wants you to use the '[[Nvidia on Ubuntu/ubuntu-drivers|ubuntu-drivers]]' tool<ref>https://documentation.ubuntu.com/server/how-to/graphics/install-nvidia-drivers/</ref>.
NVIDIA seems to just settled on a new mechanism<ref>https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/</ref> rather than downloading the (former?) .run installers:  <code>wget <nowiki>https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb</nowiki> dpkg -i cuda-keyring_1.1-1_all.deb apt update</code> <code>apt install nvidia-open</code>
NVIDIA distributes a script called <code>nvidia-bug-report.sh</code> that you can and should run<ref>https://forums.developer.nvidia.com/t/if-you-have-a-problem-please-read-this-first/27131
</ref> to collect detailed information about any problems.
== Interesting Notes ==
Usually, when you have 'sudo' or root privileges you can do '''more'''. One exception is the X-Server. Root access to the server may be restricted. In that case,
<code>glxinfo</code>
will give
Error: unable to open display
A regular user will have no problem running <code>glxinfo</code>.


=== See Also ===
=== Different Desktop Environments ===
{{#subpages:}}
As a regular user, my DE is KDE Plasma (using Kubuntu) rather than the GNOME default of Ubuntu


{{Subpages|}}


{{References}}
{{References}}