Driver Wars: nvidia-drivers-375.39 vs. kernel-4.11.0-rc4

Oh my gosh, it hurts so bad; I’m having trouble deciding where to even begin. I haven’t seen it this bad in years. I’m going to do things a little bit differently this time. I’m going to post the patch first, instead of last, then break it down from there. So, here’s everything I had to change to get this to work (minus one gotcha that will send me spiraling into a rant):

Phew. What a mess.

Let’s start with the fence stuff. Nvidia still hasn’t fixed this, which actually surprises me considering their vigilance when it comes to kernel changes in the last year or so. The fence stuff was prefixed with dma_ back in October 2016. So the driver code must change accordingly, and that’s why you see so many fence changes in the patch.

Next up is a change to the vm_fault struct. The nvidia code has some compiler tests in kernel/conftest.h that appear to be responsible for catching this sort of difference between kernel versions, but for some reason it fails, so I just hard code the change of the virtual_address member to address since I know what kernel version I’m on and could care less about backwards compatibility.

Then we move on to the change to the return type of drm_driver.unload from int to void.

The drm/drm_encoder.h header now needs to be explicitly included because it was moved out of drm/drm_crtc.h.

Add drm_device as a parameter to the drm_helper_mode_fill_fb_struct.

Next-to-lastly we remove the silly NV_DRM_MODE_FB_CMD2_T macro that doesn’t even work anyway (it’s another kernel/conftest.h fail) with the literal value, which is now a const struct drm_mode_fb_cmd2 instead of a struct drm_mode_fb_cmd2. This is a change I can’t take credit for. I took a hint from the other gurus on the nvidia developer forums.

And now for the real fun. After all of that the module compiles fine and thennnnn….. still won’t work. >.< You see, thanks to this concept of making some symbol exports in the kernel GPL-only, the module loader refuses to load our shiny new kernel module because somehow during all of these changes we’ve pulled in GPL-only functions in the kernel’s lib/refcount.c. Since the nvidia drivers are proprietary (and that is a problem you need to fix, nvidia!), the kernel now rejects them. Since I build my own kernel from source, this is an easy fix by just replacing EXPORT_SYMBOL_GPL with EXPORT_SYMBOL in my kernel source.

I have to disagree with the kernel hackers/devs and the hard-nose usage of GPL in general here. This is a pain in the rear, and only serves to discourage companies from making Linux drivers. It does not encourage them to make their drivers open-source and GPL licensed. This is the kind of thing that contributes to Linux not having the wide range of hardware support that Windows has. This is what gives people the impression that Linux is this arbitrary esoteric thing that is not practical for everyday use. This is why I still can’t run an audio production studio off of my Linux machine.

The real solution to this problem is this: Driver developers just need to make open-source drivers. I do support pushing that issue, but not at the expense of my desktop working.

So much politics in the world of open-source.

Leave a Reply