Hi Guys, today’s topic is one that I’ve been
wanting to do for a long time now. I figured it should be covered before Vega
arrives. So what is it about? Well, you often hear these mantras repeated,
that with DX11, “AMD has more driver overhead”, or somehow “NV’s drivers are much more efficient”. While in DX12 or Vulkan, we often hear the
reverse, that “AMD is better at next-gen APIs”. That’s the observed effect, yet there’s been
few in-depth attempt at explaining why. It is a very complex topic to delve into. It took me awhile to think of how to best
explain it as a simpler concept, in a way that allows gamers to better appreciate the
journey of both NVIDIA & AMD in recent times. Before I start, just let me make this disclaimer,
I’m just a software guy, unaffiliated with any of these corporations. What I present is based on my own understanding,
I could be wrong, it is ultimately up to you if you are curious enough, to do your own
research if you have doubts about what I say. In order to tell this tale and do it justice,
we have to go on a short trip back in history. Let’s start with Fermi, the architecture responsible
for the GTX 480, the really hot, loud and power hungry beast from NVIDIA back in 2010. The biggest reason why it was so power hungry
was that it actually had a very powerful Hardware Scheduler with advanced features that pre-date
AMD’s GCN architecture. Fermi had very fast Context Switching, allowing
it to interweave graphics & compute workloads quickly. It also supported concurrent execution of
multiple different kernels, allowing for PhysX or CUDA to run in parallel with graphics rendering. Do these things sound familiar? It should, because it’s DX12 & Vulkan capable,
NV just never enabled it in their drivers. The problem with Fermi was TSMC’s early 28nm
was horrible and it exacerbate the power usage. NV did tame Fermi with a redesigned chip in
the GTX 580, but that initial hot & hungry problem badly hurt NVIDIA’s pride. This would revolutionize the way NVIDIA design
GPU architectures afterwards, with everything focused on efficiency. With Kepler, NVIDIA decided to remove a major
part of the Hardware Scheduler, making it purely dependent on software or drivers to
process and optimally distribute workloads to the GPU and this led to a good efficiency
increase. Interestingly, just when NV moved away from
Hardware Scheduling, AMD moved to one in their new GCN architecture. The reason is multi-faceted, but it involves
the Fermi Teslas dominating High Performance Computing & Super Computing workloads, where
having a good Hardware Scheduler is a big advantage. AMD wanted to chase after this lucrative market
so they had to have a powerful & dynamic GPU compute architecture. Secondly, AMD worked with Sony to develop
GCN, and on the console, they have APIs that are vastly different & superior to Windows
PCs. Now, with that short hardware history out
of the way let’s move onto the inner details of how AMD & NV’s drivers currently operate
with these different APIs. Let’s start with a typical DX11 game, light
on all thread utilization, but heavy on the primary thread. With AMD up to now and NV prior to the last
few years, how this works is that the draw call processing in DX11 is restricted to the
main rendering thread. Even if a game utilizes 1-2 threads, with
the rest idling, that game can become CPU bound really quick. DX11 actually has a way to reduce this single
thread bottleneck, to better use multi-cores, called Command Lists or Deferred Context. But at the time, nobody bothered with it,
being single thread bound was the status quo back in the days, as game devs were reliant
on higher clocked & higher IPC CPU architectures to remove the bottleneck. It took a few years after DX11 came out for
a game to use it’s Multi-thread Commandlist feature. That game was Civilization 5, with the engine
developed by some of the guys now at Oxide, like Dan Baker, the creators of Nitrous Engine
& Ashes of the Singularity. A well written summary from Anandtech’s editor,
Ryan Smith, is available on the forum that covers this topic way back in 2011. How DX11 multi-threading works is by breaking
down draw call submission into an Immediate Context and a Deferred Context. Within this Deferred Context, you pile up
your draw calls, and when it’s time to send to the GPU, you assemble or batch them into
a Command List and submit that to the GPU. The submission is still based on the primary
thread, but now the extra cores or threads of the CPU can assist in batching up draw
calls, a semi-approach to Multi-thread rendering. Suddenly NV GPUs were twice as fast in Civilization
5 at low resolution where it’s CPU bound. Even AMD’s Richard Huddy at the time, admitted
that such an approach could yield twice the performance. We’ll go back to this figure later. But it’s important to note, this improvement
requires both drivers and game developers to code in a specific way to take advantage
of it. To better explain this, I’ve made a diagram
which simplifies DX11 Multi-threading. Instead of having the usual main thread maxing
out due to running heavy game logic as well as all the draw call processing and submission,
DX11 can split up that draw call processing, offloading some of the work to worker threads
on the other cores. When the task is completed, the data is assembled
into a Command List for submission. This process can improve performance in situations
where the main thread is heavily loaded. But there’s a cost, some CPU cycles are required
to split up the workloads, and more for combining them into a Command List, basically there
is added CPU overhead to this approach. But back in 2011, most games rarely max out
cores 2 onwards, so this approach was the perfect solution. Returning back to the 2x increase, why not
3x or 4x for more CPU cores? It’s down to the nature of process itself,
splitting up draw calls into batches cannot scale indefinitely. There are dependencies and complexity increases
the more cores you scale rendering to as they eventually have to recombine into the primary
thread for submission. Typically a return of 2x draw call throughput
for DX11 Multi-threading is a good result. Now, here’s the kicker, the real magic of
NV’s driver team. After NV saw a huge benefit from Civilization
5’s use of DX11 Command Lists, they were working behind the scenes on how to bypass the need
for developers to multi-thread DX11 games, a way to use more cores regardless of developer
coding. Their solution is a brilliant one, often rarely
discussed in public, and the simplest way to explain it is that NV’s Driver has an active
“Server” thread that monitors draw calls. If it’s not sent as a Deferred Context for
Command List preparation, the Server process intercepts, then slices the workloads up sending
it to worker threads to process and eventually assemble a Command List. It sounds easy but trust me, it is not, this
feature is a marvel of software engineering. With the new Multi-threaded DX11 Driver, NV
GPUs suffer much less primary CPU thread bottlenecks. It essentially only needs a small portion
of the primary thread to run the Server process for batching with worker threads on the other
CPU cores. Games that are single threaded suddenly run
great on NV GPUs, examples like World of Tanks or Arma 3, single thread bound, runs much
faster on NV GPUs. Even 3dMark’s DX11 API overhead test, runs
almost identical for NV GPUs in Single Thread or Multi-Thread mode. As a side effect, intensive CPU PhysX in games
have almost no performance impact for NV’s GPUs, again, because it’s immune to primary
thread bottlenecks. It’s been smooth sailing for NVIDIA in DX11
due to this secret sauce. At this point, you have to ask, if DX11 Command
List is so good, why can NV implement it but AMD cannot? It comes down to one major difference, NV’s
GPU use a Software Scheduler, or the driver. After the draw calls are sent to the Scheduler,
it decides how to distribute the workloads in an optimal manner into the Compute Units
or SMX. Note that this compiling or scheduling has
an inherent CPU cost associated with it, so it’s not a free lunch, but in many low-thread
games, lots of CPU resources are left untapped so this software scheduling overhead is hidden. Because the main Scheduler for NV is software
based, it can accept many Command Lists, even very large ones, as they are buffered for
execution. In contrast, AMD GPUs have a Harware Scheduler,
both the Global and Warp Scheduler are present on chip. The Driver does not Schedule, it just passes
it to the GPU for the HWS to handle. By design, AMD’s HWS relies on more immediate
context, or a constant stream of draw calls entering, not big packets or a Command List
that has to be stored prior to execution. Basically, AMD’s HWS lacks the capability
or mechanism to properly support constant usage of DX11 Command Lists. So when AMD claims they did not focus on Multi-threading
DX11, and instead focused on breaking through the Draw Call ceiling directly with a new
next-gen API like Mantle, you can understand why. It’s not that they can but they won’t, it’s
simply because GCN is incapable of Multi-threading well under DX11’s restrictions. This isn’t to say that AMD’s DX11 is awful,
games can be coded to run really well on AMD DX11. They just need to be well threaded with the
game logic, and able to leave the primary rendering thread mainly to handle draw calls
& submissions to the GPU. An example, you may have heard of the Total
War series. The previous ones like Attilla and Empires,
were single threaded and ran very badly on AMD GPUs. With their recent Total War Warhammer, AMD
had a partnership with Creative Assembly to focus on both Multi-Threading their engine
and adding DX12. The result is that AMD GPUs run this game
well in both DX11 and DX12, and that’s just down to proper game thread usage. However, the reverse is also true, games can
be coded to really hurt AMD performance. It’s just a simple matter of loading more
game logic to the primary thread like CPU intensive effects or PhysX. With the main thread fully loaded, draw calls
to AMD GPUs gets stalled and GPU utilization and performance drops. This happens in some games and it has led
to this incorrect believe that AMD has worse DX11 driver overhead. Contary to what the mantra suggests, as games
become more threaded, it can benefit AMD if the game logic is spread to all the threads,
keeping the rendering thread free for draw calls. While the inverse can also be true, in games
that are well threaded and CPU intensive, NV’s GPU can suffer both from the overhead
of DX11 Command Lists and also from the Software Scheduler. In simple terms, when a game pushes all threads,
the extra overhead reduces the overall CPU cycles available for game logic and the frame
rate slows down as a consequence when there’s not enough CPU power. Or if the CPU is powerful enough, it results
in higher CPU usage for a similar frame rate. We can see this typically in recent generation
of multi-threaded console ports, such as Call of Duty Black Ops 3 benchmarked here on an
GTX 970 and RX 480. Note the 970 is fully maxing the 4 CPU cores,
while outputting lower frame rates than the RX 480 which still has CPU resources in reserve. Again in Crysis 3, on a quad core, similar
frame rates, but much higher CPU utilization for the GTX 970 and 1060. This is the DX11 Command List & Software Scheduling
overhead, it takes CPU cycles to perform these tasks, this cost is often hidden because most
games do not stress all CPU threads so highly. The same can be seen in The Witcher 3 in Novigrad
where the game becomes CPU bound on a quad-core. Again, similar frame rates for all 3 GPUs,
but note the CPU usage is maxed for NVIDIA while the RX 480 still has ample in reserves. It is a specific phenomenom, whereby games
are both CPU intensive and multi-threaded, but they also leave the rendering thread free
for draw calls for AMD GPUs to function well. When you actually think about it, it is actually
NVIDIA’s Driver that has higher CPU overhead as it needs more CPU cycles to function. The reason they benefit with this approach
is because games have been very slow to push multi-threading, think back from 2012 to 2015,
most titles barely use 2 threads and so a lot of CPU threads remain idle. Meanwhile, during these years, AMD GPUs suffer
in some DX11 games and people falsely assume they have higher driver overhead. Games do not come close to pushing the draw
call ceiling even for AMD’s GPUs. The problem all along was these games fully
loaded the main rendering thread. The correct statement is that NVIDIA GPUs
have a higher CPU overhead, it uses more CPU cycles, but can potentially have much higher
DX11 draw calls if there’s idling CPU resources available. Inversely, AMD’s Driver is more direct to
the GPU, but it is very weak against the rendering thread bottlenecks. As for the situation on DX12 & Vulkan, adoption
has been slow and many studios design their games for both DX11 and these APIs, it’s not
an approach that yields the best result for either DX12 or Vulkan as these APIs require
a ground up engine redesign to truly benefit. The reason DX12 & Vulkan benefit AMD’s GPU
because the GCN architecture was designed for an API that can fully tap into it’s Hardware
Scheduler. DX12 and Vulkan natively allows draw call
processing and submission on all CPU threads. On AMD, this goes directly to the Hardware
Scheduler which can have 64 indepedent queues for distributing to the Compute Units. In some ways, these new APIs are removing
the shackle that AMD GPUs has been carrying in the DX11 era. In regards to NVIDIA’s DX12/Vulkan, because
of it’s Sofware Scheduler, there’s some overhead to separate draw call submissions directly
across many threads as the streams have to be reassembled. This is not going to be a big impact, considering
the draw call potential of DX12 or Vulkan, but when these games do not stress draw calls
and are not coded to scale as well as NV’s DX11 Multi-threading, performance can regress. However, done right by a good developer, both
DX12 & Vulkan has big potential in future games that will push scene complexity much
higher than currently. We are still really on the verge of truly
meaningful performance gains with these APIs because game complexity has somewhat stagnated
due to the current console generation limiting game designs complexity. Knowing these differences between AMD & NV’s
strategy, can we conclude which approach is better? I think we can, clearly NVIDIA has been winning
and is still winning, with marketshare, revenue and profits. However, it puts AMD’s approach in perspective. As an underdog, they had to gamble with Mantle
to spur on real next-gen APIs that can fully tap into their GCN architecture as well as
their high core count FX CPUs. If it had never been done or pushed, DX11
would live on for much longer and that is simply not in AMD’s interest. NVIDIA meanwhile, moved towards Software Scheduling
that benefits from an API that works best with more Driver intervention, and they invested
heavily in maximizing their DX11 capabilities. It is therefore in their interesting to ensure
DX11 remain the target API for as long as possible. Moving forward, what is the optimal strategy
for AMD & NVIDIA? Note that their strategies do not align, what
is good for one is bad for the other. AMD needs Vulkan & DX12 to be adopted faster,
with more multi-threaded game engines, and this is particularly important now with Ryzen
CPUs in the market. AMD’s main advantage here is their 8 core
Jaguar CPUs in the major consoles dictate game studio optimizations. However, NVIDIA has more money, simply put,
they can dictate how games are optimized for the PC port. In this kind of competitive battle, NVIDIA
has a huge advantage because money matters here, it allows more engineers on site with
game devs and it allows more studio sponsorships. This ultimately results in NVIDIA having a
major say in how the PC port is coded, and assuredly, next-gen API or not, they will
make it run superior on their GPUs. Therefore, AMD has no choice but to design
their hardware to be more resilient against NVIDIA’s strategy. They started with Polaris, the Discard Acelerator
nullifies x32 and x64 Tessellation bottlenecks. The improved HWS & Instruction Pre-Fetch slightly
alleviates primary thread bottlenecks. Vega must take this further, vastly improving
it’s HWS & shader efficiency because DX11 is still going to be important in the foresable
future. It’s going to be a very interesting few years
ahead as we witness these great tech companies battle it out, this we can be sure of. If you’ve stuck around for this long, things
will start to add up and perhaps something clicks in your mind regarding Ryzen’s odd
gaming performance. Because of NVIDIA’s Software Scheduler and
DX11 Multi-threading, if it is not optimized for a new CPU architecture, like Ryzen with
it’s unique CCX design, it could negatively impact performance in a few ways. First, the Sofware Scheduler under DX12 with
each thread submitting draw calls, the re-organization and optimization of these multiple sources
has high thread data dependencies. In DX11, the auto Multi-threading feature
constantly splits up draw calls into multiple worker threads, then reassembles them into
a Command List. This again is a high thread dependency task. If these trips across the CCXs to share or
sync thread data happens often, there’s extra latency. This is exacerbated when reviewers test low
resolution, very high frame rates, as each frame now has to be rendered by the CPU much
faster, the latency penalty kicks in. With AMD’s GPUs, both in DX11 and DX12, draw
call submission is thread independent as the draw calls get sent straight to the GPU’s
HWS. This is not to say there aren’t other issues
with Ryzen, such as game specific optimizations or Windows thread management. But there’s no multi-threaded thread dependent
GPU Drivers that can compound the problem. I’ve been waiting to see whether any tech
journalist would go down this route and as expected, most of the tech press aren’t interested
in delving deep, there’s an alarming lack of curiosity for a profession that demands
it. However, it is refreshing to see a certain
Youtuber, Jim @ AdoredTV, has uncovered the surface of this issue. You’ve guys no doubt have seen it, if you
have not, I will link it in the description. It’s a brilliant piece of tech journalism. This video has been in the back of my mind
for a long while as I thought about how best to present it, hopefully this has been satisfactory
and insightful for you guys. And btw, sorry for the voice, I’ve got a nasty
cold. Until next time..

AMD vs NV Drivers: A Brief History and Understanding Scheduling & CPU Overhead
Tagged on:                                         

100 thoughts on “AMD vs NV Drivers: A Brief History and Understanding Scheduling & CPU Overhead

  • April 2, 2017 at 3:32 pm
    Permalink

    That
    is a good video.  The ending kind of pisses you off though.  Nvidia is
    handicapping game performance on the PC, so they can maintain their near
    monopoly of the market.

    Reply
  • April 2, 2017 at 3:34 pm
    Permalink

    Really good work. I love how both you and Jim go deeper into things and try to educate people on what's really going on.

    Now, regarding the conclusion of your video: like you said, as long as Nvidia has the $$$ to dictate how PC ports are built, they will stay ahead.

    However, I need to point out that a significant development that will tilt the balance in AMD's favour within 2017 is Microsoft's stance that largely favours DX12. Microsoft wants people to move from Windows 7 to Windows 10. It wants developers to use DX12 which will help in forcing people to upgrade. It wants people to be able to buy the game in the Windows store and play it on both XBox and PC. For reasons of its own (Win 10, competition with Sony) we now have this 1000 pound gorilla that's pushing towards AMD's direction and there's nothing Nvidia can do about it. All AAA titles for Scorpio will be DX12 and ported with MS sposoring as Windows exclusives on the Windows store.

    Meanwhile, the rest of the game studios have to compete and suddenly big players (e.g. Bethesda) that want to be free to compete across all platforms and on all stores (Windows/Steam/etc) are starting to pick up Vulkan in order to keep up. Suddenly you have Bethesda cozying up to AMD in order to push in he parallel direction.

    That right there, is NVidia's problem.

    Reply
  • April 2, 2017 at 3:36 pm
    Permalink

    Thx for the video.

    Reply
  • April 2, 2017 at 3:44 pm
    Permalink

    Thanks, this was very informative and explains many things.

    Reply
  • April 2, 2017 at 3:51 pm
    Permalink

    It all comes down to Nvidia crippling their competition in any way possible (CMDLists, Gameworks "API", Overtessellation…) by adding artificial "empty workloads".

    Reply
  • April 2, 2017 at 3:52 pm
    Permalink

    I was suspecting this is the reason why I was getting higher framerates with 9800 GTX+ than R9 270 in our OGRE 1.9-based game. Is that true?

    (OGRE 1.9 is very poorly designed, and the game is entirely single-threaded)

    Reply
  • April 2, 2017 at 3:53 pm
    Permalink

    Wow. Great video. Please keep sharing your insight.

    Reply
  • April 2, 2017 at 3:58 pm
    Permalink

    If this is true, it's amazing how the many tech site continue to mislead the masses for personal gain. Many of them refused to test ryzen with amd gpus.

    Reply
  • April 2, 2017 at 3:59 pm
    Permalink

    Great video. You strike me as a more educated and less AMD biased version of Jim (though I still think he's a decent tech reviewer). Keep up the good work and more subscribers will be inbound!

    Reply
  • April 2, 2017 at 4:15 pm
    Permalink

    Excellent coverage of the issues at hand! I truly hope that more reviewers, especially those such as Gamers Nexus that purport stringent testing methodology, take this sort of information to heart and learn how to better present data to their viewers. Definitely an earned sub here.

    Reply
  • April 2, 2017 at 4:32 pm
    Permalink

    I`m cum with this video…this is normal ?

    Reply
  • April 2, 2017 at 4:32 pm
    Permalink

    Dam some real intelligence in a propaganda youtube community!

    Reply
  • April 2, 2017 at 4:36 pm
    Permalink

    There is one big caveat with synthetic draw call benchmarks; NVidia's driver is "optimized" to be good at processing the same draw call being called over and over again. So if we have:

    Draw(
    Rock()
    Rock()
    Rock()
    Rock()
    Rock()
    )

    NVidia's driver will eat AMD's. But if we have:

    Draw(
    Rock(CastShadow)
    Rock(CastShadow)
    Rock(CastShadow)
    Rock(CastShadow)
    )

    The difference will be negligible (at best) between the GPU vendors. Same with:

    Draw(
    Rock()
    Tree()
    Rock()
    Tree()
    Rock()
    Tree()
    )

    So that's to be taken into account. And since NVidia uses a software scheduler, it's likely that AMD's driver has less overhead in DIrect3D 9.

    Reply
  • April 2, 2017 at 4:37 pm
    Permalink

    Damn you are the Man for sure. Very Very interesting this is.

    Reply
  • April 2, 2017 at 4:46 pm
    Permalink

    Thank you for this video. Your explanation is great and explains a lot of what I've seen in the past. It gets past the overly reductionist explanations of why Nvidia and AMD fare better or worse in certain titles. One question: How invested is Nvidia in their software scheduler, and in the longer term, are they incentivized to move away from their current approach? I understand Nvidia trying to incentivize DX11 and hold back the adoption of DX12, but are we likely to see a shift in approach come Volta? Will it take even longer before they move towards a new strategy? And lastly, are there any natural extensions to Nvidia's current driver-based implementation that would either reduce overhead, or improve threading? It seems likely that Nvidia has some roadmap for how they will proceed when DX12/Vulcan gains wider adoption.

    Reply
  • April 2, 2017 at 4:55 pm
    Permalink

    I almost never subscribe or comment on YouTube. Just subscribed, best of luck with this YouTube channel and your other endeavors.

    Best Regards,
    Mo

    Reply
  • April 2, 2017 at 5:25 pm
    Permalink

    If a boss were to ask, "So knowing this, Which design scales better for the future?"

    Reply
  • April 2, 2017 at 5:37 pm
    Permalink

    Fantastic video, made many things clear to me, you got a new sub, keep up the good work !

    Reply
  • April 2, 2017 at 5:53 pm
    Permalink

    I will support AMD and buy their brand both cpu and gpu cause i believe we are going to see a shift take place!
    Great stuff, Thanks.

    Reply
  • April 2, 2017 at 5:54 pm
    Permalink

    awesome video man, you're a hero.

    Reply
  • April 2, 2017 at 6:03 pm
    Permalink

    I want to thank the guy that sent me here! Absolutely awesome video. Great job.

    Reply
  • April 2, 2017 at 6:06 pm
    Permalink

    Brilliant video.

    Reply
  • April 2, 2017 at 6:16 pm
    Permalink

    Oh my God you're GODLIKE, I've learned more from this 20 min video than from all that amateur techtubers that just benchmark, but dont understand WHY this numbers happened, and just throw numbers at you.
    I beg you please do more videos about this stuff, its SUPER interesting.

    Reply
  • April 2, 2017 at 6:17 pm
    Permalink

    wow this was amazing. very educational thank you 🙂

    Reply
  • April 2, 2017 at 6:28 pm
    Permalink

    So what do you think about NV Volta? Will it continue to use software scheduler or move towards hardware just like AMD? Because obviously Vulkan and DX12 will soon become main and only APIs….I'm super interested about what are your predictions about this Volta vs Vega architechures! Please tell me!)

    Reply
  • April 2, 2017 at 6:43 pm
    Permalink

    given that hardware queues are available from software side (otherwise they won't be available via Vulkan API), why can't AMD write their own software scheduler for DX11 games?

    Reply
  • April 2, 2017 at 6:46 pm
    Permalink

    Wish everyone saw this… people who are deciding between a 7700k and a Ryzen 8 core owe it to themselves to see if Nvidia puts out a better software scheduler in their driver to use more than 4 cores or how Vega's updated hardware scheduler improves things on amd's gpus side.

    Reply
  • April 2, 2017 at 7:42 pm
    Permalink

    great video, as someone who had an rx480 until recently, it's definitely noticable the better dx12 usage on it. my 980ti averages the same dx12 as my 480, but utterly destroys it in dx11 it's nice to finally understand why.

    that being said, if NV effectively made dx11 multicore wouldn't that mean we should compare AMD dx12 to Nvidia dx11?

    Reply
  • April 2, 2017 at 7:46 pm
    Permalink

    Another superb tech channel !

    Reply
  • April 2, 2017 at 7:47 pm
    Permalink

    this video kills two flys with one strike. it explains why ryzen performs so bad with nvidia cards and why amds gpus are worse at dx11. essentially amd has the masterplan of people going for high core count cpus (ryzen anyone?), which are to be used in dx12 (derived from mantle herpderp), which offers the ability for their gpus to actually be delivered the drawcalls in a sufficient manner.

    Reply
  • April 2, 2017 at 8:04 pm
    Permalink

    This needs ALL the views

    Reply
  • April 2, 2017 at 8:17 pm
    Permalink

    Amazing! Thank you for this!

    Reply
  • April 2, 2017 at 8:21 pm
    Permalink

    Really solid video and great catch on the NV driver and Ryzen at the end.

    I actually went over around half of this video in my head today before watching it just now.

    I pretty much agree on all points, however I'd say that most game devs aren't a problem, it's the publishers who Nv can convince to add useless crap like we've just seen on ME: Andromeda again. This is where I agree most, that Vega needs to have gone the extra mile in negating this stuff.

    AMD also hasn't leveraged their advantages nearly enough. My Tomb Raider results show that Pascal has a real issue under certain scenarios, but we've never seen AMD push that advantage. It's the biggest difference between the companies tbh.

    Reply
  • April 2, 2017 at 8:35 pm
    Permalink

    I am still running a gtx480 :p its a potato and a toaster at the same time. Very good video AdoredTV brought me here xD

    Reply
  • April 2, 2017 at 8:45 pm
    Permalink

    i bet Tech Showdown will start striking this channel also HHHHHHHHHHHH

    Reply
  • April 2, 2017 at 9:03 pm
    Permalink

    Now that was a proper nerdgasm. Thank you!

    Reply
  • April 2, 2017 at 9:16 pm
    Permalink

    I know nothing of all this, but it seems amd's aproach is overall better

    Reply
  • April 2, 2017 at 9:50 pm
    Permalink

    It seems the author of the video has no idea what he is talking about. Global thread scheduling and SM local wavefront scheduling, both have been hardware based in all GPUs since like forewer (wavefront level scheduling with decoupled math ops was actually introduced in R500). With many ACEs global thread scheduling is just distributed over several instances of compute command processors, each is a simple scalar CPU core which can process a bunch of driver methods and schedule compute kernels, that's it. There is nothing new in dedicated engines for some tasks, there have been dedicated command processors for multiple async DMAs and multiple CUDA kernels since Fermi, both have nothing to do with DX graphics commands and draw calls because these are handled by dedicated hardware Graphics Command Processor in all GPU architectures, and even one Graphics Command Processor can handle many queues asynchronously if it's designed in such way

    Reply
  • April 2, 2017 at 10:10 pm
    Permalink

    Highly educational video. Thank you. Thank you a lot.

    Reply
  • April 2, 2017 at 10:27 pm
    Permalink

    Interesting, keep up the good work!

    Reply
  • April 2, 2017 at 11:04 pm
    Permalink

    +NerdTechGasm Could you make a video on the same subject but talk more about Vulkan API please. And did you dive in to software that can manage CPU affinity? Process Lasso is a great one!

    Reply
  • April 2, 2017 at 11:09 pm
    Permalink

    1. Thanks for this video. In today's popcorn entertainment world and 10-second attention spans of the masses, an informed and technical approach to computer hardware is as welcome as a well in the middle of the sahara desert.
    2. You sound like the Terminator so extra kudos for that as well.

    Reply
  • April 3, 2017 at 1:07 am
    Permalink

    Great video! I appreciate the detailed explanation about each approach to drivers and performance. I remember seeing a chart on Ryzen launch day… Ryzen was very close to broadwell in many different workloads, except it was noticeably behind in gaming. Rather than treat this oddity as an outlier and investigate it, the tech press threw their hands up and said, "Welp, since NVIDIA must be the only GPU company in the world, I guess Ryzen just underperforms in gaming." A very few number of outlets have compared Ryzen vs Intel using Radeon GPUs, and some of that data is showing that Ryzen may not be as far behind in gaming as was originally thought. It appears NVIDIA is currently the biggest vendor standing in between Ryzen and great 1080p gaming performance. If any GPU vendor had early access to be able to optimize their GPU driver for Ryzen, it would be AMD.

    Reply
  • April 3, 2017 at 1:21 am
    Permalink

    Excellent video and awesome explanation. I am just wondering why NVIDIA is beating AMD in DOOM VULKAN in terms of CPU performance, although in theory, AMD should be ahead. http://i.imgur.com/3PlPPyI.jpg

    Reply
  • April 3, 2017 at 1:51 am
    Permalink

    The breakdown was really interesting and easy to understand. I like how you explained how and why, rather than just blaming X company has better or worse drivers. Looking forward to more videos.

    Reply
  • April 3, 2017 at 2:34 am
    Permalink

    It sounds like Nvidia buffer should introduce lag, but nobody is actually testing this, I mentioned in some other AdoredTV video, reviewers does not seem to be interested. Lag should be considered second most important if not the most important metric with FPS. It effects experience fundamentally, and basically invalidates any effort to get better monitor refresh rates for premiums.

    Reply
  • April 3, 2017 at 3:09 am
    Permalink

    This is one of the most revealing explanation about gpu and pc gaming related issue I've seen since quite some time, great job!

    There is one thing I'd like to know though, and it seemed to be in line with your proclaimed background as software engineer…

    You see there is concern about how dx12 would possibly actually be bad for pc gaming in relation with how much of game optimizations would ultimately be put into game/engine developers/engineers rather than something that be in the control at the hands of hardware makers like nv.

    The potential problem should be obvious if you follow pc hardware and gaming stuffs, that is that dx12 would need too many efforts to tailor game performances toward the vast array of possible pc hardware combination that are being used.

    Not only it would burden the larger game developers but what's worse is what it could mean for smaller developers, they could be left with their games rather unoptimized and simply cannot do better because the lack of fund and manpower. This issue is very relevant to certain "niche but not so niche" games that inherently tax hardware quite nicely and thus may require much fine tuning from the software side of things; I'm talking about games like xplane, cities skylines, wargame etc; they came from small developers.

    Is there any truth to that concern?

    Reply
  • April 3, 2017 at 3:37 am
    Permalink

    Wonderful work there
    but seriously upgrade your microphone
    Looking forward to more videos from you

    Reply
  • April 3, 2017 at 8:57 am
    Permalink

    I don't see AdoredTV and you, NerdTechGasm, as much tech journalism as i view this as fundamental material and teaching for the 'tech youtubers' to gain more knowledge and explain their results in a better more informative way. Keep this coming, i find it very interesting to view.

    Reply
  • April 3, 2017 at 1:02 pm
    Permalink

    How you only have 1300 subscribers is beyond me.

    Reply
  • April 3, 2017 at 1:54 pm
    Permalink

    Thank you for researching and sharing your findings. Knowledge is power.

    Reply
  • April 3, 2017 at 2:24 pm
    Permalink

    Subbed, appreciate you going to this amount of effort to explain what others choose to ignore! 👏

    Reply
  • April 3, 2017 at 3:32 pm
    Permalink

    thx great vid
    you keep digging we need the like's of you and Jim investigating ..things like this are just as important as frame rates times ect …

    Reply
  • April 3, 2017 at 6:53 pm
    Permalink

    Its very rare I'll 'like' anything on the tubes, much less make a comment in a thread about it… its feels like tossing fish to performing seals for balancing a ball on their nose screaming 'LOVE ME' 'FEED ME!'…
    But DAMN that's a fantastic vid! Thank you! I have learned so much in 20 mins than a year of 'reviewers' who haven't got a clue (MOST of them it seems…) and to be honest all I need to DEFINITELY make my mind up and go all in with AMD and Ryzen.
    (I know that was not your intent to push anyone one way or the other but something about the whole thing leaves a bad taste in my mouth)
    Really cool to see Adored TV was spot on too! Well done all

    Reply
  • April 3, 2017 at 9:04 pm
    Permalink

    What a great video. Saw it recommended from Jim and was not disappointed.

    Reply
  • April 4, 2017 at 7:12 am
    Permalink

    "Often rarely discussed" lol..

    Reply
  • April 4, 2017 at 5:22 pm
    Permalink

    Excellent work. I have always wondered why AMD never did something to fix their DX11 driver and why they seemed to be slightly better at Vulkan.
    If what you say is true it all makes sense now. First thing I will look after in the first Vega reviews is about the scheduler!

    And yes, it is a shame "reviewers" just do their loops and says it is done without understanding anything or telling their readers/viewers. Nowadays you can find better(I know most aren't) youtubers than review sites.

    Reply
  • April 4, 2017 at 7:50 pm
    Permalink

    Thank you for making this video, I have been wondering why CPU usage is so much higher when PC's are running NV GPU's vs AMD GPU's. This did a great job of explaining this.

    Reply
  • April 4, 2017 at 8:47 pm
    Permalink

    I get the impression that this is something that has been substantially improved with the newer Polaris cards, I switched from an R9 280X to an RX480 and despite both substantially exceeding the requirements to run Bioshock Infinite I noticed after switching to the RX480 that there was substantially less CPU related microstuttering. Infinite is known for being a problem for CPU utilisation because of the way the game streams assets in the background (it loads and executes AI scripts as well as textures and level data during gameplay). I tested this myself (I kept task manager, GPU-Z and MSI Afterburner open in the background) and on the 280X I used to get quite serious stuttering whenever the game was loading something in the background, and with the RX480 that problem has almost been completely alleviated.

    Regardless, this was a fascinating video, you've earned another sub.

    Reply
  • April 5, 2017 at 11:19 pm
    Permalink

    Brilliantly explained, let us hope others get it now.

    Reply
  • April 5, 2017 at 11:31 pm
    Permalink

    Good explanation of whats going on, it reminds me of a video i watched about idk 2-3 years ago someone explaining what cards and CPU's to get for low end rigs on a budget. it brought up if using something like an i3 AMD cards was better where only 2 physical cores and 4 threads ran better on the R9 290 than on a GTX580 and was noted that nVIDIA showed significantly higher core utilisation. So the software scheduler nVIDIA uses consumed a decent amount of resources crippling a dual core processor thats already going to be fairly taxed to begin with and with the overhead that would only get worse with limited resources.

    Reply
  • April 6, 2017 at 9:33 am
    Permalink

    amd's tesselator design is also a bottleneck, it doesn't scale well past a 16x scale, this is what caught them up in several gameworks titles that use a high factor scale.

    amd has had almost 10 years to fix that, and have as so far ignored it.

    Reply
  • April 6, 2017 at 11:23 am
    Permalink

    Great breakdown on how game developer laziness hampers progress and that the future is here and now with AMD, it's just that the future doesn't work as well with the present.

    The whole direct X12 and Vulkan saga will continue until like with DX11, game developers take the API seriously from the ground up and not just port DX11 coding to the DX12 API. Just like they did with DX9 to DX11 which took a few years before DX11 came into its own.

    Just as DX11's introduction saw the uptake of quad cores becoming slowly standardised, DX12 and Vulkan are the API's for 8 cores and beyond.

    It's nice to know that the current Radeons will only get better under DX12 and Vulkan once game engines are designed from the ground up, if history is to repeat, once game engines were designed with DX11 in mind, that API just creamed DX9 in performance. We already are seeing that now in some games.

    Reply
  • April 6, 2017 at 5:31 pm
    Permalink

    Yeah I'm an ADOREDTV. CHEERLEADER !! He's the real deal! Your gonna learn something when you watch his videos

    Reply
  • April 6, 2017 at 6:06 pm
    Permalink

    That's a good theory. I wish I had time to test all that stuff (well, I might, come time).

    Regards from PCGH. (PCGH_Phil)

    Reply
  • April 7, 2017 at 2:59 am
    Permalink

    Thanks for the good job in explaining these issues. There are only about 6 of you out there in Youtube land that explain it indepth with out hype. I believe in a blind test the majority of users would not be able to notice any differrence in performance between the I7 and 1700. They would all notice the differrence in price. Fps testing is the standard and it is what people have come to expect maybe we should test for how much fun you can have versus how much you had to spend. Right now the only thing intel can hang onto is 10 to 20 percent fps advantage in some games that will problably go obsolete in six months, while Ryzen can run its worst game and stream it just fine.

    Reply
  • April 8, 2017 at 1:49 am
    Permalink

    +1 very insightful 😀

    subbed
    and thanks for the research 😀

    Reply
  • April 12, 2017 at 12:03 pm
    Permalink

    I have no doubt that you and AdoredTV are at least partially responsible for Anandtechs recent review of the 1600X and 1500X using both Nvidia and AMD cards! Good job, guys. Props to Anandtech for doing such a good review as well.

    Reply
  • April 12, 2017 at 6:27 pm
    Permalink

    another technical video by a non technical person claiming to be "in the know". you obviously do not understand what an instruction scheduler is and how it works..

    Reply
  • April 14, 2017 at 12:20 pm
    Permalink

    scottish adoredtv AMD fanboy STFU!

    Reply
  • April 18, 2017 at 3:07 am
    Permalink

    you are one intelligent man could you make a video about what cores vs threads do

    Reply
  • April 19, 2017 at 9:22 pm
    Permalink

    "I DEMAND A NEW VIDEO", please!
    🙂

    Reply
  • April 21, 2017 at 1:08 pm
    Permalink

    Boykot Nvidia now!

    Reply
  • April 21, 2017 at 5:11 pm
    Permalink

    I just wanna say … Congrats !!!

    Reply
  • April 21, 2017 at 6:49 pm
    Permalink

    Brilliant minds forced to waste their time on inventing workarounds for obsolete technologies, and all because of greedy companies.
    Imagine what nvidia and amd engineers could reach if they worked together and got infinite moneys
    We could've been playing on 4K @ 120 FPS right now

    Reply
  • April 24, 2017 at 5:57 pm
    Permalink

    Another sub earned!

    Reply
  • April 24, 2017 at 8:40 pm
    Permalink

    Gods work.

    Reply
  • May 1, 2017 at 8:24 am
    Permalink

    Absolutely subbed

    Reply
  • May 22, 2017 at 9:40 am
    Permalink

    Great video. You cleared up so many questions I have had over the last months comparing performance on my 970's when comparing video's in games under the same settings in CF.

    Reply
  • May 29, 2017 at 9:32 pm
    Permalink

    Nice video!

    Reply
  • June 6, 2017 at 8:49 pm
    Permalink

    Best Tech channel I came across, thank you for the informative videos

    Reply
  • June 10, 2017 at 12:52 am
    Permalink

    I guess that applies to OpenGL too, being also of a pre-Mantle era.

    Reply
  • June 17, 2017 at 5:18 am
    Permalink

    Lol ur never gonna get any love from the big companies with this kind of deep factual digging. So glad I followed the link from adoredtv's comment section.

    I'm not entirely sure nvidia fanbois can even get through this vid without bleeding from the eyes and small explosions in the grey matter area.

    Too bad API adoption has been slow, because all of us would benefit otherwise from better results on less than top dollar hardware. (Says the guy, btw, running a 1080ti on a Ryzen 1800x 😉 ).

    As far as the rest of the press can't get free shit by digging this deep bruh

    Reply
  • August 6, 2017 at 6:48 pm
    Permalink

    This is a well detailed, yet well explained video on the key differences between NV and AMD. I will probably link this video for years until the consensus finally understands that NV and AMD both achieve the same goal, just using different techniques to get there that are equally valid, rather than this "I'll believe whatever marketing says" approach.

    Reply
  • August 7, 2017 at 2:27 am
    Permalink

    Great information! Some of this was borderline over my head, but I appreciate you laying it out in a reasonable format for those of us who don't eat, drink, and breath this stuff…but still enjoying being informed!

    Reply
  • August 7, 2017 at 7:29 pm
    Permalink

    Nerdtechgasm, does this Nvidia's multithreading driver functionality has anything to do with the Nvidia Control Panel settings "threaded optimizations"? And if so, would it be beneficial for dual core systems to turn it off and quad core systems or higher be turning it on?

    Reply
  • August 22, 2017 at 6:05 pm
    Permalink

    Might be time for another video since Vega release. Interesting Ryzen/Vega results displayed here https://youtu.be/SAQc6bdotG0

    Reply
  • August 23, 2017 at 2:54 am
    Permalink

    This video was very informative and well though out. I never realized that NVidia had a software scheduler.

    Reply
  • October 16, 2017 at 5:39 am
    Permalink

    But this isn't helping me understand how AMD drivers age like fine wine for newer games on older gpus or how older games are worse on both sides (directx 8 and 9) on new cards

    Reply
  • December 24, 2017 at 7:04 pm
    Permalink

    if only drivers were open source
    like on linux

    Reply
  • December 28, 2017 at 8:35 am
    Permalink

    does this have anything to do with the driver setting "threaded optimisation"? i have a weak cpu (g3258) so i forced it on for gta v and my game stuttered less and had a reasonable frame rate, i assume it has something to do with the fact that core 0 has more cpu time for the game rather than just using most of its time for draw calls

    Reply
  • January 4, 2018 at 3:22 am
    Permalink

    Shit dota works much faster now when i got nvidia,no high fps then drop suden, even with low gpu usage game output much more fps , doom works very nice also explosions are no fps problem

    Reply
  • March 27, 2018 at 1:15 pm
    Permalink

    You need to make mroe vids man. There's really not enough of this in depth analysis available.

    Reply
  • June 9, 2018 at 4:28 pm
    Permalink

    NVIDIA SUXCKZXCZXCZXCZX!!! SEE THAT YOU NVIDIOT!!! NVIDIA IS A TOTAL PIECE OF CRAP!!!

    Reply
  • September 17, 2018 at 11:27 am
    Permalink

    You made mistakes.
    Nvidia to devs in DX12:
    "Don’ts

    Don’t rely on the driver to parallelize any Direct3D12 works in driver threads.

    On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12
    While the total cost of work submission in DX12 has been reduced, the amount of work measured on the application’s thread may be larger due to the loss of driver threading. The more efficiently one can use parallel hardware cores of the CPU to submit work in parallel, the more benefit in terms of draw call submission performance can be expected."
    https://developer.nvidia.com/dx12-dos-and-donts

    Your slides were backwards, as it is in DX11 where Nvidia threads submit directly to the GPU, the main thread simply fires them off by referencing them. DX12 has single threaded submission.
    Vulkan does support Nvidia's method of having your primary buffer reference your secondary buffers, but it is DIY, not automatic.

    AMD worked with Dice on DX12 implementation, and Dice developers acted as spokesmen, giving talks on DX12.
    BF1
    Athlon 4x has a 0.01% of 43.3 FPS in DX11 but only 5.8 FPS in DX12.
    i3-6300 has a 0.01% of 70.3 in DX11 but only 21.5 FPS in DX12.
    i7-6700k has a 0.01% of 96 FPS in DX11, but only 32.5 in DX12.
    i7-5930k has a 0.01% of 107.3 FPS in DX11, but only 28.8 in DX12.
    DX12 loses FPS with a higher core CPU that has a lower per core clock. DX12 is less able to multi-thread than DX11.
    BF1 DX11
    https://www.gamersnexus.net/images/media/2016/game-bench/battlefield/cpu/bf1-cpu-benchmark-dx11.png
    BF1 DX12
    https://www.gamersnexus.net/images/media/2016/game-bench/battlefield/cpu/bf1-cpu-benchmark-dx12.png

    DX12 needs a high per core IPC and is not able to use as many cores as DX11. A higher core/thread count gets you less performance in DX12, but a higher performance in DX11.
    DX12 doesn't improve graphics, and harms performance. As core count increases DX12 will get worse compared to DX11.
    It is simply a way for AMD to force Nvidia onto single threaded drivers, and not of benefit to consumers or game developers.

    Reply
  • September 17, 2018 at 8:08 pm
    Permalink

    Great video, very well researched and presented

    Reply
  • October 10, 2018 at 7:22 pm
    Permalink

    would give u an oscar award but i dont have any at the moment 😀

    Reply
  • November 12, 2018 at 6:17 pm
    Permalink

    This made me love AMD even more. They do have superior hardware. They have sabotaged by dick bag game devs that REFUSE to use more than 2 cpu cores. Mean while AMD AND Intel continue to put out multi-core CPUs and with NO benefit for the end user. I do like Nvidia…and they did a smart thing but AMD IS the future and I love that they have stuck to their guns and now that Ryzen is kicking ass they WILL pull ahead with Ryzen 2 and beyond.

    Reply
  • February 3, 2019 at 10:33 am
    Permalink

    dx11 on nvidia really helps slow quad core 2630qm. any games has dx11 help the mobile nvidia gpu to 100% usage. ( ex: dota2, dirt series, pcsx2 emu, … any thing comes with dx11)

    Reply
  • February 11, 2019 at 5:15 pm
    Permalink

    ITT.. Nvidia keeps pushing stagnation and anticompetitive control with money.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *