I find it really interesting what was done in the case of ShmupMAME:
From
https://shmupmame.wordpress.com/about/:
A few months after making the MAME shmups input delay list. Pulsewidth pointed out that Raine could emulate most of the laggier games in mame with less delay, after investigation and clarification by PsikyoFan it turns out that mame emulates a frame/sprite buffer for most arcade hardware. What I found out is that this buffer can ?safely? be removed on most hardware with very little consequences. So after messing around with the mame drivers for a few days I got a first version ready for release.Is anything similar applicable to FBA?
A few years ago some input lag tests were done to compare an old build of FBA from 2008 against ShmupMAME (I assume using Windows XP as the OS):
http://forums.shoryuken.com/discussion/comment/8133809/#Comment_8133809The conclusion from the above tests was that FBA has an additional 2 frame latency compared to ShmupMAME.
So in the case of MAME it was a combination of the core engine having frame/sprite buffers (introducing lag), plus additional per-driver optimisations that resulted in the greatly reduced latency. In the case of Super Street Fighter 2 Turbo (SSF2T), the base input delay of the game itself as measured on arcade is 4 frames. Now I'm really curious as to what optimisations might be possible at the driver level to get the input polling to be truly "next frame". I'm really only concerned with CPS2 emulation, so I took a quick look at cps_run.cpp, especially the Cps2Frame() function, but being unfamiliar with the source I have no idea if this is the right place to be looking at...
One thing that I haven't tried yet is to toggle the "Max frames to render ahead" setting (nVidia) from its default of 3 to 1. Presumably this also helps further reduce input lag and gets us that bit closer to the holy grail of arcade input latency.