Ok, but this isn't how it interleaves
/*modify buffer to enable quad mode*/
test_buffer[0] |= 0x40;
/*set dummy cycles*/
test_buffer[1] |= 0xE0;
test_buffer[2] |= 0x40;
test_buffer[3] |= 0xE0;
The SR will need to be paired.
ie
test_buffer[0] = (test_buffer[0]) & 0xC3) | 0x40; // SR First Bank
test_buffer[1] = (test_buffer[1]) & 0xC3) | 0x40; // SR Second Bank
..
I would make sure the BP bits, which are sticky, are also cleared otherwise writes and erases can fail due to protection mechanics. This can be problematic when switching vendors as these bit definitions can be different. I've mostly seen myself locking Micron's by using Macronix code because the 0x40 is a Protection Bit in their context.
I'm also not sure you need to switch parts to use 4-Byte (32-bit) addressing mode, there are commands that explicitly define 3 or 4-byte .
For loaders, I'd keep them as simple as possible at the outset. You don't need to switch into the most exotic modes, performance is usually more to do with how fast the ST-LINK/V2 or V3 can shovel data across the interface.
In dual bank the page write can expand from 256 bytes to 512 bytes, as it interleaves between the two chips.