05:49 pm - Further update on the dynamic-HQ code...
...I've seperated out most of the components into easier-to-digest sub-files that I've been keeping sorted. I noticed something that I mentioned in the previous post, a possible symetry, and I started re-ordering the sorting columns and pixel-orders to test the hypothesis.
Turns out there's a rotational symetry in the algorithm as well as a bilateral segmenting. Basically, I can cut the output equations down to a quarter as large, and by rotating them in 90-degree increments re-assemble all the possible equations I could need. Odd scaling factors will have to be mildly special-cased to prevent double/quad-computing of the same pixel for overlap, but that's possibly optional.
And since these 90-degree increments would only occur in fixed positions (I.E. upper-right gets no rotation, lower-right gets a 90, lower-left gets a 180, upper-left gets a 270) it will be effectively 'free' to do this. The only 'cost' will be a minor expansion of the indexing table to have 4 indexes instead of only 1.
The current code will have three table sets, one fixed, two dynamic. The fixed table set will translate a 12-bit value to a sub-8-bit value. The first dynamic table set will translate that reduced index to the indexes of the specific expansion coeffecients to use. The third will be the table actually holding those coeffecients.
So... we'll have a 16k+4k+Xk total data size, with the XK being very contigiously accessed. 2x2 would be a total data size of 22k, 3x3 would be a total data size of 28k, 4x4 would be a total data size of 28k as well, etc. Reducing the 4k table to a 1k table, the data sizes would become 25k, 35k, or 49k. Noticably larger, and nothing but the 2x2 has a chance of fitting into the L0 data cache even on modern processors. And the 4k and Xk tables will only be accessed 16 bytes or more in a row at any given time, making them very cache-fill-friendly as long as they're kept aligned in memory. The 16k table will (perhaps unfortunately) only be accessed one byte at a time, mostly for compactness.3 comments