Treats left and right as 4-component vectors of UInt8 and computes dot(left, right)+acc
uint dot4add_u8packed( uint left, uint right, uint acc);