If you were creating this in Unity and had 6 DOF movement around your scene in the headset, then yes, scaling would play a big role. But since you are just rendering a 2D spherical image, it doesn't do much. Think of your camera less as a camera and more like a reflective sphere. This sphere is infinitely small- a point. Then imagine somehow you can still see this sphere and it's reflections. Take everything it's reflecting and unwrap it into a rectangle and that is what your "camera" is capturing. Go into front/top/back view and measure the angles from the camera point to various objects. The angles will not change as you scale the scene (assuming your scale origin is the camera), so the point where they hit the sphere will not change, therefore the image will not change.
Caveat:
I don't know how the CV VR camera actually calculates an image, but it's possible that it internally creates 6 cameras- one for each side of a cube, then stitches these images into one (similar to how cheap GoPro 360 rigs work for footage). If this is the case, then there will be a tiny bit of variance as you scale the scene since the 6 cameras must have a specific FOV/mm. For the most part, though, you'll still see the effect I'm talking about and that you seem to be experiencing.