//----------------------------------------------------------------------------------------------------------
AO is a diffuse factor thus I simply put away the spec part (but actually it has its own occlusion factor simulated by RO or SSR in modern real-time projects).
where Sv is a sample vector and 2 is a normalization factor for cosine weighted samples (it's half from uniform sample weights).
3.Sample Kernel
Figure
8. SSAO
without radius test | 32 samples
Article is a bit outdated, with all that fancy compute shader-based aglorithms, but maybe
someone will find it usefull
//----------------------------------------------------------------------------------------------------------
Hemisphere
based ray matched SSAO
1.Introduction
SSAO
is not a new method for solving the Global Illumination.
The very
first implementation
of it was
performed by
Crytek in 2007. Still, SSAO has a lot of problems.
In
this work I have made an attempt to solve them.
2.Equations
and theory
I
start with a simplified form of rendering equation and pick ambient occlusion out of it:
AO is a diffuse factor thus I simply put away the spec part (but actually it has its own occlusion factor simulated by RO or SSR in modern real-time projects).
Next,
I separate Li into two factors, the
incoming light color and brightness and
the occlusion
factor:
(this
approximation precludes all secondary GI bounces, Error 1)
(this
is
an unavoidable
approximation, Error 2)
where Sv is a sample vector and 2 is a normalization factor for cosine weighted samples (it's half from uniform sample weights).
However,
that causes a lot of undesirable calculations. To avoid it dot
product has to
be eliminated. Fortunately, importance sampling allows us to do that:
3.Sample Kernel
I
take Crytek's SSAO as the starting point. The core idea is to take a
number of samples in
a sphere,
project them on
the screen
and test
the depth
difference of sample points and
the depth
of this pixel for negativity, if it’s <0 then
the point
is occluded:
Figure
2. This
method has two major problems :
- Half of the samples are wasted because they are below the surface
Figure
3. Crysis
1 SSAO
in action | 32 samples
he
first thing we need now is to get a hemisphere sampling as our
integration tells us.
There
are two ways to perform it :
- to redirect undersurface sphere kernel samples in the positive normal direction by checking the dot product between the normal and the sample vector:
(dot(normal,
sample)>= 0.0f) ? sample : -sample
- to reorient hemisphere kernel samples by TBN basis of the pixel :
sample =
mul(sample, TBN)
The
dot check is simple therefore I will not stop on it. The second thing
is a trickier one.
We
are to construct a vector basis whose z-axis should be always
directed along the surface normal and x- and y-axes must be uniform
per-pixel random orthonormal vectors so as to get rid of banding
artefacts.
We
need a table of normalized vectors that fills the circle. I am going
to use 9 directions and store them in the 3x3 texture (you may want
to read it from a 2d array , but the texture has proved to be
slightly faster ):
int2
RI = (uv * g_screen.xy)%3;
float3
PointsOnCircle[3][3] =
{
{float3(+0.766044,
+0.642788, 0), float3(+0.173648, +0.984808, 0), float3(-0.5,
+0.866025,0)},
{float3(-0.939693,
+0.342020, 0), float3(-0.939693, -0.342020, 0), float3(-0.5,
-0.866025,0)},
{float3(+0.173648,
-0.984808, 0), float3(+0.766044, -0.642788, 0), float3(+1.0,
+0.000000,0)}
};
float3
random = PointsOnCircle[RI.x][RI.y]
Now I am going to construct the basis
using Gram-Schmidt process:
POC
= PointsOnCircleTexture * 2 – 1;
Tangent
= normalize(POC - normal * dot(POC, normal));
Binormal
= cross(projection_normal, Tn);
This
may require explanation:
Figure
4. Tangent
calculations
After
that the sampling should look the following way :
Figure
5. Hemisphere
sampling kernel
Figure
6. Hemisphere
kernel – dot products | 32 samples
Figure
7. Hemisphere
kernel – TBN basis align | 32 samples
phi = v * 2.0 *
PI;
cosTheta
= sqrt(1.0 – u)
sinTheta
= sqrt(1.0 - cosTheta * cosTheta)
x
= cos(phi) * sinTheta
y = sin(phi) *
sinTheta
z
= cosTheta
4.
Occlusion calculation
In
order to get the occlusion factor we need to check the intersection
of the sampled vector with scene geometry. The simplest way to handle
it is to check the visibility of point, which is done by reading the
depth of the vector projected on the screen and comparing it to the
actual depth of vector. Since the coordinate space consists of screen
vertical axis, screen horizontal axis and depth, which are in ranges
: (0:1, 0:1, -1:1), we are to put z - coordinate of vector, to range
0:1, like out depth is. After that we compare sample.z to the depth:
if it's less, the point is occluded.
Besides,
it is necessary to make sure that the vector does not get out of
sampling radius in world space. To this end, we once again compare
the depth difference to the radius: if it is greater, the sample is
rejected.
The
result is already acceptable, nevertheless, there is still room for
mistakes.
5.Ray
matching
Figure
9. Trivial
occlusion (on the left) vs Ray matching(on the right)
The
nature of error consists in the fact that you still consider lighting
from direction that has already been occluded. Light can't reach the
point from that direction no matter how many times you sample. And in
case of classic SSAO the greater the number of samples past occluder
is, the greater the error is.
To
do ray matching we need to sample the direction a number of times by
extending the sampling vector with each step until it finds the
occluder or its length exceeds the radius.
Figure
10. SSAO
IS vectors | 32 samples
Figure
11. Ray
matched SSAO | 16x3
Trivial
raw Hemisphere based SSAO is greatly underoccluded due to the great
percentage of missed occluders and more noise in general:
6.
Code
static
const float3 kernel32[32] =
{
float3(+0.000000,
+0.000000, +1.000000 ) * 0.695583f ,
float3(+0.693520,
+0.137950, +0.707107 ) * 0.706933f ,
float3(+0.461940,
+0.191342, +0.866025 ) * 0.753630f ,
float3(+0.720074,
+0.481138, +0.500000 ) * 0.400474f ,
float3(+0.250000,
+0.250000, +0.935414 ) * 0.849644f ,
float3(+0.439217,
+0.657334, +0.612372 ) * 0.183093f ,
float3(+0.234345,
+0.565758, +0.790569 ) * 0.477398f ,
float3(+0.182490,
+0.917441, +0.353553 ) * 0.465453f ,
float3(+0.000000,
+0.250000, +0.968246 ) * 0.396867f ,
float3(-0.146318,
+0.735589, +0.661438 ) * 0.666024f ,
float3(-0.213927,
+0.516464, +0.829156 ) * 0.621573f ,
float3(-0.500784,
+0.749477, +0.433013 ) * 0.162192f ,
float3(-0.306186,
+0.306186, +0.901388 ) * 0.725721f ,
float3(-0.689418,
+0.460655, +0.559017 ) * 0.092574f ,
float3(-0.611089,
+0.253121, +0.750000 ) * 0.109925f ,
float3(-0.949641,
+0.188895, +0.250000 ) * 0.356492f ,
float3(-0.176777,
+0.000000, +0.984251 ) * 0.219726f ,
float3(-0.714864,
-0.142195, +0.684653 ) * 0.135819f ,
float3(-0.489961,
-0.202949, +0.847791 ) * 0.974883f ,
float3(-0.734922,
-0.491059, +0.467707 ) * 0.095080f ,
float3(-0.279508,
-0.279508, +0.918559 ) * 0.595859f ,
float3(-0.450063,
-0.673567, +0.586302 ) * 0.742192f ,
float3(-0.243914,
-0.588860, +0.770552 ) * 0.637056f ,
float3(-0.185720,
-0.933680, +0.306186 ) * 0.950980f ,
float3(+0.000000,
-0.306186, +0.951972 ) * 0.900276f ,
float3(+0.150327,
-0.755746, +0.637378 ) * 0.035258f ,
float3(+0.224368,
-0.541672, +0.810093 ) * 0.883426f ,
float3(+0.510324,
-0.763754, +0.395285 ) * 0.550505f ,
float3(+0.330719,
-0.330719, +0.883884 ) * 0.050631f ,
float3(+0.704913,
-0.471008, +0.530330 ) * 0.852166f ,
float3(+0.632537,
-0.262005, +0.728869 ) * 0.406028f ,
float3(+0.965339,
-0.192018, +0.176777 ) * 0.085051f ,
};
static
const float3 kernel16[ 16 ] =
{
float3(+0.000000,
-0.433013, +0.901388 ) * 1.000000 ,
float3(+0.000000,
+0.353553, +0.935414 ) * 0.707107 ,
float3(-0.250000,
+0.000000, +0.968246 ) * 0.577350 ,
float3(+0.433013,
-0.000000, +0.901388 ) * 0.500000 ,
float3(+0.433013,
+0.433013, +0.790569 ) * 0.447214 ,
float3(-0.467707,
+0.467707, +0.750000 ) * 0.408248 ,
float3(-0.353553,
-0.353553, +0.866025 ) * 0.377964 ,
float3(+0.395285,
-0.395285, +0.829156 ) * 0.353553 ,
float3(+0.653281,
+0.270598, +0.707107 ) * 0.333333 ,
float3(-0.864210,
+0.357968, +0.353553 ) * 0.316228 ,
float3(-0.894543,
-0.370532, +0.350000 ) * 0.301511 ,
float3(+0.692910,
-0.287013, +0.661438 ) * 0.288675 ,
float3(+0.331414,
+0.800103, +0.500000 ) * 0.267350 ,
float3(-0.302538,
+0.730391, +0.612372 ) * 0.257261 ,
float3(-0.344946,
-0.832774, +0.433013 ) * 0.228199 ,
float3(+0.317304,
-0.766040, +0.559017 ) * 0.125374 ,
};
float4
IRSSAO(float2 uv)
{
//
world to view space
float3
ProjectionNormal = mul(ProjectionMatrix, WorldSpaceNormal,
0.0h)).xyz;
ProjectionNormal.y =
-ProjectionNormal.y;
float2
NoiseTexUV = 0.50f * screen_res.xy * uv * 0.333333f ;
float3
jitter = tex2D(NoiseSampler, NoiseTexUV) * 2.0f - 1.0f;
float
frame_depth = ReadGbufferZ(uv);
float
AOAccumulator = 0.0f;
float
InvDepth = 1.0f / frame_depth;
float
AspectRatio = screen_res.x/screen_res.y;
const
int TraceSteps = 3;
#if
0
float3
kernel[32] = kernel32;
const
int numSamples = 32;
#else
float3
kernel[16] = kernel16;
const
int numSamples = 16;
#endif
//
Gram-Schmidt process
float3
Tn = normalize(jitter - ProjectionNormal * dot(jitter,
ProjectionNormal));
float3
Bn = cross(ProjectionNormal, Tn);
float3x3
normalBasis = float3x3(Tn, Bn, ProjectionNormal);
UNROLL
for (int j=0; j<numSamples; j+=1)
{
float
TraceResult = 0.0f;
float
rad = 0.0f ;
float3
trace_sample = 0.0f;
float3
sample = mul(kernel[j+0], normalBasis) ;
sample.y *=
AspectRatio;
UNROLL
for ( int i = 0; i < TraceSteps; i+=1 )
{
rad +=
ao_radius_scale ;
trace_sample
= sample * rad;
float
sampleDepth = ReadGbufferZ(trace_sample.xy + uv) * InvDepth;
float
kernelDepth = trace_sample.z;
//
*2 needed since depth is 0 to 1 and SS coordinate Z is in -1 to 1
range
float
dd3 = (1.0f + 2.0f * kernelDepth - sampleDepth);
float
Occlusion = saturate(dd3*10000.0f);
float
RadReject = 1.0f - saturate((dd3-rad) * 1000.0f);
TraceResult =
lerp((Occlusion * RadReject), 1.0f, TraceResult);
}
AOAccumulator +=
TraceResult;
}
AOAccumulator
= AOAccumulator * ao_saturation / numSamples;
float
final_ao = 1.0f - AOAccumulator;
return
float4( pack_ssao(final_ao) , 0.0f, 0.0f, 0.0f );
}
Bonus
Versus
HBAO:
Figure
12. HBAO
| 8x4 | 0.910 ms Radeon 7970
Figure
13. IRAO
| 16x3 | 0.650 ms Radeon 7970
Geometry
Normal:
When
implementing the algorithm in an actual game you may find Gbuffer
normal unsatisfactory because of overocclusion in intence
normalmapping areas. It happens because the depth does not know about
your normals map and what they are trying to do unless you have
height displacement maps and a detailed tesselation.
What’s
more, using Geometry Normal turned out to be a little bit faster.
No comments:
Post a Comment