//----------------------------------------------------------------------------------------------------------

AO is a diffuse factor thus I simply put away the spec part (but actually it has its own occlusion factor simulated by RO or SSR in modern real-time projects).

where Sv is a sample vector and 2 is a normalization factor for cosine weighted samples (it's half from uniform sample weights).

Article is a bit outdated, with all that fancy compute shader-based aglorithms, but maybe

someone will find it usefull

//----------------------------------------------------------------------------------------------------------

**Hemisphere based ray matched SSAO**

**1.Introduction**

SSAO
is not a new method for solving the Global Illumination.
The very
first implementation
of it was
performed by
Crytek in 2007. Still, SSAO has a lot of problems.
In
this work I have made an attempt to solve them.

**2.Equations and theory**

I
start with a simplified form of rendering equation and pick ambient occlusion out of it:

AO is a diffuse factor thus I simply put away the spec part (but actually it has its own occlusion factor simulated by RO or SSR in modern real-time projects).

Next,
I separate Li into two factors, the
incoming light color and brightness and
the occlusion
factor:

(this
approximation precludes all secondary GI bounces, Error 1)

(this
is
an unavoidable
approximation, Error 2)

where Sv is a sample vector and 2 is a normalization factor for cosine weighted samples (it's half from uniform sample weights).

However,
that causes a lot of undesirable calculations. To avoid it dot
product has to
be eliminated. Fortunately, importance sampling allows us to do that:

**3.Sample Kernel**

I
take Crytek's SSAO as the starting point. The core idea is to take a
number of samples in
a sphere,
project them on
the screen
and test
the depth
difference of sample points and
the depth
of this pixel for negativity, if it’s <0 then
the point
is occluded:

**Figure 2.**

*This method has two major problems*

*:*

*Half of the samples are wasted because they**are b**elow the surface*

**Figure 3.**

*Crysis 1*

*SSAO in action | 32 samples*

he
first thing we need now is to get a hemisphere sampling as our
integration tells us.

There
are two ways to perform it :

- to redirect undersurface sphere kernel samples in the positive normal direction by checking the dot product between the normal and the sample vector:

(dot(normal,
sample)>= 0.0f) ? sample : -sample

- to reorient hemisphere kernel samples by TBN basis of the pixel :

sample =
mul(sample, TBN)

The
dot check is simple therefore I will not stop on it. The second thing
is a trickier one.

We
are to construct a vector basis whose z-axis should be always
directed along the surface normal and x- and y-axes must be uniform
per-pixel random orthonormal vectors so as to get rid of banding
artefacts.

We
need a table of normalized vectors that fills the circle. I am going
to use 9 directions and store them in the 3x3 texture (you may want
to read it from a 2d array , but the texture has proved to be
slightly faster ):

int2
RI = (uv * g_screen.xy)%3;

float3
PointsOnCircle[3][3] =

{

{float3(+0.766044,
+0.642788, 0), float3(+0.173648, +0.984808, 0), float3(-0.5,
+0.866025,0)},

{float3(-0.939693,
+0.342020, 0), float3(-0.939693, -0.342020, 0), float3(-0.5,
-0.866025,0)},

{float3(+0.173648,
-0.984808, 0), float3(+0.766044, -0.642788, 0), float3(+1.0,
+0.000000,0)}

};

float3
random = PointsOnCircle[RI.x][RI.y]

Now I am going to construct the basis
using Gram-Schmidt process:

POC
= PointsOnCircleTexture * 2 – 1;

Tangent
= normalize(POC - normal * dot(POC, normal));

Binormal
= cross(projection_normal, Tn);

This
may require explanation:

**Figure 4**.

*Tangent calculations*

After
that the sampling should look the following way :

**Figure 5.**

*H*

*emisphere sampling kernel*

**Figure 6.**

*Hemisphere kernel – dot products | 32 samples*

**Figure 7.**

*Hemisphere kernel – TBN basis align | 32 samples*

phi = v * 2.0 *
PI;

cosTheta
= sqrt(1.0 – u)

sinTheta
= sqrt(1.0 - cosTheta * cosTheta)

x
= cos(phi) * sinTheta

y = sin(phi) *
sinTheta

z
= cosTheta

**4. Occlusion calculation**

In
order to get the occlusion factor we need to check the intersection
of the sampled vector with scene geometry. The simplest way to handle
it is to check the visibility of point, which is done by reading the
depth of the vector projected on the screen and comparing it to the
actual depth of vector. Since the coordinate space consists of screen
vertical axis, screen horizontal axis and depth, which are in ranges
: (0:1, 0:1, -1:1), we are to put z - coordinate of vector, to range
0:1, like out depth is. After that we compare sample.z to the depth:
if it's less, the point is occluded.

Besides,
it is necessary to make sure that the vector does not get out of
sampling radius in world space. To this end, we once again compare
the depth difference to the radius: if it is greater, the sample is
rejected.

**Figure 8**.

*SSAO without radius test | 32 samples*

The
result is already acceptable, nevertheless, there is still room for
mistakes.

**5.Ray matching**

**Figure 9**.

*Trivial occlusion (on the left) vs Ray matching(on the right)*

The
nature of error consists in the fact that you still consider lighting
from direction that has already been occluded. Light can't reach the
point from that direction no matter how many times you sample. And in
case of classic SSAO the greater the number of samples past occluder
is, the greater the error is.

To
do ray matching we need to sample the direction a number of times by
extending the sampling vector with each step until it finds the
occluder or its length exceeds the radius.

**Figure 10.**

*SSAO IS vectors | 32 samples*

**Figure 11**.

*Ray matched SSAO | 16x3*

Trivial
raw Hemisphere based SSAO is greatly underoccluded due to the great
percentage of missed occluders and more noise in general:

**6. Code**

static
const float3 kernel32[32] =

{

float3(+0.000000,
+0.000000, +1.000000 ) * 0.695583f ,

float3(+0.693520,
+0.137950, +0.707107 ) * 0.706933f ,

float3(+0.461940,
+0.191342, +0.866025 ) * 0.753630f ,

float3(+0.720074,
+0.481138, +0.500000 ) * 0.400474f ,

float3(+0.250000,
+0.250000, +0.935414 ) * 0.849644f ,

float3(+0.439217,
+0.657334, +0.612372 ) * 0.183093f ,

float3(+0.234345,
+0.565758, +0.790569 ) * 0.477398f ,

float3(+0.182490,
+0.917441, +0.353553 ) * 0.465453f ,

float3(+0.000000,
+0.250000, +0.968246 ) * 0.396867f ,

float3(-0.146318,
+0.735589, +0.661438 ) * 0.666024f ,

float3(-0.213927,
+0.516464, +0.829156 ) * 0.621573f ,

float3(-0.500784,
+0.749477, +0.433013 ) * 0.162192f ,

float3(-0.306186,
+0.306186, +0.901388 ) * 0.725721f ,

float3(-0.689418,
+0.460655, +0.559017 ) * 0.092574f ,

float3(-0.611089,
+0.253121, +0.750000 ) * 0.109925f ,

float3(-0.949641,
+0.188895, +0.250000 ) * 0.356492f ,

float3(-0.176777,
+0.000000, +0.984251 ) * 0.219726f ,

float3(-0.714864,
-0.142195, +0.684653 ) * 0.135819f ,

float3(-0.489961,
-0.202949, +0.847791 ) * 0.974883f ,

float3(-0.734922,
-0.491059, +0.467707 ) * 0.095080f ,

float3(-0.279508,
-0.279508, +0.918559 ) * 0.595859f ,

float3(-0.450063,
-0.673567, +0.586302 ) * 0.742192f ,

float3(-0.243914,
-0.588860, +0.770552 ) * 0.637056f ,

float3(-0.185720,
-0.933680, +0.306186 ) * 0.950980f ,

float3(+0.000000,
-0.306186, +0.951972 ) * 0.900276f ,

float3(+0.150327,
-0.755746, +0.637378 ) * 0.035258f ,

float3(+0.224368,
-0.541672, +0.810093 ) * 0.883426f ,

float3(+0.510324,
-0.763754, +0.395285 ) * 0.550505f ,

float3(+0.330719,
-0.330719, +0.883884 ) * 0.050631f ,

float3(+0.704913,
-0.471008, +0.530330 ) * 0.852166f ,

float3(+0.632537,
-0.262005, +0.728869 ) * 0.406028f ,

float3(+0.965339,
-0.192018, +0.176777 ) * 0.085051f ,

};

static
const float3 kernel16[ 16 ] =

{

float3(+0.000000,
-0.433013, +0.901388 ) * 1.000000 ,

float3(+0.000000,
+0.353553, +0.935414 ) * 0.707107 ,

float3(-0.250000,
+0.000000, +0.968246 ) * 0.577350 ,

float3(+0.433013,
-0.000000, +0.901388 ) * 0.500000 ,

float3(+0.433013,
+0.433013, +0.790569 ) * 0.447214 ,

float3(-0.467707,
+0.467707, +0.750000 ) * 0.408248 ,

float3(-0.353553,
-0.353553, +0.866025 ) * 0.377964 ,

float3(+0.395285,
-0.395285, +0.829156 ) * 0.353553 ,

float3(+0.653281,
+0.270598, +0.707107 ) * 0.333333 ,

float3(-0.864210,
+0.357968, +0.353553 ) * 0.316228 ,

float3(-0.894543,
-0.370532, +0.350000 ) * 0.301511 ,

float3(+0.692910,
-0.287013, +0.661438 ) * 0.288675 ,

float3(+0.331414,
+0.800103, +0.500000 ) * 0.267350 ,

float3(-0.302538,
+0.730391, +0.612372 ) * 0.257261 ,

float3(-0.344946,
-0.832774, +0.433013 ) * 0.228199 ,

float3(+0.317304,
-0.766040, +0.559017 ) * 0.125374 ,

};

float4
IRSSAO(float2 uv)

{

//
world to view space

float3
ProjectionNormal = mul(ProjectionMatrix, WorldSpaceNormal,
0.0h)).xyz;

ProjectionNormal.y =
-ProjectionNormal.y;

float2
NoiseTexUV = 0.50f * screen_res.xy * uv * 0.333333f ;

float3
jitter = tex2D(NoiseSampler, NoiseTexUV) * 2.0f - 1.0f;

float
frame_depth = ReadGbufferZ(uv);

float
AOAccumulator = 0.0f;

float
InvDepth = 1.0f / frame_depth;

float
AspectRatio = screen_res.x/screen_res.y;

const
int TraceSteps = 3;

#if
0

float3
kernel[32] = kernel32;

const
int numSamples = 32;

#else

float3
kernel[16] = kernel16;

const
int numSamples = 16;

#endif

//
Gram-Schmidt process

float3
Tn = normalize(jitter - ProjectionNormal * dot(jitter,
ProjectionNormal));

float3
Bn = cross(ProjectionNormal, Tn);

float3x3
normalBasis = float3x3(Tn, Bn, ProjectionNormal);

UNROLL
for (int j=0; j<numSamples; j+=1)

{

float
TraceResult = 0.0f;

float
rad = 0.0f ;

float3
trace_sample = 0.0f;

float3
sample = mul(kernel[j+0], normalBasis) ;

sample.y *=
AspectRatio;

UNROLL
for ( int i = 0; i < TraceSteps; i+=1 )

{

rad +=
ao_radius_scale ;

trace_sample
= sample * rad;

float
sampleDepth = ReadGbufferZ(trace_sample.xy + uv) * InvDepth;

float
kernelDepth = trace_sample.z;

//
*2 needed since depth is 0 to 1 and SS coordinate Z is in -1 to 1
range

float
dd3 = (1.0f + 2.0f * kernelDepth - sampleDepth);

float
Occlusion = saturate(dd3*10000.0f);

float
RadReject = 1.0f - saturate((dd3-rad) * 1000.0f);

TraceResult =
lerp((Occlusion * RadReject), 1.0f, TraceResult);

}

AOAccumulator +=
TraceResult;

}

AOAccumulator
= AOAccumulator * ao_saturation / numSamples;

float
final_ao = 1.0f - AOAccumulator;

return
float4( pack_ssao(final_ao) , 0.0f, 0.0f, 0.0f );

}

**Bonus**

**Versus HBAO:**

**Figure 12**. HBAO

*| 8x4 | 0.910 ms Radeon 7970*

**Figure 13**. IRAO

*| 16x3 | 0.650 ms Radeon 7970*

**Geometry Normal:**

When
implementing the algorithm in an actual game you may find Gbuffer
normal unsatisfactory because of overocclusion in intence
normalmapping areas. It happens because the depth does not know about
your normals map and what they are trying to do unless you have
height displacement maps and a detailed tesselation.

What’s
more, using Geometry Normal turned out to be a little bit faster.

## No comments:

## Post a Comment