ReliableBunch溢出处理

什么是ReliableBunch溢出

当DS日志显示:SendBunch: Reliable partial bunch overflows reliable buffer!, 即当前未响应的ReliableBunch过多, 造成DS关闭了连接.

1
SendBunch: Reliable partial bunch overflows reliable buffer!

SendBunch时触发Overflow.

SendRPC时触发Overflow

溢出后怎么调试

溢出原因是Unack ReliableBunch过多造成的, 那么, 我怎么才能知道当前有哪些ReliableBunch没有ACK呢? UE本身已经有相关功能: 当溢出时, 打印出所有ReliableBunch信息. 如果知道当前有哪些ReliableBunch, 那么就可以通过查看逻辑, 来排查为什么会突然发送过多的ReliableBunch, 再进一步处理.

进一步分析发现, 其打印的是FOutBunch.DebugString.

开启命令

开启命令:net.Reliable.Debug后, 在非UE_BUILD_SHIPPING || UE_BUILD_TEST下会记录ReliableBunch Debug信息.

1
2
3
4
5
6
7
8
TAutoConsoleVariable<int32> CVarNetReliableDebug(
TEXT("net.Reliable.Debug"),
0,
TEXT("Print all reliable bunches sent over the network\n")
TEXT(" 0: no print.\n")
TEXT(" 1: Print bunches as they are sent.\n")
TEXT(" 2: Print reliable bunch buffer each net update"),
ECVF_Default);

  • net.Reliable.Debug 0: 默认, 不记录, 也不输出任何信息.
  • net.Reliable.Debug 1: 仅仅输出当前ReliableBunch信息, 并且在Overflow时候, 会输出当前具体有哪些ReliableBunch没有ACK.
  • net.Reliable.Debug 2: 每次发送ReliableBunch时, 都会输出当前ActorChannel没有ACK的ReliableBunch信息. 并且在Overflow时候, 会输出当前具体有哪些ReliableBunch没有ACK.
  • net.Reliable.Debug 3: 仅仅在Overflow时候, 输出当前具体有哪些ReliableBunch没有ACK.(任何大于2的数值都可以, 这里选择3)

填充FOutBunch.DebugString

发送RPC时填充

UNetDriver.ProcessRemoteFunctionForChannelPrivate

ReplicateActor时填充

UActorChannel.ReplicateActor

分包时候填充

UChannel.SendBunch. 注意: 一个分Bunch是1M的.

1
2
3
4
5
6
7
日志:
LogNetTraffic: Warning: Out: Partial[1]: 2.28 RPC: ...
LogNetTraffic: Warning: Out: Partial[2]: 2.28 RPC: ...
LogNetTraffic: Warning: Out: Partial[3]: 2.28 RPC: ...
LogNetTraffic: Warning: Out: Partial[4]: 2.28 RPC: ...
LogNetTraffic: Warning: Out: Partial[5]: 2.28 RPC: ...
LogNetTraffic: Warning: Out: Partial[6]: 2.28 RPC: ...

溢出时, 输出信息

当Unack ReliableBunch超过RELIABLE_BUFFER(256)时, 会将Connection关闭.

SendBunch时溢出

如果一个Bunch含有GUID, 当发送Bunch时候, 会生成GUIDBunch+Bunch, 如果此时超过256, 会在UChannel::SendBunch中CloseConnection.

1
2
UE_LOG(LogNetPartialBunch, Warning, TEXT("SendBunch: Reliable partial bunch overflows reliable buffer! %s"), *Describe() );
UE_LOG(LogNetPartialBunch, Warning, TEXT("Num OutgoingBunches: %d. NumOutRec: %d"), OutgoingBunches.Num(), NumOutRec );

发送RPC时溢出

当发送RPC时候, 也有可能触发Overflow. 会在UNetDriver::ProcessRemoteFunctionForChannelPrivate函数中CloseConnection.

Bunch Error的原因是在FOutBunch构造函数中判断当前ActorChannel的NumOutRec是否overflow, :

溢出原因

从代码上看, 是因为UChannel.NumOutRec过多了, 即没有ACKReliableBunch累计到一定程度了, 很严重, 需要关闭Connection. 任意一个ActionChannel累计的Unack ReliableBunch过多都会造成整个Connection的关闭. 

重连还会溢出吗

由于溢出后, Connection已经关闭(销毁了), 重连后会新建Connection, 之前Connection的数据全部丢失, 所以, 重连后会不会溢出全看重连逻辑.

合并ReliableBunch

是不是每个ReliableBunch都独占一个NumOutRec呢? 即每发送一个ReliableBunch, 都会将NumOutRec加1?

答: 不是的. 如果可以合并, 会优先合并Bunch, 佐证代码:

是否可以合并

判断ReliableBunch是否可以合并是多方面的.

函数允许Merge

调用该函数, 必须允许Merge:

具有ExportGUID的Bunch,不能合并

如果一个Bunch需要ExportGUID, 会在该Bunch之前放一个ExportBunch.

含有FOutBunch.bHasMustBeMappedGUIDs不能合并

MustBeMappedGUID用于加载资源.

ChIndex+Reliable

ChannelIndex必须相同, 必须是Relaible的. 即必须是同一个ActorChannel的ReliableBunch才能合并

Connection开启AllowMerge

其他条件

  1. 合并前提必须是之前就存数据, 否则没有合并目标.

  2. 记录的待发送的SendBuffer必须和之前一致. 必须是连续的ReliableBunch, 如果中间插入UnreliableBunch则不能合并.

  1. 有可合并的空间, 即合并之后不能超过bunch的最大限度.

总结

可以合并的条件:
  1. 函数输入开启Merge
  2. Connection允许Merge
  3. 同一个ActorChannel, ReliableBunch必须是连续的, 中间不能插入任何其他Bunch
  4. 之前必须有Bunch, 才能将当前Bunch合并到之前的Bunch.
  5. 合并后大小必须小于单个Bunch上限(如果超了还得分Bunch, 就没必要合并了)

旧Bunch怎么和新Bunch合并

使用UNetConnection.LastOut存储之前的Bunch, 如果条件允许, 将当前Bunch与之前bunch合并, 并替代之前的bunch. 详见:UNetConnection.LastOut

ReliableBunch链表怎么处理

使用UNetConnection.LastOutBunch代表ReliableBunch链表中最后一个Bunch, 如果可以合并, 将UNetConnection.LastOutBunch赋值为此Bunch和新Bunch合并后的bunch. 详见: UNetConnection.LastOutBunch.

SendBuffer中数据怎么处理

使用UNetConnection.LastStart记录可合并Bunch之前的SendBuffer数据, 每当有可合并Bunch到来时, 可以根据UNetConnection.LastStart把SendBuffer中之前可合并的Bunch剔除掉. 详见: UNetConnection.LastStart.

相关变量

UNetConnection.LastEnd

记录之前的UNetConnection.SendBuffer. 用于和当前的SendBuffer比较, 如果一致, 则表示可以合并, 不一致则不能合并. 用该变量限制ReliableBunch必须是连续的, 比如中间插入了一个UnreliableBunch则不能合并.

1
FBitWriterMark  LastEnd; // Most recently sent bunch end.

UChannel.SendBunch时, UNetConnection.LastEnd会设置成UNetConnection.SendBuffer.

UNetConnection::FlushNet时, 会将UNetConnection.LastEnd置空, 意思是, 如果真的发送了, lastEnd就是空的.

但是为什么还要和UNetConnection.SendBuffer比较呢? 能合并的条件必须是上次的SendBuffer还没发送出去, 可以将当前的Bunch合并到这个SendBuffer里面.

UNetConnection.LastOut

可以把它看成已经存储到SendBuffer中, 但是还没有发送出去的Bunch, 这个bunch可能是单个Bunch, 也可能是合并后的Bunch, 并且如果条件允许, 还可以和当前Bunch合并成新的Bunch. 将新Bunch merge到UNetConnection.LastOut中, 然后UNetConnection.LastOut又指向Merge后的bunch.

1
FOutBunch    LastOut;

UNetConnection.LastStart

最近发送的Bunch, 用它记录之前的SendBuffer. 每当一个可与之前Bunch合并的Bunch到来时, 将SendBuffer中之前Bunch的数据清除掉, 然后发送之前Bunch和新Bunch合并的Bunch. 这个行为是可重复的.

1
FBitWriterMark  LastStart;    // Most recently sent bunch start.

记录之前SendBuffer数据.

将SendBuffer恢复到发送上一个Bunch之前的样子, 即在SendBuffer中剔除上一个Bunch. 因为新的Bunch已经和上一个Bunch合并成最新的Bunch, 一起发送.

合并之后, 将之前的Bunch从SendBuffer中剔除.

UNetConnection.LastOutBunch

UE中每个ActorChannel都有一个ReliableBunch链表, 里面存储着待Ack的ReliableBunch, 以便丢包后重发. 针对可合并的bunch, 需要记录链表中最后一个bunch, 如果有一个可以合并的Bunch来到, 可以将其合并到链表的最后一个Bunch中. 具体逻辑就是:

1
2
// Most recent outgoing bunch.
FOutBunch* LastOutBunch;

总结

由于远端没有及时响应, 导致ReliableBunch积累过多, 当超过256时, UE会关闭连接. UE为了减少ReliableBunch个数, 默认增加了合并机制. 当同一个ActorChannel处理的ReliableBunch可合并时(ReliableBunch必须是连续的), 进行合并. 合并后作为一个Bunch进行发送.

测试用例

构建重现Overflow的测试代码:

两个Actor交替发送RPC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
FVector Loc = FVector::ZeroVector;
FRotator Rot = FRotator::ZeroRotator;
FActorSpawnParameters Par;
Par.SpawnCollisionHandlingOverride = ESpawnActorCollisionHandlingMethod::AlwaysSpawn;
UClass* pClass = LoadObject<UClass>(this, TEXT("Blueprint'/Game/Test/TestRepFlow/BPTestRepFlowSpawnedActor.BPTestRepFlowSpawnedActor_C'"));
TestRepFlowSpawnedActor = GetWorld()->SpawnActor<AActor>(pClass, Loc, Rot, Par);
TestRepFlowSpawnedActor->SetAutonomousProxy(true);
TestRepFlowSpawnedActor->SetOwner(this);

TestRepFlowSpawnedActor12 = GetWorld()->SpawnActor<AActor>(pClass, Loc, Rot, Par);
TestRepFlowSpawnedActor12->SetAutonomousProxy(true);
TestRepFlowSpawnedActor12->SetOwner(this);

if (ATestRepFlowSpawnedActor *pTestRepFlowSpawnedActor = Cast<ATestRepFlowSpawnedActor>(TestRepFlowSpawnedActor))
{
ATestRepFlowSpawnedActor* pTestRepFlowSpawnedActor12 = Cast<ATestRepFlowSpawnedActor>(TestRepFlowSpawnedActor12);
pTestRepFlowSpawnedActor->mTestNumber = 998;
pTestRepFlowSpawnedActor->TestRepFlowSpawnedCmp = NewObject<UTestRepFlowSpawnedCmp>(pTestRepFlowSpawnedActor, UTestRepFlowSpawnedCmp::StaticClass());
pTestRepFlowSpawnedActor->TestRepFlowSpawnedCmp->RegisterComponent();
pTestRepFlowSpawnedActor->TestRepFlowSpawnedCmp->TestNumber = 999;

for (int32 i = 0; i < 300; ++i)
{
// for test overflow
pTestRepFlowSpawnedActor->S2C_TestRealiableFunc();
//pTestRepFlowSpawnedActor->S2C_TestRealiableFunc();
pTestRepFlowSpawnedActor12->S2C_TestRealiableFunc();

//pTestRepFlowSpawnedActor->ServerMulti_TestReliableFunc();
//pTestRepFlowSpawnedActor->ServerMulti_TestUnreliableFunc();
}
pTestRepFlowSpawnedActor->S2C_TestRealiableFunc();
DebugLogALSV("ATestRepFlowActor::Tick Spawn TestRepFlowSpawnedCmp FrameIndex[%lu]", GFrameNumber);
}

执行堆栈后, 会在函数UNetDriver.ProcessRemoteFunctionForChannelPrivate中CloseConnection:

BunchError的原因:ActorChannel内ReliableBunch Overflow了

一个Actor交替发送ReliableRPC和UnreliableRPC(不行)

这种方法不会造成ReliableBunch Overflow.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
FVector Loc = FVector::ZeroVector;
FRotator Rot = FRotator::ZeroRotator;
FActorSpawnParameters Par;
Par.SpawnCollisionHandlingOverride = ESpawnActorCollisionHandlingMethod::AlwaysSpawn;
UClass* pClass = LoadObject<UClass>(this, TEXT("Blueprint'/Game/Test/TestRepFlow/BPTestRepFlowSpawnedActor.BPTestRepFlowSpawnedActor_C'"));
TestRepFlowSpawnedActor = GetWorld()->SpawnActor<AActor>(pClass, Loc, Rot, Par);
TestRepFlowSpawnedActor->SetAutonomousProxy(true);
TestRepFlowSpawnedActor->SetOwner(this);

TestRepFlowSpawnedActor12 = GetWorld()->SpawnActor<AActor>(pClass, Loc, Rot, Par);
TestRepFlowSpawnedActor12->SetAutonomousProxy(true);
TestRepFlowSpawnedActor12->SetOwner(this);

if (ATestRepFlowSpawnedActor *pTestRepFlowSpawnedActor = Cast<ATestRepFlowSpawnedActor>(TestRepFlowSpawnedActor))
{
ATestRepFlowSpawnedActor* pTestRepFlowSpawnedActor12 = Cast<ATestRepFlowSpawnedActor>(TestRepFlowSpawnedActor12);
pTestRepFlowSpawnedActor->mTestNumber = 998;
pTestRepFlowSpawnedActor->TestRepFlowSpawnedCmp = NewObject<UTestRepFlowSpawnedCmp>(pTestRepFlowSpawnedActor, UTestRepFlowSpawnedCmp::StaticClass());
pTestRepFlowSpawnedActor->TestRepFlowSpawnedCmp->RegisterComponent();
pTestRepFlowSpawnedActor->TestRepFlowSpawnedCmp->TestNumber = 999;

for (int32 i = 0; i < 300; ++i)
{
// for test overflow
pTestRepFlowSpawnedActor->S2C_TestRealiableFunc();
//pTestRepFlowSpawnedActor->S2C_TestRealiableFunc();
pTestRepFlowSpawnedActor12->S2C_TestRealiableFunc();

//pTestRepFlowSpawnedActor->ServerMulti_TestReliableFunc();
//pTestRepFlowSpawnedActor->ServerMulti_TestUnreliableFunc();
}
pTestRepFlowSpawnedActor->S2C_TestRealiableFunc();
DebugLogALSV("ATestRepFlowActor::Tick Spawn TestRepFlowSpawnedCmp FrameIndex[%lu]", GFrameNumber);
}

原因: 在发送UnreliableRPC时候, 直接丢弃了. 丢弃的原因是流量超发了, DS端一次性发这么多对方接受不了.

1
2
3
4
5
6
7
8
9
10
11
12
// Bunch Overflow, 不能发送unreliable RPC堆栈
--ABasePlayerController.Tick
|--AActor.ProcessEvent.if.if
| |--UObject.ProcessEvent.if
| | |--AActor.CallRemoteFunction.if.for.if
| | | |--UNetDriver.ProcessRemoteFunction.if
| | | | |--UReplicationGraph.ProcessRemoteFunction.if
| | | | | |--// If we're saturated and it's not a reliable multicast, drop it.
| | | | | |--if (!(bIsReliable || IsConnectionReady(Connection)))
| | | | | |--{
| | | | | |-- return true;
| | | | | |--}

UNetConnection.QueuedBits

为了更好理解UReplicationGraph::IsConnectionReady函数, 这里单门说一下UNetConnection.QueuedBits. 它是为了限制带宽用的. 即不能超量发送, 要适应接受方的带宽.

1
2
3
4
5
6
7
8
9
bool UReplicationGraph::IsConnectionReady(UNetConnection* Connection)
{
if (CVar_RepGraph_DisableBandwithLimit)
{
return true;
}

return Connection->QueuedBits + Connection->SendBuffer.GetNumBits() <= 0;
}

初始化

初始化时为0.

更改

UNetConnection.FlushNet中, 发送SendBuffer之后, 会将当前Packet的bit数, 累加到UNetConnection.QueuedBits中.

image-20231128204057394

并且在每次Tick时候, 根据网络Tick频率和当前网络速度, 预计后续能发送的数据大小.

总结

一个Actor交替发送ReliableRPC和UnreliableRPC, 不能触发Overflow, 在发送Unreliable RPC的时候, 由于远端带宽限制, DS检测到IsConnectionReady为false, 即流量超发了, 远端无法收到. 进而直接丢弃UnreliableRPC. 导致看似ReliableRPC与UnreliableRPC交替发送, 其实只发送了ReliableRPC. 而ReliableBunch如果满足条件会合并的, 恰好测试用例满足了合并的条件, 即合并了. 所以, 没有触发Overflow