Tuesday, July 22, 2008

Activity Log : GSoC Week #13 (14.VII - 20.VII)

Last week I focused on debugging the no-registrar patch. I won't reproduce the entire mail I've sent on the main dev list last Wednesday, because it's quite long and maybe a bit over-detailed. I'll summarize it in fewer lines and add some further observations.

Let's take this as a use/test case:

We have two SC peers.
Peer A has three accounts created: one registrar mode account R1 and two no-registrar mode accounts: NR1 and NR3.
Peer B has two accounts created: one registrar mode account R2 and one no-registrar mode account NR2.

The registrar server is shut down.
Peer A has NR1 logged in.
Peer B has NR2 logged in.

Peer B (NR2) calls Peer A (NR1) - the INVITE is sent on NR1@w.x.y.z where w.x.y.z is Peer A's IP address.

The problem: The INVITE request is processed by the SIP stack instantiated by the R1 - not NR1 - account's Protocol Service Provider (which is a registrar-mode type). The stack is also the one which continues the call handling, which makes the wrong ProtocolProviderService to process any further requests.

What I can tell for sure after the debugging done last week: The reason why R1's ProtocolProviderService's SIP stack is the one which handles the INVITE request, is the fact that in this specific test case the R1 account is the one that's loaded first (which also means that the corresponding Protocol Provider Service is the first one from the three registered in the bundle context). The account loading order might differ. After deleting and re-creating all the accounts, the first account loaded was NR3 for example and the SIP stack/ProtocolProviderService handling the INVITE for NR1 were the one associated with NR3.

I tried also other configurations; all lead to the same conclusion - the first loaded account's ProtocolProviderService registered is the one which handles the requests when operating in no-registrar mode - it doesn't matter if the first loaded account it's registrar mode or not, if it's logged in or not or if the request is addressed to a totally different target.

I couldn't find why this happens. I managed to trace the code back into the JAIN SIP RI sources, up to inside the SipTransactionStack class but this doesn't help me. The stack is of course the one corresponding to the first loaded account. What I need to know is the point where the code gets there - how the stack/ProtocolProviderService is selected to handle the request. The getProviderForAccount and getRegisteredAccounts inside the ProtocolProviderFactorySipImpl don't seem to have anything to do with this, being called only prior to the call initiation, in the account loading phase, and I didn't find any bug there. Some more details and also about another test case also can be found inside the mail I've sent on the dev mailing list.

After a lot of debugging done and no success, I decided to check out the revision which was the initial subject for the original patch by Michael Koch - revision 3252, to see what makes the difference. I applied the original patch, activated the no registrar Protocol Provider Service instantiation, and unfotunately there isn't any difference - the same problem is present.

This happens however when the registrar is off. When the registrar is on the calls routed trough it, addressed to registrar mode accounts, reach their target ProtocolServiceProvider successfully.

Anyway, even if I couldn't find out what causes the problem, I made some "fixes" to the initial patch submitted:
  • added the OperationSetTypingNotifications to the others in the registrar mode ProtocolProviderService
  • cut out the REGISTERS_USE_ROUTE property from the AbstractProtocolProviderServiceSipImpl (it was a duplicate - also in the registrar mode provider service)
  • restricted the entire keep alive related functionality only to registrar mode ProtocolProviderService instantiations (I'm not very sure about this but at least the using of REGISTER for keep alive in non-registrar mode doesn't seem to make sense; though I'm not entirely sure about OPTIONS request - I'm not very experienced with SIP; anyway, it was a newer functionality added since the initial patch and I've done the modification also to exclude a potential source for the bug .. but it wasn't)
  • added an if branch in the new sayBye method from the OperationsSetBasicTelephonySipImpl for specific cases of registrar or no-registrar mode provider service (again I'm not entirely certain about if that is correct or not)

These, together with other older and more minor modifications can be found in the patch at this address (done this time against the main trunk revision):

http://students.info.uaic.ro/~eonica/sc/current-rev-no-registrar.patch

Note that this patch allows ProtocolProviderService instantiation for registrar and no-registrar mode also. The older patches had the registrar mode option blocked allowing only no-registrar. Even if the mentioned problem was present also at that time, the call because being handled through a no-registrar ProtocolProviderService as it was intended went on fine.

I'm leaving the no-registrar patch for now, because I've already spent a week with the debugging on it and I didn't manage to solve the main bug. I'll post the main things discussed
here on the dev list, maybe someone experienced with SIP and the JAIN stack can give me a hint eventually, and go back until then to my main GSoC project theme.

I've merged the encryption branch with the latest main trunk revision and I solved the conflicts, but I didn't committed it yet. I'll probably do it soon but first I'll report on the main dev list, a minor bug I found. It's about the new Hold functionality.

The bug manifests the next way: When I put a call on hold, after I release it in aproximately 20 seconds the call ends abruptly without any error message. The logger displays the message from the callStateChangedfunction inside the CallSessionImpl class : "Stopping streaming." like a CALL_ENDED event was received. I didn't debug the code further because I tested this also in no-registrar mode and it seems it doesn't happen, so it might be related with the type of registrar used. However I didn't commit the merge to the encryption branch yet, as I said; I wonder if putting the call on Hold doesn't affect somehow the ZRTP integration at some level and I must take a careful look at it (at a first sight the ZRTP exchange shouldn't be affected at least, but the first thought when encountering the bug above was that is related with the RTPConnector usage for handling the media flow in the ZRTP integration branch; anyway it also occurs between two peers built from main trunk when using my registrar so it seems it wasn't that).

For now you can find the merge patch at this address:

http://students.info.uaic.ro/~eonica/sc/last-rev-merge-patch.patch

That's pretty much all for the moment; I'll probably start thinking on the suggested JUnit testing this week.

No comments: