I ran into a bug a couple weeks ago at work that was quite interesting. We’re using a DispatchGroup and DispatchGroup.notify
to stop our pull to refresh animation once all the microservices come back with results.
Our QA team caught an issue where if you to a pull to refresh and then background the app it crashes. Of course you can’t simulate the error, it has to be on a physical device. Also, while using a debug profile it doesn’t crash, just hangs forever.
It took me a few days of experimenting and debugging but I finally figured out that the HTTP calls were being unceremoniously killed by the OS.
Our code looks something like this:
let group = DispatchGroup()
group.enter()
service1.fetch() {
group.leave()
}
...
serviceN.fetch() { // service goes off and does async things
group.leave()
}
group.notify(queue: DispatchQueue.main) {
pullToRefreshCallback()
}
In debug mode when I paused the execution I could see my app was stuck down in pthreads land and that’s when I figured it out. group.leave()
for the failing service(s) was never called because the call was ended at the OS level. There was zero feedback that I could find to know that this happened.
I also couldn’t find a method that would allow the group.notify()
to timeout. I also couldn’t find anything on UrlSession that would give me the proper feedback that something happened to my request.
So I turned to Combine:
subscription = publisher
.timeout(.milliseconds(1_000), scheduler: DispatchQueue.global(qos: .userInitiated), options: nil, customError: { .timeout }) // 1
.receive(on: DispatchQueue.main) // 2
.collect(numberOfNetworkCalls) // 3
.sink(
receiveCompletion: { [weak self] error in
completionCallback(error) // 4
subscription?.cancel() // 5
},
receiveValue: { [weak self] services in
successCallback() // 6
subscription?.cancel()
})
makeNetworkCall1. { _ in
publisher.send(.networkCall1)
}
...
- This timeout is key - when the app is backgrounded and comes back this calls the completion handler with the custom error
.timeout
otherwise we wouldn’t know it failed. - I found out the hard way that the order of
.timeout()
and.receive(on:)
is important, if.timeout()
is second the publisher will use the scheduler specified there instead of.receive(on:)
. .collect()
will wait until results of all our network calls are returned before declaring success- Handle the completion - whether a timeout or a normal completion
- There’s a slim chance that the timeout and success completion handlers can happen at the same time - explicitly ending the subscription helped with this
- If we hit
receiveValue
then we gathered all the network calls and we can handle the network responses however we want - Now that the subscription is setup, make the network calls
I feel like this wasn’t the best solution - anyone have any better ideas?
Open Questions:
- Why do the network calls not finish in the background? Even if I turn on background network at the app level it gets killed.
- Is there any other way to get the feedback that the request was killed from UrlSession? The callbacks are never executed and nothing I can find on the UrlSession delegates was helpful.
- Is there a way to shortcut a
group.notify()
after a certain amount of time? I tried this by starting a delayed async thread and sendinggroup.leave()
a bunch but it got angry about that too and threw aSIGTRAP
at me.