Skip to content

Handle ZMQError "Address already in use"#30

Draft
pnuu wants to merge 8 commits into
pytroll:mainfrom
pnuu:bugfix-zmqerror
Draft

Handle ZMQError "Address already in use"#30
pnuu wants to merge 8 commits into
pytroll:mainfrom
pnuu:bugfix-zmqerror

Conversation

@pnuu

@pnuu pnuu commented Oct 29, 2020

Copy link
Copy Markdown
Member

This PR fixes an occasional zmq.error.ZMQError: Address already in use I get in Trollflow2. This happens when a new publisher is created with the same port as the one closed immediately before and OS hasn't had time to free the port.

Minimal code that shows the error with current master branch:

from posttroll.publisher import Publish
while True:
    with Publish("gdal_warper", port=50000) as pub:
        pass

@pnuu pnuu added the bug label Oct 29, 2020
@pnuu pnuu requested a review from mraspaud October 29, 2020 07:42
@pnuu pnuu self-assigned this Oct 29, 2020
@coveralls

coveralls commented Oct 29, 2020

Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.2%) to 80.364% when pulling 254c58e on pnuu:bugfix-zmqerror into 67b8f3d on pytroll:master.

@pnuu

pnuu commented Oct 29, 2020

Copy link
Copy Markdown
Member Author

I'm ignoring CodeFactor and Stickler complaints, they are something that should be handled in a separate test refactoring PR.

@mraspaud

mraspaud commented Nov 2, 2020

Copy link
Copy Markdown
Member

How about syncing with the linger timeout? Because I think this is the reason the bind fails.

@pnuu

pnuu commented Nov 2, 2020

Copy link
Copy Markdown
Member Author

What does "linger timeout" mean?

@mraspaud

mraspaud commented Nov 2, 2020

Copy link
Copy Markdown
Member

We set the linger timeout here: https://github.com/pytroll/posttroll/blob/master/posttroll/publisher.py#L143

Basically that's how much time we give the socket to shut off all connections cleanly, and it's set here to one second. And I suspect that's why you get the error in the first place, since the previous connection doesn't have time to shut down cleanly before you create a new one one the same port.

@pnuu

pnuu commented Nov 2, 2020

Copy link
Copy Markdown
Member Author

This page says the timeout is in milliseconds: http://api.zeromq.org/2-1:zmq-setsockopt

At the moment I'm still having occasional problems with 2 second (10 x 0.2 seconds) wait.

@mraspaud

mraspaud commented Nov 2, 2020

Copy link
Copy Markdown
Member

oh, ok. Then I suppose TIME_WAIT is the culprit. Not too much we can do about that.

@mraspaud mraspaud left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. Some inline comments.

Comment thread posttroll/tests/test_pubsub.py Outdated
Comment thread posttroll/tests/test_pubsub.py Outdated
Comment thread posttroll/tests/test_pubsub.py
Comment thread posttroll/publisher.py Outdated
@pnuu

pnuu commented Nov 6, 2020

Copy link
Copy Markdown
Member Author

I started to make the requested changes and add tests, but have come to the conclusion that this won't really solve the problem I've been having. So, instead of defining the port, I now let the Trollflow2 publisher to have any free port from a given range by defining POSTTROLL_PUB_MIN_PORT and POSTTROLL_PUB_MAX_PORT environment variables.

I can still finish this PR if it is seen to be usable, IIRC I have only one of the suggestions unfinished. @mraspaud?

@pnuu

pnuu commented Sep 1, 2021

Copy link
Copy Markdown
Member Author

Pushed the changes in any case, but will convert to a Draft PR.

@pnuu pnuu marked this pull request as draft September 1, 2021 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants