osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: socket: Too many open files


>On 13Oct2018 14:10, Shakti Kumar <shakti.shrivastava13 at gmail.com> wrote:
>>I?m running a script which basically does a traceroute to the list of
>>hosts
>>provided, and then pulls up some info by logging in to gateways in the
path.
>>I am running this script for a list of almost 40k hosts in our data
centers.
>>Also, I am using commands module to get the traceroute output.
>>
>>out = commands.getstatusoutput('traceroute ' + ip)
>>
>>However I observe that this particular line is failing with socket error
>>after I reach some 5k to 6k hosts.
>>I know commands module is using pipes to execute the given command and
this
>>is one reason for exhaustion of file descriptors.
>>Any suggestions for improving this and getting a workaround?

>I'd figure out where your file descriptors are going.

>Is traceroute leaving sockets littering your system? If you're on Linux
>this command:

>  netstat -anp

>will show you all the sockets, their state, and the pids of the
>processes which own them. Does your script cause sockets to accrue after
>the traceroutes?

>If you write a trivial shell script to do the traceroutes:

>while read ip
>do
>    traceroute $ip
>  done <file-with-ips-in-it.txt

>does it also exhibit the problem?

>The if doesn't, then traceroute may not be the problem and something
>else is leaking file descriptors.

>In fact, given that it is file descriptors, maybe sockets are not what
>is leaking?

>From another terminal, see what your Python programme has open when this
>happens with "lsof -n -p pid-of-python-programme". Maybe the leaks are
>pipes, or connections from your "logging in to gateways in the path"
>code. It may be as simple as you not closing files or connections.

>Cheers,
>Cameron Simpson <cs at cskk.id.au>

Thanks Cameron, I still cant get over the fact that you were able to pin
point the issue even without looking at my code XD
Indeed when I started looking for sockets, I realised I was not closing
connections to ACI hosts in our datacentres, and the error was starting to
pop up at around 2900 to 3k TCP connections open to the hosts. XD

Thanks Jonathan for pointing me to more apt functions. It helped a lot :)

Regards,
Shakti.