Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GnuTLS Error When Fetching > 5 Opencast (4.x) Events #609

Open
spoonsauce opened this issue Jan 7, 2019 · 23 comments
Open

GnuTLS Error When Fetching > 5 Opencast (4.x) Events #609

spoonsauce opened this issue Jan 7, 2019 · 23 comments
Labels

Comments

@spoonsauce
Copy link

spoonsauce commented Jan 7, 2019

Galicaster 2.1.1 throws the following exception when trying to fetch more than 5 events in a single location :

username    2019-01-07 14:11:03,995 WARNING opencast/service        Problems to connect to opencast server : (56, 'GnuTLS recv error (-110): The TLS connection was non-properly terminated.')

This appears to be related to the process_ical() functions, but my Python chops aren't up to snuff to really troubleshoot this.

I can reproduce this across all of my 2.1.1 machines. 2.1.0 works, but that version is no longer available on the Xenial repository as far as I can tell? As soon as I remove events >= 6, Galicaster once again successfully fetches events.

I will try to find another way to roll-back in the meantime.

Correction: Even after rolling back I have the same problem, so this may be related to another Ubuntu package which was upgraded. Any ideas?

@Alfro
Copy link
Contributor

Alfro commented Jan 8, 2019

Hi @spoonsauce, which version of Opencast are you testing this against?
Are you testing this on a clean installation of Ubuntu?

@spoonsauce
Copy link
Author

Hi @Alfro,

Apologies, I forgot a few key details. We're scrambling to try and get this resolved as quickly as possible!

A bit about our Opencast setup:

We're running Opencast 4.5, but were previously running 4.4 with the same problem. Opencast has an NGINX reverse proxy in front of it for SSL.

https://opencast.example.com (NGINX) -> 127.0.0.1:8080 (Opencast)

Our Galicaster machines are a clean install of Ubuntu 16.04 with Blackmagic software and Galicaster. There are a few different iterations with slightly varying configs, but nothing major changes from one to the other.

@ppettit
Copy link
Collaborator

ppettit commented Jan 8, 2019

@spoonsauce All Galicaster is doing is calling curl and passing along the curl error, which seems to suggest a config error possibly with nginx?

does using curl directly work? make sure to try it from the galicaster machine.

this is the equivalent curl command line for the settings used by galicaster:
curl -v -k --digest -u opencast_system_account:CHANGE_ME -H "X-Requested-Auth: Digest" -H "X-Opencast-Matterhorn-Authorization: true" -H "If-None-Match: 12345" https:/opencast.example.com/recordings/calendars?agentid=test

obviously you'll need to substitute the user/pass, opencast address and agentid.

(@Alfro as an aside, disabling SSL cert checks seems like a bad idea?!)

@spoonsauce
Copy link
Author

spoonsauce commented Jan 8, 2019

@ppettit Thanks for the suggestion. It looks like a direct curl command works. Here are the results (with a ton of information obfuscated):

curl -v -k --digest -u opencast_system_account:XXXXXXXX -H "X-Requested-Auth: Digest" -H "X-Opencast-Matterhorn-Authorization: true" -H "If-None-Match: 12345" https://opencast.example.com/recordings/calendars?agentid=GC-XXXXXXXX
*   Trying XXX.XXX.XXX.XXX...
* TCP_NODELAY set
* Connected to capture.example.com (XXX.XXX.XXX.XXX) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=XX; ST=XXXXXXXX; L=XXXXXXXX; O=XXXXXXXX; CN=*.example.com
*  start date: Jan  4 00:00:00 2018 GMT
*  expire date: Mar  4 12:00:00 2020 GMT
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
*  SSL certificate verify ok.
* Server auth using Digest with user 'opencast_system_account'
> GET /recordings/calendars?agentid=GC-XXXXXXXX HTTP/1.1
> Host: opencast.example.com
> User-Agent: curl/7.61.1
> Accept: */*
> X-Requested-Auth: Digest
> X-Opencast-Matterhorn-Authorization: true
> If-None-Match: 12345
> 
< HTTP/1.1 401 An Authentication object was not found in the SecurityContext
< Server: nginx/1.12.2
< Date: Tue, 08 Jan 2019 14:44:20 GMT
< Content-Type: text/html; charset=ISO-8859-1
< Connection: keep-alive
< Set-Cookie: JSESSIONID=wkhqglxoegt41358edq238aeb;Path=/
< WWW-Authenticate: Digest realm="Opencast", qop="auth", nonce="XXXXXXXX"
< Cache-Control: must-revalidate,no-cache,no-store
< Transfer-Encoding: chunked
< 
* Ignoring the response-body
* Connection #0 to host opencast.example.com left intact
* Issue another request to this URL: 'https://opencast.example.com/recordings/calendars?agentid=GC-XXXXXXXX'
* Found bundle for host opencast.example.com: 0x55d6aec97880 [can pipeline]
* Re-using existing connection! (#0) with host opencast.example.com
* Connected to opencast.example.com (XXX.XXX.XXX.XXX) port 443 (#0)
* Server auth using Digest with user 'opencast_system_account'
> GET /recordings/calendars?agentid=GC-XXXXXXXX HTTP/1.1
> Host: opencast.example.com
> Authorization: Digest username="opencast_system_account", realm="Opencast", nonce="XXXXXXXX", uri="/recordings/calendars?agentid=GC-XXXXXXXX", cnonce="XXXXXXXX", nc=00000001, qop=auth, response="909f9344143c791ccb9bf24e5fa71328"
> User-Agent: curl/7.61.1
> Accept: */*
> X-Requested-Auth: Digest
> X-Opencast-Matterhorn-Authorization: true
> If-None-Match: 12345
> 
< HTTP/1.1 200 OK
< Server: nginx/1.12.2
< Date: Tue, 08 Jan 2019 14:44:20 GMT
< Content-Type: text/calendar;charset=UTF-8
< Connection: keep-alive
< Set-Cookie: JSESSIONID=mtvm62t73lypz80kmm0anm8m;Path=/
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< ETag: mod1545416541000
* no chunk, no close, no size. Assume close to signal end
< 
BEGIN:VCALENDAR
PRODID:Opencast Matterhorn Calendar File 0.5
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTSTAMP:20190108T144420Z
DTSTART:20190412T153000Z
DTEND:20190412T165900Z
SUMMARY:XXXXXXXX
UID:aab70733-5383-451e-bd59-46a64f56aaf3
LAST-MODIFIED:20190108T123319Z
DESCRIPTION:XXXXXXXX
LOCATION:GC-XXXXXXXX
RELATED-TO:8a08a4a2-03f5-4ad8-a5ac-eed429a6ec67
ATTACH;FMTTYPE=application/xml;VALUE=BINARY;ENCODING=BASE64;X-APPLE-FILEN
 AME=episode.xml:XXXXXXXX

ATTACH;FMTTYPE=application/xml;VALUE=BINARY;ENCODING=BASE64;X-APPLE-FILEN
 AME=series.xml:XXXXXXXX

ATTACH;FMTTYPE=application/text;VALUE=BINARY;ENCODING=BASE64;X-APPLE-FILE
 NAME=org.opencastproject.capture.agent.properties:XXXXXXXX

END:VEVENT

Events repeat several times...

END:VCALENDAR
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):

I've decoded a few of the BASE64 values and they are valid. If any particular bit of obfuscated information is required, please let me know.

@ppettit
Copy link
Collaborator

ppettit commented Jan 8, 2019

@spoonsauce very strange.

assuming that was done when the galicaster would have failed (>5 recordings scheduled) can you repeat when it would have succeeded (<5 recordings scheduled) and see if anything is different?

you do get the "closing connection" and "TLS alert" response which I don't with any server i've tried with (all my traces end with * Connection #0 to host XXXX left intact but i don't know enough about TLS to know what that means.

i'm not sure why the result would be different using the curl command directly rather than through pycurl. can't think of anything else obvious to try at the moment 😞

@spoonsauce
Copy link
Author

@ppettit Interesting results, but I'm not sure what to do with them.

I dumped the output of a curl call with 6 events and with 5 events. Other than the obvious changes, the TLS alert disappears at the end of the call with 5 events.

5 events:

* Connection #0 to host opencast.example.com left intact

6 events:

* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):

I'm not a TLS expert either, but I'll look into this. Nevertheless, would this prevent Galicaster from fetching the events list? The data itself is usable in both cases.

@spoonsauce
Copy link
Author

I realized I was testing on a machine with curl compiled with OpenSSL.

With GnuTLS I get the exact message Galicaster reports:

* GnuTLS recv error (-110): The TLS connection was non-properly terminated.
* Closing connection 0
curl: (56) GnuTLS recv error (-110): The TLS connection was non-properly terminated.

However, all the data is there and usable. There may definitely be something wrong with our NGINX setup, but I think Galicaster is assuming there is a problem with the above message is returned, when in fact it may be better to treat that as a warning than an show-stopper error?

@ppettit
Copy link
Collaborator

ppettit commented Jan 8, 2019

(sorry i had not seen your message above when i wrote this!)

i guess one thing we could (probably should at some point anyway) do would be to allow you to config the calendar cutoff and then you could send it to a fairly small value so you would only ever get less than 5 scheduled events returned (unfortunately you have to specify a timestamp as the cutoff so you can't just ask for the next 4 recordings). it is not really solving this problem though.

@paulgration
Copy link
Contributor

paulgration commented Jan 8, 2019 via email

@ppettit
Copy link
Collaborator

ppettit commented Jan 8, 2019

@pmgration yeah that could be the case especially as it only happens with longer responses!

@spoonsauce ah, that makes more sense! could still be an nginx config problem i guess, or a bug in curl/gnutls (or both).

I suppose it really depends where the error is thrown, if it gets far enough to be able to grab the data that has already been sent - hard to test without being able to reproduce the problem though.

could you try adding , exc_info=1 into line 239 of galicaster/opencast/service.py so it looks like this:

            self.logger.warning('Problems to connect to opencast server: {0}'.format(exc), exc_info=1)

that should add a stack trace to the galicaster log for that exception which might give some more clues.

the other thing would be to try switching to requests instead of pycurl and see if that doesn't have the same issue. that is a fair amount of work though.

@spoonsauce
Copy link
Author

spoonsauce commented Jan 8, 2019

@pmgration Seems very probable. I'm not sure how to what (if anything) to change in NGINX's but it gives me something to go on.

@ppettit Here's are the relevant log lines after the change:

username    2019-01-08 14:14:43,136 DEBUG   scheduler/heartbeat     galicaster-notify-short in 10
username    2019-01-08 14:14:43,137 INFO    opencast/service        Init opencast client
username    2019-01-08 14:14:43,222 INFO    opencast/service        Be careful using profiles and opencast scheduler
username    2019-01-08 14:14:43,223 INFO    opencast/service        Updating series from server
username    2019-01-08 14:14:43,356 INFO    opencast/service        Process ical
username    2019-01-08 14:14:43,570 WARNING opencast/service        Problems to connect to opencast server: (56, 'GnuTLS recv error (-110): The TLS connection was non-properly terminated.')
Traceback (most recent call last):
  File "/usr/share/galicaster/galicaster/opencast/service.py", line 237, in process_ical
    ical_data1 = self.client.ical()
  File "/usr/share/galicaster/galicaster/opencast/client.py", line 198, in ical
    icalendar = self.__call('GET', ICAL_ENDPOINT, query_params={'agentid': self.hostname}, headers={'If-None-Match': self.ical_etag})
  File "/usr/share/galicaster/galicaster/opencast/client.py", line 165, in __call
    raise RuntimeError, exc
RuntimeError: (56, 'GnuTLS recv error (-110): The TLS connection was non-properly terminated.')

Appreciate all the help on this so far!

@ppettit
Copy link
Collaborator

ppettit commented Jan 9, 2019

sorry for the delay i got a bit busy!

yeah that stack trace is pretty unhelpful unfortunately 😞

try replacing galicaster/opencast/client.py with this version https://gist.github.com/ppettit/603a91e17ca85576011b26f1aa8ae1dc

make sure you install python-requests

not sure if this will help or not but maybe worth a try. it will spew out loads of debug info on the console, and hopefully helpful stack traces in the logs.

@spoonsauce
Copy link
Author

No problem at all. I created a bit of a hacky work-around that limits the cutoff that Galicaster fetches from Opencast to a single day, so I am back up and running. Not ideal, but it works.

The client.py you provided does appear to work!

username    2019-01-09 11:13:59,478 INFO    opencast/service        Init opencast client
username    2019-01-09 11:13:59,584 INFO    opencast/service        Be careful using profiles and opencast scheduler
username    2019-01-09 11:13:59,584 INFO    opencast/service        Updating series from server
username    2019-01-09 11:13:59,616 INFO    opencast/service        Process ical
username    2019-01-09 11:13:59,858 INFO    opencast/client iCal modified
username    2019-01-09 11:13:59,886 INFO    utils/ical      Creating MP with UID bb8361fc-bd7a-4885-ba37-18387ae31d6c from ical
username    2019-01-09 11:13:59,890 INFO    utils/ical      Creating MP with UID 3235c3e7-6c8b-4ea3-afc4-f18fece8ba78 from ical
username    2019-01-09 11:13:59,893 INFO    utils/ical      Creating MP with UID fe3c58b9-431c-4ef2-b5b4-2d35bcf8a53c from ical
username    2019-01-09 11:13:59,896 INFO    utils/ical      Creating MP with UID 14983f4e-1458-4259-a990-ecaf7cc2449b from ical
username    2019-01-09 11:13:59,900 INFO    utils/ical      Creating MP with UID 4fe92aca-8680-411f-a5d2-9216d2be4833 from ical
username    2019-01-09 11:13:59,903 INFO    utils/ical      Creating MP with UID 8d12b255-394a-4841-9906-bd3b463372b7 from ical
username    2019-01-09 11:13:59,906 INFO    opencast/service        Setting CA configuration to server
username    2019-01-09 11:14:09,136 DEBUG   scheduler/heartbeat     galicaster-notify-short in 10
username    2019-01-09 11:14:09,137 INFO    opencast/service        Set status idle to server
username    2019-01-09 11:14:09,137 INFO    opencast/client Sending state idle

Any issues using this across our deployment as a temporary fix?

@smarquard
Copy link

Have you tried using an http rather than https URL for Opencast? That would help establish if the issue is in the TLS/SSL libraries.

@ppettit
Copy link
Collaborator

ppettit commented Jan 10, 2019

@smarquard it does look like it is a problem with the combination of the nginx config, and curl compiled with gnutls but not openssl.

@spoonsauce yay! Glad that worked!

As I said I did this quickly and didn't do much testing, so I would definitely test to make sure everything you need is working properly (particularly ingesting which required the most change).

You might want to delete the block starting import logging to stop the torrent of debug stuff on the console.

I will tidy it up and make a pull request but that might take me a while.

@spoonsauce
Copy link
Author

@ppettit I've been running several test recordings on a dev box with your modified client.py. So far only one error, I'm not sure if it's a fluke or related to the changes made:

username    2019-01-10 09:20:30,226 ERROR   core/worker     Failed Ingest for MP 052d242a-c8e6-400d-9b59-ece7e220c0c6. Exception: ('Connection broken: IncompleteR
ead(0 bytes read)', IncompleteRead(0 bytes read))

Recording files look fine and are viewable. Manually ingesting produces the same error.

Also, for the record, we're using an identical NGINX configuration to the one found here: https://docs.opencast.org/r/5.x/admin/configuration/https/nginx/

I've attempted to disabled chunked transfers but no matter what I try the headers return < Transfer-Encoding: chunked.

I will spin up a dev instance and test without HTTPS when I have a bit of time. I'd imagine it will work fine.

@ppettit
Copy link
Collaborator

ppettit commented Jan 10, 2019

@spoonsauce cool, sounds like it is mostly OK then.

So that recording that failed to ingest also fails when you try to re-ingest manually?

If so, is there anything obviously different about it? Is it possible to share the repository folder for it (at least the metadata files if the video is too big)?

@spoonsauce
Copy link
Author

@ppettit That's correct. There's nothing unusual about the recording that I can think of. I've also had a second one fail to ingest.

I'm happy to send over the files, though the videos may be too large. My preference would be to email them to you, but I can also just add them here and remove them once you've had a chance to download the files.

@ppettit
Copy link
Collaborator

ppettit commented Jan 10, 2019

@spoonsauce sure email is fine (check my profile) or share on something like google drive if you like

another interesting thing to try would be point you galicaster at a public opencast like https://stable.opencast.org and seeing if it ingests manually there

@spoonsauce
Copy link
Author

@ppettit Google Drive link sent to your inbox.

Can you share publishing credentials for stable.opencast.org? I tried admin:opencast got a 401.

I'd also like to try recompiling python-pycurl with OpenSSL instead of GnuTLS to see what that does with an un-patched client.py.

@ppettit
Copy link
Collaborator

ppettit commented Jan 10, 2019

@spoonsauce cool I'll take a look tomorrow.

The default digest credentials for opencast are:

opencast_system_account:CHANGE_ME

@ppettit
Copy link
Collaborator

ppettit commented Jan 11, 2019

@spoonsauce those recordings ingest ok to my test and production machines, with both gc 2.1.1 and my patched client.py so i can't see anything wrong with them.

definitely remove the debug stuff as I mentioned above - that slows things down considerably withig binary files.

we use haproxy rather than nginx though. it does still look look like there is something weird with your proxy, but I guess pucurl/OpenSSL and requests are somehow more tolerant than pycurl/GnuTLS.

@spoonsauce
Copy link
Author

spoonsauce commented Jan 11, 2019

@ppetit The strange part is our proxy worked fine for 4 straight months. This GnuTLS nonsense only began when I came back from holiday.

The only change was an upgrade to Galicaster 2.1.1, which is why I initially thought that was the issue. I know there were upgrades done to switches and hypervisors over the holidays, so that may or may not have something to do with it.

On the bright side, the recompiled python-pycurl + OpenSSL libraries fetches schedules correctly. I'm testing ingestion as I write this. Fingers crossed!

Edit: Confirmed that ingesting works with the recompiled python-pycurl libraries.

@Alfro Alfro added the bug label Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants