Sunday July 19 2020 • posted by james

July 2020 log exposure incident

At 14:54 UTC on Saturday, July 18th, 2020 we discovered that a small number of IRCCloud users were seeing logs from other users. This affected certain users on our “hathersage” and “stonehaven” connection servers between 12:53 UTC on Friday July 17th, and around 16:00 UTC on Saturday July 18th.

In order to resolve this issue, we have had to delete some logs for users during this period. All users affected by this issue will receive an email with more details of exactly which logs were affected.

This incident was triggered by a mis-configuration error on our part, and we’re very sorry for this failure to maintain your data integrity. We take seriously the trust placed in us to safeguard your data privacy and security, and we are committed to restoring that trust.

How this happened

This issue was related to our ongoing work to move our connection servers to a more decentralised architecture, which, as we’ve mentioned before, will allow us more flexibility on where we can host these servers.

Part of this work involves connection and buffer (channel/private message) metadata being stored in a separate, local database for each connection server, for performance and reliability reasons. This means that connection and buffer IDs are generated independently on each server, rather than by a single central database. These IDs are namespaced with the ID of the server they were generated on, and each server stores the offset of the IDs it’s currently issuing.

We had originally started to migrate users to hathersage on March 22nd, in the final step of a rolling upgrade of all of our connection servers. However, this migration was aborted when the server suffered a hardware failure on April 18th. As part of the rollback, we moved these users back to another server (stonehaven).

When we finally received replacement hardware for hathersage, we restarted the migration at 12:53 UTC on July 17th. This was the first time we had placed users back on a freshly-configured server which reused a server ID, and our server deployment procedure did not prompt us to configure the offset. The considerable amount of time which had elapsed since the previous migration, due to the hardware failure and other factors, also meant that this process was not fresh in our minds.

Because of this, hathersage started to issue connection and buffer IDs which duplicated those which already existed on stonehaven. These duplicated IDs meant that logs from multiple users’ buffers were being combined together in our log data store. When one user fetched a backlog, the logs from both buffers were returned.

The fix

The immediate fix involved correctly setting the offset, and this was completed at 15:45 UTC on Saturday 18th. At 16:07 UTC, we disabled backlog fetching for all users on hathersage and stonehaven to prevent incorrect logs from being displayed to users.

We then deleted all new buffers and connections created on hathersage during the 27 hour window since the start of the issue. This was completed at 16:28 UTC and backlog fetching was restored on hathersage.

Since the reused IDs had been in use on stonehaven for much longer, deleting all affected logs would have caused considerable data loss for those users. Instead, we began a more involved process of purging those logs of any data leaked from other users. This was completed at 22:03 UTC and backlog fetching was subsequently restored on stonehaven.

Summary of the impact to users

hathersage: Users who created new buffers and connections between 12:53 UTC on Friday July 17th, and around 16:00 UTC on Saturday July 18th may have seen logs from stonehaven accounts in those buffers. They may also have had those newly created logs exposed back to the stonehaven accounts. Those connections and buffers have now been deleted from accounts on hathersage.

stonehaven: Some users who were initially moved during the aborted migration earlier this year may have had some of their logs exposed to users now on hathersage. They may also have seen newly created logs from the affected users on hathersage in their own accounts.

All log entries that were inadvertently shared between users has now been deleted from log storage.

We will be emailing both sets of users with details on their affected logs.

How we’ll avoid this in future

We will require a manual confirmation step to set the offsets on a connection server before it can be brought into use.

We will implement additional checks when backlogs are fetched to ensure that stored logs cannot be leaked to the wrong users in the case of a buffer ID collision.

Thursday July 09 2020 • posted by russ

Report on the July 2020 extended outage

The IRCCloud service experienced an extended period of downtime between around 22:40 on 07/07/2020 and 19:10 on 08/07/2020 (UTC) due to a fault with our internet service provider. This was the second time such an outage has occurred this year, and we are as frustrated as you are with this unacceptable situation.

This outage affected seven of the eight servers we use to handle our outgoing IRC connections. As we explained in our report on the previous outage, these outbound connection servers have a fairly unique networking configuration which means they are hosted by a specialist ISP, which we cannot quickly migrate away from.

We have built sufficient redundancy to ensure that we can survive the loss of one or two of these eight servers, but we don’t yet have the capability to survive the loss of all of them.

Although we haven’t yet received a full explanation from our ISP, it appears that this problem was caused by some kind of networking failure which required a technician to visit the datacenter to resolve. For reasons which are unclear, it took more than 12 hours for this to be arranged.

As we mentioned in our previous report, we have been working on making changes to our system so we can move to a new ISP. These changes are almost complete and we will now accelerate our migration away from this ISP, which we hope to complete within the next few months.

If you’re an IRCCloud subscriber, we’re happy to issue you a month’s refund in compensation for this downtime - drop us an email at team@irccloud.com with the email address associated with your account.

Friday April 24 2020 • posted by james

Pinned Channels

Pinned channels are a new way to keep your channels and conversations organised.

You can pin any channel or private message to the top of your list from the menu (gear icon) or by pressing and holding in the mobile apps.

Pinned private messages won’t automatically archive so you can keep easy access to them.

Thursday January 16 2020 • posted by russ

Report on the January 2020 extended outage

Last night we experienced approximately 12 hours of downtime between around 18:00 and 06:40 UTC, caused by a prolonged period of internet routing issues which our ISP has attributed to a failed line card in one of their routers. This was our longest period of downtime in many years and we’re very sorry for the disruption it caused.

Running a large service which interfaces with the venerable IRC protocol poses a different set of challenges to most modern web services: Firstly, we have to manage a large number of outbound IRC connections while ensuring as few disconnections as possible. Secondly, IRC networks expect our users to connect from a consistent set of IP addresses, and lastly, IRCCloud is subject to a high volume of distributed denial of service (DDoS) attacks.

These constraints mean that our outbound connection servers, which actually make your outbound IRC connections, have been hosted for years by a specialist DDoS-resistant hosting service provided by a major ISP. This is a costly part of our infrastructure, and it wouldn’t be economical for us to completely duplicate these servers elsewhere to mitigate against rare situations like the one last night. Switching to another ISP - even if we could find one to provide the required servers at short notice - would involve a long process of getting new IP addresses whitelisted by IRC networks.

Our current architecture also restricts us to running our outbound connection servers in relatively close proximity to the rest of our infrastructure (which is hosted on Amazon Web Services). Over the last few months we’ve been working on a significant update of our backend software to remove this restriction - in fact, we started rolling this update out yesterday.

These improvements will make it easier for us to investigate other approaches for our outbound connection servers in future, and we’ll certainly be discussing network redundancy with our ISP and future providers.

If you’re an IRCCloud subscriber, we’re happy to issue you a month’s refund in compensation for this downtime - drop us an email at team@irccloud.com.

Tuesday January 22 2019 • posted by james

Bouncer: connect with other clients

Today we’re launching one of our most requested features. Paid subscribers can now use 3rd party IRC clients to connect to the IRCCloud service, just as you would with a traditional bouncer.

Connect with another client menu item

Open the menu for one of your IRC or Slack connections and choose the “Connect with another client…” option for details on how to connect.

For IRC connections, you’ll be prompted to generate a unique server password.

Backlog replay

Note: backlog replay isn’t currently available for Slack connections

Bouncer passwords are shown to you in the following format:

bnc:xxxxxxxx…

If you’d like the bouncer to replay missed messages whenever you reconnect with your client, you’ll need to change this format to include a clientid of your choosing.

This is used to identify and track the messages your client has seen to make sure we only replay undelivered messages.

The clientid can be anything, but can’t include spaces. Just make sure to use a different id for each client you use.

Once you’ve chosen a clientid, rewrite your password in the following format:

bnc@clientid:xxxxxxxx…

For example, if your generated password was bnc:abcxyz and you chose laptop as a clientid, you’d connect with the following server password:

bnc@laptop:abcxyz

Security

A bouncer password grants full access to the associated network connection, so make sure to keep it safe.

You can revoke or regenerate a bouncer password at any time, in case you no longer need it or it becomes compromised. This will also disconnect any client currently using that password.

Backlog timestamps

The latest versions of most 3rd party clients support the server-time IRCv3 feature, which the bouncer will use to provide the correct timestamp for backlog replay.

However, some clients may need a little coaxing: