[Resolved] AWS Restored: Impaired functionality between Canvas and Amazon continues

All services appear to be resolved.

Monitoring Feb 28, 15:59 MST

Amazon has verified that uploads to their service should be working again; users should be seeing improved performance with their uploads to Canvas. Our DevOps team is continuing to monitor the situation, but we are not currently aware of any lingering issues that affect Canvas functionality at this time.

Update Feb 28, 14:37 MST

In our previous update, we mentioned there would still be areas of impaired functionality between Canvas and Amazon. The biggest area of impact right now is that uploads are not yet working. This includes student uploads to assignments, instructor grade uploads, and similar functions, but also the ability for Canvas’ background processes to upload files such as admin reports (which is required as part of the process to generate a report at the account level). You may continue to see issues with this, and other areas in Canvas, as Amazon works to fully restore all services.

Update Feb 28, 14:15 MST

Canvas performance and service recovery continues to progress quickly. Although many users should now be able to access Canvas, there may still be areas of impaired functionality as we work through remaining issues.

Update Feb 28, 13:54 MST

We are beginning to see positive indications of recovery and have successfully tested workflows that were previously failing. We are still awaiting full resolution, and we will provide updates as the situation continues to improve.

Update Feb 28, 13:45 MST

AWS is still working through their recovery process. Unfortunately, the number of Amazon services that have been impacted has grown in the time it took to find the root cause, and it will be a significant effort on their side to recover all of the services. They are understandably starting with the most critical ones. Since Canvas depends on so many of their services, a full recovery may still take some time.

On our side, our DevOps team has moved on to other ideas about how to get from a “service disruption” state to a “degraded performance” state in Canvas. We are also discussing the plans for addressing similar circumstances in the future, though our options are limited due to the perniciousness of this incident; but we are considering all options at this time.

Update Feb 28, 13:05 MST

Amazon is continuing to work through their recovery process. On our side, our DevOps team has implemented a temporary change to ensure tools and apps not hosted on AWS (Amazon Web Services) are still accessible to those that are able to access Canvas, which is an improvement to the complete service disruption we have had since 10:37 AM MST. However, the majority of Canvas users are still unable to access their Canvas site, due to the outage with AWS.

We will continue our efforts to ensure a good experience with Canvas for users once they are able to access the site again, and will provide an update on the overall issue within the next 30 minutes.

Update Feb 28, 12:29 MST

As Amazon works to restore availability in their systems, our DevOps team continues their efforts to expedite the process to restore access to Canvas. We will provide a new update on their progress in 30 minutes or less.

Update Feb 28, 12:04 MST

Amazon Web Services has informed us that they have identified the underlying root cause of the issue and they are beginning the remediation process. Our internal DevOps team continues to explore options to facilitate faster recovery.

Update Feb 28, 11:52 MST

Amazon is still working to restore server access for sites that have been affected by their outage today, including many Canvas sites. They will keep us updated on their progress.

Identified Feb 28, 11:27 MST

Amazon has narrowed the scope of their investigation and has identified a specific region impacted by the networking issue. They are actively working on a solution. Our own DevOps team is investigating options that may allow us to work around the problem. We will provide another update in 15 minutes.

Identified Feb 28, 11:27 MST

Amazon has identified the issue as being limited to a set of servers in the US. They are actively working on finding a fix to address the errors you are seeing.

Update Feb 28, 11:08 MST

Amazon has updated their status page to indicate they are investigating increased error rates for their servers. They are working with us to provide updates on the issue; we will update this page with any new information. In the meantime, you can monitor their status page at https://status.aws.amazon.com/. Other Amazon Web Service Applications may be affected.

Update Feb 28, 11:03 MST

Amazon Web Services is currently experiencing what appears to be a large-scale networking issue that has impacted Instructure along with many other companies. We are working with Amazon to diagnose the problem and waiting for updates on their mitigation timeline. We will keep you posted as soon as we have more information.

Investigating Feb 28, 10:50 MST

Canvas is currently experiencing an outage that we are investigating. Our DevOps team has determined that this is an AWS (Amazon Web Services) Outage. We will post updates as they become available.

Updates will follow as they become available.

Posted in Educational Technologies

[Completed] VPN Maintenance

The VPN Maintenance scheduled for today has been completed. All VPN services are expected to be operational at this time.

Posted in Banner and Admin Systems, Network and Telecom, Systems, Workday

[Completed] Network upgrade in the Science Building on Feb 25

The network upgrade in the Science Building, SE-43 has been completed.

Thank you

Posted in Network and Telecom

VPN Maintenance – Saturday February 25 from 4:00 a.m. until 7:00 a.m.

OIT will be performing maintenance on the Faculty/Staff VPN from 4:00 a.m. until 7:00 a.m. tomorrow. During this time, VPN services will be intermittent. The backup VPN will be available during this time for those users that need to utilize the VPN. Additional details on the backup VPN are available at the vpn page:  www.fau.edu/security/vpn.php

Posted in Banner and Admin Systems, Network and Telecom, Systems, Workday

Network upgrade in the Science Building on Feb 25

On Saturday, February 25, 2017, between 12:00 am and 6:00 am, OIT will upgrade the network in the FAU Science Building, SE-43. This upgrade will cause intermittent interruptions to the network in the Science building during the implementation and testing of the new hardware. If you experience difficulties after the times outlined above, please contact the FAU help desk at http://www.fau.edu/helpdesk or 561-297-3999 for immediate assistance.

Thank you

Posted in Network and Telecom

OIT Systems Maintenance, Thurs. Feb 23 from 3:00AM to 7:00AM

Office of Information Technology will be performing routine systems maintenance which includes but is not limited to rebooting Windows servers as deemed necessary, installing critical patches and other health checks on Thursday, 02/23/2016 from 3:00 AM to 7:00 AM during OIT maintenance window.  In addition, the following systems will be specifically impacted:

***************************************************
College of Medicine SharePoint Site Migration
When: 4:00AM to 5:00AM
Affected services:   CoM SharePoint Site Collection and any sub sites.  – https://sharepoint.fau.edu/com/
Affected systems:  On-Prem ​SharePoint
Description of maintenance:  We will be completing the migration of the College of Medicine’s SharePoint site to Office 365.
On-Prem URL: https://sharepoint.fau.edu/com/
365 URL: https://fau.sharepoint.com/sites/com
User impact:  Users will need to update any bookmarks, saved links, and any calendars that point to https://sharepoint.fau.edu/com/

***************************************************
Financial Aid 8.28.0.3 patch
When:  3:00AM – 4:00AM
Affected services:  Not Applicable
Affected systems:  Banner Financial Aid
Description of maintenance:  ​The Financial Aid office has requested the installation of patch 8.28.0.3 be expedited through approval. This patch provides the basis for a procedure that will continue to load 2016-2017 COD system generated files into Banner after the COD system goes live for 2017-2018 with a new schema.
User impact:  No user impact is anticipated

***************************************************
Banner 8 Student patch 8.10.2.1 – critical install
When:  3:00AM – 4:00AM
Affected services:  ​Not Applicable
Affected systems:  ​Banner 8 Student module
Description of maintenance:  ​This Banner 8 patch provides data updates to accommodate a recent change to the AMCAS (Medical School) application file. Without the change, the COM applications cannot be loaded into BANNER.
User impact:  ​No user impact is anticipated

***************************************************

Posted in Banner and Admin Systems, Systems

[Completed] Network maintenance on February 18 from 12:01 am – 3:00 am

The network maintenance scheduled for February 18 has been completed.

Thank you

Posted in Network and Telecom, Systems

Network maintenance on February 18 from 12:01 am – 3:00 am

OIT will be performing network maintenance on Saturday, February 18 from 12:01 a.m. until 3:00 a.m. Due to the nature of the maintenance, users may experience intermittent Internet outages lasting approximately 10 minutes during this window. The Internet connection will be rerouted to the backup connection to minimize any downtime.

Thank you

Posted in Network and Telecom, Systems

OIT Systems Maintenance, Thurs. Feb 16 from 3:00AM to 7:00AM

Office of Information Technology will be performing routine systems maintenance which includes but is not limited to rebooting Windows servers as deemed necessary, installing critical patches and other health checks on Thursday, 02/16/2017 from 3:00 AM to 7:00 AM during OIT maintenance window.  In addition, the following systems will be specifically impacted:

***************************************************
Migration of authentication system supporting Office 365
When: 3:00AM – 6:00AM
Affected services: Sign-on to all Office 365 services. Sign-on to College of  Business websites​
Affected systems: boc22adfsp, boc22adfs, nwadfsp, nwadfs​
Description of maintenance: OIT will be migrating the ADFS 2.1 farm to ADFS 3.0​
User impact: Sign-on to Office 365 services may be intermittent during the window​

***************************************************
Migrate CoE SharePoint Sites to Office 365
When: 4:00AM – 5:00AM
Affected services: CoE site collection and any other subsites – https://sharepoint.fau.edu/coe/
Affected systems: SharePoint
Description of maintenance: ​We will be completing the migration of the College of Education’s SharePoint site to Office 365.
User impact: Users will need to update any bookmarks, saved links, and any calendars that point to https://sharepoint.fau.edu/coe/

***************************************************
Migration of sharepoint from old bigip to the new bigip
When: 6:00AM – 7:00AM
Affected services: sharepoint.fau.edu​
Affected systems: sharepoint​
Description of maintenance: We are continuing the move of servers from older bigip load balancer hardware to the new ones.  This move will be for the backend servers that support sharepoint
User impact: SharePoint may be unavailable for 5 minutes while DNS is updated​

***************************************************

Migration of tms (video bridge) from old bigip to the new bigip
When: 6:00AM – 7:00AM
Affected services: Tms and Tmsbridge​
Affected systems: tms.fau.edu, tmsbridge.fau.edu​
Description of maintenance: ​We are continuing the move of servers from older bigip load balancer hardware to the new ones.  This move will be for the backend servers that support video bridge​
User impact: Users may not be able to connect to tmsbridge​

***************************************************

Posted in Email, Instructional Technologies, Systems

[Completed] Network Maintenance Saturday February 11th from 6:00 a.m. until 7:00 a.m.

The network maintenance scheduled for this morning has been completed.

Posted in Banner and Admin Systems, Network and Telecom, Systems