[Resolved] AWS Restored: Impaired functionality between Canvas and Amazon continues

All services appear to be resolved.

Monitoring Feb 28, 15:59 MST

Amazon has verified that uploads to their service should be working again; users should be seeing improved performance with their uploads to Canvas. Our DevOps team is continuing to monitor the situation, but we are not currently aware of any lingering issues that affect Canvas functionality at this time.

Update Feb 28, 14:37 MST

In our previous update, we mentioned there would still be areas of impaired functionality between Canvas and Amazon. The biggest area of impact right now is that uploads are not yet working. This includes student uploads to assignments, instructor grade uploads, and similar functions, but also the ability for Canvas’ background processes to upload files such as admin reports (which is required as part of the process to generate a report at the account level). You may continue to see issues with this, and other areas in Canvas, as Amazon works to fully restore all services.

Update Feb 28, 14:15 MST

Canvas performance and service recovery continues to progress quickly. Although many users should now be able to access Canvas, there may still be areas of impaired functionality as we work through remaining issues.

Update Feb 28, 13:54 MST

We are beginning to see positive indications of recovery and have successfully tested workflows that were previously failing. We are still awaiting full resolution, and we will provide updates as the situation continues to improve.

Update Feb 28, 13:45 MST

AWS is still working through their recovery process. Unfortunately, the number of Amazon services that have been impacted has grown in the time it took to find the root cause, and it will be a significant effort on their side to recover all of the services. They are understandably starting with the most critical ones. Since Canvas depends on so many of their services, a full recovery may still take some time.

On our side, our DevOps team has moved on to other ideas about how to get from a “service disruption” state to a “degraded performance” state in Canvas. We are also discussing the plans for addressing similar circumstances in the future, though our options are limited due to the perniciousness of this incident; but we are considering all options at this time.

Update Feb 28, 13:05 MST

Amazon is continuing to work through their recovery process. On our side, our DevOps team has implemented a temporary change to ensure tools and apps not hosted on AWS (Amazon Web Services) are still accessible to those that are able to access Canvas, which is an improvement to the complete service disruption we have had since 10:37 AM MST. However, the majority of Canvas users are still unable to access their Canvas site, due to the outage with AWS.

We will continue our efforts to ensure a good experience with Canvas for users once they are able to access the site again, and will provide an update on the overall issue within the next 30 minutes.

Update Feb 28, 12:29 MST

As Amazon works to restore availability in their systems, our DevOps team continues their efforts to expedite the process to restore access to Canvas. We will provide a new update on their progress in 30 minutes or less.

Update Feb 28, 12:04 MST

Amazon Web Services has informed us that they have identified the underlying root cause of the issue and they are beginning the remediation process. Our internal DevOps team continues to explore options to facilitate faster recovery.

Update Feb 28, 11:52 MST

Amazon is still working to restore server access for sites that have been affected by their outage today, including many Canvas sites. They will keep us updated on their progress.

Identified Feb 28, 11:27 MST

Amazon has narrowed the scope of their investigation and has identified a specific region impacted by the networking issue. They are actively working on a solution. Our own DevOps team is investigating options that may allow us to work around the problem. We will provide another update in 15 minutes.

Identified Feb 28, 11:27 MST

Amazon has identified the issue as being limited to a set of servers in the US. They are actively working on finding a fix to address the errors you are seeing.

Update Feb 28, 11:08 MST

Amazon has updated their status page to indicate they are investigating increased error rates for their servers. They are working with us to provide updates on the issue; we will update this page with any new information. In the meantime, you can monitor their status page at https://status.aws.amazon.com/. Other Amazon Web Service Applications may be affected.

Update Feb 28, 11:03 MST

Amazon Web Services is currently experiencing what appears to be a large-scale networking issue that has impacted Instructure along with many other companies. We are working with Amazon to diagnose the problem and waiting for updates on their mitigation timeline. We will keep you posted as soon as we have more information.

Investigating Feb 28, 10:50 MST

Canvas is currently experiencing an outage that we are investigating. Our DevOps team has determined that this is an AWS (Amazon Web Services) Outage. We will post updates as they become available.

Updates will follow as they become available.

Posted in Educational Technologies

[Completed] OIT network maintenance January 22 from 6:00 am – 7:00 am

Network maintenance has been completed.

Thank you

Posted in Banner and Admin Systems, Educational Technologies, Email, Network and Telecom, Systems

OIT network maintenance January 22 from 6:00 am – 7:00 am

On Sunday, January 22, 2017, between 6:00 am and 7:00 am, OIT will implement an emergency update to the core network. Due to the nature of the maintenance, users may experience intermittent network outages lasting approximately 10 – 20 minutes during this time period. This outage will affect connectivity to the Internet as well as connectivity within FAU.

Thank you

Posted in Banner and Admin Systems, Educational Technologies, Email, Network and Telecom, Systems

Do Not Open Phishing Email with subject line “Important announcement from President John Kelly”

If you received an email that appears to come from President John Kelly with subject line “Important announcement from President John Kelly” do not open and please delete this email immediately. This is a phishing email.

If you already opened the attached document contact the OIT Help Desk at 561.297.3999 for further assistance.

Posted in Banner and Admin Systems, Educational Technologies, Email, Instructional Technologies, Network and Telecom, Systems, Uncategorized, Workday

[Completed] OIT network maintenance January 7 from 1:00 am – 6:00 am

The network maintenance scheduled for January 7 has been completed.

Thank you

Posted in Banner and Admin Systems, Educational Technologies, Email, Network and Telecom, Systems

OIT network maintenance January 7 from 1:00 am – 6:00 am

On Saturday, January 07, 2017, between 1:00 am and 6:00 am, OIT will implement an emergency update to the core network.  Due to the nature of the maintenance, users may experience intermittent network outages lasting approximately 20 – 30 minutes during this time period. This outage will affect connectivity to the Internet as well as connectivity within FAU.

Thank you

Posted in Banner and Admin Systems, Educational Technologies, Email, Network and Telecom, Systems

[Resolved] Intermittent access to Blackboard

The intermittent access to Blackboard has been resolved.

Posted in Educational Technologies, Systems

Intermittent access to Blackboard

OIT is currently investing a problem with intermittent access to Blackboard.  We hope to speedily correct any issues found and apologize for the inconvenience.

Posted in Educational Technologies, Systems

Resolved – Wireless networks in Davie and Jupiter able to access SSO

Update: Jupiter and Davie campuses are now able to access SSO from the wireless networks.

Posted in Banner and Admin Systems, Educational Technologies, Email, Network and Telecom, Systems, Workday

[CANCELLED] – OIT Systems Maintenance, Thurs. Nov 3 from 12:01AM to 7:00AM

***UPDATE***

Maintenance for Thursday, Nov 3 has been postponed until next Thursday.

*****************************************

 

Office of Information Technology will be performing routine systems maintenance which includes but is not limited to rebooting Windows servers as deemed necessary, installing critical patches and other health checks on Thursday, 11/03/2016 from 12:01 AM to 7:00 AM during OIT maintenance window. In addition, the following systems will be specifically impacted:

***************************************************

Oracle quarterly DB patches (mcoracle)
When: 3:00AM to 7:00AM
Affected services: Luminis production (MyFAU)​​
Affected systems: mcoracle
Description of maintenance: Applying Oracle PSU to DB’s on host mcoracle
User impact: MyFAU and Luminis will be come unavailable during the maintenance window.

***************************************************

Oracle quarterly DB patches (boc22ora05)
When: 3:00AM to 7:00AM
Affected services: IEA Data Warehouse, Resource 25​
Affected systems: boc22ora05​
Description of maintenance: Applying Oracle PSU to DB’s on host boc22ora05.​
User impact: The IEA Datawarehouse will be unavailable during this time. The Resource 25 application will experience interruptions in service as well.

***************************************************

Oracle quarterly DB patches (boc22ora03)
When: 3:00AM to 7:00AM
Affected services: Identity Manager production and MSOP instance which involves Owlapp, grouper, undergrad app, grad survey, lumsync, pwd expire, swift, frevvo, fau visitor​
Affected systems: boc22ora03​
Description of maintenance: Applying Oracle PSU to DB’s on host boc22ora03​
User impact: Applications that login to the MSOP or IDMP instances will become temporarily unavailable during the maintenance window.​

***************************************************

Oracle quarterly DB patches (boc22ora01)
When: 3:00AM to 7:00AM
Affected services: Report Caster and WebFOCUS production​
Affected systems: boc22ora01​
Description of maintenance: Applying Oracle PSU to DB’s on host boc22ora01.​
User impact: Report Caster and WebFOCUS will be temporarily unavailable at times during ​the maintenance window. Jobs may not execute in Report Caster.

***************************************************

Migration of virtual servers to new BigIP load balancers (SSO)
When: 5:00AM to 6:00AM
Affected services: SSO
Affected systems: Bigip Load Balancers
Description of maintenance: Our current BigIP load balancers are being upgraded, so we are moving secure websites hosted by the old ones to the new ones.
User impact: SSO may be inaccessible for up to 5 minutes.

***************************************************

Migration of www and Myfau to the new BigIP load balancers
When: 5:00AM to 6:00AM
Affected services: ​FAU website and MyFAU
Affected systems: BigIP Load balancers​
Description of maintenance: Our current BigIP load balancers are being upgraded, so we are moving secure websites hosted by the old ones to the new ones.
User impact: Users may experience a brief interruption while trying to access these websites

***************************************************

Disk maintenance on several systems
When: 3:00AM to 6:00AM
Affected services: Test Blackboard, Metadata Management for Middleware Services, OIT App Dev server, Test Talisma, WildFly Application Server for Middleware Identity Management (Dev, Test and Prod), Gitlab for OIT App Dev team, LDAP​, ADFS, Jira Test​
Affected systems: boc22l6jirat, ldapnw1-el6, ldapboc1-el6, ldapboc0-el6, shibidp1t, bblt3sp10, bblt2sp10, bblt1sp10, bbda2, bbda1​, metadata, boc22devproc, boc22taldev1, gitlab, wildflymc1d, wildflymc1t, wildflymc1, wildflymc2, wildfly3, wildfly1t, wildfly1d, boc22adfs
Description of maintenance: OIT will be shutting down several systems to preform disk maintenance
User impact: The services mentioned above will be unavailable intermittently during the maintenance window

***************************************************
Verasmart phone billing upgrade
When: 12:01AM – 2:00AM
Affected services: phone billing​
Affected systems: boc22comm1
Description of maintenance: OIT will be upgrading verasmart phone billing software​
User impact: Phone billing will be unavailable during this time​

***************************************************

Posted in Banner and Admin Systems, Educational Technologies, Email, Systems