Developing on Staxmanade

Oops - how a simple bit of automation put NuGet services on edge...

(Comments)

This past week I received an email from Microsoft's NuGet team asking if I could look into a bit of an issue with DefinitelyTyped's NuGet package publishing.

Some Background

A really long time ago, I wanted to access DefinitelyTyped packages within Visual Studio via the NuGet package manager. So I quickly wrote up a powershell script to accomplish this. This script has run almost continuously ever since, and primarily without issue.

There's been a couple tweaks/issues along the way - as to be expected, but it's been primarily hands-off.

As of today, these NuGet packages have been downloaded over 5,268,852 times - wow.

What does the automation do?

All of the NuGet packages generated for DefinitelyTyped are run through a build process on the good servers at AppVeyor (Thanks AppVeyor).

Every 2 hours the task does some git-fu to figure out what DT packages have updated (since the last run) and publishes updated NuGet packages for each updated DT package.

The initial problem report:

First let me say that thanks to Yishai and Maarten from Microsoft who brought the issue to my attention and were extremely polite and patient with the raised issue. So thank you, thank you, thank you for the support and being so friendly while working through this...

service status image of problem with nuget

Looking at status.nuget.org

It was pretty easy to see that every 2 hours a large spike in uploads to NuGet was happening.

service status image of problem with nuget

service status image of problem with nuget

While I can't say for certain this incident report on the status page was due to the NuGet automation, it was around the same time the automation was pushing extra builds (and right before I was contacted by Microsoft).

Was that my automation oops?

I didn't recall getting an error email from AppVeyor so I was initially suspicious. But logging in and looking at build history: hmmm. Looking back at my email, looks like I did receive the first failed build email - but must have been busy day as I didn't happen to notice that one email (when I usually do from other projects).

service status image of problem with nuget

YIKES!... so I quickly responded to Microsoft saying I'd shut down the automated portion and dig into it.

The Problem & Resolution

Turns out the problem was due to a large pull request that updated about every package in the DT repo. This meant it had to publish every single package, but for some reason (not showing in the logs) the AppVeyor job was failing at the end and not saving the fact that the packages were being updated on NuGet...

I have a way to run the NuGet publish manually on a local machine so I pulled down the project and ran the complete build. This took quite a while (almost 3 hours) and eventually I discovered the problem.

At the end of the script is a git commit -m {msg} command. This is an important step as it records what has been updated/published. The problem was due to the large number of packages published, the {msg} was so big that it was throwing an error saying the command line command was too long to execute. Causing the system to not complete the cycle and end up re-publishing all nuget packages every 2 hours.

I was able to manually commit with a shorter message and it brought the system back to normal.

And below is NuGet status after the fix.

service status image of problem with nuget

Thanks NuGet!

Turns out the NuGet team put some time into optimizing the publishing process of their service - so maybe there was a benefit to this whole fiasco, but hopefully we won't be hammering the system going forward :)

So I'd like to say thanks again to the NuGet team for your kind support email and professional way of handling the issue. This is a great example of how Microsoft is helping the OSS community and their support is really taking off and showing promise!

Also Good Point

service status image of problem with nuget

Happy NuGetting!

Comments