Wednesday, November 19, 2008

Meltdown

It's been quite a long time since I've posted anything. I just didn't seem to have any energy to do so after the brain meltdown of my job caused me to revert mentally to two years of age and start to loudly demand biscuits and my stuffed elephant. You'll be glad to hear that I'm better now. I have medication. The very patient staff at the asylum have even managed to teach me to read and write again.

I also have a new job, much to the disgust of Roy who I left behind at the old employer cleaning up after me.

In the interview for the new position, they seemed a little vague on what my duties would actually be. This didn't unduly concern me given that my interview was informal to say the least. A pair of technicians were interviewing me and seemed a little hazy on what they were supposed to ask. I probably though should have paid a little more attention at the time to the only other member of the team I was to be joining. He looked as though he hadn't slept in a month.

On my first day, I was introduced to The System. It seems I hadn't really been hired to be a UNIX Systems Administrator in any sense of the word I understood. I had been hired to look after The System.

The System is a sprawling catacomb of home-built perl that acts as a tool to perform monitoring, configuration management, customer relationship management, documentation, change control, trouble ticketing and making toast. It has a PHP front-end that I'm told only works for the simplest of tasks – anything even vaguely complicated and I'll need to write SQL to poke my commands straight into the database.

It seems the parent company have been trying in vain to get my department to replace The System with an off-the-shelf product for years, but that my department believe The System is a far better solution. 'This way' the architect beamed, 'we can just make it do anything we want instead of being forced into someone else's idea of management!'

I was supposed to have my first training session in using The System yesterday afternoon – unfortunately my mentor was called away abruptly. It seems The System has had a bit of a hiccup ... for about six months ... and hasn't been monitoring some systems it should have been. My mentor, who actually seems like quite a nice guy, informed me wryly that this has happened before.

Afternoon of day two, and I'm still sitting here sipping chilled water and flicking through engadget waiting for that first lesson. I can't wait.

Tuesday, November 11, 2008

Packaging

How to obtain a package:

1. Minion locates upstream site with package
2. Minion wget's it
3. Minion replicates it out to your various package repositories
4. Minion says it's done
5. Your installation fails
6. You discover minion failed to get the URL correct, and instead has replicated out a copy of a 404 page called the package.
7. Smack minion. Bad minion. No biscuit.

Thursday, October 23, 2008

Choices

Your VM host is out of space. You can't add more disk to it, as the chassis is full. There's no SAN.

You have two choices:
(a) use a spare VM host of a similar spec and migrate some machines to it
(b) break the RAID-1 on the exsting host, and run it with no disk redundancy

If you chose (a), you're wrong. The obvious answer is (b).

Right?

Tuesday, August 12, 2008

Networking is hard

You have a new VM to build. It has three IP addresses on the same network. Naturally, you set up the name of the machine in the DNS with one of them, and have suitable application names for the others. Maybe you CNAME things as well. And you'll need a virtual NIC too.

That's why you do this:

# host myvmname
myvmname.domain has address 10.10.1.20
myvmname.domain has address 10.10.1.22
myvmname.domain has address 10.10.1.21

And why you don't provide any other names.

And why you give the VM three virtual NICs, one for each IP address. All of them on the same network.

Networking is hard.

Wednesday, July 16, 2008

Requirements. You don't has them.

User: Can you copy thisapplog to othermachine please?
Me: Where do I find thisapplog at the moment?
User: I don't understand
Me: What machine is thisapplog on?
User: I don't know
Me: ...

Resisting the urge to put a "copy" of the log file containing nothing but:
[Wed 16 Jul 2008 04:43:56] error: you fail at specifying logs

Friday, June 27, 2008

Friday 5pm

It's 5pm on Friday. You are a few hours out from a massive migration of data between SANs. The planning for this has been rough, but it is starting to look sane.

That is until you discover right then that no-one has mentioned before the size of the disks you'll be getting from the new SAN is different to the old one.

It's not like that would be an important detail.

This is Friday 5pm for sysadmins.

Tuesday, June 17, 2008

Gah!

No, I cannot burn your 50GB of data to 'a DVD'. Even if my regulation issue company brick had a blueray disk burner, I'd still have to get the data back to it through may layers of VPN and NAT that at some points only manages to pass data at slow internet speeds. This is just a silly idea, honestly. Buy a real backup solution.

Wednesday, May 7, 2008

The Trouble Ticket

When I first started a job at a recent employer, I was quickly assigned The Ticket. This ticket had been in the system since time immemorial, and had been passed around all of the administrators in the team. The comment history scrolled for pages and pages.

On the surface, it didn't really look all that bad. There was a small development company who were dissatisfied with their current revision control system and practices, and who wanted a new system put in place and some training in how to use it. They were very happy with open source solutions and they were using a revision control system old enough there were a lot of scripts to migrate from it to almost any of the newer systems.

Apart from everyone having a different opinion on which system to migrate them to, I was fairly hopeful it was going to be an interesting project, or at the very least not terribly difficult.

To start with, I brought the discussion around to the customer's current usage patterns.

'Well, one of the problems that we have is disk space on the local developer workstations. We don't want a checkout of the code on every system, so we check out the code onto a file server and then everyone edits it from there'

This sounded extremely odd to me, but I assumed naively that people had their own checkouts in their own shell accounts.

'Oh no, we share the code out on a Samba share, and everyone maps it as a drive on their workstations'

Oh no.

It turns out that the entire office were mounting the same samba share read/write all using the same username and password, and all editing the files as the same user in the revision control system. There was absolutely no ability to figure out who was editing what, and conflicts happened regularly. When they did, work would grind to a halt for hours while everyone tried to figure out a compromise. What was worse, the disk space problem that had prompted this odd solution was mostly a factor of their lack of understanding of how their revision control system worked. Not everyone needed a full checkout of the repository but they hadn't seemed to yet figure out that you didn't need to pull the whole thing.

I made my escape quickly and when I got back to the office I ignored The Ticket.

Until the next new Sysadmin started, and then I assigned it to him.

Tuesday, April 15, 2008

Interns

I work for a very large company that has very American style practices. Most of the time they don't affect us here in not-america, but the tradition of internship is one that my company has implemented in all of it's world-wide offices.

I rather wish they hadn't.

We've been assigned an intern to babysit in our very small, extremely busy team. We work with Linux. He's never touched Linux before. Ever. He doesn't like what he's seen of it so far. Apparently working at the command line is something that for him went out with the dinosaurs.

He doesn't seem to be particularly good with Windows either though, at least, I ended up configuring his wireless networking with him. We assumed then that he must have been a reasonable java developer as that's what he's studied. Seems not, after hearing him in conversation with a peer of mine who is an exceptionally talented developer. The Intern came off sounding, well, a little thick.

He is however bright eyed and bushy tailed and very, very keen. He seems to have really taken a shine to my colleague, who with his 13 years of industry experience probably seems like a bit of a father figure, or at least a brotherly type. Shame my colleague is about ready to strangle him - the constant barrage of 'But if you used Windows wouldn't that just work?' infuriates my colleague beyond rational behaviour.

 I see myself patiently explaining that most of these technologies do not scale very well on Windows platforms many times over.

He also seems to think that he should have more 'responsibility'. I think he sees himself as a manager, which is great. My team already has plenty of management and I fully support his wish to be a manager.

Far, far away from me.


Tuesday, March 25, 2008

Little piggie, little piggie let me in.

I love how BigCorp(tm) think it's a great idea to use a Windows domain controller (ADS/KRB5) to authenticate their Linux users against.

What a marvelous idea! It means we can all have a single password throughout the organization!

It sounds great in a perfect world, where:

  • Networks/interfaces don't fail.
  • Accounts are not locked out when a user attempts to autheticate more than once every 5 seconds (really nasty when attempting to do something like: for i in `cat hosts.txt`; do ssh $i /bin/something; done )
  • Machines and the DC don't always match up time (particularly across large subnets regions/physical locations.

The one that gets me....
  1. Lose connectivity to the subnet that contains the Windows Domain Controllers.
  2. Customer raises issue 'Can't login'.
  3. Customer expects us to 'fix the issue'.
  4. We can't even login (even on the console as root with a local password), as the pam config specifies it needs to check the KRB5 realms.
  5. Customer gets narky.
  6. Customer is aware of the issue, but refuses to acknowledge it as a problem.

The solution... sit it out until hopefully the network comes back. Failing that.. a reboot using the boot option of 'single'. That's if the customer allows you to reboot the machine.

The joys of corporate stupidity. *sigh*

Thursday, March 6, 2008

1+1= ?

16GB of swap space required.

15GB SSD as the only onboard disk.

Are you sure you don't see something wrong with this picture?

Wednesday, January 23, 2008

I just don't believe you

Somehow I find it very hard to believe that you did not realise at any point while creating the severity 2 ticket in our trouble ticketing system that this action was going to page out the sysadmin on call. I find it even harder to believe that you would be surprised they would get upset with you about this on finding out that the issue was not an urgent one but rather an on-going issue you'd been experiencing for months that you wanted some data collected on.

Right now if someone would invent me stab-over-ip I'd bake them cookies.

Monday, January 21, 2008

Epic Fail

I discovered today when I picked up the pager for my on-call duties this week that I've been deleted from a certain customer's trouble ticketing system, along with about 100 random users. This is going to make it slightly difficult to respond to tickets.

Only slightly.