Wednesday, November 11, 2009

Mailinator for Testing

Recently I implemented a captcha system in Mailinator.

If you're a normal user - you've probably never seen it. That's because it doesn't get activated until emails with the same subject get read more than like 10 times in a minute. Needless to say, if you're using Mailinator to sign-up for something here and there, that's not a normal use case.

But if you're a script - or a person signing up for some website over and over and over (and over) - you hit this pretty fast. It has done a fantastic job of stopping scripts pretty succinctly and slowing down humans zealously working over some site. The main problem with the latter is that many of those sites don't like that and may then ban Mailinator. That's not good for anyone.

What surprised me about the captcha system is how many people emailed me that I broke their test scripts. That is, they had email system tests (i.e. "Thanks for registering!") that use Mailinator as an end point. Their test then (with variation per user of course) checks the Mailinator inbox that their system correctly sent the email.

ManyBrain, Inc. (the company that owns Mailinator) has offered for a long time testing packages that completely bypass Mailinator's abuse system for high-volume testing or other emailing.

I'd be interested in however in improving this service and making it more mainstream. If you use Mailinator for testing - or would like to - I'd like to hear from you.

How would you use it? What kind of volume? I can't believe anyone is excited about scraping HTML (that wasn't particularly designed to be scraped) to get test results. Mailinator already has a (secret, shhh) JSON interface thats not public. That would seem to be the way to go.

My thoughts is a JSON-based SOAP/REST/whatever API that allows testing scripts to read emails. Whats a reasonable method and threshold of use to start charging? Possibly a sister site to Mailinator itself would be a better home for the testing service.

If there's enough interest to support the development, I'd be excited to get this up and running.

Email me at paul@manybrain.com

Monday, September 21, 2009

More Alternate Domains !

I've added a few more alternate domains. If you've never used them, they are simply other domains (for example, sogetthis.com) that forward all email to Mailinator.

Simply, if you send email to fred@mailinator.com - or - fred@sogetthis.com it works exactly the same way and your email will arrive in the "fred" inbox at Mailinator both ways.

You can find alternate domains (including the new ones mixed in) on the front page of Mailinator on the left column.

I'll be sneaking more into the rotation soon!

Tuesday, June 9, 2009

A Beautiful Race Condition

I recently gave a keynote at the ICIS 2009 conference in Shanghai. The topic was why multithreaded programming seemed so easy, yet turned out to be so hard. The fun part was I investigated (per my last post and this one) several old, personal concurrency demons I knew existed but wanted to know more about.

One of those was, indeed, my favorite race condition. It doesn't escape me that its probably wholly unhealthy to even *have* a favorite race condition (akin to having a favorite pimple or something) - but nonetheless, the elegance of this one still makes my heart aflutter.

The scenario of this race is that we assume, not terribly inaccurately, that race conditions at times, can cause corrupted data. However, what if we have a situation where we sort of don't mind some corrupted data? A "good enough" application as it were.

The dangerous part of all this is if we assume (without digging in) what kind of data corruption can happen. As you'll see, you might just not get the type of data corruption you were hoping for (which is one of the sillier sentences I've ever written).

The particular instance of this kind of happy racing I've encountered is where someone uses a java.util.HashMap as a cache. I've never done such a thing myself, but I heard about this race and thus this analysis. They may use it with a linked-list or maybe just raw, but the baseline is that they figure a synchronized HashMap will be expensive - and in their case, a race condition inside the HashMap will just lose (or double up on) an entry now and then.

That is - a race condition between two (or more) threads might accidentally drop an entry causing an extra cache miss - no biggie. Or, it may cause one thread to re-cache an entry that didn't need it. Also no biggie. In other words, a slightly imprecise, yet very fast cache is ok by them. (of course, this assumption is dead wrong - don't do that - read on for why!)

So they setup a HashMap in some global manner, and allow any number of nefarious threads bang away on it. Let them put and get to their hearts content.

Now if you happen to know how HashMap works, if the size of the map exceeds a given threshold, it will act to resize the map. It does that by creating a new bucket array of twice the previous size, and then putting every old element into that new bucket array.

Here's the core of the loop that does that resize:


1:  // Transfer method in java.util.HashMap -
2:  // called to resize the hashmap
3:  
4:  for (int j = 0; j < src.length; j++) {
5:    Entry e = src[j];
6:    if (e != null) {
7:      src[j] = null;
8:      do {
9:      Entry next = e.next;
10:     int i = indexFor(e.hash, newCapacity);
11:     e.next = newTable[i];
12:     newTable[i] = e;
13:     e = next;
14:   } while (e != null);
15:   }
16: }


Simply, after line 9, variable e points to a node that is about to be put into the new (double-wide) bucket array. Variable
next
holds a reference to the next node in the existing table (because in line 11, we'll destroy that relation).

The goal is that nodes in the new table get scattered around a bit. There's no care to keep any ordering within a bucket (nor should there be). HashMap's don't care about ordering, they care about constant time access.

Graphically, let's say we start with the HashMap below. This one only has 2 buckets (the default of java.util.HashMap is 16) which will suffice for explanatory purposes (and save room).

As our loop starts, we assign e and next to A and B, respectively. The A node is about to be moved, the B node is next.



We have created a double-sized bucket array (in this case size=4) and migrate node A in iteration 1.



Iteration 2 moves node B and Iteration 3 moves node C. Note that next=null is the ending condition of our while loop for migrating any given bucket (read that again, its important to the end of the story).



Also important to the story, note that the migration inverted the order of Node's A and B. This was incidental to the smart idea of inserting new nodes at the top of the list instead of traversing to find the end each time and plunking them there. A normal put operation would still have to check that its inserting (and not replacing) but given a resize can't replace, this saves us a lot of "find the end" traversals.

Finally, after iteration 3, our new HashMap looks like this:



Our resize accomplished precisely the mission it set out to. It took our 3-deep bucket and morphed it into a 2-deep and 1-deep one.

Now, that's all well and good, but this article isn't about HashMap resizing (exactly), its about a race condition.

So, let's assume that in our original happy HashMap (the one above with just 2 buckets) we have two threads. And both of those threads enter the map for some operation. And both of those threads simultaneously realize the map needs a resize. So, simultaneously they both go try to do that.

As an aside, the fact that this HashMap is unsynchronized opens it up to a scary array of unimaginable visibility issues but that's another story. I'm sure that using an unsynchronized HashMap in this fashion can wrack evil in ways unlike man has ever seen, I'm just addressing one possible race in one possible scenario.

Ok.. back to the story.

So two threads, which we'll cleverly name Thread1 and Thread2 are off to do a resize. Let's say Thread1 beats Thread2 by a moment. And let's say Thread1 (by the way, the fun part about analyzing race conditions is that nearly anything can happen - so you can say "Let's say" all darn day long and you'll probably be right!) gets to line 10 and stops. Thats right, after executing line 9, Thread1 gets kicked out of the (proverbial) CPU.



1:  // Transfer method in java.util.HashMap -
2:  // called to resize the hashmap
3:  
4:  for (int j = 0; j < src.length; j++) {
5:    Entry e = src[j];
6:    if (e != null) {
7:      src[j] = null;
8:      do {
9:      Entry next = e.next;
     // Thread1 STOPS RIGHT HERE
10:     int i = indexFor(e.hash, newCapacity);
11:     e.next = newTable[i];
12:     newTable[i] = e;
13:     e = next;
14:   } while (e != null);
15:   }
16: }


Since it passed line 9, Thread1 did get to set its e and next variables. The situation looks like this (I've renamed e and next to e1 and next1 to keep them straight between the two threads as both threads have their own e and next).



Again, Thread1 didn't actually get to move any nodes (by this time in the code, it did allocate a new bucket array).

What happens next? Thread2, that's what. Luckily, what Thread2 does is simple - let's say it runs through the full resize. All the way. It completes.

We get this:



Note that e1 and next1 still point to the same nodes. But those nodes got shuffled around. And most importantly the next relation got reversed.

That is, when Thread1 started, it had node A with its next as node B. Now, its the opposite, node B has its next as node A.

Sadly (and paramount to the plot of this story) is that Thread1 doesn't know that. If you're thinking that the invertedness of Thread1's e1 and next1 are important, you're right.

Here's Thread1's next few iterations. We start with Thread2's bucket picture because thats really the correct "next" relations for our nodes now.







Everything sort of looking ok.. except for our e and next at this point. The next iteration will plunk A into the front of the bucket 3 list (it is after all, next). And will assign its next to whatever happens to already be there - that is, node B.



Woah. Thar she blows.

So right about now Thread1 goes into, what we like to call in the biz, an "infinite loop". Any subsequent get operations that hit this bucket start searching down the list and, go into, yep - an infinite loop. Any put operation that first needs to scan the nodes to see if its going to do a replace, will, you guessed it, go into an infinite loop. Basically, this map is a pit of infinite loopeness.

If you remember we noted that race conditions cause data corruption. Well, yeah, thats what we have here - just very unlucky data corruption on pointer structures. I'm the first to admit this stuff is tricky - if you found errors in my analysis I'll happily fix this post.

Now I had the happy fortune for a time of sharing an office with Josh Bloch who wrote java.util.HashMap. When I innocently mentioned he had a bug in his code given this behavior, he did indeed (to use Josh's word's) go non-linear on me.

And he was right. This is not a bug. HashMap is built specifically for its purpose and this implementation is not intended as threadsafe. There's a gaggle of ways to make it threadsafe, but in plain, vanilla, (and very fast) form - its not. And needless to say, you shouldn't be using it that way.

Race conditions are nothing to mess with and the worst ones are the ones that don't crash your program but let it wander down some unintended path. Synchronization isn't just for fun you know.

And nefarious uses of HashMap aside, I still attest - this is, indeed, a beautiful race.

Addendum: I've been yelled at a few times for calling any race condition "beautiful". I'll defend myself by our apparently human nature to generally call intricate complexity beautiful (i.e. waves crashing on a shore, nature in general).

Most races end up being about data-corruption. This one is data-corruption that manifests as control-flow-corruption. And it does so fantastically without an error (infinite loops notwithstanding).

As the evolution analogy goes, if you drive a needle into a pocket watch - chances are you'll simply destroy it. But there's a tiny chance you'll actually make it a better watch (clearly, not the case here). And another tiny chance you'll simply make it something "different" - but still, per se, functioning.

Again, my use of "beautiful" might be more appropriate as "a complex mutation with surprising non-destruction" :)

Monday, June 8, 2009

On Java Visibility

A semi-famous example of broken Java synchronization is this:

class SomeClass {

  private boolean keepGoing = true;

    public boolean get() {
      return keepGoing;
    }

    public synchronized boolean set(boolean x) {
      keepGoing = x;
    }
}

I believe this example is in Josh Bloch's book "Effective Java" (which I now notice I somehow lost my copy in my most recent move - don't tell Josh).

The idea is here that someone (ostensibly) thought that they'd save some performance by not synchronizing a getter and just synchronizing the setter. There definitely is a cost to synchronization and needless to say, getting doesn't change anything - so why bother paying that cost for gets?

As has been pointed out long before this post, without a synchronize on a getter, there is no guarantee of visibility when doing the get. That is, if one thread calls set(false), there's no guarantee that any other thread will know it happened.

Consider code that might use the code above:


class SomeOtherClass {

  SomeClass keepGoing = new SomeClass();

  class Thread1 implements Runnable {
    public void run() {
      while (keepGoing.get()) x++;
      System.out.println("done1");
    }
  }

class Thread2 implements Runnable {
    public void run() {
    keepGoing.set(false);
  }
  }
}


Let's say you start a Thread1 running. And of course, keepGoing.get() is true, so it just keeps looping along. Then lets say an eternity (or maybe 2 seconds) later you start Thread2.

If we had reliable visibility, Thread1 would end the moment after Thread2 sets keepGoing to false.

If you've read Josh's book, its no surprise that it doesn't. Specifically, Thread1 doesn't end. It keeps going. Thread2 ends happily and Thread1 never ends.

The only interesting part to me was that this always worked. Always. Adding to the complexity of visibility concerns is that memory is "leaky". That is, even without guaranteed visibility you often get unreliable visiblity.

So, I dug a little deeper. If you're adventurous enough to grab a debug-version of the JDK and run it with the -XX:+PrintOptoAssembly option. You get to see the optimizations the JIT are making. That is, you see the assembly code version of your Java code - post-optimization. Check out Koshuke Kawaguchi's Blog for some instructions.

So here's Thread1's loop code after JIT optimization at runtime:

02c movq R10, precise klass manybrain/test/Main: 0x0000000040a50968:Constant:exact * # ptr
036 movsbl R8, [R10 + #596 (32-bit)] # byte ! Field manybrain/test/Main.keepGoing
03e testl R8, R8
041 je,s B4 P=0.000000 C=147944.000000
041
043 B2: # B3 <- B1 Freq: 1
043 movl R8, [R10 + #592 (32-bit)] # int ! Field manybrain/test/Main.x
043
04a B3: # B3 <- B2 B3 top-of-loop Freq: 1e-35
04a incl R8 # int
04d movl [R10 + #592 (32-bit)], R8 # int ! Field manybrain/test/Main.x
054 testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC # manybrain.test.Main$Thread1::run @ bci:14 L[0]=_
# OopMap{r10=Oop off=84}
05a jmp,s B3
05a
05c B4: # N53 <- B1 Freq: 4.76837e-07
05c movl RSI, #27 # int
061 nop # 2 bytes pad for loops and calls


If you're old school, welcome home.

If you're not, then this might look like a lot of goop. So let's parse out just the interesting parts.

Line 41 is an conditional jump. Basically, if keepGoing (per the comment in line 36) is false, we jump to line 5C (label B4) and end the code segment. You and I know that keepGoing started true, so basically that jump isn't followed.

Lines 43-5a are the loop that does x++.

And checkout line 5a. That is an unconditional jump back to the top of the loop. So what does all this mean? That the JIT did some very aggressive optimization. In fact, remember our original loop from Thread1?


while (keepGoing.get()) x++


The JIT has optimized this to:

if (keepGoing.get()) {
  while (true) x++;
}


No wonder the loop never ends. You've got bigger problems now than a little leaky visibility issue. I'm positive I'm oversimplifying, but effectively the JIT saw your get method wasn't synchronized and made the assumption that it could optimize as such. If you didn't guarantee visibility, it didn't need to either. Obviously, add the synchronization modifier to the get method and all this badness won't happen.

Moral of the story is much like the inevitable topics discussed at a lunch with Jeremy Manson - you can't optimize away correct synchronization; as all you'll probably do is optimize the "correctness" part away.

Monday, January 26, 2009

How long before an email is removed from the system?

I get this question a fair bit. How long, exactly, is an email available to be read after it enters the system?

Its hard to answer because its literally dependent upon the incoming rate at which email arrives. In effect, new email "pushes out" old email. That's a bit simplified as some email is thrown away based upon spam rules (if you get 100,000 emails with the same subject, you can probably say they aren't all going to be read).

So, whats the average? This weekend, email was lasting around 10 hours before getting "pushed out" of memory. I've seen that jump down to an hour - but mostly its around 5-7 hours.

I'll note that the primary Mailinator use case is that people tend to read email soon after it arrives. In other words, if email only lasted 15 minutes, we'd actually fill the needs of many users.

(I'd be *very* interested in hearing ways you use Mailinator that need mail to stick around longer.)

Thursday, January 8, 2009

No, Mailinator didn't spam you

I must say, it's rare these days but I still now and then get emails from folks that think Mailinator spammed them. As it says on our contact page, this is pretty unlikely. I say that because Mailinator is custom software and that software contains no specific way to actually have a user "send" email. There's no chance of Mailinator being an open-relay as it stands. And of course, there is no place at all on the site to accept an email for sending.

I suppose its possible, but it would involve a hacker breaking into the server, installing some other email server (which would likely conflict with mailinator itself), configuring it, and then start pummeling it for their nefarious purposes. Given that any self-respecting spammer has a billion zombies at their disposal and that this would definitely be discovered very quickly (my colo vendor loves to watch my bandwidth), it doesn't seem like an efficient way to spam.

In any case, I still get accused of spamming at times. And all that accusation takes is for a spammer to forge the return address as a mailinator address. Let me tell you, forging a return address is stunningly easy. Here's 3 million or so guides how to do it if you're wondering.

Below is an actual email header someone sent me.

The interesting parts are really the first two lines. As you see the forged return path is ronb@mailinator.com. Now if you know mailinator, you know ANYONE can check that box. It belongs to no one and everyone (as outlined in the FAQ - Mailinator guarantees NO PRIVACY. All emails are viewable by ANYONE).

The 2nd line (i.e. Received:) shows the IP (and dns) of the server that actually sent the email. Something at abac.net. That looks like a hosting company somewhere. One thing I can tell you though is that that server has zero to do with mailinator. The spam email never ever touched the mailinator server. So even if I devoted my life to stopping this email, there's nothing I could do.



Return-Path: ronb@mailinator.com
Received: from 216.55.169.94 (216-55-169-94.dedicated.abac.net
216.55.169.94)
by smtpin4.mail.de.uu.net (8.14.1/8.14.1) with SMTP id n083RPV6001157;
Thu, 8 Jan 2009 03:27:26 GMT
Message-Id: 200901080327.n083RPV6001157@smtpin4.mail.de.uu.net
From: "RON" ronb@mailinator.com
Reply-To: "RON" ronb@mailinator.com
To: xxxxxxxxxxxxxx --> edited



This is sort of similar to a phishing attack. Someone gets an email from their bank, then goes to the phish site, then loses all their money. In truth their bank had nothing to do whatsoever with any of that but the bank still gets blamed.

The saddest part for me is that even after I respond to people showing them the real culprit, its not uncommon for them to stay mad at me. I suppose its because they then don't know who they're going to yell at now and I'm still available for the job.

Mailinator is about letting you protect your real email address. It might be to prevent spam but at times it might even be to receive spammy email they really want (just not at their primary address).

Regardless what you use it for, it won't email you. It just doesn't do that.

Plenty of people threaten to blacklist Mailinator from ever sending them email again. Yes, please do! As I've said in the past, feel free to put mailinator.com on the tippy-tippy-top of all your spam blacklists. Mailinator doesn't send any email at all - so you can be sure any email that looks like it came from a mailinator address is forged. And I'll sleep just fine if such email gets blacklisted.