Monday, November 19, 2007

Filling a need: a (true) Java Online Regular Expression tester

I found myself doing a lot of little tests spawned by a bunch of regular expression tweaks.

There are of course some existing online regex testers, but none I found seemed bent towards
Java (right off, I was switching back and forth between code and the online tester, and none
I found supported the idea that Java Strings require double-backslashing).

So I put one together this weekend (beta) at:

http://www.testregex.com

The exact matcher is quick, dirty and run in Javascript. The other buttons are actual
RPCs back to the server. Thus, they really do run "in Java". This has already been handy
for me. If you find it useful, bookmark it !

Thursday, November 1, 2007

Introducing the Mailinator Widget !

Want to be the envy of your friends? Gain the admiration of your peers? No problem.

Put one of these shiny Mailinator widgets in your webpage and off you go! The box below monitors the foobaz@mailinator.com (it auto-updates) but you can customize the box to monitor any Mailinator inbox you like (as usual, probably best to pick an unusual name if you want to avoid lots of spam).

The widget is definitely in beta right now - so please let us know about any bugs!

Wednesday, October 31, 2007

new Mailinator alternate domains - (including putthisinyourspamdatabase.com)

With the site redesign we hadn't gotten around to re-add the alternate domains to Mailinator (i.e., these are domains that are equivalent to @mailinator.com, so sending email to duke@putthisinyourspamdatabase.com is the same as sending email to duke@mailinator.com).

The delay was in part this was due to all the new domains we added. Well, here is the list. Note this is NOT the complete list - because in general, we don't know what the complete list is. Anyone (even you!) can point any domain you like at mailinator.com and it will happily accept the email sent to that domain. So if you have a domain or two laying around - feel free to point it to us! (and let us know!)

Here's the list ! (note: capitalization doesn't matter, just added for effect)

anythingYouWant@mailinator.com is the same as:

anythingYouWant@PutThisInYourSpamDatabase.com
anythingYouWant@ThisIsNotMyRealEmail.com
anythingYouWant@binkmail.com
anythingYouWant@SpamHerePlease.com
anythingYouWant@SpamHereLots.com
anythingYouWant@SendSpamHere.com
anythingYouWant@chogmail.com
anythingYouWant@SpamThisPlease.com
anythingYouWant@frapmail.com
anythingYouWant@obobbo.com
anythingYouWant@devnullmail.com

We'll get these on the site very soon.

Monday, October 8, 2007

Mailinator gots new shoes!

Wednesday, September 26, 2007

And then there were 12 (million emails per day that is)

Sorry for the long silence, but it should definitely not be construed as a lack of activity. We've been very busy working on the "next big thing", not to mention some fun stuff for Mailinator.

And just in time, Mailinator's user count is at an all time high. And so is the spam count! We hit an all-time high of an 12million emails/day this week. Thats 500k emails an hour, 8333 emails a min, and 138 emails a second. Still of course on our single mailinator server (holding up several thousand simultaneous connections).

Damn, you go spammers.

Sunday, May 20, 2007

Japanese Spammers got it going on

So I've been up doing some *looong* overdue fixing on handling international character sets in Mailinator. Seeing some international spam, I must say, English-language spammers got nothing on Japanese spammers.

These guys take spamming as an art form!

I cut and pasted one below - actually, I'm not exactly "sure" its a spam, but it sure is pretty :)

-----------------------------------------------------------

Inbox: azz
From: eaglebomber@one-crest.com
Subject: ワケあって欲求不満なんで
Charset: ISO-2022-JP


"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#
■ 今月のイチオシ! ■■■ 中出し・顔写・緊縛・エトセトラ… ■■■■■
■■■■■■■■■■■■■ あなたの求める全てのエロが    ■■■■■
■■■■■■■■■■■■■   ここでなら、体感できます!!!!!■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■ ┏━━┳┓ ┏┓  ┏━━┳━━┓┏━━┳━━┳━━┳━━┓ ■■
■■ ┃┏┓┃┃ ┃┃  ┗┓┏┫┏┓┃┃━━┫┏┓┃━━┫━━┫ ■■
■■ ┃┏┓┃┗━┫┗━┓┏┛┗┫┗┛┃┃┏━┫┏┓┫━━┫━━┫ ■■
■■ ┗┛┗┻━━┻━━┛┗━━┻━━┛┗┛ ┗┛┗┻━━┻━━┛ ■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ ┏┓  ┏━━━┓┏┓ ┏┓┏┓┏━┓  ┏━━┓┏━━━━━━┓
■┏┛┗━━┓━━┓┃┃┃ ┃┃┃┃┗━┛┏┓┃┏┓┃┃┏━━━━┓┃
■┗┓┏━┓┃  ┃┃┃┃ ┃┃┃┃   ┃┃┃┃┃┃┃┃┏━━┓┃┃
■ ┃┃┏┛┃  ┃┃┃┃ ┃┃┃┃   ┃┃┗┛┃┃┃┃┃┏┓┃┃┃
■ ┃┃┗━┛  ┃┃┃┃ ┗┛┃┃   ┃┃  ┃┃┃┃┃┃┃┃┃┃
■ ┃┗━━┓ ┏┛┃┃┗━┓┏┛┃┏━━┛┃ ┏┛┃┃┃┃┗┃┃┃┃
■ ┗━━━┛ ┗━┛┗━━┛┗━┛┗━━━┛ ┗━┛┃┃┗━┗┛┃┃
■     直  メ  即  会  い  掲  示  板   ┃┗━━━━┛┃
■ ・・・………━━━━━━━━━━━━━━━━━━━┗━━━━━━┛
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■                           ■■■■
■■■■     http://adap.jp/main.php?adv=LP18325      ■■■■
■■■■                           ■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■                              ■■■■
■┌───────────┐                  ■■■
■│   セフレリンクが  │┏━━━━━━━━━━━━━━━━━━━┓
■│必ずヤレル!?つのワケ│┃                   ┃
■└───┐┌──────┘┃ ?アドレス・電話番号公開掲示板だから ┃
■    ││       ┃  自由に連絡がトレル!!        ┃
■    ││   │\  ┃                   ┃
■    ││   │ \ ┃ ?目的が決まっている人ばかりなので、 ┃
■    │└───┘  \┃  スグ待ち合わせデキル!!       ┃
■    └────┐  /┃                   ┃
■         │ / ┃ ?検索機能も充実で、初心者でも簡単に ┃
■■        │/  ┃  利用デキル!!            ┃
■■■           ┃                   ┃
■■■■          ┃ まずは無料で、即会いセックスを   ┃
■■■■■          ┃             お試し下さい!!┃
■■■■■■        ┗━━━━━━━━━━━━━━━━━━━┛
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


┏━━┓                本┃日┃即┃ハ┃メ┃!!┃
┃\/┃ アドレス公開BBS      ━┛━┛━┛━┛━┛━┛
┗━━┛━━━━━━━━━━━━━━━━━━━━━━━━………・・・
                     TEL & アドレス公開   
 名前:Dr.SHIHO さん   ┏━┫   ┏━━┳━┳┓┏┳┓
                 ┃□┃━━┓┃┏┓┃┃┃┃┃┃┃
 年齢:29 歳          ┃==┃\/┃┃┗┛┃┃┃┃┣╋┫
                 ┗━┛━━┛┗━━┻┻━┛┗┻┛

                                  
 はじめまして、しほって言います。1人しか経験がないので色々教えてく
 れるセフレを探してます。


  ┏━━┓    ┏━━━━━━━━━━━━━━━━━━━━━┓
 ┏┛ ━┻━━┓ ┃       直接メール!!        ┃
 ┛  ━┳━━┛ ┃                     ┃
    ━┫    ┃  http://adap.jp/main.php?adv=LP18325  ┃
 ━┓ ━┫    ┗━━━━━━━━━━━━━━━━━━━━━┛
  ┗━━┛      
・・・………━━━━━━━━━━━━━━━━━━━━━━━━━━━━




      実際のサイトはこんな感じ!! レッツ・ヌプヌプ♪      

 簡┃単┃?┃S┃T┃E┃P┃
 ━┛━┛━┛━┛━┛━┛━┛────────────────────
     あなたの選んだタイプにマッチする女性を紹介します。     
       簡単?ステップで、好みの女性を今すぐ!!         

              ▼▲▼▲▼▲▼              
               ▼▲▼▲▼
                ▼▲▼                
                 ▼                 
                                   
             ┏━┓
┏━━┳━━┳━━┳━━┓┗┓┃
┃┏━┻┓┏┫━━┫ □┃ ┃┃    好みのタイプを選んで下さい!! 
┗━━┃┃┃┃━━┫┏━┛┏┛┗┓┏┓     ※欲張り過ぎに注意!! 
┗━━┛┗┛┗━━┻┛  ┗━━┛┗┛     
 ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄
 ┌┐      ┌┐      ┌/      ┌┐
 └┘ ★人妻  └┘ ★巨乳  〆┘ ★SM  └┘ ★セレブ   

 ┌/      ┌┐      ┌┐      ┌┐
 〆┘ ★20代  └┘ ★30代  └┘ ★40代  └┘ ★中出し   

           ※チェックは例です。ここでは、20代とSM♪♪♪♪

 
             ┏━━┓
┏━━┳━━┳━━┳━━┓┗━┓┃
┃┏━┻┓┏┫━━┫ □┃┏━┛┃   あなたのことを教えて下さい!! 
┗━━┃┃┃┃━━┫┏━┛┃━━┓┏┓ 
┗━━┛┗┛┗━━┻┛  ┗━━┛┗┛                
 ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄

 あなたの名前と年齢を教えて下さい。───────────────  
   │\               │\             
  ┌┘ \ ニックネーム      ┌┘ \ 年齢         
  └┐ / ┌───────┐   └┐ /  ┌──┐      
   │/  │       │さん  │/   │  │歳     
       └───────┘         └──┘      

 お住まいと、ログインに使用するパスワードを決めて下さい。────  
   │\               │\             
  ┌┘ \ お住まい        ┌┘ \ パスワード      
  └┐ / ┌───────┐   └┐ /  ┌───┐     
   │/  │       │    │/   │   │     
       └───────┘         └───┘     

             ┏━━┓
             ┗━┓┃
┏━━┳━━┳━━┳━━┓┏━┛┃
┃┏━┻┓┏┫━━┫ □┃┗━┓┃                  
┗━━┃┃┃┃━━┫┏━┛┏━┛┃┏┓ ボタンを押すだけ!!!     
┗━━┛┗┛┗━━┻┛  ┗━━┛┗┛                
 ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄
   ┏━━━━                          
   ┃┏━━━━━━━━━━━━━━━━━━━━━━━━━┓   
   ┃┃                         ┃   
   ┃┃   あなたの好みの女性からメールを受け取る!!  ┃    
   ┃┃                         ┃   
     ┃   http://adap.jp/main.php?adv=LP18325     ┃   
    ┃          ┏┓             ┃┃  
     ┃          ┃┃             ┃┃  
    ┃         ┏┫┣┳┳┓          ┃┃   
    ┗━━━━━━━  ┃┫┃┃┃┃ ━━━━━━━━━┛┃  
              ┃    ┃        ━━━━┛  
               ┗┓  ┏┛ ポチッ!! 

     あとは、2人が決める待ち合わせの場所まで一直線です!     
     あなたも人生が変わるような出会いを体験してください!!     

Saturday, May 5, 2007

Nonblocking-Reader/Blocking-Writer in Java

So it's Saturday. I should be off like chasing women or motorcycle racing or something, but instead I'm at home obsessing over non-blocking threadsafe algorithms.

(yeah. again.)

In particular, I had an email exchange with (my ex-Ph.D.advisor) Doug Lea regarding the java.util.concurrent.lock's ReentrantReadWriteLock (which Doug also happen to be the author of). The beauty of this class is that it provides a solution to the classic Reader/Writer problem. That is, it allows many threads interesting reading something to operate simultaneously (as reading does not conflict with reading). However, it also allows you a writer-lock that when switched on, blocks all readers and other writers. This follows the assumption that writing could corrupt simultaneous readers or writers.

This wasn't very easy in Java prior to JDK 1.5.

Our discussion was over a microbenchmark I wrote to prove to myself how great it was that I could have a flock of readers all banging on a map (or something) simultaneously. I wrote (or borrowed) a standard regularly synchronized map wrapper:

public synchronized V put(K key, V value) {
  return map.put(key, value);
}

public synchronized V get(Object key) {
  return map.get(key);
}


That version will have readers waiting for both writers and other readers. And one that took advantage of the new Reader/Writer lock:


ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
Lock readLock = lock.readLock();
Lock writeLock = lock.writeLock();

public V put(K key, V value) {
  writeLock.lock();
  try {
    return map.put(key, value);
  } finally {
    writeLock.unlock();
  }
}

public V get(Object key) {
  readLock.lock();
  try {
    return map.get(key);
  } finally {
    readLock.unlock();
  }
}


Here, readers will not wait for other readers, however they will wait for writers (and writers wait for everyone).

This example does do better, but maybe not as "better" as I would think. Pounding the code with 50 simultaneous readers runs about 6.5 times faster (i.e., I got 6.5 times as many operations in a 1 second interval) than the fully synchronized version.

Interestingly, with 5 readers and 20 writers the synched version does about 10000 writes and 1200 reads. The reader/writer version does 300 writes and 10000 reads (sort of the opposite). In other words, the writers (probably expectedly) really get hurt in the reader/writer configuration. (If you're complaining microbenchmarks suck, you can stop. Highly contended parallel access to a map is pretty real - and if nothing else, I was really just trying to make threads fight, and this seemed like a reasonable way to do it).

Doug quickly pointed out that the reader/writer lock has a fair bit of overhead as compared to a simple synchronized method and that the map operations are so fast, they simply don't hold the lock long enough to cause much contention.

I probably should have quit there and played some more desktop tower defense, but I tried something else. Of course it seems that (in true double-checked locking form) if a reader can know that no writers are about, then there's no need for it to do any locking at all. Thus I played around awhile with a non-blocking threadsafe reader. This didn't go so well as I found counterexamples to every version of my code right as I typed the final semi-colon.

Then last nite I woke up at 3am (yeah, I know) and wrote this:


AtomicInteger writeCounter = new AtomicInteger();

public V get(Object key) {
  int save = 0;
  V value = null;
  do {
    while (((save = writeCounter.get()) & 1) == 1);
    value = map.get(key);
  } while (save != writeCounter.get());
  return value;
}

Lock lock = new ReentrantLock();

public V put(K key, V value) {
  lock.lock();
  try {
    writeCounter.getAndIncrement();
    map.put(key, value);
    writeCounter.getAndIncrement();
  } finally {
    lock.unlock();
  }
  return value;
}


The write lock used in the put method is pretty standard. Only one writer can be active at any time. The reader code (i.e., the get method) is the tricky part. Effectively, a reader won't start while it knows that a writer is active. And, once its done it checks again to see if a writer is now active or was while it was running. If so, it throws away its result and tries again. In theory, a reader might reread the same value 1000 (or infinite) times before its satisfied that the data is good. In practice, it only happens a very few times.

In threadsafe (blocking) code we're used to coding such that all important code can only happen when its safe. Nonblocking code works more like a "teenager model" - it says "Let's try it and then when we're done, we'll see if what we did was ok. If not, we'll try it again until it is ok."

If the very busy looking do or while loops seem odd to you, they're quite normal in non-blocking threadsafe code. See Brian Goetz's great introduction to nonblocking algorithms. In the right circumstance (as Brian is happy to point out), nonblocking code can smoke blocking code even if it ends up busy looping awhile. And of course, still be 100% correct. Of course, the latter statement is often difficult to make happen.

I love this kind of all hell's breaking loose code. I created some tests to try to expose a race condition and so far it looks good. I ran this idea by Jeremy Manson ("Dr. Memory Model") and he confirmed for me that the use of atomic ints above will cause memory consistency (keep in mind, Jeremy is not a big fan of lock-free algorithms it seems). My empirical tests verified it so far - I can't make this code race (I can easily make races by changing just tiny little bits). I'm going to play with some benchmarks and see what I can drum up.

(Edit: Honestly, if you're looking for a non-blocking hashtable the above is thought-provoking but not terribly clever. You'd be wise to check out Cliff Click's Nonblocking Hashtable implementation. Its more comprehensive, proven, and is very likely faster.)

Thursday, April 26, 2007

Need a Server? Save $100 using Serverbeach code BA83A6U6H2

The lonely little Mailinator server runs at Serverbeach and their referrral program has helped immensely over time covering our server costs.

If you're in the market to rent a server, check out Serverbeach, and if they meet your needs, use our referral code BA83A6U6H2. You'll save $100 and Mailinator will get an account credit.

It's like spam fighting for free :)

Friday, April 20, 2007

Mailinator Alternate Domains

There is a perpetual list of alternate domains that receive mailinator email. That is, if you find a site bans mailinator.com (after several years, its surprising how few actually do) you can use one of these instead - then still check the email at www.mailinator.com:

* mailinator2.com
* fuckingduh.com
* sogetthis.com
* mailin8r.com
* mailinator.net
* klassmaster.com

Tuesday, April 10, 2007

Schaa-wing! - Mailinator on N3rd.tv

I have no fricken idea what these guys are saying, but how cool is this!?

N3rd.Tv Episode 0x1E

Saturday, March 31, 2007

Redditted and Digged

My Mailinator Architecture article was on reddit and digg today.

These were posted up there when the article was originally written but didn't get pushed to the front. Someone resubmitted apparently and off they went.

I've gotten a ton of nice comments - thanks folks!

Friday, March 30, 2007

Get any Spam about winning an HDTV lately?

We sure did.. check these guys out.

85h38m54s 529120 Enter to win a 58" HDTV from XEROX

Sent us 529 thousand emails over the last 85 hours. Check our Subjects page to see the realtime tally.
(If its not there anymore, then they've finally stopped sending)

Wednesday, February 28, 2007

Spammer buggy software

I just saw these on the subjects page. Seems some spammers script isn't running right - instead of grabbing random words likes it supposed to, its just sending the command it was going to use.

15m10s 21 RANDLINE[./content/words1.txt]
14m3s 19 RANDLINE[./content/words1.txt] RANDLINE[./content/words2.txt]

Hehe. Oh well.. no pump and dump today guys.

Tuesday, February 20, 2007

Received over 5million emails today

It is at 5,178,247 at the moment to be exact. Thats 59 a second. I'm really starting to wonder what the limit is (I'm sure some DOS'r will let me know someday :) ).

The eensy little single Athlon 2100 server is *still* not breaking a sweat.

EDIT: Yeah, that was a short lived record - one day later, we're at 6.1million

Sunday, February 18, 2007

Incoming nasty subject page up & running

As I mentioned before, its at http://www.mailinator.com/subjects.jsp.

I ironed out a few kinks (keeping around spam longer than mailinator wanted to was harder than I thought, the system is pretty draconian about deleting spam).

Anyway - enjoy!

Tuesday, February 13, 2007

What Spam is Mailinator getting - RIGHT Now?

If you read the Mailinator architecture article (post below), you know that Mailinator filters lots of emails in order to handle the millions it gets each day. Now, one way to do that is to filter bulk email. After all, the purpose of Mailinator is to get an email here or there for signing up for stuff or whatever - typically, you don't care about reading the latest in bulk email (as I'm sure you get enough of that in your own email account).

So, if enough emails with the same subject comes in, in a short enough period of time, we stop accepting them. And keep a note that we shouldnt receive these emails anymore. Of course, converting that information from a crufty data structure to a shiny web page is just an hour of hacking away. So, here's the webpage:

Web Page: Banned Subjects








In SystemCountBoxSubject
2h23m26s1019lmyfaqnRE: Marcenaria: Curso Completo com
2h23m23s953oeonlygchlmdyhCialis Shop
2h23m34s924nahvzovamktiViagra Shop
2h23m32s399semsprjskinumsWe are ready to give you a loan
1h2m31s380shadow007Short 30 second form
2h23m26s375barbaraamRefinance approved

This page is in BETA! It will change some. Its also only updated every minute or so.

Note:

Some of these emails get totally deleted (instead of a few initial copies that are entered before we figure out we're about to get about 9 zillion of them). So if you goto a mailbox looking for one of these spams and its empty, its because the email got deleted for other reasons (i.e., its a bounce message, it had a trigger word ("viagra", "schoolgirl", etc) in its title).

Also, the time "In System" indicates when we first saw that message at all. So if it says like 35minutes, then we've been receiving a steady stream of those spams for 35 minutes. Once the steady stream stops (for a few minutes), it will leave the list.

Suggestions welcome!
(I do plan to have a xml feed for this eventually, if you'd make use of that, let me know)

Monday, February 12, 2007

Situation resolved

Not sure if it was exactly a hardware issue. Serverbeach replaced the drive and put the old primary as a secondary. I couldnt mount it until i ran fsck on it. After that, it did have a bunch of lost files but otherwise seems to be working ok.

Anyway, Mailinator is back up and de-spamming!

Sunday, February 11, 2007

WTF - harddrive crash

Harddrive went smush today it seems. Serverbeach is on the job replacing it as we speak (or, I guess, as I type). Should be back up shortly thereafter.

It should be fun to watch the email after its up. Usually, if we go down awhile, a lot of email gets "pent up" and volume is notably higher for a day or two.

Thursday, February 1, 2007

Mailinator's 2006 Stats

I'm definitely a stat junkie and the free Google Analytics definitely feeds my habit. Of course, that's just web stats. For email stats, I've kept my own (far shoddier) statistics. Here's some stats:

Web Stats
The number of hits Mailinator gets is surprisingly consistent. Note the below graph is done with Google Analytics, thus it does not count people on browsers with javascript turned off and it does not count RSS hits (which are several 10's of thousands a day - way more than web hits).



The consistency surprises me because of Mailinator's nature. Personally, I use it "now and then" - like when I need it. Maybe (maybe) once a week when I sign up for something. Its not the type of site that you just hang out and browse on. So interestingly, this "now and then" use spreads across its users in a very uniform way.



What's also interesting is how people find Mailinator. It would seem that people just "know about it". A very large percentage simply come directly to it. A smaller but notable percentage get there through Google. Stumbleupon give a surprising number of referrals (in fact, its 10 times the number of yahoo).

Note that Mailinator's Alexa rating averages around 30,000. I've seen it as good as 19,000 and as bad as 60,000 or so in the last year (seems to bounce a lot).

Email Stats
Sorry this section doesn't have such pretty graphs, like I said, this stuff is all home grown and far less pretty. Also - if you remember (from the architecture entry below) Mailinator's SMTP server is home-grown. That's one reason it can handle this volume but alas it is also a work in progress, thus some stats were lost at times - in other words, the following numbers are estimates although I feel they're still pretty representative.

Number of average emails per day: 1.234 million
Number of total emails for 2006: 450.74 million
Percentage rejected for same subject: 8.7%

Number of total emails for 2005: 280.68 million

Note that "same subject" means mailinator got emails with the exact same subject over and over and over (i.e., bulk email).

Obviously, spam is way way up. And honestly, the surge happened mostly in Q4 of 2006. If 2007 keeps on the track its on now, we'll be headed for (get this) - 1.29 BILLION emails for the year.

How's that for lotsa spam?

Another interesting note is that Mailinator got a total number of web users (not counting those with javascript off and not counting RSS users) of about 1.3million visitors for the year. Say we double that to account for non-javascript and RSS, making it 2.6million for the year.

Assuming that each visitor only cared about 1 email each, that means of the 450 million emails we got, only about .5% were actually looked at. Or 172 or every 173 emails was crap that no one wanted!


Monday, January 22, 2007

How much porn is too much?

Friday, January 19, 2007

Spam surge still surging

Crazy crazy. Daily volume has hit a new high of just over 5million emails (with one hourly peak heading for a 6.5million/day rate - or 4513/min or 75/sec).

One interesting phenomenon is the number of socket timeouts mailinator gets. Any SMTP gurus out there? The IPs I've tracked on this very often seem to go to dsl lines and such. I'm thinking these are zombie systems sending at a slow rate to avoid detection.

Now I don't care much if I lose emails from zombie computers (as its always junk no one wants) - but if they really do send slowly - this might be a metric to differentiate such emails on other SMTP servers.

The Architecture of Mailinator

Almost 3.5 years ago I started the Mailinator(tm) service. I got the bulk of the idea from my drunk roommate at the time and the first incarnation took me all of about 3 days to code up. In some senses it was a crazy idea. As far I know, it was the first site of its kind. A web-based email service that allowed any incoming email to create an inbox. No sign-up. No personal information. Send email first, check email later.



This became ridiculously handy for things like signing up for websites that send you one confirmation email, then save or sell or spam your email address forever. And of course, it *is* very handy for users. But think about it from mailinator's side. Its basically signing up to receive spam for that address forever. That's a tall order and one that seems to have the possibility of a terrible demise. Someday, enough email could come in that will simply smush Mailinator. But, as of this writing, that day isn't today.

I have in that 3.5 years received hundreds of "thank you" emails, a pile of "it doesn't work" emails, a radio interview, articles in the Washington Post, New York Times, and Delta Skymiles magazine, 1 call from both Scotland Yard and the LAPD, and a total of 4 subpoenas (1 of those being a Federal Grand Jury subpoena issued by the FBI).

At this point, Mailinator averages approximately 2.5million emails per day. I have seen hourly spikes that would result in about 5million in a day. (Edit: Feb 2007 - One month later we're averaging 4.5million emails a day with spikes over 6million) In addition, the system also services several thousand web users and several thousand RSS users per day.


In the world of email services, this probably isn't all that much. The most interesting part to me is that the complete set of hardware that mailinator uses is one little server. Just one. A very modest machine with an AMD 2Ghz Athlon processor, 1G of ram (although it really doesn't need that much), and a boring IDE, 80G hard drive (Check ServerBeach's Category 1 Powerline 2100 for the exact specs). And honestly, its really not very busy at all. I've read the blogs of some copycat services of Mailinator where their owners were upgrading their servers to some big iron. This was really the impetus for me writing down this document - to share a different point of view.

Mailinator easily handling a few million emails a day wasn't always the case. The initial mailinator system was quite busy. And in fact, got overwhelmed about a year ago when email traffic started topping 800,000 a day (that's my recollection anyway). In an effort to squeeze life out of the
server and as an exercise in putting together some principles I always championed about server development, I rewrote the system from scratch. I have no idea what the current limit of the existing system is, but at 2.5million a day, its not even breaking a sweat.

If you don't know what Mailinator is, take a small tour through the (rather funny) FAQ.

Lossy lossie lossee

There is a very important point to note about the Mailinator service. And that is, that it is indeed - free. Although it might not seem like it, it has an immense impact on the design (as you'll see). This allowed me to favor performance across the board of the design. This fact influenced decisions from how I dealt with detecting spam all the way down to how I synchronized some code blocks. No kidding.

The basic tenet is that I do not have to provide perfect service. In order to do that, my hardware requirements would be much higher. Now that would all be fine and dandy if people were paying for the service. I could then provide support and guarantees. But given its free I instead went for, in order, these two design decisions:

1) Design a system that values survival above all else even users (as of course, if its down, users aren't really getting much out of it)
2) Provide 99.99% uptime and accuracy for users.

If you wonder what I mean about "survival" in the first line, it basically means that Mailinator is attacked on literally a daily basis. I wanted to make a system that could survive the large majority of those attacks. Note - I'm not interested in it surviving all of them. Because again, if some zombie network decided to Denial-of-service me - I really have no chance of thwarting it without some serious hardware. The good news is that if someone goes to all the trouble of smashing Mailinator (again referencing the fact we're lossy), I really don't lose much sleep over it. It sucks for my users - but there really isn't anything I can do anyway. I'm not trying to be cavalier about this - I went to great lengths to handle attacks, I'm just saying its a cold reality that I simply cannot stop them all. Thus I accept them as part of the game.

The platform

The original Mailinator used a relatively standard
unix stack of applications including a Java based web application running in Tomcat. Mailinator is and was of course, always just a hobby. I had a day job (or 2) so months of development just was never an option. I chose Java for no other reason than I knew Java better than anything else. For email, it used sendmail with a special rule that directed any incoming email to mailinator.com to one single mailbox.

Sendmail --> disk --> Mailinator <-- Tomcat Servlet Engine

The Java based mailinator app then grabbed the emails using IMAP and/or POP (it changed over time) and deleted them. I should have used an mbox interface but I never got around to implementing that. The system then loaded all emails into memory and let them sit there. Mailinator only allowed about 20000 emails to reside in memory at once. So when a new one came, the oldest one got pushed out.

The FAQ advertises that emails stick around for "a couple of hours." And that was true, but exactly how long mattered on the rate of incoming emails. You'll also note an interesting side effect that since all emails lived in memory, if the server came down - all emails were lost! Talk about exploiting the fact that my service was free huh? This may seem dubious but the code was really quite stable and ran for weeks and months without downtime.

I thought about saving emails into a database of course but honestly, all this bought me was emails that stuck around longer. And, that in and of itself sort of went against my intent for mailinator. The ideas was, sign-up for something, goto Mailinator, click the link, and forget about it. If you want a
mailbox
where emails last a few days, thats fine, but there are many other alternatives out there - that's not what Mailinator is about. I forgot the database idea and now shoot for mails that last somewhere around 3-4 hours.

This all worked fabulously for awhile. It pretty much filled up all 1G of ram of the server. Finally when the incoming email rate started surpassing 800,000 a day, the system started to break down. I believe it was primarily the disk contention between unix mail apps and the Java app locking mailboxes. Regardless, there were many issues with that system that bugged me for a long time. The root of most of those problems really boiled down to one thing - the disk. The disk activity of sendmail, procmail, logging and whatever else was a silly bottleneck. And it needed to go.

More than a year ago now I did a full rewrite. Much of the anti-spam code that I'll describe later was already in this code-base but was improved and extended for the new system.

Synchronous vs. Asynchronous I/O

I've read a fair number of articles on the wonders of asynchronous I/O (java's NIO library). I don't doubt them but I decided against using it. Primarily, again, because I did a great deal of work in multithreaded environments and knew that area well. I figured if I had performance issues later, I could always switch over to NIO as a learning experience.

The biggest thing I knew I needed to do with Mailinator was to remove the unix application components. Mailinator needed to stop outsourcing its email receipt and do it itself. This basically meant I needed to write my own SMTP server. Or at least, a subset of one. Firstly, Mailinator has never had the ability to send email so I didn't need to code that part up. Second, I had really different needs for receiving email. I wanted to get it as fast as possible -or- refuse it as fast as possible.

SMTP has a rich dialog for errors but I chose to only support one error message. And that error is, appropriately enough - "User Unknown". That's a touch ironic since Mailinator accepts any user at all. Simply said, if you do anything that the Mailinator server doesn't like - you'll get a user unknown error. Even if you haven't sent it the username yet.

I looked at Apache James as a base which is a pure java SMTP server but it was way too comprehensive for my needs. I really just found some code examples and the SMTP specs and wrote things basically from scratch. From there, I was able to get an email, parse it, and put it right into memory. This bypassed the old system's step of writing it to disk all the way. From wire to user, mailinator mail never touches the disk. In fact, the Mailinator server's
disk is pretty darn idle all things considered.

Now to address persistence concerns right away - Mailinator doesn't run diskless, but it does run very asynchronously with regards to the disk. Emails are not written to disk EVER unless the system is coming down and is instructed to write them first (so it can reload them upon reboot). This little fact has been very handy when I've been subpoenaed. I simply do not have access to any emails that were sent to Mailinator in the past. If it is possible that I can get an email - so can you just by checking that inbox. If you can't get it then that means its long deleted from memory and nothing is going to get it back.

Mailinator also used to do logging (again, shut-off because of pesky subpoenas). But it did it very "batchy". It wrote several thousand logs lines to memory before doing one disk write. In effect we never want to have contention based on the incredibly slow disk.

Now if this all sounds a bit shaky, as in we might just lose an email now and then - you're right. But remember, our goal is 99.99% accuracy. Not 100%. That's an important distinction. The latest incarnation of Mailinator literally runs for months unattended. We do lose emails once in awhile - but its rare and usually involves a server crash. We accept the loss and by far most users never encounter it.

Emails

The system now is one unit. The web application, the email server, and all email storage run in one JVM.


The system uses under 300 threads. I can increase that number but haven't seen a need as of yet. When an email arrives (or attempts to arrive) it must pass a strong set of filters that are described below. If it gets past those filters it is then stored in memory - however, it is first compressed to save in-memory space. Over 99% of emails that arrive are never looked at, so we only ever decompress an email if someone actually "looks" at them.

Because of this, I am able to store many more emails than the original system's 20000. The current mailinator stores about 80000 emails and uses under 300M or ram. I probably should increase this number as plenty of ram is just sitting around. The average email lifespan is about 3-4 hours with this pool. The amount of incoming email has gone way up, so even by increasing this pool, we're largely staying steady as far as email lifespan. I could probably kick that up to 200,000 or so and increase the lifespan accordingly but I haven't seen a great need yet.

Another inherent limit that the system imposes is on mailboxes themselves. Popular mailboxes such as joe@mailinator.com and bob@mailinator.com get much more email than average. Every inbox is limited to only 10 emails. Thus, popular boxes inherently limit themselves on the amount of email they can occupy in the pool. Use of popular inboxes is discouraged anyway and generally become the creme de la cesspool of spam.

Two more
memory conserving issues is that no incoming email can be over 100k and all attachments are immediately discarded. That latter feature was in years ago but obviously really ruins this whole new wave of image spam (if you see a few seemingly "empty" emails in some popular boxes, they might have been image spam that got their images thrown away).

Spam and Survival

I'd like to emphasize here that Mailinator's mission is NOT to filter spam. If you want penis enlargement or sheep-of-the-month club emails, that's pretty much what Mailinator is good for. We are clear in the FAQ. Mailinator provides pretty good anonymity - but we do NOT guarantee it. We also do NOT guarantee ANY privacy. Its really easier that way for us. Still, it does a pretty damn good job even so. We might log you (used to and it might get turned on again someday, never know) and we DO respond to subpoenas (that whole "jail" thing is a strong motivator).

So, in essence I have no real interest in filtering out spam. I do however, have a great deal of interest in keeping Mailinator alive. And spammers have this nasty habit of sending Mailinator so much crap that this can be an issue. So - Mailinator has a simple rule. If you do anything (spammer or not) that starts affecting the system - your emails will be refused and you may be locked out.

In the new system I created a data structure I call an AgingHashmap. It is, as it indicates a hashmap (String->int) that has elements that "age".

The first type of spammer I encountered was one machine blasting me with thousands of emails. So, now, every time an email arrives, its senders IP is put into an AgingHashmap with a counter of 1. If that IP does not send us anymore email for (let's say) a minute, then that entry automatically leaves the AgingHashmap. But, let's say that IP address sends us another email 2 seconds later. We then find the first entry in the AgingHashmap and increase that counter to 2. If we see another email from that IP, it goes to 3 and so on. Eventually, when that counter reaches some threshold we ban all emails from that IP for some amount of time.

We can put this in words as so (values are examples):
If any IP address sends us 20 emails in 2 minutes, we will ban all email from that IP address for 5 minutes. Or more precisely, we will ban all email from that IP until it stops trying to send to us for at least 5 minutes.

This is really what the AgingHashmap is good for. We can setup some parameters and detect frequency of some input, then cause a ban on that input. If some IP address sends us email every second for 100 days straight, we'll ban (or throw away) every last email after the first 20.

Here's a graph of an average 24 hours of banned IP address emails. Notice at 10am and 11am some joker (i.e., some single IP address) sent us over 19000 emails per hour.



I do have some code that has Java talk back to unix's iptables system to do very hard blocking of IP addresses but its not on right now. Partially because there's no need (yet) and partially because I like to see the stats.

The funny part of this is the error Mailinator gives. Remember the "User Unknown"? Once an IP address is banned and then it tries to open a new connection it will send the SMTP greeting of "HELO". Mailinator will then reply "User Unknown" and close the connection. Of course, it didn't even get the username yet.

Zombies

The next problem came from zombie
networks. Now we were getting spam from thousands of different IPs all sending the same message. We could no longer key in on IP address. As a layer of defense past IP we created an AgingHashmap based on subjects. That is, if we get (again, example numbers) something like 20 emails with the same subject within 2 minutes, all emails with that subject are then banned for 1 hour.

Here's a similar graph. Keep in mind these emails got past the IP filter - so basically they are "same subject" emails from many disparate sources.



You could argue we should ban them forever, but then we'd have to keep track of them and the Mailinator system is inherently transient. Forgetting is core to what it does. This blocking is more expensive than IPs as comparing subjects can be costly. And of course, we have to have enough of a conversation with the sending server to actually get the subject.

Pottymouth!

Finally, we ran into some issues on emails that just weren't cool. As I said, I'm far more interested in keeping Mailinator alive than blocking out your favorite porn newsletter. But, some unhappy people used Mailinator for some really not happy purposes. Simply put, as a last layer, subjects are searched for words that indicate hate or crimes or just downright nastiness.

Boing

Another major influx that happened early on was a plethora of bounce messages. Now thats sort of odd isn't it? I mean Mailinator doesn't send email. In fact, it CAN'T send email so how could it get bounce messages? Well, some spammy type folks thought it'd be neat to send out spam from their servers using forged Mailinator addresses as a return address. Thus when those emails bounced, the bounce came here.

What's worse, is I still get email from people who think Mailinator sent them
spam. Its very frustrating to defend myself against people who are ignorant of how email works ready to crucify me for sending them spam (especially ironic is that I run a free, anti-spam website). As I've said in my FAQ - please feel free to add mailinator.com to the tippy tippy tippy top of your spam blacklists. If you EVER get an email from mailinator.com, its a forged spam.

The good news is that bounces are very easy to detect, and are really the first line of our defense. Bouncing SMTP servers aren't particularly evil, they're just doing their job so when I say "user unknown" they believe me and go away.

On an abstract level, here is what happens to an email as it enters the system.



(and to be fair, there might just be another layer or two thats not on that diagram!)

Anti-Spam revolt

There are 2 more, somewhat conflicting features of the Mailinator server that should be noted. For one, its a clear fact that when we're busy, we're busy. An easy DoS against us would be to open a socket to our server and leave it open. This is an inherent vulnerability in any server (maybe especially multithreaded servers). So, as a basic idea Mailinator closes all connections if they are silent for more than a second or two. Actually, the amount of time is variable (read below). Clearly, we are DoS'able by sending us many many connections, but this blocks at least one trivial way of bringing us down.

Secondly, although we demand servers talking to us are very speedy. We reserve the right to be very NOT speedy. Here's the logic. When Mailinator is not terribly busy, we still demand responses quickly, but we give responses slowly. In fact, the less busy we are, the slower we give responses. It is possible that sending an email into the Mailinator SMTP server could take a very long time (like 10 or 20 or 30 seconds) even for a very small amount of data.

Why? Well.. think about it. Let's say you're spamming. You want to send out a zillion emails as fast as possible. You want every receiving SMTP server to get your email, deliver it to the poor sod who wants (or doesn't want) weener enlargement and then close the connection so you can go on to the next. If you encounter some darn SMTP server that takes 20 seconds to receive your email, the speed at which you can send out your emails diminishes. You might just even think about avoiding such SMTP servers.

It might be a pipe dream to think this is slowing down any spammers, but this does tend to keep my quieter times lasting longer. And it doesn't really hurt me - or my users. And if we eventually get terribly busy, those delays are scaled down to make sure we don't lose any emails.

Sites will ban it

Every time I read some comment about Mailinator, someone always points out something like "Yeah, well sites will start banning any email from Mailinator and then it will be worthless". Guys. Its been 3 years. A handful of sites have indeed blocked email from Mailinator, but my user base and the number of read emails has only gone up. Clearly, people are finding Mailinator more useful than ever.

I have added at times additional domains (like sogetthis.com and fakeinformation.com) that point to mailinator. Often if a site bans mailinator.com proper, you can use one of those to same effect.

Overall

Many copycat sites have appeared over the years which is pretty reasonable. This idea itself is obvious. The only real hurdle was that it seemed impossible to do given the amount of useless email you'd get. But the copycats had the advantage of seeing that Mailinator actually does work, so they knew what to shoot for. Only a few post their daily email numbers but I've yet to see any that come close to mailinator's incoming email (not that this is necessarily a good thing). I also see that many are using an architecture similar to Mailinator's original which is just fine so long as they either don't get any massive increases in email or are happy to keep buying bigger hardware.

Overall, Mailinator has been a great experience. It was a terribly fun exercise in optimization, security, and generally making things work. Thousands of people use it everyday and its amazing how many people know about it when it comes up in conversation. I've thought many times about how to make a business around it, and there is always an angle, but I've just been to busy with other things.

My hope is that its useful for you and that you tell your friends.


Digg!






Note: Eternal thanks to Jack Lawrence of Syracuse, NY who, in a drunken stupor gave me the core idea (story here), Nicci Gabriel of www.sideofsauce.com for the seriously cool web design, and to Brian Pipa of www.candyaddict.com who, as a big fan of Mailinator, added the very cool "Spam Map" and the RSS feeds.