Props  |  Art of War  |  Leadership  |  Management  |  Software  |  Security  |  Project Management  |  Music  |  Movies  |  Stuff  |  Info
 
dragonCrew Topics
Truism:
Good, Fast, Cheap. Pick any two - you can't have all three.

Poor management can increase software costs more rapidly than any other factor.

Latest Developer Updates from MySQL AB

 
Prev  Previous |  Next  next  

MySQL


MySQL is finally growing up and becoming a serious "enterprise" player. MySQL is having a major impact on the business model and ultimately the pricing of the big boys or in business speak, the Tier 1 companies in the database space.

Who Uses MySQL?

The commercial database big boys have created the FUD factor when it comes to MySQL. For example, it doesn't scale. Well, here's the proof MySQL does scale and on the cheap. Hardware not included!!!

  • Google uses MySQL for AdSense and AdWords
  • Yahoo
  • Omniture runs over 250 billion transactions per quarter on a farm of MySQL servers
  • Flickr
  • YouTube - checkout these stats
  • Craigslist
  • Habbo Hotel
  • Technorati
  • Digg
  • Second Life
  • FeedBurner
  • Wikipedia
  • Travelocity
  • Weather.com
  • HotorNot
  • Nokia and Alcatel build mobile phone networks that run on MySQL
  • Mars rover - huh - go figure
"By 2010, Global 2000 IT organizations will use open-source products in 80 percent of their infrastructure-focused software investments".
Gartner, Gartner – 2005

Jay Pipes' MySQL Performance Tuning Best Practices presentation at Google.

MySQL 5

Key Features

Some features listed below were actually added in 4.1!

  • Stored Procedures and Functions
  • Triggers
  • Views
  • Subqueries
  • Cursors
  • Information Schema
  • Archive Storage Engine
  • Clustering
  • Replication
  • Instances

Creating a Cluster

This HOWTO was designed for a classic setup of two servers behind a loadbalancer. The aim is to have true redundancy - either server can be unplugged and yet the site will remain up.

Notes:

You MUST have a third server as a managment node but this can be shut down after the cluster starts. Also note that I do not recommend shutting down the managment server (see the extra notes at the bottom of this document for more information). You can not run a MySQL Cluster with just two servers And have true redundancy.

Although it is possible to set the cluster up on two physical servers you WILL NOT GET the ability to "kill" one server and for the cluster to continue as normal. For this you need a third server running the managment node.

I am going to talk about three servers:

mysql1.domain.com 		192.168.0.1
mysql2.domain.com 		192.168.0.2
mysql3.domain.com 		192.168.0.3

Servers 1 and 2 will be the two that end up "clustered". This would be perfect for two servers behind a loadbalancer or using round robin DNS and is a good replacement for replication. Server 3 needs to have only minor changes made to it and does NOT require a MySQL install. It can be a low-end machine and can be carrying out other tasks.

Alex Davies, HOWTO set up a MySQL Cluster for two servers – February 8, 2005

From SD Times in July 2004

Database Marketshare

MySQL 4.0

The Good

  • FREE!! Me Likes!!!
  • Tons of free tools
  • Pretty Fast
  • Supports multiple Operating Systems
  • Not bloated with stuff you're never going to use - like Oracle. Sorry Larry
  • Tons of documentation and forums
  • Tons of ISP support on the cheap

The Bad

  • No stored proc's - me-ouchhhh - what's up with that?
  • No UNION - oh well
  • No sub-queries. May be a good thing especially if rooksters start using them
  • No Table Partitioning
  • Too many data types. Well, let me clarify. Why have a tinyint where you can define the number of bytes to be the same as an integer. Example, tinyint(1) and integer(1). I don't get it.

Pet Peeves

  • ADOdb is so bloated. It's going to end up like Java. Objectified to death. But there is a minimalist install instructions.
  • Developers who design databases and tables like business users do with Microsoft Access. Here's a true story. A "programmer" created a text field and used it for storing dates and then used it for reporting. Nice. What's even funnier is he never even indexed the field. But in his defence, he was only being paid to program. He didn't last long though.
  • ProgressSQL is actually technically better as a Open Source database. Well, it gives MySQL something to shoot for.
  • Hard core normalization freaks. Have you ever heard about speed let alone denormalization?
  • C like naming conventions

Tips

Article Type Description
Configuring Apache for Maximum Performance Article Apache server performance can be improved by adding additional hardware resources such as RAM, faster CPU etc. But, most of the time, the same result can be achieved by custom configuration of the server. This article looks into getting maximum performance out of Apache with the existing hardware resources, specifically on the Linux systems.
MySQL Basics Article Has good stuff on Windows
Users are Evil (or, How to Protect Yourself From SQL Injection) Article Be Paranoid and take practical meaures
SQL Injection Attacks by Example Article "SQL Injection" is subset of the an unverified/unsanitized user input vulnerability ("buffer overflows" are a different subset), and the idea is to convince the application to run SQL code that was not intended. If the application is creating SQL strings naively on the fly and then running them, it's straightforward to create some real surprises.
Pros and Cons of MySQL Table Types Article Of all the positive things that MySQL brings to the table, probably the most overlooked is multiple table types. This facet of the application is overlooked as a feature and more importantly is overlooked at design time.
Comparison of different SQL implementations Article information relevant for people who are porting SQL from one product to another and/or are interested in possibilities and limits of 'cross-product' SQL.
How-To setup a MySQL replicating cluster Article This guide is designed to help do the initial setup on a MySQL cluster in which multiple MySQL servers all serve the same content through the use of the replication function. Make sure server versions are the same, see Replication Compatibility Between MySQL Versions
Securing a MySQL Server on Windows Article There is no patch required to prevent future exploits. Forbot acted by exploiting poorly configured MySQL installations that had been installed with no root password or with a weak root password. Some examples of the passwords Forbot tried to use to access the MySQL root account include abcd1234 and 654321
Securing MySQL (Linux) Article The measures we'll talk about below will enable you to better secure your database, but be sure to secure the underlying operating system too.
How To Set Up Database Replication In MySQL Article This tutorial describes how to set up database replication in MySQL. MySQL replication allows you to have an exact copy of a database from a master server on another server (slave), and all updates to the database on the master server are immediately replicated to the database on the slave server so that both databases are in sync. This is not a backup policy because an accidentally issued DELETE command will also be carried out on the slave; but replication can help protect against hardware failures though.
Building a High-Availability MySQL Cluster Article Hold on before you decide to implement MySQL cluster, as it certainly doesn't suit all jobs yet. The MySQL NDB engine currently runs its database completely in memory. This means that you have to be able to fit your database in memory. If you have a 1GB dataset and you want to have your data spread over two nodes, you need 1GB of memory per node. If you have four nodes, you can deal with 512MB per node. The first thing that becomes clear is that you have to look at your dataset. Giant data-warehousing databases won't fit into the MySQL NDB engine yet.
Tags: Database schemas Blog Discussion of options and solutions for creating a better del.icio.us schema
Comparing MySQL performance Article Detailed comparison of MySQL performance under a number of different operating systems, including Linux, FreeBSD, NetBSD, OpenBSD and Solaris using the Super Smack and SysBench benchmarking utilities.
Storing images in MySQL Article Benchmark of using the DB and not using the DB.

Here is an apposing view from Kristian Köhntopp, Serving Images From A Database.

Naming Convention Article Wow, someone who hates camelCase!! Sweet
Flickr.com - PHP/mysql case study Article Must Read and don't forget to download the PDF! Tips: don't use HtmlEntities(). Also, Javascript has patchy Unicode Support
MySQL Gotchas Article This is not an "anti-MySQL" list, although it does contain critical comments
MySQL Optimization, part 1 Article While optimization is possible with limited knowledge of your system or application, the more you know about your system, the better your optimization will be. This article, the first of two parts, covers some of the different points you will need to know for optimizing MySQL. It is excerpted from chapter six of the book MySQL Administrator's Guide, by MySQL AB (Sams, 2004; ISBN: 0672326345)
MySQL Optimization, part 2 Article Covers Optimizing Database Structure, How MySQL Uses Indexes, Shared Key Cache Access, Restructuring a Key Cache, Tuning Server Parameters, How MySQL Uses Memory and Using Symbolic Links for Databases on Unix
Best MySQL admin tool Blog Navicat, SQLyog, DBManager, MySQL Administrator, MySQL Front and at least 10 others

SQL 2003 standard

A review of how PostgreSQL, DB2, MS SQL, MySQL and Oracle implement the SQL 2003 standard. Comparison of different SQL implementations.

2005 MySQL Application of the Year

The MySQL Application of the Year is awarded to the developers of some of the best production applications that showcase MySQL's innovative features and provide tangible benefits to their organization and its users.

  • CNET Networks uses MySQL to store FULLTEXT searchable data for many of their Websites' search applications.
  • Friendster uses MySQL to serve more than 85 million dynamic page views per day and is capable of supporting over 1.5 billion MySQL queries per day.
  • Wikipedia is the most popular wiki in the world, operating in more than 50 languages. Wikipedia uses MySQL to process more than 200 million queries and 1.2 million updates per day.

Just too funny

In the real world, non-technical people are making technical decisions. Here is the classic result for AdWords. It's nice to know that no one is immune from such follies, not even the mighty Google!!! Thanks to Xooglers, ex-Googlers, for exposing the truth.

AdWords was built using the MySQL database, which is open-source and therefore available for free. It is by now also nearly as full-featured as the best commercial databases, but back in 2000 this was not the case. MySQL was quite a capable system, but missing a few (what some would consider basic) features. These missing features were obviously not a show-stopper, as we managed to get AdWords to work without them, but in a few cases it did take some extra programming to work around one of these missing features. On the plus side, MySQL was fast and reliable and, as I have already noted, free.

We finally decided to go with a commercial database (I won't say which one) over the objections of a number of engineers, including myself. To ease the transition it was decided to convert AdWords over to the new system first, and to do the main ads system later. It was a project on a par with the internationalzation effort in terms of the tedious work required to comb over nearly all of the AdWords code and change all of the database queries. (Databases are supposed to all be compatible with one another, but in reality they pretty much aren't.)

To make a long story short, it was an unmitigated disaster. The new system was slower than molasses in February. Some heroic optimization efforts eventually produced acceptable performance, but it was never as good as the old MySQL-based system had been. For a long time we were stuck with the worst of all possible worlds, with the two ads systems running on two different databases. It was still that way when I left Google in October of 2001, but I have heard through the grapevine that they eventually went back to MySQL. (Since then, MySQL has added many of the features that had been missing at the time.)

The moral of the story is that sometimes, and in particular with free software, you get more than what you pay for. There are a lot of companies out there paying dearly for commercial databases (and operating systems for that matter). As far as I'm concerned they might as well be flushing that money down the toilet. Actually, they might be better off. We certainly would have been.

Xooglers, Let's get a real database – December 9, 2005

War Stories

O'Reilly has started a great series on Database War Stories. O'Reilly basically asks the new kings of the hill about their existing database infrastructure and challenges.

Flickr

Cal Henderson of Flickr. "Flickr has total stored unique data : 935 GB and total stored duplicated data : ~3TB." Keep in mind these numbers are only of the database and do not include the photos and thumbs. Me-ouchhh!

tags are an interesting one. lots of the 'web 2.0' feature set doesn't fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that's slow to generate, but if it's so expensive to generate that you can't ever regenerate that view without pegging a whole database server then it's not going to work (or you need dedicated servers to generate those views - some of our data views are calculated offline by dedicated processing clusters which save the results into mysql).

federating data also means denormalization is necessary - if we cut up data by user, where do we store data which relates to two users (such as a comment by one user on another user's photo). if we want to fetch it in the context of both user's, then we need to store it in both shards, or scan every shard for one of the views (which doesn't scale). we store alot of data twice, but then theres the issue of it going out of sync. we can avoid this to some extent with two-step transactions (open transaction 1, write commands, open transaction 2, write commands, commit 1st transaction if all is well, commit 2nd transaction if 1st commited) but there still a chance for failure when a box goes down during the 1st commit.

Tim O'Reilly, Database War Stories #3: Flickr – April 27, 2006

craigslist

Eric Scheide of craigslist. After reading this, CTO's and CIO's from Fortune 1000 companies will start scratching their heads wondering how craigslist can manage all this with only 19 people!!

all database machines are on 64 bit linux boxen/ 14 local drives with 16gig of ram.

craigslist runs clusters of dbs for each of our various services: forums: 1 master and 1 slave (mostly for backup) myIsam tables everywhere. DataDir size including indexes 17G. Largest table is approaching 42 million rows.
classifeds: 1 master and 12 slaves. We have various flavor of slave databases. we have teamreader, longreader, thrashbox, for backups and very long adhoc queries and a few extra boxen. At times we have an offsite slave incase the colo goes dark. Currently this is on hold until we get a bigger pipe to our office location. Current footprint including indexes 114G, 56 million rows in the largest table (it's time to archive some of those oh yes it is) yesterday we wrote 330000 new rows to this table; Myisam everywhere, mostly because it works.
ArchiveDB: 1 master 1 slave. holds all craiglsist postings older than about 3 months. Looks very similar to classifieds except bigger. 238Gigs, 96 million rows. Oh yea we use merge tables all over the archive spliting data into more managable chunks. We may do this in production soon.
searchdbs: 16 of these in 4 clusters. We take live postings and split them by area/category type (sfbay/housing) and then use myisam full text indexing. each cluster only contains a subset of all positngs. We find the right host/table in software. This runs good, but do not think this solution will scale for much more than year. Indexing is expensive and we have a lot of churn.
Authdb: 1 master and 1 slave. smallish.
a few smaller "junk" db's that have transient data.

Tim O'Reilly, Database War Stories #5: craigslist – April 28, 2006

Checkout the slides about Google's BigTables and it's home grown Google File System.

 

MySQL B-Roll Collapse

Tagging Collapse

Tools Collapse

Classes Collapse

Optimization Collapse

Resources Collapse

Data Modelling Collapse

Rollyo

Make Poverty History
 

 
Top of Page