Return to Tips and HOWTOs

Importing MediaWiki Meta Help: Namespace without overwriting content

These are the steps needed to import the MediaWiki help pages from MediaWiki.org

Who this is for

You’ll need to know a bit of MySQL, have a working Java installation, and have a working MediaWiki installation. These instructions are for those who already have content in their MediaWiki, and wish to preserve it. The instructions on MediaWiki (currently) zap your content.

Prerequisites

Java JRE or JDK – Curently requires Sun JDK (see Errors below)
Populated MySQL MediaWiki DB
MediaWiki 1.4.X (This will be updated when I move to 1.5.X)
Mysql 4.1.X

Tools needed

mwdumper.jar – converts XML to SQL to import into a MySQL database
importDump.php (included with MediaWiki 1.5+) – imports database dumps directly

Steps

  1. Download the latest pages dump (pages_current.xml.bz2) from MetaWiki. Download the latest image dump from MetaWiki.
  2. Backup your current MediaWiki DB. REALLY! DO NOT SKIP THIS STEP. At shell:
    mysqldump --all-databases > <date>-full-backup.sql
  3. Create a new empty database in MySQL. At shell:
    mysqladmin -u root -p create <tmp_databasename>
  4. At shell:
    java -server -jar mwdumper.jar --output=stdout --format=sql:1.4
     --filter=namespace:NS_HELP,NS_TEMPLATE
     pages_full.xml.bz2 | mysql -u <username> -p <tmp_databasename>
  5. Delete cur_id from temp database. At shell:
    mysql -u <sername> -p <tmp_databasename>
     -e "ALTER TABLE cur DROP COLUMN cur_id;"
  6. At shell:
    mysqldump -u <username> -p -c -n -t
     --skip-add-drop-table <tmp_databasename> > dropped-wikidb.sql
  7. (optional*) Open dropped-wikidb.sql in vim. Enter the following, after entering a “:”
    %s/TABLE `/TABLE `<specdb>/g
    %s/TABLES `/TABLES `<specdb>/g
    %s/INTO `/INTO `<specdb>/g

    where <specdb> is any MediaWiki instance name you’ve created. Note the backticks – they are NOT single quotes.

  8. At shell:
    mysql -p -u <username> < dropped-wikidb.sql
  9. Check your wiki! Oh, it is b0rked? You DID backup your database, right?

* Only needed if you’ve installed MediaWiki with localized table names

Notes/Shortcomings

What is missing:

  • Images
  • Links to help text outside the Help: namespace

Some weirdness in imported info.I had some odd effects where a Help: namespace link appears orphaned, but, upon clicking it, presnted with Edit box, but with content box filled with appropriate text.

Limitations:

  • Cannot “upgrade” Help /Templates. Since we remove the cur_id as created at the source site (“Meta”), I’m not certain that upgrades are possible. I don’t know if this would upset the database to have a new object imported with a newer cur_id that has the same name/meta-information as an older object. But, this could just reflect more of my ignorance of MediaWiki’s internal structures…

Solutions to try:

  • Find Images! Figure out where MW expects them to be.

References

Conversation on #mediawiki on Freenode, November 4, 2005 11:30 PM to 12:14 AM

<NightMonkey> Howdy. Is there an easy way to populate a fresh MediaWiki install's Help: with the MediaWiki Handbook?
<brion> at this time we don't have an installable, redistributable set of help pages
<TimStarling> no reason we couldn't have one though
<TimStarling> is there?
<brion> time and labor
<brion> there've been several attempts to reorg the various doc pages
<brion> afaik none has produced an actual downloadable package to this date
<TimStarling> we could just do an XML dump of meta's Help namespace
<TimStarling> maybe import it to mediawiki.org first and delete any unnecessary pages
<NightMonkey> I'd love it. I'm a SysAdmin, so I could deal with a database dump of some sort, if that is what is necessary. I'm not worried about any non-English pages, or even Wikipedia-specific pages - I can edit any I find.
<brion> NightMonkey: well if you're brave...
<brion> fetch the special/meta page dump from download.wikimedia.org
<brion> use the mwdumper too to extract pages with --filter=namespaces:NS_HELP
<brion> and then import that into your wiki with importDump.php
<brion> it *might* work :D
<NightMonkey> brion: Cool! I'll give it a try. Is that the whole Wikimedia namespace? I'll edit it to just include the Help: namespace, if that's the case.
<brion> the dump is the entire meta.wikimedia.org site
<brion> but mwdumper can extract subsets based on namespace or a list of page titles
<NightMonkey> brion: Thank you. I don't need to use importDump.php if I use mvdumper, correct? (I have a 1.4.11 MediaWiki)
<brion> mwdumper's database import only works on a clean (empty) database, as it includes the page id numbers
<brion> however you could dump into an empty cur table, then copy those entries to your own skippiing the cur_id
<NightMonkey> brion: Ah, I see.

http://download.wikimedia.org/

Errors

mwdumper.jar Java exception:

Exception in thread “main” java.io.IOException: Parser has reached the entity expansion limit “64,000” set by the Application.
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)

Adding -DentityExpansionLimit=10000000 or similar high number to the java command only allows it to go furhter through the XML, but it still dies with a different exception.

This error occurs while using “Blackdown JDK 1.4.2.02”. Using Sun JDK 1.5.0.05 fixes this problem. I haven’t tested other JDKs/JREs, sorry.

Leave a Reply

Your email address will not be published.