Importing content into Drupal using Node Import

Dev

Importing content into Drupal using Node Import

by Sean Corrales
//

I just wrapped up some data imports for a few projects we’ve been working on and wanted to share some tips I gleaned from the process.

  • Use find and replace
    When working with large sets of data, an editor with find and replace is invaluable. I prefer to use the same text editor I use for coding since I’m most familiar with it. Depending on your data, there’s a lot of stuff you might want to do with find and replace including removing extraneous characters or replacing UTF 8 characters with their HTML entity code which brings me to the next item…
  • UTF-8 characters will mess up your day
    ..unless you respect them. I was reviewing samples of my data I was importing (step 7) and using find and replace to substitute UTF 8 characters for their respective html entities. As I found out after I finished up my node imports, you can also change the CSV file to UTF-8 encoding which is a lot less painful than using find and replace.  If you attempt to import a non UTF-8 encoded file with UTF-8 characters, Node Import will create the nodes but truncate the text of any field with UTF-8 characters when it encounters those characters.
  • Use Taxonomy to help keep your data straight
    If you’re doing large data imports, taxonomy can be a great help. I like to create a taxonomy vocabulary specifically for data imports and I create terms that describe the type of import and even identify multiple attempts. For example, I might have a terms like “Blog import – 1st pass”, “Blog import – 2nd pass”, etc. By attaching these terms to all the nodes imported on each data import, I’m able to easily search for and find the data. Did something go wrong on my third pass of blog imports? No problem; I just filter my content by the respective taxonomy term and I have all the nodes from that data import ready for editing/deleting.
  • Node references have to be exact
    If you’re importing in data with node references, be sure the title you’re referencing is exactly the same. For example, if you’re trying to import blog entries and associate the posts back to a user’s content profile, the name of the profile needs to match exactly. I had a case similar to this and it turns out that an extra space that wasn’t even visible to the end user was enough to mess up the node import. I had to go delete the single extra space after every author’s middle initial. Even an extra space at the end of the node title is enough to make the node reference fail.
  • Download and use the error file
    After you import data, node import reports back how many rows were successful and how many had errors. You can also download a CSV of all the rows that had errors – do this! You can push this error file back into node import after you’ve resolved the issues that prevented the nodes from being imported before. On some imports, I have to import 3 or 4 times to get all the errors resolved.
  • Dealing with missing data that’s required
    Sometimes you want to create nodes but don’t have all the required data for the content type. In these cases, I’ll usually mark the field I’m missing data for as not required, attach some sort of taxonomy term to indicate these fields need to be reviewed, and then do the import.  Then you can look into resolving the issue at another time. Another option is to replace the NULL values with something else. When I imported in stories that didn’t have authors, I replaced the NULL author value with “John Doe” so all stories could easily be found after the import.

More About the Author

Sean Corrales

Lead Web Developer
Internet Explorer ignoring CSS files Like most web developers, I do most of my development work in one browser (in my case, Firefox) and then do cross browser checks after ...
Creating checkout panes for Ubercart All code comes from a shipping insurance module I wrote for Ubercart. I plan to release it on ubercart.org and drupal.org after I ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!