Linux Format forums Forum Index Linux Format forums
Help, discussion, magazine feedback and more
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Find and Replace

 
Post new topic   Reply to topic    Linux Format forums Forum Index -> Programming
View previous topic :: View next topic  
Author Message
LeeNukes
LXF regular


Joined: Sun Jun 21, 2009 9:11 pm
Posts: 954
Location: At the bar

PostPosted: Wed Apr 21, 2010 3:49 pm    Post subject: Find and Replace Reply with quote

Hello,

I have a requirement to go through two large files one has 3307339 words and the other has 3998911 words.

Against these two files I need to find and replace occurances of files, at the moment they are in two columns in a spreadsheet.

So where there is an occurence of A1 for example, replace with B1 and so on.

Now I understand that I will likely need to output these, maybe to a CSV or a Tabbed CSV file, and then likely use regular expressions to do the find and replace, like this:

http://www.regular-expressions.info/perl.html

What I'm wondering about, is how do I take each entry as a variable?

The two files checking against are parallel texts, so one is in English, one is in Spanish but they are sentence aligned (this shouldn't matter, its just a background).

The content I need to replace looks like this:

Code:
intermediate   intermedio
Linux   Linux
mount   montaje


So for every occurance of Linux I want to make sure it is replaced with Linux (probably a bad example). intermediate should be replaced with intermedio. Etc.
_________________
Join GiffGaff and get 5 free credit
Back to top
View user's profile Send private message
nelz
Site admin


Joined: Mon Apr 04, 2005 12:52 pm
Posts: 8439
Location: Warrington, UK

PostPosted: Wed Apr 21, 2010 4:13 pm    Post subject: Reply with quote

Create a file containing all the replacement rules and then use sed to do the replacements.
Code:
sed --file=rules english.csv >spanish.csv

where rules contains
Code:
s/intermediate/intermedio/g
s/friend/amigo/g
s/1sf (troll)/gringo/g
...

_________________
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
Back to top
View user's profile Send private message
Bazza
LXF regular


Joined: Sat Mar 21, 2009 11:16 am
Posts: 1468
Location: Loughborough

PostPosted: Wed Apr 21, 2010 4:21 pm    Post subject: Reply with quote

Hi nelz...

> "s/1sf (troll)/gringo/g"

ROTFL...

You`re on form boyo... ;oD

Apologies for butting in LeeNukes but that caught my humour...
_________________
73...

Bazza, G0LCU...

Team AMIGA...
Back to top
View user's profile Send private message
LeeNukes
LXF regular


Joined: Sun Jun 21, 2009 9:11 pm
Posts: 954
Location: At the bar

PostPosted: Wed Apr 21, 2010 4:30 pm    Post subject: Reply with quote

There are hundreds of replacements that need doing. Are you suggesting I put s/<original word>/<replacement word> for each occurance of a word?
_________________
Join GiffGaff and get 5 free credit
Back to top
View user's profile Send private message
nelz
Site admin


Joined: Mon Apr 04, 2005 12:52 pm
Posts: 8439
Location: Warrington, UK

PostPosted: Wed Apr 21, 2010 5:32 pm    Post subject: Reply with quote

How else would it know which words to replace or what to replace them with?

Unless you want to send it through Babelfish Smile
_________________
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
Back to top
View user's profile Send private message
LeeNukes
LXF regular


Joined: Sun Jun 21, 2009 9:11 pm
Posts: 954
Location: At the bar

PostPosted: Wed Apr 21, 2010 7:05 pm    Post subject: Reply with quote

It's for a translation system but its to guarantee certain translations.
_________________
Join GiffGaff and get 5 free credit
Back to top
View user's profile Send private message
nelz
Site admin


Joined: Mon Apr 04, 2005 12:52 pm
Posts: 8439
Location: Warrington, UK

PostPosted: Wed Apr 21, 2010 7:11 pm    Post subject: Reply with quote

So however you did it, you'd need a translation table of some sort?
_________________
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
Back to top
View user's profile Send private message
LeeNukes
LXF regular


Joined: Sun Jun 21, 2009 9:11 pm
Posts: 954
Location: At the bar

PostPosted: Wed Apr 21, 2010 7:14 pm    Post subject: Reply with quote

I would have thought it would have been possible to ready in the content between the commas for a CSV and use that as a find replace.

So, assume I'd set the first entry as variable1 and the last part as variable2 then do:

s/$variable1/$variable2

I just don't know how it's done.
_________________
Join GiffGaff and get 5 free credit
Back to top
View user's profile Send private message
nelz
Site admin


Joined: Mon Apr 04, 2005 12:52 pm
Posts: 8439
Location: Warrington, UK

PostPosted: Wed Apr 21, 2010 10:48 pm    Post subject: Reply with quote

You could, but you'd need to define the variable pairs somewhere. If you have them in a file, sed will turn that file into a suitable script for sed.
_________________
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
Back to top
View user's profile Send private message
View previous topic :: View next topic  
Display posts from previous:   
Post new topic   Reply to topic    Linux Format forums Forum Index -> Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Linux Format forums topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group


Copyright 2011 Future Publishing, all rights reserved.


Web hosting by UKFast