| View previous topic :: View next topic |
| Author |
Message |
LeeNukes LXF regular

Joined: Sun Jun 21, 2009 9:11 pm Posts: 954 Location: At the bar
|
Posted: Wed Apr 21, 2010 3:49 pm Post subject: Find and Replace |
|
|
Hello,
I have a requirement to go through two large files one has 3307339 words and the other has 3998911 words.
Against these two files I need to find and replace occurances of files, at the moment they are in two columns in a spreadsheet.
So where there is an occurence of A1 for example, replace with B1 and so on.
Now I understand that I will likely need to output these, maybe to a CSV or a Tabbed CSV file, and then likely use regular expressions to do the find and replace, like this:
http://www.regular-expressions.info/perl.html
What I'm wondering about, is how do I take each entry as a variable?
The two files checking against are parallel texts, so one is in English, one is in Spanish but they are sentence aligned (this shouldn't matter, its just a background).
The content I need to replace looks like this:
| Code: | intermediate intermedio
Linux Linux
mount montaje
|
So for every occurance of Linux I want to make sure it is replaced with Linux (probably a bad example). intermediate should be replaced with intermedio. Etc. _________________ Join GiffGaff and get £5 free credit |
|
| Back to top |
|
 |
nelz Moderator

Joined: Mon Apr 04, 2005 12:52 pm Posts: 8036 Location: Warrington, UK
|
Posted: Wed Apr 21, 2010 4:13 pm Post subject: |
|
|
Create a file containing all the replacement rules and then use sed to do the replacements.
| Code: | | sed --file=rules english.csv >spanish.csv |
where rules contains
| Code: | s/intermediate/intermedio/g
s/friend/amigo/g
s/1sf (troll)/gringo/g
... |
_________________ "Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein) |
|
| Back to top |
|
 |
Bazza LXF regular

Joined: Sat Mar 21, 2009 11:16 am Posts: 1393 Location: Loughborough
|
Posted: Wed Apr 21, 2010 4:21 pm Post subject: |
|
|
Hi nelz...
> "s/1sf (troll)/gringo/g"
ROTFL...
You`re on form boyo... ;oD
Apologies for butting in LeeNukes but that caught my humour... _________________ 73...
Bazza, G0LCU...
Team AMIGA... |
|
| Back to top |
|
 |
LeeNukes LXF regular

Joined: Sun Jun 21, 2009 9:11 pm Posts: 954 Location: At the bar
|
Posted: Wed Apr 21, 2010 4:30 pm Post subject: |
|
|
There are hundreds of replacements that need doing. Are you suggesting I put s/<original word>/<replacement word> for each occurance of a word? _________________ Join GiffGaff and get £5 free credit |
|
| Back to top |
|
 |
nelz Moderator

Joined: Mon Apr 04, 2005 12:52 pm Posts: 8036 Location: Warrington, UK
|
Posted: Wed Apr 21, 2010 5:32 pm Post subject: |
|
|
How else would it know which words to replace or what to replace them with?
Unless you want to send it through Babelfish  _________________ "Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein) |
|
| Back to top |
|
 |
LeeNukes LXF regular

Joined: Sun Jun 21, 2009 9:11 pm Posts: 954 Location: At the bar
|
|
| Back to top |
|
 |
nelz Moderator

Joined: Mon Apr 04, 2005 12:52 pm Posts: 8036 Location: Warrington, UK
|
Posted: Wed Apr 21, 2010 7:11 pm Post subject: |
|
|
So however you did it, you'd need a translation table of some sort? _________________ "Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein) |
|
| Back to top |
|
 |
LeeNukes LXF regular

Joined: Sun Jun 21, 2009 9:11 pm Posts: 954 Location: At the bar
|
Posted: Wed Apr 21, 2010 7:14 pm Post subject: |
|
|
I would have thought it would have been possible to ready in the content between the commas for a CSV and use that as a find replace.
So, assume I'd set the first entry as variable1 and the last part as variable2 then do:
s/$variable1/$variable2
I just don't know how it's done. _________________ Join GiffGaff and get £5 free credit |
|
| Back to top |
|
 |
nelz Moderator

Joined: Mon Apr 04, 2005 12:52 pm Posts: 8036 Location: Warrington, UK
|
Posted: Wed Apr 21, 2010 10:48 pm Post subject: |
|
|
You could, but you'd need to define the variable pairs somewhere. If you have them in a file, sed will turn that file into a suitable script for sed. _________________ "Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein) |
|
| Back to top |
|
 |
| View previous topic :: View next topic |
|