Find Possible Word Rectangles

Question

Johnny is trying to create crossword puzzles, but he is having difficulty making words match each other.

He has come up with several simple word rectangles: that is, groups of words that form a rectangle where all horizontal and vertical paths form a word.

//2x2 
PA
AM

//2x3
GOB
ORE

//3x3
BAG
AGO
RED

//3x4
MACE
AGES
WEES

However, to make a good puzzle, he needs some word rectangles that are somewhat larger than 3x4. Instead of agonizing over the arrangement letters for hours, Johnny would prefer to have a program that does this for him--and in as few characters as possible, because long blocks of code are extremely intimidating for casual programmers like Johnny.

Given

a text file dictionary where words are separated by newlines in alphabetical order,
input specifying the number of rows and columns in the words rectangle (which can be provided however is most convenient in your programming language of choice)

generate at least one word rectangle. If it is not possible to generate a word rectangle with the given lexicon and dimensions, the program need not have defined behaviour. It is not necessary for the program to be able to generate rectangles which contain more than 64 letters or have dimensions that exceed 8 in either direction. The program should be able to complete in a reasonable amount of time, say, in thirty minutes or less.

EDIT: If you are doing an NxN rectangle, you are permitted to use a smaller dictionary file that only contains words that have a length of N letters.

I'm curious if anyone has an implementation that can generate an 8x8 from that word list in under ½ hour. — MtnViewMark, May 14, 2011 at 6:05
@MtnViewMark If not, I'll be willing to lower the size requirement. Even so, I think that Keith's implementation could be made quite a bit faster if he could do some checking to reduce the number of possiblities. — Peter Olson, May 14, 2011 at 14:44
@MtnViewMark, rather than testing merely whether a prefix is viable I use a trie which stores the number of children under each node to give a heuristic for which continuations are most likely to give a solution. (I then pick one at random weighted by the heuristic - I should probably try just picking the best candidate, since variability of results isn't explicitly in the spec). I'm testing the reduction to set cover, solving with Knuth's dancing links, on the basis that it applies heuristics globally rather than to prefixes, but so far not promising. Maybe there's a better reduction. — Peter Taylor, May 17, 2011 at 8:42

Community · Accepted Answer · 2020-06-17 09:04:33Z

Haskell, 586 characters

import Data.List
import qualified Data.Vector as V
import System
data P=P[String](V.Vector P)
e=P[]$V.replicate 128 e
(a:z)∈(P w v)|a>' '=z∈(V.!)v(fromEnum a);_∈(P w _)=w
p∫w=q p w where q(P u v)=P(w:u).V.accum q v.x;x(a:z)=[(fromEnum a,z)];x _=[]
l%n=foldl'(∫)e.filter((==n).length)$l
d§(p,q:r)=map(\w->p++w:r)$q∈d;_§(p,_)=[p]
(d¶e)i=filter(not.any(null.(∈e))).map transpose.(d§).splitAt i
s i n d e m|i==n=["":m]|1<3=(d¶e)i m>>=(e¶d)i>>=s(i+1)n d e
p[b,a,n]w=take n$s 0(a`max`b)(w%a)(w%b)$replicate b$replicate a ' '
main=do{g<-map read`fmap`getArgs;interact$unlines.concat.p g.lines}

Invoked by supplying 3 arguments: number of rows, number of columns, number of solutions; and the word list is accepted on stdin:

$> ghc -O3 2554-WordRect.hs 
[1 of 1] Compiling Main             ( 2554-WordRect.hs, 2554-WordRect.o )
Linking 2554-WordRect ...

$> time ./2554-WordRect 7 7 1 < 2554-words.txt

zosters
overlet
seriema
trimmer
element
remends
startsy

real    0m22.381s
user    0m22.094s
sys     0m0.223s

As you can see 7×7 runs relatively fast. Still timing 8×8 and 7×6....

It would be 9 characters shorter to remove the number of solutions argument and just produce all solutions, but then it becomes impossible to time.

Edit: (585 → 455) replaced custom data structure with a simple map of prefix string to possible replacements; oddly, this is a bit slower, perhaps because Map String a is slower than a hand-built tree of Map Char a...
Edit: (455 → 586) Bigger?!?!! This version does more optimization of the search space, using both the techniques of my original solution, and of the python and awk solutions. Further, the custom data structure, based on Vector is much faster than using a simple Map. Why do this? Because I'm think a solution that is closer to the goal of 8x8 in under ½ hour is preferable to a shorter solution.

What version of GHC are you using? Using the same commands (except I need --make after ghc) with the word list posted in the question, I'm getting words that aren't in the list, like "zymesz" and "youthy". — Joey Adams, May 14, 2011 at 16:52
Perhaps the requirements should be altered to say that the output grid shouldn't be its own transpose. — Peter Taylor, May 14, 2011 at 17:09
@Joey: Did you convert the words.txt file to local line endings? I'm running on Mac OS X, and I converted the file to \n line endings before using it. — MtnViewMark, May 14, 2011 at 19:21
Thanks, that worked. The input file has to be in lowercase and have line endings compatible with one's system. — Joey Adams, May 14, 2011 at 21:36

Community · Accepted Answer · 2020-06-17 09:04:33Z

5

Python, 232 chars

x,y=input()
H=[]
P={}
for w in open('words.txt'):
 l=len(w)-2
 if l==x:H+=[w]
 if l==y:
  for i in range(y+1):P[w[:i]]=1
def B(s):
 if(x+2)*y-len(s):[B(s+w)for w in H if all((s+w)[i::x+2]in P for i in range(x))]
 else:print s
B('')

It can only handle up to 6x6 in the 1/2 hour limit, though.

edited Jun 17, 2020 at 9:04

CommunityBot

1

answered May 12, 2011 at 18:21

Keith Randall

20.9k2 gold badges44 silver badges105 bronze badges

1

\$\begingroup\$ Does it generate all the pair or valid words pair only? When I read vertical from above to bottom, it does not seems to form a valid word. Eg., I got one result "aahs","abet","lack", but vertical reading "stk" is not in the words list, and also for above I pass parameter 3,3 but its return words 3x4 \$\endgroup\$
– YOU
May 13, 2011 at 6:20
\$\begingroup\$ Hmmm, that's not what I get. I suspect the problem is with linebreaks in the dictionary. The provided file has Windows linebreaks (\r\n) in it, so the length of the words is len(w)-2. If you somehow converted the line breaks (or if Python on Windows does that for you?), change it to len(w)-1 and that should fix it. \$\endgroup\$
– Keith Randall
May 13, 2011 at 16:17
\$\begingroup\$ ...and change the other +2s to +1s. \$\endgroup\$
– Keith Randall
May 13, 2011 at 16:17
\$\begingroup\$ Ah, I see. I tested with Windows, and Linux. Python on Windows automatically removed \r from w, and on Linux I have converted file to Unix format, so both didn't work. \$\endgroup\$
– YOU
May 14, 2011 at 3:42
\$\begingroup\$ This is a very elegant solution. \$\endgroup\$
– asoundmove
May 15, 2011 at 5:26

Add a comment |

Community · Accepted Answer · 2020-06-17 09:04:33Z

Java (1065 bytes)

import java.util.*;public class W{public static void main(String[]a){new
W(Integer.parseInt(a[0]),Integer.parseInt(a[1]));}W(int w,int h){M
H=new M(),V=new M();String L;int i,j,l,m,n=w*h,p[]=new int[n];long
I,J,K,M,C=31;long[]G=new long[h],T=new long[w],W[]=new long[n][],X;try{Scanner
S=new Scanner(new java.io.File("words.txt"));while(0<1){L=S.nextLine();l=L.length();for(i=0;i>>l<1;i++){K=0;for(j=0;j<l;j++)K+=(i>>j&1)*(L.charAt(j)-96L)<<5*j;if(l==w)H.put(K,H.g(K)+1);if(l==h)V.put(K,V.g(K)+1);}}}catch(Exception
E){}while(n-->0){j=1;if(W[n]==null){M=1L<<62;for(i=w*h;i-->0;){m=i/w;l=i%w*5;if((G[m]>>l&C)<1){X=new
long[27];I=K=0;for(;K++<26;){J=H.g(G[m]+(K<<l))*V.g(T[i%w]+(K<<5*m));X[(int)K]=K-32*J;I+=J;}if(I<1)j=0;if(I<M){M=I;p[n]=i;W[n]=X;}}}}X=W[n];Arrays.sort(X);M=X[0]*j;X[0]=0;K=M&C;i=p[n]%w;j=p[n]/w;l=5*i;m=5*j;G[j]&=~(C<<l);G[j]+=K<<l;T[i]&=~(C<<m);T[i]+=K<<m;if(M>=0){W[n]=null;n+=2;}}for(long
A:G){L="";for(i=0;i<w;)L+=(char)(96+(C&A>>5*i++));System.out.println(L);}}class
M extends HashMap<Long,Long>{long g(Long s){return get(s)!=null?get(s):0;}}}

A long way from being the shortest, but I think it's the closest to meeting the timing constraints. I saved 14 bytes by assuming that the input file has been filtered to words of the right lengths; on my netbook, if you feed the whole words.txt then it spends the first minute preprocessing it, discarding most of what it produces, and then takes a mere 20 or so seconds to solve 7x7. On my desktop it does the whole thing in under 15 seconds, giving:

rascals
areolae
serrate
coroner
alanine
latents
seeress

I've let it run for over 50 hours without finding a solution to 8x7 or 8x8. 8-letter words seem to be a critical boundary for this problem - it just hovers around half-full without making much progress.

The approach used is full pivoting and a heuristic based on the number of possible horizontal completions times the number of possible vertical completions. E.g. if we have intermediate grid

*ean*
algae
*ar**
*ier*
*nee*

then we give the top-left corner a heuristic value of count(aean*)count(aa***) + count(bean*)count(ba***) + ... + count(zean*)count(za***). Of all the cells we pick the one with the smallest heuristic value (i.e. hardest to satisfy), and then work though the letters in descending order of the amount they contributed to the heuristic value of that cell (i.e. starting with the most likely to succeed).

\$\begingroup\$ Lovely approach and explanation. \$\endgroup\$
– MtnViewMark
May 25, 2011 at 5:44 — MtnViewMark, May 25, 2011 at 5:44

Ming-Tang · Accepted Answer · 2011-05-15 23:05:51Z

F#

Backtracking solution, but I'm going to optimize the search space later.

open System

(*-NOTES
    W=7 H=3
    abcdefg<-- searching from wordsW
    abcdefg
    abcdefg
    ^
    partial filtering from wordsH
  *)

let prefix (s : char[]) (a : char[]) =
  a.[0..s.Length - 1] = s

let filterPrefix (s : char[]) =
  Array.filter (prefix s)

let transpose (s : char[][]) =
  [|
    for y = 0 to s.[0].Length - 1 do
      yield [|
        for x = 0 to s.Length - 1 do
          yield s.[x].[y]
      |]
  |]

[<EntryPoint>]
let main (args : String[]) =
  let filename, width, height = "C:\Users\AAA\Desktop\words.txt", 3, 3
  let lines = System.IO.File.ReadAllLines filename |> Array.map (fun x -> x.ToCharArray())
  let wordsW = Array.filter (Array.length >> ((=) width)) lines
  let wordsH = Array.filter (Array.length >> ((=) height)) lines

  let isValid (partial : char[][]) =
    if partial.Length = 0 then
      true
    else
      seq {
        for part in transpose partial do
          yield Seq.exists (prefix part) wordsH
      }
      |> Seq.reduce (&&)

  let slns = ref []
  let rec back (sub : char[][]) =
    if isValid sub then
      if sub.Length = height then
        slns := sub :: !slns
      else
        for word in wordsW do
          back <| Array.append sub [| word |]

  back [| |]
  printfn "%A" !slns
  0

asoundmove · Accepted Answer · 2011-05-15 20:24:43Z

awk, 283

(may need to add 14 for parameter input flags)

Call with e.g. awk -v x=2 -v y=2...
Find first match and print it (283 chars):

{if(x==L=length)w[++n]=$0;if(y==L)for(;L>0;L--)W[substr($0,1,L)]++}END{for(i[Y=1]++;i[j=1]<=n;){b[Y]=w[i[Y]];while(j<=x){s="";for(k=1;k<=Y;k++)s=s substr(b[k],j,1);W[s]?0:j=x;j++}if(W[s])if(Y-y)i[++Y]=0;else{for(k=1;k<=Y;k++)print b[k];exit}i[Y]++;while(Y>1&&i[Y]>n)i[--Y]++}print N}

Find number of matches (245 chars, much slower):

{if(x==L=length)w[++n]=$0;if(y==L)for(;L>0;L--)W[substr($0,1,L)]++}END{for(i[Y=1]++;i[j=1]<=n;){b[Y]=w[i[Y]];while(j<=x){s="";for(k=1;k<=Y;k++)s=s substr(b[k],j,1);W[s]?0:j=x;j++}W[s]?Y-y?i[++Y]=0:N++:0;i[Y]++;while(Y>1&&i[Y]>n)i[--Y]++}print N}

For both programs (of course more so for the solutions count), the running time far exceeds 30 minutes for some values of x and y.

Just as a matter of interest, here is the word count for each word length:

Stack Exchange Network

Find Possible Word Rectangles

5 Answers 5

Haskell, 586 characters

Python, 232 chars

Java (1065 bytes)

F#

awk, 283

Your Answer

Not the answer you're looking for? Browse other questions tagged
code-golf
crossword
or ask your own question.

Hot Network Questions

Find Possible Word Rectangles

5 Answers 5

Haskell, 586 characters

Python, 232 chars

Java (1065 bytes)

F#

awk, 283

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged code-golfcrossword or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
code-golf
crossword
or ask your own question.